1,024 51 84MB
English Pages XVIII, 893 [893] Year 2021
Advances in Intelligent Systems and Computing 1261
Aboul Ella Hassanien · Adam Slowik · Václav Snášel · Hisham El-Deeb · Fahmy M. Tolba Editors
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020
Advances in Intelligent Systems and Computing Volume 1261
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Aboul Ella Hassanien Adam Slowik Václav Snášel Hisham El-Deeb Fahmy M. Tolba •
•
•
•
Editors
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020
123
Editors Aboul Ella Hassanien Faculty of Computers and Artificial Intelligence, Information Technology Department, and Chair of the Scientific Research Group in Egypt Cairo University Cairo, Egypt Václav Snášel Faculty of Electrical Engineering and Computer Science VŠB-Technical University of Ostrava Ostrava-Poruba, Moravskoslezsky Czech Republic
Adam Slowik Department of Electronics and Computer Science Koszalin University of Technology Koszalin, Poland Hisham El-Deeb Rector of the Electronic Research Institute Cairo, Egypt
Fahmy M. Tolba Faculty of Computers and Information Ain Shams University Cairo, Egypt
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-58668-3 ISBN 978-3-030-58669-0 (eBook) https://doi.org/10.1007/978-3-030-58669-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume constitutes the refereed proceedings of the 5th International Conference on Advanced Intelligent Systems and Informatics (AISI 2020), which took place in Cairo, Egypt, during October 19–21, 2020, and is an international interdisciplinary conference covering research and development in the field of informatics and intelligent systems. In response to the call for papers for AISI 2020, 113 papers were submitted for the main conference and 57 for three special sessions, so the total is 170 papers submitted for presentation and inclusion in the proceedings of the conference. After a careful blind refereeing process, 79 papers were selected for inclusion in the conference proceedings. The papers were evaluated and ranked on the basis of their significance, novelty, and technical quality by at least two reviewers per paper. After a careful blind refereeing process, 79 papers were selected for inclusion in the conference proceedings. The papers cover current research intelligent systems, deep learning technology, document and sentiment analysis, blockchain and cyber-physical system, health informatics and AI against COVID-19, data mining, power and control systems, business intelligence, social media and digital transformation, robotic, control design, and smart systems. We express our sincere thanks to the plenary speakers, workshop chairs, and International Program Committee members for helping us to formulate a rich technical program. We would like to extend our sincere appreciation for the outstanding work contributed over many months by the Organizing Committee: local organization chair and publicity chair. We also wish to express our appreciation to the SRGE members for their assistance. We would like to emphasize that the success of AISI 2020 would not have been possible without the support of many committed volunteers who generously contributed their time, expertise, and resources toward making the conference an unqualified success. Finally, thanks to Springer team for their support in all stages of the production of the proceedings. We hope that you will enjoy the conference program.
v
Organization
Honorary Chair Fahmy Tolba, Egypt
General Chairs Vaclav Snasel Hesham El-deeb
Rector of the Technical University of Ostrava, Czech Republic Rector of the Electronic Research Institute, Egypt
Co-chairs Aboul Ella Hassanien Allam Hamdan
Scientific Research Group in Egypt (SRGE) Ahlia University, Manama, Bahrain
International Advisory Board Norimichi Tsumura, Japan Kuo-Chi Chang, China Tarek Sobh, USA Mahmoud Abdel-Aty, Egypt Reda Salah, Egypt Nagwa Badr, Egypt Vaclav Snasel, Czech Republic Janusz Kacprzyk, Poland Siddhartha Bhattacharyya, India Ahmed Hassan, Egypt Hesham El-deeb, Egypt Khaled Shaalan, Egypt Ayman Bahaa, Egypt Ayman El Desoky, Egypt
vii
viii
Organization
Nouby Mahdy Ghazaly, Egypt Hany Harb, Egypt Alaa El-Sadek, Egypt Arabi Keshk, Egypt Magdy Zakariya, Egypt Saleh Mesbah, Egypt Fathi El-Sayed Abd El-Samie, Egypt Tarek Ghareb, Egypt Mohamed Belal, Egypt
Program Chair Adam Slowik
Koszalin University of Technology, Poland
Track Chairs Intelligent Natural Language Processing Track Khaled Shaalan, Egypt Informatics Track Diego Alberto Oliva, Mexico Intelligent Systems Track Ashraf Darwish, Egypt Robotics, Automation and Control Ahmad Taher Azar Internet of Things and Big Data Analytics Track Sherine Abd El-Kader
Publicity Chairs Khaled Ahmed, USA Mohamed Abd Elfattah, Egypt Assem Ahmed Alsawy, Egypt
Technical Program Committee Milan Stehlik Fatmah Omara Wael Badawy Passent ElKafrawy Walaaa Medhat Aarti Singh Tahani Alsubait Ahmed Fouad
Johannes Kepler University Linz, Austria Egypt Egypt Egypt Egypt India UK Egypt
Organization
Ali R. Kashani Arun Kumar Sangaiah Rizwan Patan Gaurav Dhiman Nand Kishor Meena Evgenia Theodotou Pavel Kromer Irma Aslanishvili Jan Platos Ivan Zelinka Sebastian Tiscordio Natalia Spyropoulou Dimitris Sedaris Vassiliki Pliogou Pilios Stavrou Eleni Seralidou Stelios Kavalaris Litsa Charitaki Elena Amaricai Qing Tan Pascal Roubides Manal Abdullah Mandia Athanasopoulou Vicky Goltsi Mohammad Reza Noruzi Abdelhameed Ibrahim Ahmed Elhayek Amira S. Ashour Boyang Albert Li Edgard Marx Fatma Helmy Ivan Ermilov Mahmoud Awadallah Minu Kesheri Mona Solyman Muhammad Saleem Nabiha Azizi Namshik Han Noreen Kausar Noura Semary Rania Hodhod Reham Ahmed Sara Abdelkader Sayan Chakraborty Shoji Tominaga
ix
USA India India India UK Greece Czech Republic Czech Republic Czech Republic Czech Republic Czech Republic Hellenic Open University, Greece Hellenic Open University, Greece Metropolitan College, Greece Metropolitan College, Greece University of Piraeus, Greece Metropolitan College, Greece University of Athens, Greece University of Timișoara, Greece Athabasca University, Greece Broward College, Greece King Abdulaziz University, KSA Metropolitan College, Greece Metropolitan College, Greece Tarbiat Modarres University, Iran Egypt Germany KSA . Germany Egypt Germany USA India Egypt Germany Algeria UK KSA Egypt Georgia Egypt Canada India Japan
x
Siva Ganesh Malla Soumya Banerjee Sourav Samanta Suvojit Acharjee Swarna Kanchan Takahiko Horiuchi Tommaso Soru Wahiba Ben Abdessalem Zeineb Chelly
Organization
India India India India India Japan Germany KSA Tunis
Local Arrangement Chairs Ashraf/Darwish, Egypt Mohamed Abd Elfattah, Egypt Heba Aboul Ella, Egypt
Contents
Intelligence and Decision Making System A Context-Based Video Compression: A Quantum-Inspired Vector Quantization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Osama F. Hassan, Saad M. Darwish, and Hassan A. Khalil
3
An Enhanced Database Recovery Model Based on Game Theory for Mobile Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasser F. Mokhtar, Saad M. Darwish, and Magda M. Madbouly
16
Location Estimation of RF Emitting Source Using Supervised Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kamel H. Rahouma and Aya S. A. Mostafa
26
An Effective Offloading Model Based on Genetic Markov Process for Cloud Mobile Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed S. Zalat, Saad M. Darwish, and Magda M. Madbouly
38
Toward an Efficient CRWSN Node Based on Stochastic Threshold Spectrum Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reham Kamel Abd El-Aziz, Ahmad A. Aziz El-Banna, HebatAllah Adly, and Adly S. Tag Eldien Video Captioning Using Attention Based Visual Fusion with Bi-temporal Context and Bi-modal Semantic Feature Learning . . . Noorhan K. Fawzy, Mohammed A. Marey, and Mostafa M. Aref
51
65
Matchmoving Previsualization Based on Artificial Marker Detection . . . Houssam Halmaoui and Abdelkrim Haqiq
79
Research Method of Blind Path Recognition Based on DCGAN . . . . . . Ling Luo, Ping-Jun Zhang, Peng-Jun Hu, Liu Yang, and Kuo-Chi Chang
90
xi
xii
Contents
The Impact of the Behavioral Factors on Investment Decision-Making: A Systemic Review on Financial Institutions . . . . . . . 100 Syed Faisal Shah, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Deep Learning Technology and Applications A Deep Learning Architecture with Word Embeddings to Classify Sentiment in Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Eman Hamdi, Sherine Rady, and Mostafa Aref Deep Neural Networks for Landmines Images Classification . . . . . . . . . 126 Refaat M. Fikry and H. Kasban Deep Convolutional Neural Networks for ECG Heartbeat Classification Using Two-Stage Hierarchical Method . . . . . . . . . . . . . . . 137 Abdelrahman M. Shaker, Manal Tantawi, Howida A. Shedeed, and Mohamed F. Tolba Study of Region Convolutional Neural Network Deep Learning for Fire Accident Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Ntawiheba Jean d’Amour, Kuo-Chi Chang, Pei-Qiang Li, Yu-Wen Zhou, Hsiao-Chuan Wang, Yuh-Chung Lin, Kai-Chun Chu, and Tsui-Lien Hsu Document and Sentiment Analysis Norm-Referenced Achievement Grading: Methods and Comparison . . . 159 Thepparit Banditwattanawong and Masawee Masdisornchote Review of Several Address Assignment Mechanisms for Distributed Smart Meter Deployment in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . 171 Tien-Wen Sung, Xiaohui Hu, and Haiyan Ou An Approach for Sentiment Analysis and Personality Prediction Using Myers Briggs Type Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Alàa Genina, Mariam Gawich, and Abdelfatah Hegazy Article Reading Sequencing for English Terminology Learning in Professional Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Tien-Wen Sung, Qingjun Fang, You-Te Lu, and Xiaohui Hu Egyptian Student Sentiment Analysis Using Word2vec During the Coronavirus (Covid-19) Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Lamiaa Mostafa Various Pre-processing Strategies for Domain-Based Sentiment Analysis of Unbalanced Large-Scale Reviews . . . . . . . . . . . . . . . . . . . . . 204 Sumaia Mohammed AL-Ghuribi, Shahrul Azman Noah, and Sabrina Tiun
Contents
xiii
Arabic Offline Character Recognition Model Using Non-dominated Rank Sorting Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Saad M. Darwish, Osama F. Hassan, and Khaled O. Elzoghaly Sentiment Analysis of Hotel Reviews Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Sarah Anis, Sally Saad, and Mostafa Aref Blockchain and Cyber Physical System Transparent Blockchain-Based Voting System: Guide to Massive Deployments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Aicha Fatrah, Said El Kafhali, Khaled Salah, and Abdelkrim Haqiq Enhanced Technique for Detecting Active and Passive Black-Hole Attacks in MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Marwa M. Eid and Noha A. Hikal A Secure Signature Scheme for IoT Blockchain Framework Based on Multimodal Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Yasmin A. Lotfy and Saad M. Darwish An Evolutionary Biometric Authentication Model for Finger Vein Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Saad M. Darwish and Ahmed A. Ismail A Deep Blockchain-Based Trusted Routing Scheme for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Ibrahim A. Abd El-Moghith and Saad M. Darwish A Survey of Using Blockchain Aspects in Information Centric Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Abdelrahman Abdellah, Sherif M. Saif, Hesham E. ElDeeb, Emad Abd-Elrahman, and Mohamed Taher Health Informatics and AI Against COVID-19 Real-Time Trajectory Control of Potential Drug Carrier Using Pantograph “Experimental Study” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Ramy Farag, Ibrahim Badawy, Fady Magdy, Zakaria Mahmoud, and Mohamed Sallam Early Detection of COVID-19 Using a Non-contact Forehead Thermometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Ahmed G. Ebeid, Enas Selem, and Sherine M. Abd El-kader The Mass Size Effect on the Breast Cancer Detection Using 2-Levels of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Ghada Hamed, Mohammed Abd El-Rahman Marey, Safaa El-Sayed Amin, and Mohamed Fahmy Tolba
xiv
Contents
An Integrated IoT System to Control the Spread of COVID-19 in Egypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Aya Hossam, Ahmed Magdy, Ahmed Fawzy, and Shriene M. Abd El-Kader Healthcare Informatics Challenges: A Medical Diagnosis Using Multi Agent Coordination-Based Model for Managing the Conflicts in Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Sally Elghamrawy Protection of Patients’ Data Privacy by Tamper Detection and Localization in Watermarked Medical Images . . . . . . . . . . . . . . . . 358 Alaa H. ElSaadawy, Ahmed S. ELSayed, M. N. Al-Berry, and Mohamed Roushdy Breast Cancer Classification from Histopathological Images with Separable Convolutional Neural Network and Parametric Rectified Linear Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Heba Gaber, Hatem Mohamed, and Mina Ibrahim Big Data Analytics and Service Quality Big Data Technology in Intelligent Distribution Network: Demand and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Zhi-Peng Ye and Kuo-Chi Chang Memory Management Approaches in Apache Spark: A Review . . . . . . 394 Maha Dessokey, Sherif M. Saif, Sameh Salem, Elsayed Saad, and Hesham Eldeeb The Influence of Service Quality on Customer Retention: A Systematic Review in the Higher Education . . . . . . . . . . . . . . . . . . . . 404 Aisha Alshamsi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum The Impact of Ethical Leadership on Employees Performance: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Hind AlShehhi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Data Mining, Decision Making, and Intelligent Systems Evaluating Non-redundant Rules of Various Sequential Rule Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Nesma Youssef, Hatem Abdulkader, and Amira Abdelwahab Impact of Fuzzy Stability Model on Ad Hoc Reactive Routing Protocols to Improve Routing Decisions . . . . . . . . . . . . . . . . . . . . . . . . . 441 Hamdy A. M. Sayedahmed, Imane M. A. Fahmy, and Hesham A. Hefny
Contents
xv
A Multi-channel Speech Enhancement Method Based on Subband Affine Projection Algorithm in Combination with Proposed Circular Nested Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Ali Dehghan Firoozabadi, Pablo Irarrazaval, Pablo Adasme, Hugo Durney, Miguel Sanhueza Olave, David Zabala-Blanco, and Cesar Azurdia-Meza Game Theoretic Approach to Optimize Exploration Parameter in ACO MANET Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Marwan A. Hefnawy and Saad M. Darwish Performance Analysis of Spectrum Sensing Thresholding Methods for Cognitive Radio Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Rhana M. Elshishtawy, Adly S. Tag Eldien, Mostafa M. Fouda, and Ahmed H. Eldeib The Impacts of Communication Ethics on Workplace Decision Making and Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 Alyaa Alyammahi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum A Comparative Study of Various Deep Learning Architectures for 8-state Protein Secondary Structures Prediction . . . . . . . . . . . . . . . . 501 Moheb R. Girgis, Enas Elgeldawi, and Rofida Mohammed Gamal Power and Control Systems Energy Efficient Spectrum Aware Distributed Clustering in Cognitive Radio Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Randa Bakr, Ahmad A. Aziz El-Banna, Sami A. A. El-Shaikh, and Adly S. Tag ELdien The Autonomy Evolution in Unmanned Aerial Vehicle: Theory, Challenges and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Mohamed M. Eltabey, Ahmed A. Mawgoud, and Amr Abu-Talleb A Non-destructive Testing Detection Model for the Railway Track Cracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Kamel H. Rahouma, Samaa A. Mohammad, and Nagwa S. Abdel Hameed Study of Advanced Power Load Management Based on the Low-Cost Internet of Things and Synchronous Photovoltaic Systems . . . . . . . . . . 548 Elias Turatsinze, Kuo-Chi Chang, Pei-Qiang Li, Cheng-Kuo Chang, Kai-Chun Chu, Yu-Wen Zhou, and Abdalaziz Altayeb Ibrahim Omer Power Grid Critical State Search Based on Improved Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 Jie Luo, Hui-Qiong Deng, Qin-Bin Li, Rong-Jin Zheng, Pei-Qiang Li, and Kuo-Chi Chang
xvi
Contents
Study of PSO Optimized BP Neural Network and Smith Predictor for MOCVD Temperature Control in 7 nm 5G Chip Process . . . . . . . . 568 Kuo-Chi Chang, Yu-Wen Zhou, Hsiao-Chuan Wang, Yuh-Chung Lin, Kai-Chun Chu, Tsui-Lien Hsu, and Jeng-Shyang Pan Study of the Intelligent Algorithm of Hilbert-Huang Transform in Advanced Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 Cheng Zhang, Jia-Jing Liu, Kuo-Chi Chang, Hsiao-Chuan Wang, Yuh-Chung Lin, Kai-Chun Chu, and Tsui-Lien Hsu Study of Reduction of Inrush Current on a DC Series Motor with a Low-Cost Soft Start System for Advanced Process Tools . . . . . . 586 Governor David Kwabena Amesimenu, Kuo-Chi Chang, Tien-Wen Sung, Hsiao-Chuan Wang, Gilbert Shyirambere, Kai-Chun Chu, and Tsui-Lien Hsu Co-design in Bird Scaring Drone Systems: Potentials and Challenges in Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Moammar Dayoub, Rhoda J. Birech, Mohammad-Hashem Haghbayan, Simon Angombe, and Erkki Sutinen Proposed Localization Scenario for Autonomous Vehicles in GPS Denied Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608 Hanan H. Hussein, Mohamed Hanafy Radwan, and Sherine M. Abd El-Kader Business Intelligence E-cash Payment Scheme in Near Field Communication Based on Boosted Trapdoor Hash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Ahmed M. Hassan and Saad M. Darwish Internal Factors Affect Knowledge Management and Firm Performance: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632 Aaesha Ahmed Al Mehrez, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Enhancing Our Understanding of the Relationship Between Leadership, Team Characteristics, Emotional Intelligence and Their Effect on Team Performance: A Critical Review . . . . . . . . . . 644 Fatima Saeed Al-Dhuhouri, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Factors Affect Customer Retention: A Systematic Review . . . . . . . . . . . 656 Salama S. Alkitbi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum
Contents
xvii
The Effect of Work Environment Happiness on Employee Leadership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668 Khadija Alameeri, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Performance Appraisal on Employees’ Motivation: A Comprehensive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681 Maryam Alsuwaidi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Social media and Digital transformation Social Media Impact on Business: A Systematic Review . . . . . . . . . . . . 697 Fatima Ahmed Almazrouei, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Digital Transformation and Organizational Operational Decision Making: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 Ala’a Ahmed, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum The Impact of Innovation Management in SMEs Performance: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 Fatema Al Suwaidi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum The Effect of Digital Transformation on Product Innovation: A Critical Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Jasim Almaazmi, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Women Empowerment in UAE: A Systematic Review . . . . . . . . . . . . . . 742 Asma Omran Al Khayyal, Muhammad Alshurideh, Barween Al Kurdi, and Said A. Salloum Robotic, Control Design and Smart Systems Lyapunov-Based Control of a Teleoperation System in Presence of Time Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Mohamed Sallam, Ihab Saif, Zakaria Saeed, and Mohamed Fanni Development and Control of a Micro-robotic System for Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 Fady Magdy, Ahmed Waheed, Ahmed Moustafa, Ramy Farag, Ibrahim M. Badawy, and Mohamed Sallem Wake-up Receiver for LoRa-Based Wireless Sensor Networks . . . . . . . 779 Amal M. Abdel-Aal, Ahmad A. Aziz El-Banna, and Hala M. Abdel-Kader
xviii
Contents
Smart Approach for Discovering Gateways in Mobile Ad Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 Kassem M. Mostafa and Saad M. Darwish Computational Intelligence Techniques in Vehicle to Everything Networks: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803 Hamdy A. M. Sayedahmed, Emadeldin Mohamed, and Hesham A. Hefny Simultaneous Sound Source Localization by Proposed Cuboids Nested Microphone Array Based on Subband Generalized Eigenvalue Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 Ali Dehghan Firoozabadi, Pablo Irarrazaval, Pablo Adasme, Hugo Durney, Miguel Sanhueza Olave, David Zabala-Blanco, and Cesar Azurdia-Meza A Framework for Analyzing 4G/LTE-A Real Data Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826 Nihal H. Mohammed, Heba Nashaat, Salah M. Abdel-Mageid, and Rawia Y. Rizk Robust Kinematic Control of Unmanned Aerial Vehicles with Non-holonomic Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 Ahmad Taher Azar, Fernando E. Serrano, Nashwa Ahmad Kamal, and Anis Koubaa Nonlinear Fractional Order System Synchronization via Combination-Combination Multi-switching . . . . . . . . . . . . . . . . . . . . . . 851 Shikha Mittal, Ahmad Taher Azar, and Nashwa Ahmad Kamal Leader-Follower Control of Unmanned Aerial Vehicles with State Dependent Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862 Ahmad Taher Azar, Fernando E. Serrano, Nashwa Ahmad Kamal, and Anis Koubaa Maximum Power Extraction from a Photovoltaic Panel Connected to a Multi-cell Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Arezki Fekik, Ahmad Taher Azar, Nashwa Ahmad Kamal, Fernando E. Serrano, Mohamed Lamine Hamida, Hakim Denoun, and Nacira Yassa Hidden and Coexisting Attractors in a New Two-Dimensional Fractional Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883 Amina-Aicha Khennaoui, Adel Ouannas, and Giuseppe Grassi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
Intelligence and Decision Making System
A Context-Based Video Compression: A Quantum-Inspired Vector Quantization Approach Osama F. Hassan1, Saad M. Darwish2, and Hassan A. Khalil3(&) 1
Department of Mathematics, Faculty of Science, Damanhour University, Damanhour, Egypt [email protected] 2 Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt [email protected] 3 Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt [email protected]
Abstract. This paper proposes a modified video compression model that optimizes vector quantization codebook by using the adapted Quantum Genetic Algorithm (QGA) that uses the quantum features, superposition, and entanglement to build optimal codebook for vector quantization. A context-based initial codebook is created by using a background subtraction algorithm; then, the QGA is adapted to get the optimal codebook. This optimal feature vector is then utilized as an activation function inside the neural network’s hidden layer to remove redundancy. Furthermore, approximation wavelet coefficients were lossless compressed with Differential Pulse Code Modulation (DPCM); whereas details coefficients are lossy compressed using Learning Vector Quantization (LVQ) neural networks. Finally, Run Length Encoding is engaged to encode the quantized coefficients to achieve a high compression ratio. As individuals in the QGA are actually the superposition of multiple individuals, it is less likely that good individuals will be lost. Experiments have proven the system’s ability to achieve a higher compression ratio with acceptable efficiency measured by PSNR. Keywords: Video compression Neural Network Algorithm Context-based compression
Quantum Genetic
1 Introduction The immense use of multimedia technology during the past decades has increased the demand for digital information. This enormous demand with a massive amount of data made the current technology unable to efficiently deal with it. However, removing redundancies in video compression solved this problem [1]. Reducing the bandwidth and storage capacity while preserving the quality of a video is the main goal for video
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 3–15, 2021. https://doi.org/10.1007/978-3-030-58669-0_1
4
O. F. Hassan et al.
compression. Compression techniques are divided into two types, lossless and lossy compression. Nevertheless, there are still many problems or challenges that hinder video compression from being popular. The main issue is how to make a tradeoff between the video quality in terms of Peak Signal to Noise Ratio (PSNR) and the compression ratio. Moreover, researchers sometimes are not able to reach applicable perceptual compression techniques because of application- and context-based quality expectations of users. Nevertheless, perceptual video compression has great potential as a solution to facilitate multimedia content management due to its efficiency for data rate reduction [2]. Recently, several approaches have been presented that attempt to tackle the above problems. The taxonomy of these approaches can be categorized as spatial, temporal, statistical, and psycho-visual redundancies. Readers looking for more information regarding these types can refer to [3]. In general, spatial redundancies can be exploited to remove or reduce higher frequencies in an effective way without affecting the perceived quality. Vector Quantization (VQ) is an efficient and easy technique for video compression. VQ includes three main steps, encoding, codebook generation, and decoding; see [3] for more information. Neural Network (NN) is commonly used in video coding algorithms [4] that is made of two main components, spatial component, and reconstruction component, the spatial encodes intra-frame visual patterns. The reconstruction aggregates information to predict details. Some of these algorithms can better exploit spatial redundancies for rebuilding high-frequency structures by making the spatial component deep. The neural video compression method based on the predictive VQ algorithm requires the correct detection of key frames in order to improve its performance. Recently, evolutionary optimization techniques (e.g., genetic algorithm, and swarm intelligence) are exploited to enhance the NN learning process and build an intelligent vector quantization [5]. Quantum computation is an interdisciplinary science that emerged from information science and quantum science. Quantum Genetic Algorithm (QGA) is an optimization technique adapted to the Genetic Algorithm (GA) to quantum computing. They are mainly based on qubits and state superposition of quantum mechanics. Unlike the classical representation of chromosomes (binary string, for instance), here they are represented by vectors of qubits (quantum register). Thus, a chromosome can represent the superposition of all possible states [6]. Some efforts were spent to use QGA for exploring search spaces for finding an optimal solution [7, 8]. The codebook design is a crucial process in video compression and can be regarded as a searching problem that seeks to find the most representative codebook which could correctly be applied in the video compression. 1.1
Novelty and Contribution
The novelty of the proposed video compression model is that it is done based on removing different types of redundancies in one package. The model handles the frame’s spatial redundancy by dropping the duplicate in the high-frequency coefficients of the Discrete Wavelet Transform (DWT) through adapting vector quantization based NN, whereas the redundancy inside the low-frequency (high energy) coefficients will
A Context-Based Video Compression
5
be eliminated by using DPCM. The model controls the enter-frames temporal redundancy by utilizing a background subtraction algorithm to extract motion objects within frames to generate the condensed initial codebook. Regarding statistical redundancy, the model employs run-length encoding to increase the compression ratio. Overall, the model performance depends mainly on the construction of the optimal codebook for vector quantization. It exploits QGA with a fitness function based on the Euclidean distance between the initial codebook and each frame in the video. Utilizing QGA helps in that the effective statistical size of the population appears to be increased. This means that the advantage of good building blocks has been magnified with the aim of enhancing the optimal features selection process [8].
2 Literature Survey Research in the video compression domain has attracted tremendous interest in recent years. This is mainly due to its challenging nature in effectively satisfying high compression ratio and quality after decoding without degradation of the reconstructed video. An insight into the penitential of using vector quantization for real-time neural video codec is provided in [4]. This technique utilizes Predictive Vector Quantization (PVQ) that combines vector quantization and differential pulse code modulation. Another work involving hybrid transformation-based video compression may be seen in [1]. The hybrid compressed frame is quantized and entropy coded with Huffman coding. This method utilized the motion vectors, found from estimation using adaptive rood pattern search, and is compensated globally. Their system was more complex because the hybrid transforms with quantization needs a lot of time to compress the video. With this same objective, in 2015, Elmolla et al. [1] introduced run-length and Huffman coding as a means of packaging hybrid coding. This type of compression has the ability to overcome the drawbacks of wavelet analysis, but there are some of the limitations, they are not optimal for the sparse approximation of curve features beyondsingularities. More formal description, as well as a review of video compression based on Huffman coding, can be found in [9]. Yet, Huffman coding requires two passes. The first pass is used to build a statistical model of the data, whereas the second pass is used to encode it, so it is a relatively slow process. Due to that, some other techniques are faster than Huffman coding when reading or writing files. A lot of research interest is being shown in optimization techniques that can obtain the temporal redundancy that deals with motion estimation and compensation based on edge matching, which can alleviate the problem of local minima and, at the same time, reduce computational complexity [10]. The ant colony edge detector is used to create edges for motion compensation. The main disadvantages of block matching are the heavy computation involved and the motion averaging effect of the blocks. Another approach was proposed by Rubina in 2015 [11], defining a technique to provide temporal–based video compression based on fast three-dimensional cosine transform. To minimize the influence caused by the hybrid transformation in terms of compression quality and increase the compression ratio; Esakkirajan et al. [12] incorporated the advantages of multiwavelet coefficients, possesses more than one scaling function,
6
O. F. Hassan et al.
and adaptive vector quantization scheme, the design of the codebook is based on the dynamic range of the input data. Another approach was suggested by Nithin et al. in 2016 [13]. It defined a technique to provide component-level information to support spatial redundancy reduction based on properties of fast curvelet transform, BurrowsWheeler Transform, and Huffman coding from the assembly. Although video compression has been studied for nearly many decades, there is still room to make it more efficient and practical in the real application. According to the aforementioned review, it can be found that past studies were primarily not addressing the issues associated with the building of codebook for vector quantization compression algorithms (most often built randomly). However, to the best of our knowledge, little attention has been paid to devising new optimal codebooks and improving its efficiency for vector quantization.
3 Methodology This paper proposes a new model that combines the two types of video coding: intraframe and inter-frame coding in a unified framework to remove different types of redundancies (spatial, temporal, and statistical). The intra-frame coding is achieved by fusing the information come from both of wavelet transform and quantization information, the wavelet transform decorrelates the pixels of the input frame, converting them into a set of coefficients that can be coded more efficiently than the original pixel values themselves. In contrast, the quantization information originates from DPCM that forms the core of essentially all lossless compression algorithms. For inter-frame coding, the vector quantization technique is adapted based on the background subtraction algorithm to condense the codebook length. Finally, Run Length Encoding (RLE) algorithm is used to merge information for the two coding techniques to achieve high compression by removing the statistical redundancy. Figure 1 shows the main model components for both compression and decompression phase, respectively, and how they are linked to each other.
Fig. 1. Flow diagram of the proposed system: (Left) compression phase. (Right) Decompression phase
A Context-Based Video Compression
7
Step 1: Generate Initial Codebook: In this step, a codebook for each video is built offline that relies on extracting the moving parts of the frames (foreground) beside the background; each of them is represented as a codeword. The separating of moving objects is performed based on the background subtraction technique. Background subtraction is a widely used approach for detecting moving objects in videos from static cameras [14]. The accuracy of this approach is dependent on the speed of movement in the scene. Faster movements may require a higher threshold [3]. Step 2: Codebook Optimization: Given the initial codebook, the next step is to tune the codewords inside the codebook is given a specific objective function. The quantum genetic algorithm is adopted here to realize this step; the domain of QGA is optimization problems where the set of feasible solutions is discrete or can be reduced to a discrete one, and the goal is to find the best possible solution [6–8, 15]. The structure of a QGA is illustrated in Fig. 2. The suggested model utilizes the quantum parallelism that refers to the process of evaluating a function once on a “superposition” of all possible inputs to produce a superposition of all possible outputs. It means that the time required to calculate all possible outputs is the same time required to calculate only one output with a classical computer. Quantum register with superposition can store exponentially more data than a classical register of the same size. In the quantum algorithm, superimposed states are connected by a quantum connection called Entanglement. In general, quantum superposition gives quantum algorithms the advantage of has less complexity than its classic equivalent algorithm.
Fig. 2. QGA structure (left) flowchart, (right) pseudocode
8
O. F. Hassan et al.
A chromosome is simply a vector of m qubits that forms a quantum register. Herein, the easiest way to create the initial population (a combination of different codewords) is to initialize all the amplitudes of qubits by the value p1ffiffi . All quantum superposition 2
states will be expressed by a chromosome with equal probability. In order to make a reproduction, the evaluation phase quantifies the quality of each quantum chromosome in the population. The evaluation is based on an objective function (Euclidean distance in our case) that corresponds to each individual, after measuring an adaptation value. It permits to mark individuals in the population. In order to exploit effectively superposed states of qubits, each qubit must be observed, known as measuring chromosomes, which leads us to extract a classic chromosome. In order to intensify the search and improve performance, the interference operation allows modifying the amplitudes of individuals by moving the state of each qubit in the sense of the value of the best solution. This can be made by using a unit transformation that allows a rotation whose angle is a function of the amplitudes and value of the corresponding bit in the reference solution. The value of the rotation angle must be chosen so that to avoid premature convergence. It is often empirically determined, and its direction is determined as a function of the values of probabilities where a qubit is in state 0 and state 1. Quantum genetic uses quantum gates to perform the rotation of an individual’s amplitudes. Quantum gates can also be designed according to practical problems. The qubits constituting individuals are rotated by quantum gates to update the population Q (t). The quantum rotating gates are given by the following equation [8]:
ait þ 1 bti þ 1
¼
cosðDhi Þ sinðDhi Þ sinðDhi Þ cosðDhi Þ
ati bti
ð1Þ
where ati and bti are the probability amplitudes associated with the 0 state and the 1 state of the ith qubit at time t. Therefore, the values a2 and b2 represent the probability of seeing a qubit in states 0 and 1 respectively, when the value of the qubit is measured. As such, the equation a2 þ b2 ¼ 1 is a physical requirement. Where Dhi is the rotation angle of qubit quantum gate i of each quantum chromosome. It is often obtained from a lookup table to ensure convergence, as illustrated in Table 1. Table 1. Rotation angle selection strategy xi bi f ðxi Þ [ f ðbi Þ Dhi Sðam ; bm Þ Dhi d cm bm [ 0 cm bm \0 0 0 False 0 0 0 0 0 True 0 0 0 0 1 False 0 0 0 0 1 True d 1 −1 1 0 False 0 0 0 1 0 True d −1 1 1 1 False 0 0 0 1 1 True 0 0 0
cm ¼ 0 0 0 0 0 0 1 0 0
bm ¼ 0 0 0 0 1 0 0 0 0
A Context-Based Video Compression
9
The increase in the production of a good building block seems to be the most significant advantage of QGA. The promotion of good building blocks in the classical GA is statistically due to their ability to produce fit offspring, which will survive and further deploy that building block. However, when a new building block appears in the population, it only has one chance to ‘prove itself’. By using superimposed individuals, the QGA removes much of the randomness of the GA. Thus, the statistical advantage of good building blocks should be much greater in the QGA. This, in turn, should cause the number of good building blocks to grow much more rapidly. This is clearly a very significant benefit [6–8].
3.1
Compression
Data compression schemes design involves trade-offs among the compression rate, the distortion rate (when using lossy data compression), and the computational resources required to compress and decompress the data [1, 2]. The compression phase consists of two main stages, lossless compression based on DPCM and lossy compression based on an enhanced Learning Vector Quantization (LVQ) neural network. Both stages operate on wavelet coefficients of each frame [16]. (a) Lossless Compression: for each frame, the low-frequency wavelet coefficients with a large amount of energy are losslessly compressed to preserve the most important features from loss. In this, Differential Pulse Code Modulation DPCM is employed as a signal encoder that uses the baseline of pulse-code modulation (PCM) but adds some functionality based on the prediction of the samples of the signal [16]. DPCM takes the values of two consecutive samples; if they are analog samples, quantize them; calculate the difference between successive values; then, get the entropy code for the difference. Applying this process, eliminated the short-term redundancy (positive correlation of nearby values) of the signal. (b) Lossy Compression: for each frame, the high-frequency wavelet coefficients with a small amount of energy (not salient features) are lossy compressed to achieve a high compression ratio. In this LVQ neural network are adapted to compress these coefficients; LVQ neural network utilizes an optimized codebook for each video as a dynamic vector quantization to be embedded into the hidden layer as an activation function. Unlike the current methods that employed the neural network as a black-box for lossy compression, the suggested model adapts optimized VQ derived from step 2 as an activation function embedded in each hidden layer’s neurons. (c) Run Length Coding: given both of the quantized coefficients vector obtained from the DPCM lossless compression stage and VQ index vector obtained from the LVQ neural network lossy compression stage, the two vectors are merged into a unified vector with specific delimitation between them for decoding. In this case, there exists one unified vector for each frame. To increase the compression ratio, RLE is utilized to handle statistical redundancy among unified vector elements [1].
10
3.2
O. F. Hassan et al.
Decompression
The decompression process is done in a reverse way to the compression process, as illustrated in Fig. 1 that includes the following steps. First, apply run-length decoding to each row of the matrix Vr that contains the compressed video to retrieve the merged coefficients vector fr. This vector comprises the quantized coefficients LLc and VQ index vector Idx for each frame. Then for the quantized coefficients LLc, apply inverse DPCM to obtain the uncompressed coefficients (low frequencies) LL. For the given VQ index vector Idx, by utilizing the stored codebook table, this index value is converted to the equivalent vector to retrieve the high-frequency coefficients (each frame has one vector that contains HL coefficients). Given LL and HL from the previous steps, these bands are combined with their other two unaltered bands (LH, HH) that given from the database and utilizing inverse DWT to get the decompressed frame. Repeat the previous steps for all rows in the compressed matrix Vr ; collect the frames for retrieving the original video.
4 Experimental Results Experiments were conducted on a benchmark video dataset (available at http://www. nada.kth.se/cvap/actions/ and https://media.xiph.org/video/derf/). The testbed is a set of videos with different resolutions, different numbers of frames, and various extensions like avi and mpeg. The testbed includes eight videos, as shown in Fig. 3. Herein, the background for all these videos is unmovable, while their foreground is varying from near stability like Miss America to movement like Aquarium. In this paper, the suggested intelligent vector quantization model that relies on quantum genetic algorithm has been tested with several benchmark videos. The parameter values were chosen according to the most values found in the literature [6, 8, 15]. The first set of experiments was performed to compare both of compression ratios and quality performance of the proposed model that utilizes the quantum genetic algorithm to build optimal codebook for vector quantization that is used as an activation function inside neural network’s hidden layer with LBG- based video compression technique (without QGA) that relies on the randomness to build the codebook. As shown in Table 2, using QGA achieves an improvement of about 6% in the compression ratio and 8% in PSNR compared with the LBG video coding technique. Furthermore, QGA achieves better results with about 0.2% compared to traditional GA. Figure 4 shows the visual difference between the original and the reconstructed video’s frame.
A Context-Based Video Compression
11
Fig. 3. Benchmark dataset Table 2. Performance evaluation with random, GA, and QGA-based cookbook generation. Video
Random codebook method (LBG) CR PSNR Man running 30.365 37.547 Traffic road 29.354 27.241 Aquarium 28.968 28.248 Akiyo 28.785 27.954 Miss America 28.417 40.696 Boxing 29.657 37.857
Codebook generation with GA CR PSNR 32.512 40.271 30.665 30.048 30.552 31.810 30.512 30.141 30.512 44.254 30.512 40.936
Codebook generation with QGA CR PSNR 33.312 41.234 32.335 31.324 31.310 32.239 32.234 31.325 31.865 45.087 31.469 45.632
12
O. F. Hassan et al.
Fig. 4. (a) Original frame. (b) Reconstructed frame (PSNR = 31.810)
An advantageous point of a QGA is its ability to find a globally optimal solution in multidimensional space. This ability is also useful for constructing an optimal codebook of VQ for video compression. This means that we can obtain a better quality of a representative codebook. The reason for the low compression ratio is that the proposed model utilizes the lossless compression to compress a large number of important coefficients. In general, in the case of a small number of elements within the quantization vector, both algorithms were equivalent for all problem instances. However, augmenting the number of items leads QGA to behave better than GA, and this in all problem solution variants. The next experiment shows the comparison of the proposed model with other related video compression systems. The first comparative algorithm [10] utilized a motion estimation technique based on that ant colony and modified fast Haar wavelet transform to remove the temporal redundancy. The second algorithm [1] employed fast curvelet transform with run-length encoding and Huffman coding to remove spatial redundancy. On the contrary, the proposed model removes both of temporal redundancy by utilizing optimal vector quantization, spatial redundancy by employing DPCM, and finally statistical redundancy by implementing run-length encoding. Both of Table 3 and Table 4 shows that the proposed model gives better results in terms of PSNR of the reconstructed video of about 23% improvement as compared to the first algorithm, and 3% improvement as compared to the second system. In addition, the proposed model improves the compression ratio by 22% as compared to the second system. The rationale of these results is that using QGA helps to build an accurate codebook with minimum distortion for the vector quantization technique. Furthermore, using RLE for statistical redundancy removing beside DPCM and vector quantization yields more CR as compared with the second algorithm. Table 3. Comparative result with optimized technique. Video A. Suri et al. method [10] Proposed model PSNR PSNR Tennis 30.438 38.347 Suzie 34.5746 42.908
A Context-Based Video Compression
13
Table 4. Comparative result with traditional technique for video coding. Video
A. Elmolla method [1] CR PSNR Traffic road 25.11 31.08 Aquarium 24.93 30.64
Proposed model CR PSNR 32.335 31.324 31.310 32.239
In video compression, designing a codebook can be regarded as an optimization problem; its goal is to find the optimal solution, which is the most representative codebook [5]. It is assumed that the QGA-based vectors are mapped to their nearest representative in the codebook with respect to a distortion function i.e., more PSNR. QGA is applied by different natural selection to find the most representative codebook that has better fitness value in video compression. To expedite evolution and prevent the solution from getting out of searching space, tuning crossover and mutation ratio are firstly explicitly determined. Moreover, quantum algorithms generally have the ability to minimize the complexity of equivalent algorithms that run on classic computers. Regarding the global complexity, the global complexity for QGA (Evaluation + Interference) is of the order of O(N), while for a standard GA (Evaluation + Selection + Crossover + Mutation) the global complexity is often of the order of O(N2), where N is the size of the population. Therefore, we assure that this result is very encouraging since the complexity here has been reduced to become linear. Indeed, one can imagine what happens if we consider a very large population of chromosomes; it will be very useful to use QGA instead of GA. There are some potential difficulties with the QGA presented here, even as a theoretical model that includes: (1) some fitness functions may require “observing” the superimposed individuals in a quantum mechanical sense. This would destroy the superposition of the individuals and ruin the quantum nature of the algorithm. (2) The difficulty of reproduction is more fundamental. However, while it is not possible to make an exact copy of a superposition, it is possible to make an inexact copy. If the copying errors are small enough, they can be considered as a “natural” form of mutation [6–8].
5 Conclusion Like most other problems, the design of suitable video compression strength involves multiple design criteria and specifications. Finding optimal codebook in vector quantization, not a simple task. Consequently, there is a need for optimization-based methods that can be used to obtain an optimal solution that would satisfy the requirements. Ideally, the optimization method should lead to the global optimum of the objective function. In the work presented in this paper, QGA has been used to achieve an optimal solution in the multidimensional nonlinear problem of conflicting nature (high compression ratio with an acceptable quality of the reconstructed video).
14
O. F. Hassan et al.
In general, the application of the proposed model faces some constraints that include (1) using static background videos (not a movable camera) to efficiently apply the background subtraction algorithm, (2) extra memory space is needed to store some information (wavelet HH and LH bands) that are used in the decoding process. Our experimental results have shown that QGA can be a very promising tool for exploring large search spaces while preserving the relation efficiency/performance. Our future work will focus on comparing different QGA strategies to study the effect of choosing rotation gate angles. Furthermore, the model can be upgraded to work with videos with a movable camera instead of the static camera.
References 1. Elmolla, A., Salama, G., Elbayoumy, D.: A novel video compression scheme based on fast curvelet transform. Int. J. Comput. Sci. Telecommun. 6(3), 7–10 (2015) 2. Shraddha, P., Piyush, S., Akhilesh, T., Prashant, K., Manish, M., Rachana, D.: Review of video compression techniques based on fractal transform function and swarm intelligence. Int. J. Mod. Phys. B 34(8), 1–10 (2020) 3. Ponlatha, S., Sabeenian, R.: Comparison of video compression standards. Int. J. Comput. Electr. Eng. 5(6), 549–554 (2013) 4. Knop, M., Cierniak, R., Shah, N.: Video compression algorithm based on neural network structures. In: Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Poland, pp. 715–724 (2014) 5. Haiju, F., Kanglei, Z., En, Z., Wenying, W., Ming, L.: Subdata image encryption scheme based on compressive sensing and vector quantization. Neural Comput. Appl. 33(8), 1–17 (2020) 6. Zhang, J., Li, H., Tang, Z., Liu, C.: Quantum genetic algorithm for adaptive image multithresholding segmentation. Int. J. Comput. Appl. Technol. 51(3), 203–211 (2015) 7. Mousavi, S., Afghah, F., Ashdown, J.D., Turck, K.: Use of a quantum genetic algorithm for coalition formation in large-scale UAV networks. Ad Hoc Netw. 87(1), 26–36 (2019) 8. Tian, Y., Hu, W., Du, B., Hu, S., Nie, C., Zhang, C.: IQGA: a route selection method based on quantum genetic algorithm-toward urban traffic management under big data environment. World Wide Web 22(5), 2129–2151 (2019) 9. Atheeshwar, M., Mahesh, K.: Efficient and robust video compression using Huffman coding. Int. J. Adv. Res. Eng. Technol. 2(8), 5–8 (2014) 10. Suri, A., Goraya, A.: Hybrid approach for video compression using ant colony optimization and modified fast Haar wavelet transform. Int. J. Comput. Appl. 97(17), 26–30 (2014) 11. Rubina, I.: Novel method for fast 3d DCT for video compression. In: International Conference on Creativity in Intelligent Technologies and Data Science, Russia, pp. 674–685 (2015) 12. Esakkirajan, S., Veerakumar, T., Navaneethan, P.: Adaptive vector quantization based video compression scheme. In: IEEE International Conference on Signal Processing and Communication Technologies, India, pp. 40–43 (2009) 13. Nithin, S., Suresh, L.P.: Video coding on fast curvelet transform and burrows wheeler transform. In: IEEE International Conference on Circuit, Power and Computing Technologies, India, pp. 1–5 (2016)
A Context-Based Video Compression
15
14. Boufares, O., Aloui, N., Cherif, A.: Adaptive threshold for background subtraction in moving object detection using stationary wavelet transforms 2D. Int. J. Adv. Comput. Sci. Appl. 7(8), 29–36 (2016) 15. Wang, W., Yang, S., Tung, C.: Codebook design for vector quantization using genetic algorithm. Int. J. Electron. Bus. 3(2), 83–89 (2005) 16. Singh, A.V., Murthy, K.S.: Neuro-curvelet model for efficient image compression using vector quantization. In: International Conference on VLSI Communication Advanced Devices Signals and Systems and Networking, India, pp. 179–185 (2013)
An Enhanced Database Recovery Model Based on Game Theory for Mobile Applications Yasser F. Mokhtar(&), Saad M. Darwish, and Magda M. Madbouly Institute of Graduate Studies and Research, Department of Information Technology, Alexandria University, Alexandria, Egypt {yasser_fakhry,Saad.darwish, magda_madbouly}@alexu.edu.eg
Abstract. In the mobile environment, the communication between Mobile Hosts (MHs) and the database servers have a lot of challenges in the Mobile Database System (MDS). It suffers from many factors such as handoff, insufficient bandwidth, many transactions update, and recurrent failure that consider significant problems in the information system constancy. However, fault tolerance technicality enables systems to perform the tasks in the incidence of faults. The aim of this paper is to catch the optimal recovery solution from the available state-of-the-art techniques in MDS using the game theory tool to catch the best one. Some of the presented recovery protocols are selected and analyzed to choose the most important variables, such as the number of processes, time for sending messages, and the number of messages logged-in time that affect the recovery process. Then, the game theory technique is adapted based on the payoff matrix to choose the optimal recovery technique according to the given environmental variables. Game Theory (GT) is distinguished from other evaluation methods in that it uses a utility function to reach the optimal solution, while others used particular parameters. The experiments were carried by the NS2 simulator to implement the selected recovery protocols. The experiments approve the effectiveness of the proposed model compared to other techniques. Keywords: Mobile database system Decision making
Game theory Recovery control
1 Introduction With improved networking capabilities, mobile communication is considered one of the most necessary parts of our life. Mobile Computing (MC) indicates various nodes or devices that allow people to access the information or data from wherever they are [1]. However, there are a lot of limitations in the mobile environment that introduce several challenges to manage mobile data transactions like frequent disconnections [2– 5]. The main challenges of transaction control come from the mobility of MH and the limitation in wireless bandwidth [6–8]. To recover from the failure, many recovery modules have been developed by researchers to ensure that sensitive information can be recovered. They take into consideration the obstacles of wireless communications when dealing with the mobile © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 16–25, 2021. https://doi.org/10.1007/978-3-030-58669-0_2
An Enhanced Database Recovery Model Based on Game Theory
17
database such as energy shortage, repeated disconnecting from the server, and handoff problems. In this regard, there are two types of recovery approaches in the literature: forward recovery and backward recovery; see [9, 10] for more details. Recovery is a time-consuming and resource-intensive operation, and these protocols require plenty of them. Managing the log is considered the most expensive operation in the recovery protocols, so for the MDS, an economical and efficient scheme of its management is necessary [11]. The MD recovery system becomes very hard due to that most of the nodes are mobile. Despite that, many parameter requirements for the recovery module are specified to get a dynamic recovery protocol per the environmental conditions [7–10]. The difficulty here, how to select a convenient method for the environmental conditions. Then, which of the given factors (parameters) are very influential in the recovery process. Thus, the better solution must require an optimal balance between the applicable algorithm and the performance at all levels of operations. So, the aim of this proposal is to introduce an optimal selective protocol (decision making) using GT tool to catch the best recovery algorithm from a pool of available algorithms by taking into consideration the circumstances in which the interruption occurred that can greatly improve the system performance and presents high availability data management for handoff and disconnected operations. The essential difference between the current evaluation-based recovery techniques and GT-based approaches is that the first group tries to find a suitable solution with particular parameters, whereas the second group is used to make a decision based on the utility function according to many conflicted parameters. Most of the previous recovery techniques did not take into consideration the circumstances nature in which the disconnection occurred in the mobile environment. Therefore, the goal is to develop a new intelligent approach for mobile database recovery by considering some of the environmental factors using GT. GT is applied here since there is no one optimal solution suitable for all circumstances for recovery operation from available recovery protocols. Thus, GT is a suitable tool for selecting the best strategy (best solution) according to the utility function of each player (alternative recover approaches are represented as players) [12–15]. 1.1
Research Motivation and Challenges
The mobile systems are often subject to different environmental circumstances, which may cause data loss or communication disconnection. Therefore, traditional schemes of recovery cannot be directly utilized in these systems. The most important challenges facing traditional mechanisms as follows: (1) Some recovery protocols depend on the MH storage which is not stable and very limited; (2) Some protocols affect the logging scheme which may lead to machine load overhead; (3) Some schemes require a great amount of data transfer; (4) Some protocols may be slow during the process of recovery according to the distance between the MH and the Base Station (BS); (5) several of the algorithms suffer from performance, the number of repeated processes and exchanged messages. In general, despite the efforts that have been introduced to address the mobile database recovery, there is still a possibility to improve it significantly.
18
1.2
Y. F. Mokhtar et al.
Novelty
The novelty of the suggested model is to provide high capabilities for a recovery treatment in MDS by applying a new smart technique based on two players’ GT mechanism as a decision-maker by picking out the suitable protocol to increase the recovery processing efficiency. Since the real problem is not choosing one of the known recovery techniques, but the problem is choosing the most appropriate method according to the changes imposed in the operating circumstance as it is often vague and variable. In this regard, the present work is going to ensure the selection of the best available recovery method through GT based on its important parameters. In the proposed work, a comparison between a set of different strategies for each recovery protocol is made. These strategies highlight the effect of certain parameters (features) on the dynamic performance of the chosen protocol. The work demonstrates that perfect parameters further improved the performance significantly, which showed the great potential of the proposed method. Furthermore, the proposed model submits a high degree of flexibility by adding newest recovery methods that may improve the results effectively. Besides this introduction, the next sections are planned as the following: Sect. 2 explains state-of-the-art MD recovery techniques, Sect. 3 introduces the proposed MD recovery model, Sect. 4 defines the criteria to evaluate the proposed GT technique in the MD recovery and displays the results, and as a final point, Sect. 5 draws conclusions and future work directions.
2 State-of-the-Art Mobile Database Recovery Algorithms Recovery in the MDS area of research is not a new one, but there are still many possibilities for the improvement of existing protocols and for creating new protocols. Authors in [16, 17] investigated research for recovery by combining a movement-based checkpointing with message logging. Their algorithms depend basically on a threshold of the mobility handoffs to takes checkpoints, and a specified number of host migrations across cells rather takes periodically as in [16] or checkpoint is taken once while it travels in the same region as in [17]. Their algorithms rely on factors such as failure rate, and the mobility rate, which decreases overhead in maintaining recovery data to reduce the recovery cost. However, the wrong choice of the threshold of mobility handoff may impact the performance adversely. In a different contribution, authors in [18] advised an algorithm for a rollback recovery based on message logging and independent checkpointing. This method is exploited mobile agents for managing the message logs and checkpoints with a certain threshold and latest checkpoint. Therefore, the recovery time of the MH not never overrun a particular threshold. The benefit of this framework is that nor the send or receive message log size could be large because only a few messages are exchanged in the networks. Maryam et al. [19] presented another method based on a log management technique combined with a mobile agent-based framework to decrease recovery time. The MH performed the checkpointing by using a model based on the frequency. This protocol
An Enhanced Database Recovery Model Based on Game Theory
19
reduced the time of recovery by reducing the exchanged messages. However, complexity is increased especially when many agents are required. Similarly, authors in [20] recommended another protocol using log management as a solution to support recovery in the MC environment. In this way, the information of the log is stored in the controller of the BS, which covered the geographical area as a cell. Their system based on a kind of tracking agent in the controller of the BS to get the MH location update when a transaction is launched. The most advantage of using the log management method is that it’s easy to apply, while the most disadvantage of this method is that the recovery is likely to be slow if the home agent is far from the mobile unit. According to the provided analysis, we find that most of the works presented were characterized as the following: (1) Most recovery studies were based on using different techniques in the recovery process such as log management, check pointing, movement-based check pointing, and agent-based logging scheme; (2) These methods are completely different; so one of these methods may not work as an alternative to another method; that means that each algorithm has a different parameter from the others and different assumptions to solve the recovery problem, as well as each method works in a different environment from the other. (3) Although some proposals tried to merge more than one technique into one contribution (hybrid method) but they are still suffered for selecting the best fusion from this pool of the methods. So, this may cause a high recovery cost, and the recovery may be too complex; (4) finally, most schemes did not take into consideration the environmental conditions as influential factors in the recovery process. Based on the above, the practical implementation of the recovery algorithms is bounded. However, to the best of our knowledge, it is possible to develop a scheme, which optimizes the performance by choosing the best available recovery methods according to the current situation (the variables of each case). Hence, GT has been employed because of its importance for decision making, which works through conflict analysis or interactive decision theory to choose the best solution through competition between the strategies provided for each method.
3 The Proposed Model A typical architecture for an MDS includes a small database fragment residing on the MH derived from the main database. This design is to manage the mobility constraints to be facilitated the MHs and MSS. If the MH is exists in the cell serviced by MSS, a local MH can communicate directly with this MSS. The MH can move fairly from any cell to another, and each cell has a BS and a number of MHs. BSs are configured station with a wireless interface that can communicate with the MHs to transfer the data over the wireless network. Each MH can connect to the BS during a wireless channel; the BS connects to the server of the database over wired links, and the MH cannot connect directly to the server of the database without the existence of the BS [10, 21]. The concentration of this research is on how to employ GT to find the optimal recovery protocol from alternative recovery algorithms in the MCS. Herein, two important selected algorithms of recovery work according to its algorithmic architecture, are implemented. The proposed model has been prototyped using a two-player GT
20
Y. F. Mokhtar et al.
tool to obtain an optimal decision for the best recovery mechanism. The proposed work differs from other recovery systems because it takes into account some important variables in the mobile environment during handoffs or service outages, which are different in situations. While conventional recovery algorithms depend on some assumptions in the environment and built their works on these assumptions. Figure 1 depicts the suggested model’s architecture.
Fig. 1. Overview of the proposed MDS recovery model.
To explain the technicality of the suggested recovery model in the MDS, we evaluated a portion of the most significant developed algorithms in the recovery of MD to specify any of these algorithms that are picked for investigation. In this case, we classified the available recovery algorithms into some categories depending on how they work or its characteristics. These categories differ in how the recovery methodology is applied, as discussed in [9–11]. In our proposal, we selected two recovery protocols that represented two different algorithms in terms of application. The selected algorithms are log management method [20], and the mobile agent method [19]. Herein, each player must have a group of strategies to enter the competition with other players. To get these strategies, feature analysis and extraction on each method are done to find out the most important influential aspects in each protocol. Thus for the GT, each selected method is represented as a player, and each player’s strategies come from the use of each effective variable. For instance, the first protocol (player 1) used some factors such as log arrival rate, handoff rate, average log size, and mobility rate. Likewise, the second protocol (player 2) used some factors e.g., a number of processes in the checkpoint, a threshold of handoff, and log size. To prepare the necessary recovery algorithms’ parameters for the GT as a decisionmaking technique, we first implement the selected protocols using the selected important factors that each protocol depends on. To obtain each player’s strategies,
An Enhanced Database Recovery Model Based on Game Theory
21
each algorithm is implemented based on real database transactions. A package of an objective function for total recovery cost is computed that varies in calculation from any algorithm to another. Based on the previous steps, we build the GT matrix in accordance with each method (player) output values that reflected the operating results with its factors so that the outputs of each work could be evaluated objectively. These results are considered as the gains of each player called utility or payoff in the game. GT is a mathematics branch in analyzing conflict-of-interest situations that investigates interactions involving strategic decision making. Any interaction between competitors is considered as a game. The players are assumed to act rationally. At the final of the game, the player acquires a value related to the actions of others called payoff or utility. These payoffs or benefit estimate the satisfaction degree of a player extracted from the conflicting situation. All player in this game has some available choices to do an action called strategies [14, 15]. The GT could be described as follows: (1) a set of players (the selected algorithms for negotiation); (2) a pool of strategies for each player (the strategies reflect the assumed values of significant coefficients in each protocol that also reflect the possible environmental changes); (3) the benefits or payoffs (utility) to any player for every possible list of strategy chosen by the players. Accordingly, the payoff matrix, created based on recovery algorithms’ implementation in accordance with different parameters, is produced. In this game, players pick out their action plans simultaneously. When all players in the game chosen their strategies independently to earn a maximum profit, the game is named a noncooperative game. On the contrary, if all players have formed a coalition, the game is known as a co-operative game [21, 22]. Herein, the non-cooperative GT is applied in the analysis of strategic choices [15]. Table 1 shows the bi-matrix for the two players with its payoffs. Herein, a11 is the benefit value for player 1’s strategy given payoff function u1 ðs1 ; t1 Þ if ðs1 ; t1 Þ is chosen, and b11 is the payoff value for player 2’s strategy given payoff function u2 ðs1 ; t1 Þ if ðs1 ; t1 Þ is chosen. Table 1. The bi-matrix for the two players Player 2: mobile agent method Player 1: log management method Strategy t1 t2 ... s1 ða11 ; b11 ) ða12 ; b12 ) .. . s2 ða21 ; b21 ) ða22 ; b22 ) .. . .. .. .. .. . . . . sm ðam1 ; bm1 ) ðam2 ; bm2 Þ . . .
th ða1h ; b1h ) ða2h ; b2h ) .. . ðamh ; bmh Þ
Herein, the payoff calculation for each player relies on the selection of the other player (i.e., choosing a policy for a player impacts the gain value of the other player). In our proposal, there are three utility functions, time-consuming, memory used as a performance cost, and the probability rate for complete the recovery (recovery done).
22
Y. F. Mokhtar et al.
The supposition is that the execution time for each algorithm can take values between 0 and 5 s; so payoff values for the TIME function will be distributed as follows: 8 C1;i 2 0:1; 0 ui ¼ 6 > > > > C 2 0:5; 0:1 ui ¼ 4 > 1;i > < C1;i 2 0:9; 0:5 ui ¼ 2 ð1Þ if C1;i 2 1; 0:9 ui ¼ 0 > > > > C 2 2; 1 ui ¼ 2 > > : 1;i C1;i 2 5; 2 ui ¼ 4 In the same idea, C2,i = MEMOi where MEMOi is the memory cost used for each algorithm. The assumption is that the memory cost for each algorithm can take values between 0 and 4000 Kbyte; so payoff values for the memory cost will: 8 C2;i 2 500; 0 ui ¼ 6 > > > C 2 1000; 500 > ui ¼ 4 > 2;i > < C2;i 2 1500; 1000 ui ¼ 2 ð2Þ if C 2 2000; 1500 ui ¼ 0 > > > 2;i > C 2 2500; 2000 ui ¼ 2 > > : 2;i C2;i 2 4000; 2500 ui ¼ 4 For the recovery probability rate, C3,i = DONE_PROBi where DONE_PROBi used to check if recovery has done according to the handoff rates threshold. Thus, payoff values for recovery probability rates are between %0 and %100 that are distributed as follows: 8 C3;i > > > > < C3;i if C3;i > > > C3;i > : C3;i
2 20%; 0 2 40%; 20% 2 60%; 40% 2 80%; 60% 2 100%; 80%
ui ui ui ui ui
¼1 ¼2 ¼3 ¼4 ¼5
ð3Þ
The player’s total winnings are the sum of the return values for each variable described above (C1 ; C2 ; C3 Þ. To get the final solution, there are two ways to figure out the best method (1) get a single and only one dominant equilibrium strategy in the game and (2) using Nash Equilibrium (NE). See [23] for more details. Unfortunately, most of the games do not contain dominated strategies. Thus, the other way to find the solution if there is more than one solution (more than one Nash Equilibrium) to the given problem is to find any other handling mechanism. To overcome this problem, all the values in the payoff’s matrix are subject to the system of addition or deduction of some points according to one of the important variables “say the execution time”, so that additional points can be given as a bonus to the fastest element and vice versa (normalization and reduction phase). Finally, the revisions payoff’s matrix sends to the GT model again to find a better solution (Pure Nash) that fits the different environmental variables; the optimum solution for this problem.
An Enhanced Database Recovery Model Based on Game Theory
23
4 Experimental Results
Time with seconds
The NS2 simulation is used to evaluate the proposed recovery-based GT model in mobile database applications. For our implementation, we employed some mobile log files with a different size that contains a lot of processes data items for recovering according to each algorithm’s mechanism. The NS2 simulation software is a simulator for a discrete event intent at networking research; it gives substantial support for routing, simulation of TCP, and multicast protocols over wired and wireless networks (local and satellite) [24]. Furthermore, Matlab software is utilized for GT building. Herein, both of mobile agent-based recovery and log management-based recovery algorithms [19, 20] are utilized for GT as a decision-maker that selects the best of them in relation to the available parameters. The parameter values for each recovery protocol were chosen according to the most values found in the literature that are used as input in the suggested model to examine the performance of each of them.
Recovery Execution Time
2 1.5 1 0.5 0 First Sterategy
Log Management
Second Sterategy Mobile Agent
Third Sterategy Proposed Model
Fig. 2. Comparison results in terms of total execution time
Total Recovery Performance Payoffs Payoff Values
20 15 10 5 0 First Sterategy Log Management
Second Sterategy Mobile Agent
Third Sterategy Proposed Model
Fig. 3. Comparison results in terms of total performance.
24
Y. F. Mokhtar et al.
Figure 2 and Fig. 3 reveal that the suggested recovery model yields the best results in terms of required execution time and the total payoff function in comparison with each recovery algorithm individually using their default factors’ values as stated in [19, 20]. Game theory is used to anticipate and explain the actions of all agents (recovery protocols) involved in competitive situations and to test and determine the relative optimality of different strategies. In this case, the scenarios have a well-defined outcome, and decision-makers receive a “payoff (the value of the outcome to the participants). That is, participants will gain or lose something, depending on the outcome. Furthermore, the decision-makers are rational: when faced with two alternatives, players will choose the option that provides the greatest benefits. In general, log management and agent-based protocols suffer from execution time, especially when MH goes far away from the first base station, so this choice becomes more difficult with a large log size. The results depict that log management and agent-based protocols suffer from execution time especially when MH goes far away from the first BS, so this choice becomes more difficult with a large log size.
5 Conclusion and Future Work In this paper, a novel game theory model is proposed to find the optimal recovery solution in MDS. The new algorithm has been demonstrated as a competitive manner between two of the most important recovery protocols in MDS. The idea of the game theory is that each algorithm chooses the most appropriate strategy in terms of time for sending messages, and the number of messages logged in time to reach the detection of the proper recovery solution according to the environmental variables. A key step in a game-theoretic analysis is to discover which strategy is a recovery protocol’s best response to the strategies chosen by the others. The experimental results show the superiority of the suggested recovery model. It is possible in the future to give an opportunity to new works (three players, for example) in order to achieve the best efficiency of the proposed model. In addition, applying the game theory-based mixed strategy as well as applying the proposed model in the cloud algorithms to reach a better result.
References 1. Bhagat, B.V.: Mobile database review and security aspects. Int. J. Comput. Sci. Mob. Comput. 3(3), 1174–1182 (2014) 2. Vishnu, U.S.: Concept and management issues in mobile distributed real time database. Int. J. Recent Trends Electr. Electron. Eng. 1(2), 31–42 (2011) 3. Ibikunle, A.: Management issues and challenges in mobile database system. Int. J. Eng. Sci. Emerg. Technol. 5(1), 1–6 (2013) 4. Anandhakumar, M., Ramaraj, E., Venkatesan, N.: Query issues in mobile database systems. Asia Pac. J. Res. 1(XX), 24–36 (2014) 5. Veenu, S.: A comparative study of transaction models in mobile computing environment. Int. J. Adv. Res. Comput. Sci. 3(2), 111–117 (2012)
An Enhanced Database Recovery Model Based on Game Theory
25
6. Adil, S., Areege, S., Nawal, E., Tarig, E.: Transaction processing, techniques in mobile database: an overview. Int. J. Comput. Sci. Appl. 5(1), 1–10 (2015) 7. Roselin, T.: A survey on data and transaction management in mobile databases. Int. J. Database Manag. Syst. 4(5), 1–20 (2012) 8. Vehbi, Y., Ymer, T.: Transactions management in mobile database. Int. J. Comput. Sci. Issues 13(3), 39–44 (2016) 9. Geetika, A.: Mobile database design: a key factor in mobile computing. In: Proceedings of the 5th National Conference on Computing for Nation Development, India, pp. 1–4 (2011) 10. Renu, P.: Fault Tolerance approach in mobile distributed systems. Int. J. Comput. Appl. 2015(2), 15–19 (2015) 11. Vijay, R.: Fundamentals of Pervasive Information Management Systems, 2nd edn. Wiley, Hoboken (2013) 12. John, J.: Decisions in disaster recovery operations: a game theoretic perspective on organization cooperation. J. Homel. Secur. Emerg. Manage. 8(1), 1–14 (2011) 13. Murthy, N.: Game theoretic modelling of service agent warranty fraud. J. Oper. Res. Soc. 68 (11), 1399–1408 (2017) 14. Sarah, A.: A case for behavioral game theory. J. Game Theory 6(1), 7–14 (2017) 15. Benan, L.: Game Theory and engineering applications. IEEE Antennas Propag. Mag. 56(3), 256–267 (2014) 16. Sapna, I.: Movement-based check pointing and logging for failure recovery of database applications in mobile environments. Distrib. Parallel Database. 23(3), 189–205 (2008) 17. Parmeet, A.: Log based recovery with low overhead for large mobile computing systems. J. Inf. Sci. Eng. 29(5), 969–984 (2013) 18. Chandreyee, S.: Check pointing using mobile agents for mobile computing system. Int. J. Recent Trends Eng. 1(2), 26–29 (2009) 19. Maryam, A., Mohammad, A.: Recovery time improvement in the mobile database systems. In: International Conference on Signal Processing Systems, Singapore, pp. 688–692 (2009) 20. Miraclin, K.: Log management support for recovery in mobile computing environment. Int. J. Comput. Sci. Inf. Secur. 3(1), 1–6 (2009) 21. Xueqin, Z.: A survey on game theoretical methods in human–machine networks. Future Gener. Comput. Syst. 92(1), 674–693 (2019) 22. Hitarth, V., Reema, N.: A survey on game theoretic approaches for privacy preservation in data mining and network security. In: 2nd International Workshop on Recent Advances on Internet of Things: Technology and Application Approaches, vol. 155, pp. 686–691 (2019) 23. Georgios, G.: Dominance-solvable multi criteria games with incomplete preferences. Econ. Theory Bull. 7(2), 165–171 (2019) 24. Himanshu, M.: A review on network simulator and its installation. Int. J. Res. Sci. Innov. 1 (IX), 115–116 (2014)
Location Estimation of RF Emitting Source Using Supervised Machine Learning Technique Kamel H. Rahouma(&) and Aya S. A. Mostafa(&) Department of Electrical Engineering, Faculty of Engineering, Minia University, Minia, Egypt [email protected], [email protected]
Abstract. Supervised machine learning algorithms are dealing with a known set of input data and a pre-calculated response of that set (output or target). In the present work, supervised machine learning is applied to estimate the x-y location of an RF emitter. Matlab Statistical and Machine Learning Tool Box 2019b is used to build the training algorithms and create the predictive models. The true emitter position is calculated according to the data gathered by two sensing receivers. Those data are the training data fed to the learner to generate the predictive model. A linearly x-y moving emitter-sensors platform is considered for generality and simplicity. Regression algorithms in the toolbox regression learner are tried for the nearest prediction and better accuracy. It is found that the three regression algorithms, Fine tree regression, Linear SVM regression, and Gaussian process regression (Matern 5/2) achieve better results than other algorithms in the learner library. The resulted location error of the three algorithms in training phase are about 1%, 2.5%, and 0.07% respectively, and the coefficient of determination is about 1.0 for the three algorithms. Testing new data, errors reach about, 4%, 5.5%, and 1%, and the coefficient of determination is about 0.9. The technique is tested for near and far platforms. It is proved that emitter location problem is solved with good accuracy using supervised machine learning technique. Keywords: Supervised machine learning Machine learning applications Geolocation with machine learning Emitter-sensor data collection Regression with Matlab toolbox
1 Introduction Machine learning is considered an automated analytical process. Two related procedures which are train and predict carrying out all the essential mathematics to build a predictive model, apply the predictive model to test an unlabeled data, and then check the model accuracy. The training process of the learner uses a data set of a known response. A new set of data is fed to the predictive model to predict the data output response. Machine learning utilizes computational algorithms to learn information directly from data without depending on a predetermined equation as a model [1]. The algorithms enhance their performance as the available number of data samples © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 26–37, 2021. https://doi.org/10.1007/978-3-030-58669-0_3
Location Estimation of RF Emitting Source
27
increases. Machine learning uses two types of techniques: unsupervised machine learning which detects concealed patterns or natural structures in data [2]. It is used to get inferences from input datasets with unlabeled responses. Applications for unsupervised machine learning include gene sequence analysis, market research, and object recognition. The other type is supervised learning, which trains a model on known input data and known output response so that it can predict upcoming future outputs [3]. A supervised learning algorithm takes a known set of input data and known responses to the data (output, target) and trains a model to generate reasonable predictions for the response to new data. Supervised learning uses classification and regression techniques to develop predictive models. In the present work the prediction of continuous numeric responses is carried out for continuous numeric input data. Regression prediction techniques can achieve this demand [4]. Figure 1 illustrates the steps of supervised machine learning using regression algorithms.
Fig. 1. Supervised machine learning technique, (a) Training process, (b) Learning process.
As shown in Fig. 1 the learning procedure is composed of two processes, and two related data sets: The training process is performed first, using a training data set that includes a known target field or response. The response field contains the known output numeric value associated with each record or observation in the training data set. Using different regression algorithms, the training process generates a predictive model for every algorithm to select the model that best predicts the output response. The prediction process is performed following the training process. It applies the predictive model selected by the training process to a new data of the same type of the trained data set. In the present application, the dataset is collected from emitter-sensor platform. The dataset is [location of sensors (x-y), angles of arrival (/1, and /2) w.r.t.
28
K. H. Rahouma and A. S. A. Mostafa
the two sensors, and a true calculated emitter location (x-y)]. This dataset is used to train the selected model. A new data is then collected but the emitter location will be the output response of the predicting model.
2 Literature Preview There are many machine learning packages available now to be used in different applications. The list of machine learning packages is vast. Caret (R), Python, Matlab, and more are now ready to use [5]. In the present work we choose Matlab ML toolbox 2019b to be applied to our study. Matlab ML toolbox has many advantages: easy to learn, has an attractive GUI interface, there is no need to learn commands and working in the command line, and supports the data type used in the application. Jain et al. [6] describe a semi-supervised locally linear embedding algorithm to estimate the location of mobile device in indoor wireless networks. The resulted mean error distance of 3.7 m is obtained with standard deviation error distance of 3.5 m. Freng et al. [7]: used machine learning to identify unknown information of radar emitter. All radars used in training and testing are stationary. Matlab2017b is used to implement the model. The resulted identification accuracy of the unknown radar parameters can reach 90% in the measurement error range of 15%. Canadell Solana [8] predicted the user equipment UE geolocation using Matlab to prepare the data and create the models. Multiple supervised regression algorithms were tested and evaluated. The resulted median accuracy is of 5.42 m, and the mean error of 61.62 m. We can say that machine learning algorithms is a promising solution for geolocation problem.
3 Supervised Machine Learning Procedure Supervised machine learning is applied in the present work using Matlab 2019b Statistics and Machine Learning Toolbox algorithms [9]. The procedure steps are: 3.1
Data Collection and Preparation
The proposed geolocation system assumes an emitting source tracked by two sensors, receivers [10]. Sensors are continuously collecting the emitter x-y location which are the angle of arrival aoa1 w.r.t. sensor1, and aoa2 w.r.t. sensor2 assuming known sensors locations xs1,2, ys1,2, and zs1,2, s1, or s2 stands for sensor1 or sensor2 [11]. Thus, the collected data are used mathematically for calculating the x and y emitter location, xe, and ye. A set of training data with a known output response is arranged and fed to the data import tool. Figure 2 shows the data table received by the regression learner importing data tool. The input training data variables and the calculated output response are sensors xs1, ys1, xs2, and ys2 coordinates and the resulted emitter xe, and ye coordinated. The data length is selected to be about 300 observations as a medium data length. The input data response (xe) is varying from 2000 up to 5000 m according to the locations and angles of arrivals of the sensors [12].
Location Estimation of RF Emitting Source
29
Fig. 2. Importing data to the regression training learner.
The training learner can deal only with one response, so the training process will start predicting xe at first then the same procedure will be repeated for ye. Figure 3 presents the classification of data as predictors and data response before starting the training session.
Fig. 3. Predictors and response for a training session.
3.2
Cross-Validated Regression Model
Regression model trained on cross-validated folds [13]. Estimate the quality of regression by cross-validation using data k-folds or disjoint sets where k = 1 or more. Every fold trains the model using out of fold observations. Testing the model performance using in-fold observations. For example, suppose the number of k-folds for cross-validation is five-folds as shown in Fig. 3. In this case, every training fold contains about 4/5 of the data set and every test fold contains about 1/5 of the data. This procedure means that the response for every observation is computed by using the
30
K. H. Rahouma and A. S. A. Mostafa
model trained without this observation. Then the average test error is calculated over all folds. For big data sets, cross-validation may take long accessing time, so no validation could be chosen. 3.3
Choose a Regression Training Algorithm
Now the regression trainer is ready to start a training session. Pushing the start session button as shown in Fig. 3 will open a window to choose a regression algorithm from the learner library to carry out the training process according to the selected algorithm, building the training model, and giving statistical values helping in evaluating the created training model. All data handling and analytical process are carried out automatically by the learner. Figure 4 shows the toolbox regression algorithms contained in the learner library. 3.4
Creating Predictive Model
The regression training algorithm is selected, then pressing the start button initiates the training process. Figure 5 illustrates the result of applying the Fine Tree algorithm as an example and exporting the predictive model that will be used for predicting responses of new data sets. As shown in the figure, the RMSE is about 34 m and the mean average error MAE is about 29 m which indicates an accepted result looking for coordinates long distances. The model name is written in the export model text box. The shown name, trained Model, is the default name. The model structure is saved to the workspace to be used in testing and predicting new data and responses.
Fig. 4. Selecting regression algorithm.
Location Estimation of RF Emitting Source
31
Fig. 5. Exporting the predictive model to the workspace.
As shown in Figs. 4 and 5, the History box shows the algorithms used in training the input data set and the resulted model root mean square error RMSE. A frame is drawn around the smallest RMSE value. The current model box specifies the selected training model output evaluation statistics. The current Model box specifies the quantitative estimates that evaluate the output of the current model training process. 3.5
Evaluating the Predictive Model Accuracy
The accuracy of the regression model describes the performance of the model. The regression learner uses the following metrics as measures of accuracy [14, 15]: – Mean Squared Error (MSE) represents the difference between the true and predicted response values and given by: MSE ¼
k 1X ðxi ex Þ2 k i¼1
ð1Þ
Where: k… the total number of observations or fields, i … the observation or field number such that i = 1: k, x is the true response value, and ex is the predicted value of x. – Root Mean Squared Error (RMSE) is the error rate defined by the square root of MSE as: RMSE ¼
pffiffiffiffiffiffiffiffiffiffi MSE
ð2Þ
– Mean absolute error (MAE) represents the difference between the true and predicted values and given by: MAE ¼
k 1X jxi ex j k i¼1
ð3Þ
32
K. H. Rahouma and A. S. A. Mostafa
– The coefficient of determination (R-squared) represents the coefficient of how well the predicted values fit compared to the original values. Pk
R ¼ 1 Pi¼1 k 2
ðxi ex Þ2
i¼1 ðxi
xÞ2
ð4Þ
where: x … the mean value of x. The value of R is between 0 and 1. The model of the higher value of R means that the model is better.
4 Geolocation of Emitter Source in Moving Platform The geolocation of an emitter source could be estimated using two receivers (sensors). The data collected by sensors in x-y plane are azimuth angles of arrivals hs1, and hs2 for sensors 1 and 2 respectively. The regression learner will now be used to predict the emitter location by observing the available data. The data set fed to the learner will be aoa1(hs1), aoa2(h2), and sensors x-y coordinates (xs1, ys1, xs2, and ys2) as shown in Fig. 2. The better performances found when training and testing different regression algorithms in the learner library are the Fine Tree, the linear SVM, and the Gaussian prediction Regression algorithms. 4.1
Output Results of Training Algorithms
The three mentioned regression algorithms are trained using the observations of sensors data set of about 200 observations (records). The corresponding x-coordinate of the emitter (response) is mathematically calculated using equations in [12]. The learner output results are shown in Fig. 6 as follows: a, and b for the prediction model Fine Tree, c, and d for Linear SVM, e, and f for Gaussian Process Regression (Matern 5/2).
Location Estimation of RF Emitting Source
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 6. Training results for Fine Tree (a, b), LVSM (c, d), and GPR (e, f).
33
34
K. H. Rahouma and A. S. A. Mostafa
Evaluation and accuracy of the three learning predictive models are shown in Table 1.
Table 1. Evaluation statistics of Fine Tree, LSVM, and GPR (Matern 5/2) regression models. Regression algorithms Accuracy metrics RMSE MAE Fine Tree 33.129 28.67 LSVM 82.213 74.66 GPR (Matern 5/2) 0.0343 0.0297
4.2
R-squared 1.00 0.99 1.00
Testing Predictive Models
Data records (10 records) are selected randomly from the trained data set to test the predictive models. Figure 7 illustrates the result (a) for part of the trained data set, and (b) for new data representing near platform.
(b)
(a)
Fig. 7. Results of testing regression models, (a) data from the trained set, and (b) new data.
Tables 2 and 3 present the accuracy of regression prediction models for both datasets described in Fig. 6a and 6b.
Table 2. Evaluation statistics of the tested dataset. Regression algorithm Accuracy metrics RMSE (m) MAE (m) Fine Tree 19.3 13.636 LSVM 73.149 63.622 GPR (Matern 5/2) 0.0359 0.0297
R-Squared 0.999 0.986 1.00
Location Estimation of RF Emitting Source
35
Table 3. Evaluation statistics of the tested new dataset. Regression algorithm Accuracy metrics RMSE (m) MAE (m) Fine Tree 4.6811 4.1636 LSVM 5.9279 5.4923 GPR (Matern 5/2) 0.9994 0.9944
R-squared 0.8808 0.8727 0.9091
5 Results Analysis Matlab Statistics and Machine Learning Toolbox is used here to apply machine learning in geolocating emitting sources in the x-y plane. The toolbox machine learning tool for regression prediction, Regression Learner, is introduced and steps of learning and training procedure are discussed as shown in Fig. 1. Predictive models are created and evaluated. The x-coordinate of emitter location is predicted using information collected from receiving sensors. The learner application has different regression algorithms to make a prediction. In the present work, most of the regression algorithms are tested for better performance. It is found that three algorithms achieve the best performance considering the current case study. The three algorithms are Fine Tree Regression, Linear SVM Regression, and Gaussian Process Regression (Matern 5/2). Figures 2, 3, 4 and 5 tell the story of supervised machine learning starting from data preparation up to training, evaluating, and exporting a predictive model. The output of each of the three regression predictive models is illustrated in Fig. 6. The accuracy metrics are listed in Table 1. From the figure and the table, it is clear that the created predictive models can perform well, and the prediction in the training phase is near ideal. The testing phase is carried out using two sets of data. A set selected randomly from the training data set, and a new data set. The test procedure of the selected data gives good prediction results with high accuracy as shown in Fig. 7a and Table 2. The prediction process using the new data sample is considered satisfactory as shown in Fig. 7b and Table 3. For new data, it is expected to have less accuracy, but the achieved result is good. Table 4 summarizes the average error of the predicted emitter x-position for the three regression models, and the three used datasets. The GPR (Matern5/2) model achieves the best result compared with the other two models.
Table 4. Distance percentage error of the three regression models. Data set
Regression models Fine tree (%) Linear SVM (%) GPR (Matern 5/2) (%) Training 1 2.5 0.009 Testing-training 1.3 3.2 0.01 New data 3.6 5 1.4
36
K. H. Rahouma and A. S. A. Mostafa
6 Conclusion Supervised machine learning is applied to solve the problem of geolocating an emitting source in the x-y plane. Matlab Statistics and Machine Learning Toolbox is discussed and applied. Regression prediction algorithms built in the library of the toolbox are applied saving time and complexity of mathematical calculations and analytical processes. Near optimum prediction is achieved for training algorithms discussed in the present work. Conceder testing the three algorithms, the minimum percentage error reaches less than 1% for GPR algorithm, while the maximum error reached about 5% for SVM algorithm. The same procedure is used to predict the ye coordinate of the emitter. For new data, accepted results are obtained. It is proved that machine learning could help in geolocation applications achieving good results.
References 1. Smola, A.J.: An introduction to machine learning basics and probability theory, statistical machine learning program. ACT 0200, Australia, Canberra (2007) 2. Osisanwo, F.Y., Akinsola, J.E.T., Awodele, O., Hinmikaiye, J.O., Olakanmi, O., Akinjobi, J.: Supervised machine learning algorithms: classification and comparison. Int. J. Comput. Trends Technol. 48(3), 128–138 (2017) 3. Dangeti, P.: Statistics for Machine Learning, Techniques for Exploring Supervised, Unsupervised, and Reinforcement Learning Models with Python and R. Packt Publishing Ltd., Birmingham (2017) 4. Malmström, M.: 5G positioning using machine learning. Master of Science thesis in Applied Mathematics. Department of Electrical Engineering, Linköping University (2018) 5. Zhang, X., Wang, Y., Shi, W.: CAMP: performance comparison of machine learning packages on the edges. In: Computer Science HotEdge (2018) 6. Jain, V.K., Tapaswi, S., Shukla, A.: Location estimation based on semi-supervised locally linear embedding (SSLLE) approach for indoor wireless networks. Wirel. Pers. Commun. 67 (4), 879–893 (2012). https://doi.org/10.1007/s11277-011-0416-2 7. Feng, Y., Wang, G., Liu, Z., Feng, R., Chen, X., Tai, N.: An unknown radar emitter identification method based on semi-supervised and transfer learning. Algorithms 12(12), 1– 11 (2019) 8. Canadell Solana, A.: MDT geolocation through machine learning: evaluation of supervised regression ML algorithms. MSc. thesis submitted to the College of Engineering and Science of Florida Institute of Technology, Melbourne, Florida (2019) 9. MathWorks, Statistics and Machine Learning Toolbox, R2019b 10. Progri, I.: Geolocation of RF Signals Principles and Simulations, 1st edn. Springer, Heidelberg (2011) 11. Diethert, A.: Machine and Deep Learning with MATLAB. Application Engineering MathWorks Inc., London (2018) 12. Rahouma, K.H., Mostafa, A.S.A.: 3D geolocation approach for moving RF emitting source using two moving RF sensors. In: Advances in Intelligent Systems and Computing, pp. 746– 757, vol. 921. Springers (2019)
Location Estimation of RF Emitting Source
37
13. Varoquaux, G.: Cross-validation failure: small sample sizes lead to large error bars. NeuroImage 180, 68–77 (2018) 14. Prairie, Y.T.: Evaluating the predictive power of regression models. Can. J. Fish. Aquat. Sci. 53(3), 490–492 (1996) 15. Raschka, S.: Model evaluation, model selection, and algorithm selection in machine learning, pp. 1–49 arXiv:1811.12808v2 [cs.LG] (2018)
An Effective Offloading Model Based on Genetic Markov Process for Cloud Mobile Applications Mohamed S. Zalat(&), Saad M. Darwish, and Magda M. Madbouly Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt {mohamed.zalat,saad.darwish}@alexu.edu.eg, [email protected]
Abstract. Mobile Cloud Computing (MCC) has drawn significant research attention as mobile devices’ capability has been improved in recent years. MCC forms the platforms for a broad range of mobile cloud solutions. MCC’s key idea is to use powerful back-end computing nodes to enhance the capabilities of small mobile devices and provide better user experiences. In this paper, we propose a novel idea for solving multisite computation offloading in dynamic mobile cloud environments that considers the environmental changes during applications’ life cycles and relationships among components of an application. Our proposal, called Genetic Markov Mobile Cloud Computing (GM-MCC), adopts a Markov Decision Process (MDP) framework to determine the best offloading decision that assigns components of the application to the target site by consuming the minimum amount of mobile’s energy through determining the cost metrics to identify overhead on each the component. Furthermore, the suggested model utilizes a genetic algorithm to tune the MDP parameters to achieve the highest benefit. Simulation results demonstrate that the proposed model considers the different capabilities of sites to allocate appropriate components. There is a lower energy cost for data transfer from the mobile to the cloud. Keywords: Mobile Cloud Computing Offloading Application partitioning algorithm Genetic algorithm Markov Decision Process
1 Introduction Mobile Cloud Computing (MCC) is an emerging technology linked to a broad range of mobile learning applications, healthcare, context-aware navigation, and social cloud. MCC is an infrastructure where the data storage and data processing are performed outside the mobile device but inside the cloud [1]. A mobile device itself has limitations such as limited network bandwidth, energy consumed by transmission and computation, network availability, and little storage [2]. However, the limited battery life is still a big obstacle for the further growth of mobile devices. Several known power-conservation techniques include turning off the mobile computing devices screen when not used, optimizing I/O, and slowing down the CPU [3]. One accessible © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 38–50, 2021. https://doi.org/10.1007/978-3-030-58669-0_4
An Effective Offloading Model Based on Genetic Markov Process
39
technology to reduce the energy consumption for mobile devices is MCC. Its fundamental idea is computation offloading or cyber forging, which means that parts of an application executing on the remote server, with results communicated back to the local device [4]. The offloading mechanism divides the application between local and remote execution. The decision may have to change with fluctuations in operating conditions such as computation cost, communication cost, excepted total cost of executing, user input, response time, and security agent [5]. Some critical issues concerning the partitioning problem include application component classification, application component weighing, reduced communication overhead, and reduced algorithm complexity [6, 7]. The elastic application is the application that can separate its components in runtime while preserving its semantics [8]. Messer et al. [9] list the fundamental attributes of elastic application as a distributed platform like; ad-hoc platform construction, application partitioning, transparent and distributed execution, adaptive offloading, and beneficial offloading. Many researchers describe the MCC application partitioning taxonomy [1, 10–17]. They tag the application of partitioning algorithms granularity level as module, objective, thread, class, task, component, bundle, allocation site, and hybrid level partitioning. In general, synchronization in distributed deployment of elastic application for MCC represents the main challenges for MCC. The traditional computational offloading algorithms focus on single-site offloading, where an application is divided between the mobile device and a single remote server. In recent years several, researchers studied and implemented algorithms that focus on multisite offloading. Multisite model has a valuable resource more than a single site and more power saving with less time-consuming. We focus on the partitioning/scheduling process, which is the most important phase of cyber foraging to place each task at the surrogate(s) or the mobile device most capable of performing it, based on the context information and the predictable cost of doing so. 1.1
Mobile Application Graph Problem
Mobile application execution can be represented as a sequence of components in several graph topologies, such as linear, tree, or mesh. Based on the fact that offloaded components are likely to be executed sequentially [6]. The assumption is generally used and tested in other research for simplification and convenience. Thus, we use a weighted directed acyclic graph G ¼ ðV; EÞ to represent the relationship among the components in a mobile application. Each vertex v V denotes a component, and edge eðu; vÞ denotes the communication channel between components u and v. Fig. 1 illustrates a weighted directed execution graph of a mobile application. The unshaded vertices are offloadable components, and the shaded vertices are unoffloadable components because of I/O, hardware, or externals constraints. The weight on the vertices, denoted by Ci for vertex i, is a vector weight representing the cost of executing the components on each site. The weight on edges, denoted by Ci;j for edge e(i, j) is also a vector weight that represents the cost of transmitting data between components in different sites. The cost metric can either be time or energy consumption. Our algorithm adopts both cost metrics to determine offloading decisions using an adoptive genetic-based Markov decision process. Those weights are constructed using
40
M. S. Zalat et al.
Fig. 1. Mobile application represented as a weighted directed graph.
both static analysis and dynamic profiling methods for an application. The application structure is analyzed for possible migration points and run several times with different input and environment to identify the overhead on each of its components. 1.2
Genetic Algorithm
Genetic algorithms were developed to study the adaptive process of natural systems to create artificial systems. Genetic algorithms differ from traditional optimization methods in the following ways as it uses a coding of the parameter set rather than the parameters themselves. Genetic algorithms search from a population of search solutions instead of a single one. Furthermore, genetic algorithms use probabilistic transition rules [19]. A genetic algorithm consists of a string representation (“genes”) of the solutions (called individuals) in the search space, a set of genetic operators for generating new search individuals, and a stochastic assignment to control the genetic operators. Typically, a genetic algorithm consists of the following steps. (1) Initialization- an initial population of the search solutions is randomly generated. (2) Evaluation of the fitness function - the fitness value of each individual is calculated according to the fitness function (objective function). (3) Genetic operators - new individuals are generated randomly by examining the fitness value of the search individuals and applying genetic operators to individuals. (4) Repeat steps 2 and 3 until the algorithm converges. From the above description, we can see that genetic algorithms use the notion of survival of the fittest by passing. 1.3
Markov Decision Process
Markov Decision Processes [20] are one tool of artificial intelligence that can be used to get optimal action policies under a stochastic domain. Given an action executed in a known state of the world, it is possible to calculate the probability of the next state in the world. The probability of reaching a state s0 when the action a has occurred is calculated by the summation of the conditional probability of reaching a state s0 from every possible state si of the world given the action a. A formal description of an MDP is the tuple ðS; A; P; R; tÞ where: • S is a finite set of states of the world where s S where s is system state information which is characterized by the combination of the channel state and the execution location of component, so the system state at decision t is Xt,i = (t, i, c) where c is the channel state for the next epoch between the mobile and offloading sites (i.e. either g or b) and I [0,k] which is location of the executed component t [21].
An Effective Offloading Model Based on Genetic Markov Process
41
• A is a finite set of actions; the decision can be chosen from two major actions: migrate execution or continue execute the next component locally. In our case, action for a state s are represented by As and taken at each stage t is represented by a 2 As . • Transition probability P½s0 js; a of states and P represents the transition matrix for the next stage s0 . For each action and state of the world, there is a probabilistic distribution over states of the world that can be reached by executing this action. • R : S x A ! ℜ or Rðs; a; s0 Þ is a reward function or cost function. To each action in each state of the world, is assigned a real number. The function Rðs; a; s0 Þ is defined as the reward of executing action a in state s which will be s0 . • Decision epochs is representing a point of time ðtÞ for decision for action a in the state s, where a finite horizon discrete time problem and decisions are made at the beginning of a period/stage. Decision rule dt ¼ S ! A mapping from state to action at decision epoch t that indicates which action to choose when the system is a specific state at a given time. The policy p ¼ ðd0 ; d1 ; . . .:; dn Þ represents a sequence of decision rule to be used at all decision epochs t under policy p. According to [22] there exists a stationary policy p that is optimal for all policies. Therefore, in this paper, our goal is to determine an optimal stationary policy that suggests the best action that minimizes the sum of the cost incurred at the current stage as well as the least total expected cost that can be incurred from all subsequent stages. We denote Vp ðsÞ to be the expected total cost of executing the application given initial state s and policy p, calculated as: " V ðsÞ ¼ E p
p
n X
# CðXt ; at jX0 ¼ s
ð1Þ
t¼0
Where E p represents the conditional expectation with respect to policy p and CðXt ; at Þ is the cost incurred at stage t by taking action at. The cost function is considered to be either the amount of energy consumed or time spent by the mobile as a result of taking the specific action, which will be explained in the following sections. The remainder of this paper is structured as follows: Sect. 2 presents some related work. Section 3 introduces the proposed offloading technique in detail. Section 4 reports the performance evaluation of the proposed model and gives experimental results. Finally, conclusions and lines for future work are drawn in Sect. 5.
2 Related Work In the literature, several offloading algorithms research focus on single-site offloading, where an application is divided between the mobile device and a single remote server. Cuervo et al. [11] suggested an algorithm to fine-grained code offload to maximize energy savings with minimal burden on the programmer under the mobile device’s current connectivity constrains. Chun et al. [12] designed and implemented a CloneCloud system that is a flexible application partitioner and execution runtime that enables unmodified mobile applications running in an application-level virtual machine
42
M. S. Zalat et al.
to seamlessly off-load part of their execution from mobile devices onto device clones operating in a computational cloud. Kovachev et al. [13] modeled the partitioning of data stream application by using a data flow graph. The genetic algorithm is used to maximize the throughput of the application. Over the last few years, many researchers focus on multisite offloading. Also, most of current approaches make offloading decisions based on profiling information that assumes a stable network environment [15]. However, this assumption is not always correct because the mobility of a user could create a dynamic bandwidth between the mobile and the server. As a result, if the network profile information does not match the actual post-decision bandwidth, the offloading decision could lead to a critical Quality-of-Service (QoS) failure [16]. Sinha and Kulkarni [2] developed a multisite offloading algorithm that uses differently weighted nodes and different network bandwidths. However, their work assumes a stable channel state when making the offloading decision. Terefe et al. [17] presented a model to describe the energy consumption of multisite application execution. They adopt a Markov Decision Process (MDP) framework to formulate the multisite partitioning problem as a delay-constrained, least-cost shortest path problem on a state transition graph. However, they depend on the constantly of Markov values for decisioning. Ou et al. [18] proposed a (K + 1)-way partitioning algorithm to keep the component interaction as small as possible. They utilized the Heavy Edge and Light Vertex Matching (HELVM) algorithm that splits the application graph for multiple servers while satisfying some pre-defined constraints. Unfortunately, HELVM assumes all servers are alike (i.e., they have homogeneous capacities). Niu et al. [3] presented a multi-way partition algorithm called EMSO (EnergyEfficient Multisite Offloading) that formulates the partitioning problem as a weighted directed acyclic graph and traverses the search tree using depth-first search. It then computes the energy consumption of nodes under the current and critical bandwidth. The algorithm determines the most energy-efficient multisite partitioning decision, but it does not provide a guarantee for completion time. Hence, it is not suitable for realtime multimedia applications. This paper focuses on multisite offloading MCC, a substantial extension of the work presented paper [17] that adopted the MDP framework to formulate the multisite partitioning problem as a delay-constrained, shortest path problem on a state transition graph. As a new idea, the proposed model is built based on a quantified data structure for each site’s node and determine optimal offloading policy according to the optimal time and energy for each state. This accomplished by using a genetic algorithm in association with MDP.
3 Proposed Model The main goal of the suggested algorithm is to obtain a fine-grained offloading mechanism to maximize energy savings with minimal time cost and without effect the mobile performance through finding the optimal policy (p ) to distribute application components between different multisite. The main diagram of the suggested model is depicted in Fig. 2 which divided into three steps: Step1. The profiler subsystem gathers
An Effective Offloading Model Based on Genetic Markov Process
43
the environment data using static analysis and dynamic profiling mechanisms [23–26]. Step2. GM-GA engine employs the GA to get the probabilities of the best off-loading sites. Step3. GM-MCC engine utilizes the best population to achieve the optimal policy (p ). The algorithms are specified in the following subsections.
Fig. 2. The proposal GM-MCC model.
44
M. S. Zalat et al.
Algorithm 1: GM-MCC Engine Input: Environment Data (ED). Output: Optimal Policy ( ). Formulate the problem costs according for current ED ) // see algorithm 2 . Get the best transitions probability ( Compute Optimal Iteration Energy Cost (OIEC). Optimal Iteration Time Cost (OIEC). //see algorithm 3. Construct . // see algorithm 4 . Return
Algorithm 1 constructs the optimal Policy (p ) after handling the environment data (ED). First, it formulates the problem costs according to the current ED. Secondly, it calls algorithm 2 for obtaining the best probability (Pbest ) for Energy Cost (EC) and ED. Thirdly, it computes the Optimal Iteration Energy Cost (OIEC), and Optimal Iteration Time Cost (OIEC) by algorithm 3. Then, it constructs the Optimal Policy (p ) using algorithm 4 and returns it.
Algorithm 2: GM-Probability Generating Input: (ED), Energy cost (ED). Output: the best transitions probability(Pbest) Initialize Pm, Pc, maxGenenerationNo. Generate population P0 Evaluate initial Population (P0) and find the Iteration Energy cost (IEC) //call algorithm 3
Repeat For j = 1 to PopulationSize /2 do: Select two parents P1 and P2 from Pi−1 offspring (P1, P2) Crossover(P1, P2) and generate two new child C1,C2 With across probability Pc Mutate C1,C2 randomly with a probability Pm limitation. Add C1,C2 to new population Pnew. Evaluate Pnew //see algorithm 3 Sort Pnew. Stop criteria ) = 0. If fitness ( ) < fitness Fitness ( = maxGenenerationNo . Pbest = Elitism (Pbest,Pnew) Return Pbest.
An Effective Offloading Model Based on Genetic Markov Process
Algorithm 3: GM- Value Iteration (GM-VI) Initialize Vπ(s)=0 for all s ϵ S
For all k=1=1 to n+1 For all For all
π(s)=arg min[Qk(s,a)] End for
End for Return
45
46
M. S. Zalat et al.
Herein, generating a suitable probability algorithm is based on genetic algorithms [19]. Algorithm 2 begins with initializing the population chromosome as an individual that represents a possible solution to the problem and is composed of a string of genes. The fitness evaluation of each gene is the minimal mobile energy consumption value. Algorithm 3 calculates the value of every state with its probability based on bellman’s equation [22], which expresses the optimality condition. The solution of the optimality equation represents the minimum expected total cost and the MDP policy p. Note that the MDP policy indicates which site to migrate the execution given the current state or state at the same site. Algorithms such as the Value Iteration Algorithm (VIA) and linear programming can all be applied to solve Bellman’s optimality equation [20]. We implement the VIA as GM-Value Iteration in our work because of its theoretical simplicity and ease of coding and implementation. Furthermore, Algorithm 4 compares the policy according to delay time and check if there are any availability to change in policy of energy to get optimal policy ðp Þ.
4 Experimental Results The simulation platform is a PC with Windows 10, 64-bit, 8 GB RAM. The simulation is performed to determine the role of GA-MCC Engine and its succor GA-Probability generating, GM- VI, and how GM-PI can enhance the performance of offloading methodology. The experiments test for three offloading sites (multisite model) with the following environment attributes listed in Table 1. Also, mobile applications graph parameters are listed in Table 2, where there are there application performance behavior categories: computation-intensive (CI), data-intensive (DI), and randomly one which it was changed according to user requests. Table 1. Sites characteristics. Site Mobile
Characteristics Mobile
Site 1
Local/Private cloud
Site 2
Local/Private cloud
Site 3
Remote/Public cloud
Specification f0 ¼ 500 MHz ps ¼ 1:3 W pr ¼ 1:0 W pc ¼ 0:9 W pidle ¼ 0:3 W f1 ¼ 2 GHz rg ¼ 100 kb/s rb ¼ 50 kb/s f2 ¼ 3 GHz rg ¼ 50 kb/s rb ¼ 10 kb/s f0 ¼ 5 GHz rg ¼ 50 kb/s rb ¼ 10 kb/s
fi : CPU clock speed (cycles/second) of offloading site qi ps : Mobile power consumption when sending data pc : Mobile power consumption when computing pidle : Mobile power consumption at idle rg : Data transmission rate in a good channel state rb : Data transmission rate in a bad channel state
An Effective Offloading Model Based on Genetic Markov Process
47
Table 2. Mobile application parameters Category CI
Parameter Node weight
DI
Node weight
Random
Node weight Edge weight
Symbol/Units wv ðM cyclesÞ dvs ðKBÞ dvr ðKBÞ wv ðMcyclesÞ dvs ðKBÞ dvr ðKBÞ wv ðMcyclesÞ dvs ðKBÞ dvr ðKBÞ du;v ðKBÞ
Value 500–650 4–6 5–8 100–150 25–30 15–17 100–650 14–30 5–35 100–120
wv : Total CPU cycles needed by the instructions of component v dvs : Data (bytes) sent by component v to the database dvr : Data (bytes) received by a component v from the database du;v : Data transferred from component u to v
The experiments were conducted to validate the efficiency of the suggested offloading model under a single site and multisite in terms of energy consumption. Fig. 3 shows the amount of energy consumed for each category CI, DI, and Random. There are saving between 19–40% according to graph type. Also, we compare the energy-saving between the first-generation population and the best one. Fig. 4 shows that our algorithm saves between 19–40%. The first generation saves between 3–16%. Furthermore, the GM-MCC algorithm, as shown in Fig. 5, saves more time when it performs application offloading. The saving percentage is between 2–48% for different category.
Energy Consumption
45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 Mobile
Site 1
Site 2 CI
DI
Site 3
MulƟsite
Random
Fig. 3. Energy consumption per sites and multisite
Saving
48
M. S. Zalat et al.
Energy Consumption %
100% 80%
19%
40%
60%
31%
40% 20%
15%
16%
3%
0% CI
DI
Random
Power 1st GeneraƟon Power Saving
VC* GeneraƟon Power Saving
Fig. 4. Power saving comparing between the first generation and the best energy value generation
10
30 Mobile
Site1
70 Site2
Site3
100 MulƟSite
Random
DI
CI
Random
DI
CI
Random
DI
CI
Random
DI
CI
Random
DI
180 160 140 120 100 80 60 40 20 0 CI
Time Required
Time Saving
150 Saving
Fig. 5. GA-MCC time saving
5 Conclusion and Future Work We presented a modified partitioning model to find the optimal policy for mobile application offloading. The suggested model employs the genetic algorithm to find populations for a Markov decision model to choose the best solution to handle the
An Effective Offloading Model Based on Genetic Markov Process
49
mobile application between different multisite instead of mobile-only or single cloud site. The simulation results showed better results in terms of time and energy. It is possible in the future to allow using another evolutionary algorithm to enhance the algorithm performance.
References 1. De, D.: Mobile Cloud Computing: Architectures, Algorithms and Applications, 1st edn. CRC Press LLC, Florida (2015) 2. Sinha, K., Kulkarni, M.: Techniques for fine-grained, multi-site computation offloading, In: Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, USA, pp. 184–194 (2011) 3. Niu, R., Song, W., Liu, Y.: An energy-efficient multisite offloading algorithm for mobile devices. Int. J. Distrib. Sens. Netw. 9(3), 1–6 (2013) 4. Hyytiä, E., Spyropoulos, T., Ott, J.: Offload (only) the right jobs: robust offloading using the markov decision processes. In: Proceedings of IEEE 16th International Symposium on A World of Wireless, Mobile and Multimedia Networks, USA, pp. 1–9 (2015) 5. Balan, K., Gergle, D., Satyanarayanan, M., Herbsleb, J.: Simplifying cyber foraging for mobile devices. In: Proceedings of the 5th International Conference on Mobile Systems, Applications and Services, Puerto Rico, pp. 272–285 (2007) 6. Yuan, Z., Hao, L., Lei, J., Xiaoming, F.: To offload or not to offload: an efficient code partition algorithm for mobile cloud computing. In: Proceedings of the IEEE 1st International Conference on Cloud Networking, France, pp. 80–86 (2012) 7. Ou, S., Yang, K., Liotta, A.: An adaptive multi-constraint partitioning algorithm for offloading in pervasive systems. In: Proceedings of the Fourth Annual IEEE International Conference on Pervasive Computing and Communications, Italy, pp. 10–125 (2006) 8. Veda, A.: Application partitioning-a dynamic, runtime, object-level approach. Master’s thesis Indian Institute of Technology Bombay (2006) 9. Messer, A., Greenberg, I., Bernadat, P., Milojicic, D., Deqing, C., Giuli, T., et al.: Towards a distributed platform for resource-constrained devices. In: Proceedings of the 22nd International Conference on Distributed Computing Systems. Austria, pp. 43–51 (2002) 10. Ahmed, E., Gani, A., Sookhak, M., Hamid, S., Xiam, F.: Application optimization in mobile cloud computing: motivation, taxonomies, and open challenges. J. Netw. Comput. Appl. 52 (1), 52–68 (2015) 11. Cuervo, E., Balasubramanian, A., Cho, D.k., Wolman, A., Saroiu, S., Chandram, R., et al.: MAUI: making smartphones last longer with code offload. In: Proceedings of the 8th International Conference on Mobile Systems, Applications, and Services, USA, pp. 49–62 (2010) 12. Chun, B.-G., Ihm, S., Maniatis, P., Naik, M., Patti, A.: CloneCloud: elastic execution between mobile device and cloud. In: Proceedings of the Sixth Conference on Computer Systems, Austria, pp. 301–314 (2011) 13. Kovachev, D., Klamma, R.: Framework for computation offloading in mobile cloud computing. Int. J. Interact. Multimedia Artif. Intell. 1(7), 6–15 (2012) 14. Kumar, K., Lu, Y.H.: Cloud computing for mobile users: can offloading computation save energy? Computer 43(4), 51–56 (2010) 15. Zhou, B., Dastjerdi, A., Calheiros, R., Srirama, S., Buyya, R.: A context sensitive offloading scheme for mobile cloud computing service. In: Proceedings of the IEEE 8th International Conference on Cloud Computing, USA, pp. 869–876 (2015)
50
M. S. Zalat et al.
16. Bakshi, A., Dujodwala, Y.: Securing cloud from ddos attacks using intrusion detection system in virtual machine. In: Proceedings of Second International Conference on Communication Software and Networks, Singapore, pp. 260–264 (2010) 17. Terefe, M., Lee, H., Heo, N., Fox, G., Oh, S.: Energy-efficient multisite offloading policy using markov decision process for mobile cloud computing. Pervasive Mob. Comput. 27(1), 75–89 (2016) 18. Ou, S., Yang, K., Zhang, J.: An effective offloading middleware for pervasive services on mobile devices. Pervasive Mob. Comput. 3(4), 362–385 (2007) 19. Simon, H.A.: The Sciences of the Artificial. MIT press, Cambridge (2019) 20. Thrun, M.: Projection-Based Clustering Through Self-Organization and Swarm Intelligence: Combining Cluster Analysis with the Visualization of High-Dimensional Data. Springer, Berlin (2018) 21. Zhang, W., Wen, Y., Wu, D.: Energy-efficient scheduling policy for collaborative execution in mobile cloud computing. In: Proceedings of the IEEE INFOCOM, Italy, pp. 190–194 (2013) 22. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Toward an Efficient CRWSN Node Based on Stochastic Threshold Spectrum Sensing Reham Kamel Abd El-Aziz1,2(&), Ahmad A. Aziz El-Banna1, HebatAllah Adly1, and Adly S. Tag Eldien1 1
Electrical Engineering Department, Faculty of Engineering at Shoubra, Benha University, Cairo, Egypt [email protected] 2 Electronics and Communication Department, Modern Academy for Engineering and Technology, Cairo, Egypt
Abstract. The high demand for wireless sensor networks (WSNs) is growing in different applications. Most WSNs use the unlicensed band (ISM band) which leads to congestion in that band. On the other hand, without damaging the quality of service (QoS) of the network, minimizing the consumed energy is vital in sensor networks design. Cognitive radio-based wireless sensor networks (CRWSNs) afford some solutions to the problem of scarce unlicensed band spectrum. The spectrum sensing is the main function of the cognitive radio networks. In this paper, for maximizing the accuracy of sensing, as well as the energy efficiency of the network, proposed novel method by employing adaptive spectrum sensing. Spectrum sensing is performed by Secondary User (SU) to identify if the Primary User (PU) is idle, then for verifying that primary user is actually idle, sensing the spectrum again is done by secondary user in order to provide better protection for the primary user. Because of CRWSN has a constraint in energy, that adaptive interval of sensing could also, be modified to optimize the energy efficiency of the network according to the different activity of the PU. Simulation results were provided to validate the efficacy of the proposed algorithms to enhance both spectrum sensing performance and energy efficiency. Keywords: Wireless sensor network Cognitive radio-based wireless sensor network Spectrum sensing Sensing time Energy efficiency Sensing performance
1 Introduction The challenge of spectrum shortage has become more significant due to the massive rise of wireless communication techniques. Owing to the restricted frequency deployment systems, the restricted available spectrum cannot satisfy the increasing demand for wireless communications [1]. Cognitive radio (CR) with an amoral attitude and flexible access to spectrum has come to pass to solve this problem. Based on a software-defined radio, cognitive radio is identified as an intellectual wireless communication platform that is conscious of its surroundings that is efficiently communicated with optimal use of the radio spectrum [2]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 51–64, 2021. https://doi.org/10.1007/978-3-030-58669-0_5
52
R. K. A. El-Aziz et al.
In this paper, we consider new WSN technology with Cognitive Radio called Cognitive Radio Wireless Sensor Network (CRWSN). The CR technology enables sensor nodes to identify appropriately licensed bands by implementing spectrum sensing in CRWSNs, where the Secondary Users (SUs) can use spectrum gaps or white spaces opportunistically to increase bandwidth utilization while detecting the Primary Users (PUs) as idle. Because PUs should not be clashed with SUs, it is very essential for SU to track the movement of PUs accurately. The time of sensing is a key factor which can improve the performance of sensing. In general, longer scanning time will minimize sensing errors and provide the PU with better security. The optimum sensing time therefore leads to a balance between sensing performance and secondary throughput [3]. In this paper, we employ the sensing optimization results to enhance the efficiency of the CRWSN making it more robust to the noise fluctuations, since a stochastic method of threshold level determination helps in overcoming the noise fluctuations, otherwise the time optimization offers more sensing accuracy and leads to high energy efficiency. The remainder of this paper is formulated as follows. Section 2 provides a literature review on spectrum sensing using the technique of energy detection, in addition to describing the threshold expression in noise uncertainty condition using a stochastic method. Section 3 illustrates how we could implement proposed stochastic threshold for the energy detection technique in the design phase of the proposed method, and formulates the energy-efficient optimization problem for optimum sensing time. Throughout Sect. 4, simulations are used to test the efficiency of the proposed scheme. Finally, Sect. 5 concludes the paper and discusses the future work.
2 Basic Concepts 2.1
Energy Detection Based Spectrum Sensing
The most popular spectrum sensing methods currently are the matched filter detection [4], energy detection [5, 6], cyclostationary detection [7] and eigenvalue-based detection [8]. Various comparisons between these approaches are widely covered in the literature, e.g. in [9], however, we can summarize the main differences between these techniques as follows. The eigenvalue-based method of detection does not require the information of the PU’s signal properties, but the computation is complex. Cyclostationary method of detection is robust to the noise uncertainty and able to distinguish noise and PU which leads to high sensing accuracy, but it is complex. The matched filter detection method has the lowest time of execution and robust under low SNR conditions, but the PU signal information is needed and High computational complexity. The energy detector method, also known as radiometry or periodogram, is the most popular way of detecting spectrum due to its low complexity of implementation and rapid efficiency [10]. To formulate the spectrum sensing, a binary hypothesis is used. H0 and H1 denote the idle hypothesis and the busy PU states respectively, while pi and pb specify H0 and H1 probabilities, respectively. Therefore, pi + pb = 1.
Toward an Efficient CRWSN Node Based on Stochastic Threshold
53
In this paper, we consider the energy detection method to detect the PU operation. The SU compares the energy obtained to a predefined threshold, and if the energy obtained is greater than the limit, the PU will be considered busy; otherwise, the PU will be considered idle. The energy detector test statistics G(z) can be expressed as follows [3]: GðzÞ ¼
N 1X jzðnÞj2 ; r2v n¼1
ð1Þ
where z(n) is the signal sampled and N is the samples number taken during the sensing process. If the PU is H1, z(n) = f(n) + v(n), where the PU’s signal is f(n), This is expected to be an independent and identically distributed (iid) random process with a mean of zero and a variance of r2f , and v(n) is a white Gaussian noise with a mean of zero and a variance of r2v . On the other side, if the PU is in H0, z(n) = v(n). The test statistics follow the central and non-central distribution of chi-square with 2N degrees of freedom under hypothesis H0 and H1, respectively. The test statistic can be approximated as Gaussian because the central limit theorem can be applied when the value of N is high enough [2]. We can then describe the test statistics as follows: GðzÞ
N ðN; 2N Þ H0 2 NðN ð1 þ cÞ; 2N 1 þ cÞ H1
ð2Þ
where the Signal to Noise Ratio (SNR) received from the PU is c ¼ r2f =r2v . On this basis, it is possible to define the probability of detection pD and the probability of false alarm PFA as follows: k pD ¼ pðH1 jH1 ÞpD ¼ Q pffiffiffiffiffiffi 2N ð1 þ cÞ pFA
k ¼ pðH1 jH0 Þ ¼ Q pffiffiffiffiffiffi 2N
rffiffiffiffi! N 2
rffiffiffiffi! N 2
ð3Þ
ð4Þ
where the threshold of sensing k is in comparison with the power received. Specifically, the PU is considered active when the SU senses the PU and the energy obtained is greater than k; otherwise the PU is considered idle. Q() is the Q function. The number of samples N can be calculated as N = 2t W, where W is the PU signal bandwidth and t denotes the time of sensing [2]. Using (3), the threshold of sensing k can be obtained as: pffiffiffiffiffiffi k ¼ 2N ð1 þ cÞ Q1 ðpD Þ þ
rffiffiffiffi! N ; 2
ð5Þ
54
R. K. A. El-Aziz et al.
where Q−1() is the inverse of the Q-function defined above. Substituting by k in (4), the PFA is obtained as: pFA
rffiffiffiffi! N ¼ Q ð1 þ cÞQ ðpD Þ þ c ; 2 1
ð6Þ
PD should be greater than or equal to a predefined threshold pth D to ensure essential protection for the PU in CRWSNs. Based on (6), since pD is a fixed value, PFA decreases as the sensing time increases. Moreover, as pD decreases, the value of Q−1(pD) increases and PFA decreases as Q−1(pD) increases. Therefore, PD is set as pth D (pD = pth D ) to make sure that the available secondary throughput is maximum. 2.2
Noise Uncertainty Stochastic Approach
The fluctuations in noise are defined as random signals because we cannot accurately determine their values, that is, they are values of uncertainty. Let’s denote the estimated variance in noise as [11]: 2 b v ¼ a þ 10 log r2v 10 log r
ð7Þ
where a flouts a uniform distribution in the [−UdB, UdB] interval, and at U 2= 0 there is 2 2 no ambiguity about noise. The resulting estimated noise b r v falls to ð1=r Þrv ; rrv , and r ¼ 10ðU=10Þ . Under the condition of noise uncertainty, signal power Ps should be larger than the entire noise power interval size to distinguish the presented signal situation from only noise fluctuation r2v [2], i.e., Ps [ rr2v ð1=rÞr2v ¼ ðr 1=rÞr2v
ð8Þ
SNR ¼ Ps =r2v [ ðr 1=r Þ
ð9Þ
Under both hypotheses, the mean of the test statistics is related to the noise variance. For practice, the estimated noise variance b r 2v is used to calculate the noise variance instead of the noise variance r2v . In simulations, noise uncertanity is considered to better satisfy the realistic implementation settings. Two major problems in spectrum sensing are the noise uncertainty and quality degradation, e.g. the false alarm probability PFA increases and the probability of detection PD decreases. In addition, a fixed threshold energy detection algorithm provides degraded quality with noise uncertainty. This indicates that in the presence of noise uncertainty, the dynamic threshold would yield better performance [10].
Toward an Efficient CRWSN Node Based on Stochastic Threshold
55
3 System Model In this paper, we consider a typical CRWSN consists of a single PU, and a secondary connection transmitter-receiver pair as shown in Fig. 1. In addition, other source and sink nodes are also exist for data transmission and the main spectrum access links are the PU link or the licensed band, and the SU link or the opportunistic spectrum.
Fig. 1. Proposed system model.
3.1
The Proposed Threshold Expression
The Old Threshold Under Noise Uncertainty The value of the threshold k0 can be determined as follows. In the case of hypothesis H0, which only corresponds to the presence of noise, we know that g [n] is (i.i.d.) Gaussian random variables with zero-mean and r2v variance. When the samples are large enough, using the Central Limit Theorem (CLT), the noise approaches Gaussian distribution (µ – v, r2v ), which can be determined from simulation. Then the k0 threshold value is [2, 6]. 1 k0 ¼ lv þ rv :Q1 1 ð1 PFA ÞN
ð10Þ
Stochastic Threshold The probability of false alarm PFA, will get increased in conventional signal threshold detection with U dB uncertainty if the actual noise r2v is greater than the expected noise variance b r 2v . For an optimal trade-off between PD and PFA, the decision threshold k could be selected. It is important to get the knowledge of noise intensity and signal strength to get the optimum threshold value of kS. Noise power can be estimated, but it is necessary to obtain signal power, transmission and propagation characteristics. The threshold is usually chosen in practice to fulfill a certain PFA, which only requires knowledge of the noise power. Unless signal SNR is small, the situation is similar to hypothesis H0, the detection probability PD will be increased, e.g. the probability of detection in conventional signal threshold detection with uncertainty UdB. When the signal SNR decreases, the test statistics will be lower than the threshold more often if r 2v , which is equal to the threshold increase. Then the detection r2v is lower than b probability will be increasing. The high and low threshold values can be set using the maximum, minimum noise uncertainty value respectively [12] as follows:
56
R. K. A. El-Aziz et al.
kH ¼ k0 þ U; and kL ¼ k0 U
ð11Þ
There are three cases for signal decision according to Eq. (11): 1) If G(z) > kH , so signal is existing. 2) If G(z) < kL , then signal is missing. 3) If kL < G(z) < kH , there is no decision. In this scenario, the sensing will fail and the receiver will request a new spectrum sensing from the cognitive user [12]. To overcome that problem of G(z) lies between kL and kH , the stochastic suggested threshold kS will investigate threshold various values between lower and higher thresholds and drawing their histogram to get some insights from it as follows. After building the histogram, threshold with most repeated value which has the greater histogram would be selected for use if the signal lies between kL and, then kS is defined as: kS ¼ Maxn ð
k X
ki Þ
ð12Þ
i¼1
where n is the complete observations number, i is a number of iterations and ki is a function of histogram which counts the number of observations falling between the L threshold values (known as bins). Additionally, k is the bins entire number (k = kH k h ) and h is size of bin. Summation’s largest value of the equivalent threshold value is selected and used as the stochastic threshold. 3.2
The Proposed Sensing Time Scheme
Following the frame structure in [3], equally dividing time to frames, that contains firstly phase for sensing, then second phase for transmitting data. It is believed that the inaccurate SU’s spectrum sensing, causing errors in sensing (i.e., false alarm and miss detection). The SU conducts spectrum sensing during the sensing process to detect the behavior of the PU. If the result of sensing finds that the PU is idle, during the data transmission process the SU always has data to transmit, otherwise, the SU would remain silent. To simplify the problem, a time-framed structure is assumed to follow the operation of the PU. In other words, the spectrum is either occupied by the PU or vacant during one frame time. Figure 2 displays the spectrum sensing frame structure where T represents frame time, ts denotes time of spectrum sensing, D0 and D1 show sensing outcomes when the PU is idle and active, respectively. The SU conducts the second spectrum sensing dynamically in the proposed scheme based on the first test of spectrum sensing. In particular, for time ts sensing spectrum is performed by SU, and then keeps quiet if the sensing output is D1. If the sensing result is D0, spectrum sensing will be performed by the SU for time ts again to verify the absence of the PU for better protection. The SU will transmit data if the second sensing result is still D0, Else it will keep quiet. Moreover, the final decision for sensing is obtained from 2nd sensing result, when the result of 2nd sensing is differing from the 1st sensing result.
Toward an Efficient CRWSN Node Based on Stochastic Threshold
57
Fig. 2. Structure of spectrum sensing frame.
It is possible to save the energy needed for spectrum sensing at each frame start, when takes into account the primary activity level as sensing interval will be expanded to several frames if the primary user is active according to results of final sensing. As a result of improving the accuracy of spectrum sensing, the number of inaccurate data transmission is reduced, this is in turn increases network energy efficiency by avoiding excessive energy consumption due to incorrect data transmission. Six possible cases based on the results of the first and second spectrum sensing are shown in Fig. 3.
Fig. 3. Sensing time and sensing interval frame structure.
Based on the six cases, the PU activity was successfully detected in Cases 1 and 2. Case 3 triggered miss detection problem, while Cases 4 and 6 contributed to false alarm problem. Only in case 5, a good result was achieved. From the discussed cases, case 3 only leads to the missed detection problem p1m can be specified as: p1m ¼ pb ð1 pD Þ2
ð13Þ
58
R. K. A. El-Aziz et al.
while p2m is identified as: p2m ¼ pb ð1 pD Þ
ð14Þ
Based on Fig. 3. When data is transmitted (i.e., Case 3 and Case 5), it must execute a spectrum sensing twice. As discussed above, when the primary user actual state is U0 is valid only for data transmission. the invalid throughput probability as a result of miss detection Px1 and the valid throughput probability Px2 could be estimated by the following equations [3]: px1 ¼ pb ð1 pD Þ2
ð15Þ
px2 ¼ pi ð1 pFA Þ2
ð16Þ
Thus, the SU probability for data transmission can be determined using following expression as modeled in [3] PX ¼ PX1 þ PX2
ð17Þ
As shown in Fig. 3, the SU may be silence in two situations: 1) the SU perform the spectrum sensing once and the result is D1 so it became silent n frames as case 1 and 4, 2) the SU perform the spectrum sensing once and the result is D0 then perform the 2nd spectrum sensing and the result is D1 so it became silent n frames as case 2 and 6. the probability of doing spectrum sensing once and twice is Py1 and Py2 and can expressed as py1 ¼ pb pD þ pi pFA
ð18Þ
py2 ¼ pb ð1 pD ÞpD þ pi ð1 pFA ÞpFA
ð19Þ
when D1 is the sensing final result. Thus, Pv is the probability that sensing final result D1, is given as follows: py ¼ py1 þ py2
ð20Þ
when Px and Py are known, and assuming that we have two successive frames, the throughput can be discussed, it will be four situations for the secondary throughput: 1) through data then remains silent for n frames, 2) Remain silent for n frames then remain silent for n frames, 3) through data then through data, 4) Remain silent for n frames then through data. situations 1, 3, and 4 returns correct secondary throughput ST1, ST3, and ST4, respectively, that equations are defined as follows [3]: ST1 ¼ ST4 ¼
px2 py ðT ts ÞC nþ1
ST3 ¼ p2x2 ðT ts ÞC þ px1 px2 ðT ts ÞC
ð21Þ ð22Þ
Toward an Efficient CRWSN Node Based on Stochastic Threshold
59
where C denotes the channel capacity of the SU without PU interference that can be described according to Shannon theorem ðC ¼ log2 ð1 þ csÞÞ, where cs indicates the SU transmitter received SNR. Furthermore, n gives the frames number that the SU stays silent. For ST3 the valid throughput for two frames(current and next frames) represented by the term p2x2 ðT ts ÞC; and the term px1 px2 ðT ts ÞC, is the achieved throughput by one frame when the other frame senses a miss detection. The total average valid throughput per average frame ST is: ST ðtsÞ ¼ ST1 þ ST3 þ ST4
ð23Þ
As n is the number of frames and it is set to positive integers, it’s value can be modified depending on the PU activity. The values of pi and pb determines the value of n. as the PU busy the sensing interval increased, thus n depends essentially on the prob that PU busy state will continue. Assume that the PU current state is U1, then the probability that it will stay occupied for n frames, ps(n), is determined by: ps ðnÞ ¼ pn1 b ð1 pb Þ; n 2 f1; 2; 3; . . .g
ð24Þ
Then the probability of the PU being occupied for at most n frames Ps(n), can be described as: P s ð nÞ ¼
n X
ps ð i Þ
ð25Þ
i¼1
A threshold w is specified for Ps(n), which corresponds to 0 w 1. The n values depending on w based on the equation as follows: n ¼ minfn : Ps ðnÞ wg
ð26Þ
After the previous clarification for the two proposed schemes, stochastic threshold and sensing time, the proposed CRWSN node will use the two mentioned schemes as follows when turning on the CRWSN node it will discover the working environment to determine the appropriate value of stochastic threshold which will be used as an offline threshold for spectrum sensing operations will be performed either as a first sensing or second sensing, which will enhance the sensing performance.
4 Performance Evaluation This section investigates the proposed scheme’s performance evaluation using MATLAB. A comparison of proposed scheme with two other schemes for optimizing sensing time, adaptive sensing threshold discussed in [13] and hybrid threshold discussed in [14].
60
R. K. A. El-Aziz et al.
4.1
Simulation Parameters
The simulation scenario is a basic CRWSN, consisting of a single PU and a secondary link with a randomly assigned transmitter-receiver sensor node pair within the PU’s communication range. The band that was licensed occupied by the PU is assigned to the SUs. Other simulation parameters as, P1D ¼ 0:9; W ¼ 6 MHZ, T ¼ 0:2 s; w ¼ 0:5; cs ¼ 20 dB, c ¼ 20dB, and C ¼ 6:6582 bits/sec/Hz. 4.2
Simulation Results
Figure 4-a illustrates the false alarm probability and the probability of missed detection of the old expression of threshold for different SNR at U = 0 dB and U = 1 dB noise uncertainty, it found that the missed detection prob is increased in case of U = 1 dB rather than U = 0 case which indicates that the sensing result using old threshold are effected by noise uncertainty, also the prob of false alarm is increased.
Fig. 4. a) For different SNR, missed detection prob. and false alarm prob. of the old threshold at U = 0 dB and U = 1 dB noise uncertainty respectively. b) Histogram for Stochastic-threshold in dB at environment of noise uncertainty.
Figure 4-b shows the histogram of the number of trials against the threshold levels in dB at ambient noise uncertainty. The value has the highest number of iterations occurs at a stochastic threshold equals −28.2 dB, so it has been selected. In Fig. 5-a, plot of the Stochastic and double thresholds probabilities of false alarm and missed detection versus SNR for noise uncertainty of 1 dB, which shows the two schemes double threshold and stochastic threshold approximately have the same performance taking in consideration the advantage of stochastic in case of ambiguity when the received signal level in the middle between the higher and lower threshold. Figure 5-b demonstrates the probability of missed detection against SNR using the old, double and stochastic threshold respectively for noise uncertainty U = 0 dB and U = 1 dB. Additionally, missed detection prob Pm = 0.1 was obtained according to the 802.22 standard maximum acceptable value of PFA = 0.1, at 8.2 ms sensing duration
Toward an Efficient CRWSN Node Based on Stochastic Threshold
61
Fig. 5. a) For different SNR, probabilities of false alarm and missed detection of the stochastic and double thresholds at U = 1 dB noise uncertainty. b) For different SNR, Missed detection probability for the old threshold at U = 0 dB and U = 1 dB noise uncertainty, and the double and stochastic threshold under U = 1 dB noise uncertainty.
for original threshold at SNR = 25.5 dB at U = 0 dB and obtained at 25 dB SNR at U = 1 dB at PFA = 0.4 which was not accepted by the standard, for the U = 1 dB, if target PFA = 0.1 is obtained at 22.6 dB SNR for the stochastic threshold and 22.5 dB SNR for the double threshold. The noise uncertainty U is 0 dB. If accurate information of noise exists, the old threshold exceeds the stochastic and double by 3 dB as illustrated in Fig. 5-a. Furthermore, if the expected noise uncertainty U = 1 dB, the efficiency of the original threshold is worse than the results of the stochastic and double threshold with respect to the situation PFA = 0.1. Moreover, stochastic threshold performance outperforms by 0.1 dB, the double threshold as seen in Fig. 5-b. To cover various levels of noise uncertainty that occur at different operation environments, we conduct a study to compute the corresponding stochastic threshold values. Figure 6-a concludes the results achieved from this study, where the increase in the noise uncertainty reduces the stochastic threshold. The prior knowledge of these values in the initialization phase will help the node to save time in the next operation phases. Figure 6-b illustrates the sensing duration n as a function of pi when the threshold w is 0.5 and the output of the sensing is D1. As could be seen, as pi increases, the sensing duration n reduces. If pi is greater than 0.5, the interval of sensing n is often 1. In different words, at start of each frame, the SU executes spectrum sensing at pi > 0.5. The scanning interval n shrinks as pi increases, hence the SU executes more frequent spectrum sensing and there are additional chances for spectrum holes to be utilized so secondary throughput is enhanced. Moreover, as pi decreases, the sensing duration increases, thereby reducing spectrum sensing energy consumption to boost energy efficiency. In fact, the threshold w is highly dependent on the choice between energy efficiency and secondary throughput. Figure 7-a shows the comparison of these three schemes in the secondary throughput for fixed T = 0.2 s as a function of pi. As is obvious, the proposed scheme’s secondary throughput is lower than those suggested in [13] and [14]. This is
62
R. K. A. El-Aziz et al.
Fig. 6. a) Stochastics threshold values for different noise uncertainty. b) The interval of sensing as a function of pi.
because the SU must execute the sensing again to guarantee the initial result of sensing when it indicates that the PU is idle. It minimizes the available data transmission time. Moreover, the interval of sensing becomes extended when the Primary user activity becomes more and the outcomes of sensing display that the primary user is active. This decreases spectrum sensing consuming for energy; however, when the SU stays silent for n frames, the chances for data transmission are also wasted. These are the two key reasons why the suggested scheme’s secondary throughput is lower than the other two schemes.
Fig. 7. a) Secondary throughput comparison. b) Miss detection probability comparison.
Figure 7-b compares the probability of miss detection p1m of the planned scheme with the existing ones in [13] and [14]. The suggested scheme’s probability of miss detection is often smaller than two other schemes. Because the SU just executes spectrum sensing once in [13] and [14], the probability of such two approaches is just like p2m, according to Eq. (14), it can be seen that the ratio between pm1 and p2m is 1 pD . Since pD is put on the value itself as just the two other techniques (i.e., 0.9),
Toward an Efficient CRWSN Node Based on Stochastic Threshold
63
with our suggested model the probability of miss detection reduces by a factor 10. In particular, as pi becomes smaller, the difference among both p1m and p2m has become significantly larger. Even though when the result of the sensing is D0 spending more time for detecting the primary user and for the average frame the secondary throughput is lower, the primary user has more protection because of the lower probability of miss detection, Therefore, the decrease in the incorrect data transmissions number will cause reduction in energy consumption which increases the energy efficiency of the network and extends the lifespan of the network. Therefore, within the extended lifetime, the network can be used more to substitute the wasted secondary throughput.
5 Conclusion and Future Work This paper presents a creative model for CRWSN node, which determines the proposed stochastic threshold in the first run of the node that helps to overcome the noise uncertainty in the sensing environment, which shows an enhancement in sensing result. Then executes sensing on the basis of results of the first sensing for either one or two intervals. This helps the CRWSN as a secondary user to conduct again the spectrum sensing to verify that the primary user is in fact silent when the results of first sensing shows that the primary user is silent. Furthermore, the proposed solution respects the PU’s level of activity. The interval of sensing may be varied to further enhance the energy efficiency depending on the different primary user activity levels. Moreover, simulation analysis validates that higher energy efficiency and better performance of spectrum sensing occur from the proposed scheme. Simulation analysis shows that the stochastic proposed threshold at noise uncertainty existence of 1 dB exceeds the double threshold at PFA = 0.1 and time of sensing 8.2 ms by more than 0.1 dB. As a future work, we can try the proposed CRWSN node model in cooperative sensing situation and study its effect on the cooperative sensing performance.
References 1. Ivanov, A., Dandanov, N., Christoff, N., Poulkov, V.: Modern spectrum sensing techniques for cognitive radio networks: practical implementation and performance evaluation. Int. J. Comput. Inf. Eng. 12(7), 572–577 (2018) 2. Rabie, A., Yousry, H., Bayomy, M.: Stochastic threshold for spectrum sensing of professional wireless microphone systems. Int. J. Comput. Sci. Netw. 4(4) (2015) 3. Kong, F., Cho, J., Lee, B.: Optimizing spectrum sensing time with adaptive sensing interval for energy-efficient CRSNs. IEEE Sens. J. 17(22), 7578–7588 (2017) 4. Lee, J.W., Kim, J.H., Oh, H.J., Hwang, S.H.: Energy detector using hybrid threshold in cognitive radio systems. IEICE Trans. Commun. E92-B(10), 3079–3083 (2009) 5. Kay, S.M.: Fundamentals of Statistical Signal Processing: Detection Theory. Prentice-Hall, Upper Saddle River, (1998) 6. Atapattu, S., Tellambura, C., Jiang, H.: Analysis of area under the ROC curve of energy detection. IEEE Trans. Wireless Commun. 9(3), 1216–1225 (2010) 7. Sutton, P.D., Nolan, K.E., Doyle, L.E.: Cyclostationary signatures in practical cognitive radio applications. IEEE J. Sel. Areas Commun. 26(1), 13–24 (2008)
64
R. K. A. El-Aziz et al.
8. de Souza Lima Moreira, G., de Souza, R.A.A.: On the throughput of cognitive radio networks using eigenvalue-based cooperative spectrum sensing under complex Nakagami-m fading. In: Proceeding of International Symposium. Network, Computer Communication (ISNCC), pp. 1–6, May 2016 9. Kyryk, M., Matiishyn, L., Yanyshyn, V., Havronskyy, V.: Performance comparison of cognitive radio networks spectrum sensing methods. In: Proceeding of International Conferences on Modern Problems Radio Engineering, Telecommunication and Computer Science (TCSET), pp. 597–600, February 2016 10. Farag, H.M., Ehab, M.: An efficient dynamic thresholds energy detection technique for cognitive radio spectrum sensing. In: Proceeding of Computer Engineering Conference (ICENCO), pp. 139–144, December 2014 11. Prashob, R.N., Vinod, A.P., Krishna, A.K.: An adaptive threshold based energy detector for spectrum sensing in cognitive radios at low SNR. In: The 7th IEEE VTS Asia Pacific Wireless Communication (2010) 12. Xie, S., Shen, L.: Double-threshold energy detection of spectrum sensing for cognitive radio under noise uncertainty environment. In: International Conference on Wireless Communications & Signal Processing (2012) 13. Luo, L., Roy, S.: Efficient spectrum sensing for cognitive radio networks via joint optimization of sensing threshold and duration. IEEE Trans. Commun. 60(10), 2851–2860 (2012) 14. Li, X., Cao, J., Ji, Q., Hei, Y.: Energy efficient techniques with sensing time optimization in cognitive radio networks. In: Proceeding of IEEE Wireless Communication and Networking Conference (WCNC), pp. 25–28, April 2013
Video Captioning Using Attention Based Visual Fusion with Bi-temporal Context and Bi-modal Semantic Feature Learning Noorhan K. Fawzy(&), Mohammed A. Marey(&), and Mostafa M. Aref(&) Faculty of Computer and Information Sciences, Ain Shames University, Cairo, Egypt {norhan.khaled,mohammed.marey, mostafa.aref}@cis.asu.edu.eg
Abstract. Video captioning is a recent emerging task that describes a video through generating a natural language sentence. Practically videos are untrimmed where both localizing and describing the event of interest is crucial for many vision based- real life applications. This paper proposes a deep neural network framework for effective video event localization through using a bidirectional Long Short Term Memory (LSTM) that encodes past, current and future context information. Our framework adopts an encoder decoder network that accepts the event proposal with highest temporal intersection with ground truth for captioning. Our encoder is fed with attentively fused visual features, extracted by a two stream 3D convolution neural network, along with the proposal’s context information for generating an effective representation. Our decoder accepts learnt semantic features that represent bi-modal (two modes) high-level semantic concepts. We conduct experiments to demonstrate that utilizing both semantic features and contextual information provides better captioning performance. Keywords: Video to natural language Encoder-decoder Recurrent neural network Attention-based LSTM Bidirectional LSTM Temporal action localization Deep learning
1 Introduction Video Captioning is a machine intelligence tasks, whom ultimate goal is generating a natural language description for the content of a video clip, just like humans. Many recent real life intelligent applications such as video search and retrieval, automatic video subtitling for supporting blind disabled people, have been gaining an emerging need. Recent large scale activity datasets [9, 10]; have highlighted the success of many models that solved the video action recognition task. Such models would output labels like jumping, dancing or sporting archery. The tightness of the details level provided by those models is considered a key limitation. For compensating this limitation, many subsequent works in the research community [1–3] have embraced the task of explaining video semantics using natural language sentences. These works would © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 65–78, 2021. https://doi.org/10.1007/978-3-030-58669-0_6
66
N. K. Fawzy et al.
likely describe a video with an informative sentence such as “A man is shooting a bow towards a target”. In practice, videos are not pre-segmented into short clips that contain the action of interest which will be described in natural language. The research community has been striving recently to develop models that are able to identify all events in a single pass [4] of the video. We design a temporal localization module and a video captioning module that, for each input video, a descriptive sentence along with its time location is generated automatically. The main tasks that our framework covers can be listed as: 1. The detection of temporal event segment. For this, we adopt a Bidirectional single pass temporal event proposals generation model that encodes past, current and future video information, Fig. 4. 2. The representation of this event segment. It is important to note that representing the target event within the video sequence in isolation without considering its contextual information will not produce consistent video caption. The need to consider the temporally neighbouring video contents for the target event is necessary for providing antecedents and consequences for understanding an event, Fig. 3. 3. Generating a description for the detected event. The linking of visual features to textual captions directly, as done in many works [1–3] may neglect many rich intermediate and high-level descriptions, such as people, objects, scenes, and actions. To address this issue, this study employs extracting two types (bi-modal) of high-level semantic concepts. Where concepts that describes objects, backgrounds/scenes are referred to static semantic concepts. Whereas concepts that describe dynamic actions are referred to dynamic semantic concept. In Sect. 2 we will discuss the related works. After that our proposed Framework description will be provided in Sect. 3. Section 4 and 5 discusses the performance evaluation and implementation details respectively. Finally our conclusion and future work will be given in Sect. 6.
2 Related Works Our video captioning framework requires both temporal localization and descriptions for all events that may happen in a video. We review related works on the above two tasks. 2.1
Action Recognition and Localization
Deep convolution networks with 3D kernels such as 3D ResNet [5, 6], 3D Inception network [7, 8] architectures, are able to extract motion characteristics from input frame volumes over time. They achieved surprisingly high recognition performance on a variety of action datasets [9, 10]. Also the combination of these 3D convolution networks with temporally-recurrent layers such as LSTMs, in [6, 8, 12], have shown great improvement in performance. [13] argued that untrimmed videos contain target actions that usually occupy a small portion of the whole video stream. Some current methods for action temporal localization [13, 14] rely on applying action classifiers at every time
Video Captioning Using Attention Based Visual Fusion
67
location and at multiple temporal scales, in a temporal sliding window fashion. Major drawbacks regarding these methods are, the high computational complexity, scales of sliding windows are predetermined based on the statistics of the dataset and temporal boundaries generated are usually approximate and fixed during classification. Recent researches [4, 15, 27] worked on avoiding the high computational complexity drawbacks of sliding windows. They used a CNN-RNN architecture. The work in [4] followed an end to end proposal generation model. Their model scans an untrimmed video stream of length L frames divided into T = L/ d non-overlapping time steps, where d = 16 frame, as in Fig. 1 (a). Each time step is encoded with the activations from the top layer of a 3D convolutional network pre-trained for action classification (C3D network [16]), as in Fig. 1 (b). A recurrent neural network (RNN) was used for modelling the sequential information into a discriminative sequence of hidden states, as in Fig. 1 (c). The hidden representation at each time step is used for producing confidence scores of multiple proposals with multiple time scales that all end at time t, as illustrated in Fig. 1 (d). However, these methods simply neglects future event context and only encode past and current event context information when predicting proposals. 2.2
Action Captioning
Orthogonal to work studying action proposals, early approaches in deep learning [1, 2] directly connected video with language. The encoder-decoder unified framework was used. Translating video pixels to natural language was their aim. They used a single deep neural network. Where a convolutional neural network (CNN) like ResNet [5], C3D [15] or two-stream network [17] was used as an encoder for extracting features from the video. A mean pooling of features across all frames is applied for obtaining a fixed-length vector representation, which is considered as a simple and reasonable semantic representation for short video clips, Fig. 2 (a). Translation to natural language is done via a stacked two-layer recurrent neural network (RNN), typically implemented with long short-term memory (LSTM) [18, 19], Fig. 2 (b). However, they considered frame features of the video equally, without any particular focus. Using a single temporally collapsed feature vector for representing such videos, Fig. 2 (a), leads to the incoherent fusion of the dependencies and the ordering of activities within an event. Extracting the temporal structure implied within the input video is important. Many follow-up works such as [20, 21] explore improving model’s capability of encoding both local motion features and global temporal structures. They proposed a novel spatio-temporal 3D CNN, that accepts a 3-D spatio-temporal grid of cuboids. These cuboids encodes the histograms of oriented gradients, oriented flow and motion boundary (HoG, HoF, and MbH) [22]. They argued that, average pooling these local temporal motion features would collapse and neglect the model’s ability to utilize the video’s global temporal structure. For this a soft attention mechanism was adapted, which permits the RNN decoder weighting each temporal feature vector. Although the attention-based approaches have achieved excellent results, they still ignore representing high-level video concepts/attributes. The work in [23] extracted high-level explicit semantic concepts which further improved visual captioning.
68
N. K. Fawzy et al.
3 Framework The framework of our approach consists of three components: 1) Visual feature extraction. 2) Event Proposal generation. 3) Captioning (Sentence generation). In this section, we introduce each component of the framework in details. 3.1
Visual Features Extraction
The input video with L frames is discretized by dividing into T non-overlapping time steps, where each time step is of size d = 16-frames. We adopt a two-stream [16] 3D Residual Neural Network for the extraction of spatio-temporal features [28] from each clip. A stream for learning to extract motion features from RGB frames using 3D ResNet-18 [5]. The other for learning abstract high level motion features from motion boundary frames using 3D ResNeXt-101 [5]. Motion boundary frames carry optimized smooth optical flow inputs. The reason of using the two-stream 3D approach is that, the study in [7] found that using optical flows as inputs to 3D CNNs resulted in a higher level of performance that can be obtained from RGB inputs, but that the best performance could be achieved by combining both. The reason we used Motion Boundaries is that, optical flow represents the absolute motion between two frames, which contains motion from foreground objects and background camera motion. Motion boundaries are the derivative of the flow. In many cases, camera motion is locally translational and varies smoothly across the image plane, which is eliminated in the motion boundary frames. 3.2
Proposal Generation
This pipeline is an updated version of the work in [4], where they used only a single direction RNN. The fused two stream features, are used with a bi-directional LSTM recurrent sequence model. At each time step (t), we pass the hidden state that encodes the sequence of visual features observed till time (t) through a fully connected layer with sigmoid nonlinear function, as in Eq. 1 and Eq. 2. This produces multiple (K) proposals scores, as Figure 1 (d). Proposals have different time scales with a fixed ending boundary (t) and (K) confidence scores. This is done for each LSTM direction (forwardbackward) independently, Fig. 4. ð1Þ ð2Þ Finally after the passes from the two directions, we obtain a number N of proposals collected from all time steps of both directions. We fuse the two sets of scores for the
Video Captioning Using Attention Based Visual Fusion
69
same proposals, yielding the final scores, as in Eq. 3. Thus, at each time step t, we take both forward confidence C! i score and backward confidence score Ci for each proposal, to compute the final proposal’s confidence score Cp . The proposal with a score larger than a threshold will be finally selected for further captioning.
Fig. 1. Single pass temporal proposal generation.
Fig. 2. (a) CNN encoder for extracting frame visual features. (b) Collapsing the features across the entire video through mean pooling and passing them to the stacked LSTM decoder.
70
N. K. Fawzy et al.
Fig. 3. Fusing both local context information and target event’s content for generating caption words.
Cit ¼ fCi! XCi gki¼1
ð3Þ
The context of a proposal which is referred to future and past context can be obtained from the hidden state hf and hp of the final forward and backward LSTM layers respectively, as illustrated in Fig. 4. An action proposal has a start time step S and an end time step E, Fig. 5. 3.3
Caption Generation
To implement caption generation using semantic features, a dynamic semantic concepts network (DSC-N) is built upon an encoding LSTM. It accepts the visual features from the temporal stream at each time step within the action proposal. Also the static semantic concepts network (SSC-N) accepts visual features from the spatial stream. The last output of each of the dynamic and static semantic concept LSTM networks is passed to a fully connected layer and sigmoid activation function. Each outputs a probability distribution, where (pd ) is the probabilities of the set of dynamic concepts (verbs) and (ps ) is the probabilities of the set of static concepts (nouns) extracted from the data set. We follow the encoder-decoder framework using LSTMs for generating the captions. The potential power of the encoder-decoder lies in the fact that it is a generative neural network that can map sequences of different lengths to each other. The input to the LSTM sequence decoder is obtained by applying an attention mechanism on the
Video Captioning Using Attention Based Visual Fusion
71
concatenated semantic concept features which is referred to (Attended Semantic Concepts-ASC) to treat each semantic feature differently at each time step, Fig. 5. Both dynamic and static semantic concept are concatenated (Et ) and serve as inputs for the attention layer. A weight value (wa ) reflecting the semantic concept features to focus on at a current time step (t) is learnt within the attention layer from input semantic features. The weight of the semantic concepts features (ct ) can be calculated using Eq. 4, where ba is the bias. ct ¼ softmax ðWa :Et þ ba Þ
ð4Þ
The converted semantic concepts features ct serve as inputs to the decoding LSTM. Conventionally, the attention should be directed to an object if the word to be generated is a noun, and similarly the focus should be on behaviour if the word is a verb. The output hidden state of the decoder at each time step is passed to a fully connected layer and softmax operation that identifies the probability distribution of caption words. Finally, caption is generated from the output words that are aligned in order from the first to “EOS” which indicates the end of statement. Since the context is vital, in captioning a detected proposal, we initialize the hidden state at t = 0 of the decoding LSTM by fusing the proposal states from the forward and backward passes, which capture both past and future contexts, h! f , hp , together with the visual features of the detected proposal, using a sequence encoder, Fig. 5. The visual input to the sequence encoder xt is defined in Eq. 5 and Eq. 6. xt ¼ Ft ðSn Þ
ð5Þ
t Ft ðSn Þ ¼ fusionðh! f ; hp ; V ; Ht1 Þ; n ¼ f1. . . Ng
ð6Þ
Where Vt ¼ fvi gPi¼1 is the two-stream visual features at each time step (of = 16 frames) within the proposal Sn , such that as mentioned before we have P time steps which starts at S and ends at E within each proposal, Fig. 5. We Design a dynamic attention mechanism to fuse visual features V ¼ fvi gEi¼S and context vectors h! f , hp . Mainly the Dynamic attention on features at each time step while can effectively improve the decoder’s captioning performance. To address this issue, we adopt an attention-based LSTM encoder. This means that for each proposal we fuse its hidden states together with its visual features, through a weighted linear combination. Where that at tth time step of the sequence encoder, the un-normalized relevance score rti for each time step’s features i within a proposal can be obtained as Eq. 7. Where, S and E denote the start and end time steps of the proposal. Ht1 Is the hidden state of the sequence encoder at the t-1 time step. Vector concatenation is applied on h! f , hp . The weights of vi can be obtained by softmax normalization such as Eq. 8. The attended visual feature is generated by a weighted sum through Eq. 9. The final input to the sequence encoder could be expressed as Eq. 10. The last output of the sequence encoder (context vector) is passed to initialize the hidden state ht ¼ 0 of the LSTM Caption Decoder, Fig. 5. This vector encapsulates information from all input elements for aiding the decoder to make accurate predictions.
72
N. K. Fawzy et al.
Fig. 4. Representing both future and past context information encoded in the hidden states hf and hp for the forward and backward LSTMs at a time step t. The proposal prediction step multiples the backward and forward confidence scores for each proposal in the current time step producing the final score per proposal.
Fig. 5. Caption generation module. The sequence decoder is initialized with attended visual features along with past and future context information. Whereas the input to the decoder is the attended semantic extracted features.
Video Captioning Using Attention Based Visual Fusion
73
rti ¼ WTa :tanh ðWv vi þ Wh ½h! f ; hp þ WH Ht1 þ b Þ; i ¼ f S ... Eg
ð7Þ
.X E bti ¼ exp rti rt m¼s m
ð8Þ
Vt ¼
P X
bti :vi
ð9Þ
i¼1
Ft ðSn Þ ¼ ½V t ; h! f ; hp
ð10Þ
4 Performance Evaluation 4.1
Dataset
To train and assess the performance of the caption generation, we used the MSR-VTT (video to text) dataset [1] which is divided into 20 categories of different activities, such as (music, cooking, sports, education, politics …etc.). From these categories we worked on the sports category, which consists of 785 video, where we divided into 80% training, 10% validation and 10% testing. Each video has around 20 natural language captions. For training the semantic concept features networks, a set of words acting as candidates, are defined with a size of C from all training captions. Among them, we choose the most frequent 500 verb and 1500 noun as the designated vocabulary to set the size of the fully connected layer within the dynamic and static semantic concept networks respectively. Each video will have a created ground truth of 1 hot vector where 1’s are located for nouns in the caption that appeared in the designated vocabulary. 4.2
Experiments
We experiment our adopted bi-directional LSTM temporal proposal module for detecting event segments that are close to the real segments within the MSR-VTT (sports). The Recall of such module in Table 1 was better than single-direction LSTM which confirms that bidirectional prediction that encodes past, current and future context indeed improves proposal quality, compared to single direction prediction that encodes past and current context only. To assess the performance of our video captioning module we conduct experiments on the following: 1. Measuring the accuracy for each semantic feature extraction network. We used the Mean Square Error (MSE) metric to evaluate the difference between the generated semantic words and the ground truth words per network.
74
N. K. Fawzy et al.
2. Investigate the effects of semantic features (DSC-N, SSC-N) and visual context captured through temporal visual attention (TVA-C), on caption generation performance. BLEU [25] and CIDEr-D [26], are considered typical evaluation metrics, for measuring the performance (precision and recall) of caption generation. During caption words prediction, the sequence encoder model is used to encode the input sequence once which returns the hidden states such that we use the last state to initialize the sequence decoder model. In Table 3 we report the evaluation of the following methods: – (TVA-C) + ASC: this method indicates using the temporal visual attention with context encoder for initializing the caption generation sequence decoder and the attended concatenated dynamic and semantic concepts (ASC), which is composed of DSC-N + SSC-N, as input to the decoder at each time step. – Bi-H + ASC: this method indicates using the hidden states of the proposal event from both directions within the bidirectional LSTM, Fig. 4, which are concatenated to represent an input to the initial hidden state of the decoder and the attended concatenated dynamic and semantic concepts (ASC), which is composed of DSCN + SSC-N, as input to the decoder at each time step. – (TVA-C) + DSC-N: this method indicates using only the dynamic semantic concepts DSC-N as input to the decoder at each time step, where the initialization is done by temporal visual attention with context encoder. – (TVA-C) + SSC-N: this method indicates using only the static semantic concepts SSC-N as input to the decoder at each time step, where the initialization is done by temporal visual attention with context encoder. – (TVA-C): this method indicates using the temporal visual attention with context encoder for initializing the caption generation sequence decoder. No input to the decoder and no semantic features used such that the output words probability at each time step is based only on the hidden state of the previous time step. – (TVA-C) + Bi-H: this method indicates using the hidden states of the proposal event from both directions within the bidirectional LSTM, Fig. 4, which are Table 1. Recall of the proposal module on MSR-VTT (sports) test set. Method TIOU = 0.8 Bidirectional-Temporal proposal generation 0.93 Single-direction Temporal proposal generation (Forward) 0.84
Table 2. Performance of semantic feature networks on the MSR-VTT (Sports). Networks Val-accuracy Test-accuracy SSC-N 99.66 99.68 DSC-N 99.84 99.87
Video Captioning Using Attention Based Visual Fusion
75
Table 3. Caption generation performance on the MSR-VTT (Sports) test set. Method (TVA-C) + ASC Bi-H + ASC (TVA-C) + DSC-N (TVA-C) + SSC-N (TVA-C) + Bi-H (TVA-C)
BLEU-1 87.8 83.8 79.8 77.0 67.1 62.4
BLEU-2 75.2 70.8 62.2 58.1 53.3 49.2
BLEU-3 64.7 60.0 54.3 50.7 47.5 42.2
BLEU-4 60.0 56.0 59.5 47.8 43.6 37.1
CIDer 96.4 94.3 78.6 57.0 48.3 41.7
Table 4. Performance comparison against another framework on MSR-VTT (sports) test set. Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 CIDer hLSTMat [29] 80.3 68.2 57.5 53.7 91.1 (TVA-C) + ASC 87.8 75.2 64.7 60.0 96.4
concatenated to represent an input to the decoder. The temporal visual attention with context encoder is used for initializing the caption generation sequence decoder. No semantic features used. 4.3
Experimental Results and Discussion
The recorded values in Table 2 indicate a high accuracy of semantic feature extraction in both networks. The results in Table 3 indicate that, utilizing semantic concepts (ASC, DSC-N, and SSC-N) networks during captioning is more effective than the case that only used the caption generation without semantics. It is important to note that the performance of (TVA-C) +DSC-N model is better than the (TVA-C) + SSC-N model. This may be related to the effect of the dynamic semantic concept network that indicates an activity present within the video. Another reason is that the caption of a video usually contains single activity (verb) and multiple objects (nouns). Furthermore, incorporating both the dynamic and static semantic concept features within the model (ASC) is the most effective. Also when we reused the proposal’s hidden states (backward, forward) as “context vectors” and fused them with the event’s visual features, via attention mechanism (TVA-C), along with the semantic features concepts (ASC), we got better captioning results, better than using context vectors alone. The comparison in Table 4 indicates that our framework (TVAC) +ASC has better caption generation performance than a similar work in [29] which applies temporal attention on mean pooled visual inputs. Our framework outperforms [29] because we apply attention for the fusion of visual inputs along with context vectors. Also our utilization of semantic features makes sense for better performance than [29].
76
N. K. Fawzy et al.
5 Implementation Details In this work, we utilized Pytorch python library which is a deep learning library within Python. Anaconda on Ubuntu 14.04 LTS environment was used to implement the proposed framework. The GPU hardware used for the experiment was GeForce RTX 2080-ti. For training the semantic concepts feature networks (SSC-N and DSC-N), we used “Adam model optimization” with the binary cross-entropy loss calculation function. For the caption generation, RMSprop was used as the model optimization algorithm, and the categorical cross-entropy cost function was selected as the loss function. In order to train the bi-directional proposal generation module of our framework, in Fig. 4, we wish to train the network with samples that express temporal long overlapped segments, which are longer than the (K) proposals we want to detect at each time step, so that we encourage the network to avoid saturation of the hidden states in both directions. Each time step in the input video sequence can be considered multiple times each in a different context through dense sampling.
6 Conclusion In this paper, a deep neural network framework is proposed that identifies and handles three challenges related to the task of video captioning. These challenges are (1) semantic concepts feature learning, for reducing the gap between low-level video feature and sentence descriptions (2) event representation and (3) context fusion with visual features. Firstly we adopt a bidirectional LSTM framework, for localizing events such that it encodes both past and future contexts where both contexts help localizing the current event better. We further reused the proposal’s context information (hidden states) from the localization module as context vectors and dynamically fused those with event clips features, which are extracted by two stream 3D ResNets. Using an attention based mechanism, to fuse visual contents with context information produced superior results compared to using the context alone. The proposed model also learns additionally semantic features that describe the video content effectively, using LSTMs. Experiments on MSR-VTT (sports) dataset, demonstrate the performance of the proposed framework. Our future works are as follows: 1. Coupling proposal and captioning modules into one unified framework, trained in an end to end manner. 2. Investigate how to exploit the temporal event proposal module and the bi-modal features for multiple sentence generation for videos (dense captioning).
References 1. Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 5288–5296 (2016)
Video Captioning Using Attention Based Visual Fusion
77
2. Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.: Translating videos to natural language using deep recurrent neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1494–1504. North American Chapter of the Association for Computational Linguistics (NAACL), Colorado (2015) 3. Mahdisoltani, F., Berger, G., Gharbieh, W., Fleet, D., Memisevic, R.: Fine-grained video classification and captioning. ArXiv_CV (2018) 4. Buch, S., Escorcia, V., Shen, C., Ghanem, B., Carlos Niebles, J.: SST: single-stream temporal action proposals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp. 6373–6382 (2017) 5. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City , pp. 6546–6555 (2018) 6. Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3D residual networks for action recognition. In: IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 3154–3160 (2017) 7. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, pp. 4724–4733 (2017) 8. Wang, X., Miao, Z., Zhang, R., Hao, S.: I3D-LSTM: a new model for human action recognition. In: IOP Conference Series: Materials Science and Engineering (2019) 9. Kay, W., et al.: The kinetics human action video dataset. ArXiv (2017) 10. Soomro, K., Zamir, A.R., Shah, M.: A dataset of 101 human actions classes from videos in the wild. ArXiv (2012) 11. Zhao, Y., Yang, R., Chevalier, G., Xu, X., Zhang, Z.: Deep residual bidir-LSTM for human activity recognition using wearable sensors. Math. Prob. Eng. 1–13 (2018) 12. Kuppusamy, P.: Human action recognition using CNN and LSTM-RNN with attention model. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8, 1639–1643 (2019) 13. Shou, Z., Wang, D., Chang, S.-F.: Temporal action localization in untrimmed videos via multi-stage CNNs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), USA, pp. 1049–1058 (2016) 14. Lin, T., Zhao, X., Fan, Z.: Temporal action localization with two-stream segment-based RNN. In: IEEE International Conference on Image Processing (ICIP), Beijing, pp. 3400– 3404 (2017) 15. Yao, G., Lei, T., Liu, X., Jiang, P.: Temporal action detection in untrimmed videos from fine to coarse granularity. Appl. Sci. 8(10), 1924 (2018) 16. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the 15th IEEE International Conference on Computer Vision, ICCV 2015, pp. 4489–4497 (2015) 17. Karen, S., Andrew, Z.: Two-stream convolutional networks for action recognition in videos. Adv. Neural. Inf. Process. Syst. 1, 568–576 (2014) 18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 19. Pan, P., Xu, Z., Yang, Y., Wu, F., Zhuang, Y.: Hierarchical recurrent neural encoder for video representation with application to captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1029–1048 (2016) 20. Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: IEEE International Conference on Computer Vision (ICCV), USA, pp. 4507–4515 (2015)
78
N. K. Fawzy et al.
21. Jeff, D., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Patt. Anal. Mach. Intell. 39(4), 677–691 (2017) 22. Wang, H., et al.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition (CVPR), USA (2011) 23. Yu, Y., Ko, H., Choi, J., Kim, G.: End-to-end concept word detection for video captioning, retrieval, and question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), USA, pp. 3261–3269 (2017) 24. Bird, S., et al.: Natural Language Processing with Python. O’Reilly Media Inc, California (2009) 25. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), USA, pp. 311–318 (2002) 26. Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, Massachusetts, pp. 4566–4575 (2015) 27. Escorcia, V., Caba Heilbron, F., Niebles, J.C., Ghanem, B.: DAPs: deep action proposals for action understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). https://doi.org/10.1007/978-3-31946487-9_47 28. Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and appearance features (2014) 29. Jingkuan, S., et al.: Hierarchical LSTM with adjusted temporal attention for video captioning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Australia, pp. 2737–2743 (2017)
Matchmoving Previsualization Based on Artificial Marker Detection Houssam Halmaoui1,2(B) and Abdelkrim Haqiq2(B) 1 2
ISMAC - Higher Institute of Audiovisual and Film Professions, Rabat, Morocco [email protected] Faculty of Sciences and Techniques, Computer, Networks, Mobility and Modeling Laboratory: IR2M, Hassan First University of Settat, 26000 Settat, Morocco [email protected]
Abstract. In this article, we propose a method for inserting a 3D synthetic object into a video of real scene. The originality of the proposed method lies in the combination and the application to visual effects of different algorithms of computer vision and computer graphics. First, the intrinsic parameters and distortion coefficients of the camera are estimated using a planar checkerboard pattern with Zhang’s algorithm. Then, AruCo marker dictionary and the corresponding feature detection algorithm are used to detect the four corners of a single artificial marker added to the scene. A perspective-4-point method is used to estimate the rotation and the translation of the camera with respect to a 3D reference system attached to the marker. The camera perspective model is then used to project the 3D object on the image plan, while respecting perspective variations when the camera is moving. The 3D object is illuminated with diffuse and specular shading models, in order to match the object to the lighting of the scene. Finally, we conducted an experiment to quantitatively and qualitatively evaluate the stability of the method. Keywords: Camera pose · Fiducial markers shading · Augmented reality · Visual effects
1
· Diffuse and specular
Introduction
Matchmoving or camera tracking is an augmented reality method used in visual effects [12], and which consists in inserting 3D synthetic objects into a video of real scene, in such a way that the object coexist coherently with the other elements, while respecting the geometry and the lighting of the scene (see Fig. 3b). Usually, in filmmaking, this effect is achieved in post-production. However, in case of problem during shooting, whether it is technical (lighting or tracking) or artistic (object position), it is difficult to do the matchmoving in post-production without spending a lot of time processing the video frame by frame or without re-shooting the scene. This implies significant time and cost losses. In this article, we propose a method to make a previsualization of the result on set. This allow to detect and correct problems during shooting. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 79–89, 2021. https://doi.org/10.1007/978-3-030-58669-0_7
80
H. Halmaoui and A. Haqiq
The problem of matchmoving has both geometrical and lighting aspects, which are often treated separately in the literature. The estimation of the camera pose (rotation and translation of the camera) is central to the geometrical aspect. In [8,9,13], external sensors (inertial sensor, Wifi or Hololens headset) are used for this purpose. In [1,7,10] the estimation is performed by deep learning, this solves the motion blur problem, but requires the presence of textures in the image. Traditional methods are based on feature detection, which are patterns such as corners or similar connected regions (blobs) that have the particularity of being reliable for tracking [4]. Some detectors are invariant to scale, affine transformations and orientation [11]. Artificial markers can be added in the scene in order to handle the problem of non textured area and for more robust detection, thanks to the unique binary code included in the markers. For this study, we use the ArUco detection algorithm [2] and the corresponding marker dictionary because of its detection performance compared to other artificial markers [14] and for the speed of calculation. By locating the four corners of a single marker, it can be used as a 3D reference system for the estimation of the camera pose, thanks to a perspective-4-points method [16]. Concerning the lighting aspect, the two most used methods for rendering are rasterization [3] and ray tracing [15]. We use rasterization because of the processing speed on CPU and the sufficient rendering quality for previsualization. Finally, instead of using an automatic illumination estimation method [6], we choose manually the lighting position of the 3D object when adjusting its geometrical aspect (manual intervention which is necessary since it depends on the user’s choice). The originality of our method is to propose a matchmoving previsualization solution, by combining all the geometrical and shading aspects in a single system, thanks to various computer vision and computer graphics algorithms. On the other hand, the method is accessible through the use of a single camera, an artificial marker and fast camera pose estimation and rendering algorithms.
Fig. 1. Steps of the proposed method.
The steps of the proposed method are summarized in Fig. 1. First, the camera is calibrated in order to estimate its intrinsic parameters. Then, we proceed to
Matchmoving Previsualization
81
the detection of the artificial marker corners in order to estimate the camera pose. After adjusting the desired geometrical appearance of the 3D object, we project it onto the image using the camera perspective model and the estimated camera parameters. The visible faces of the object are computed by a Hidden Surface Removal algorithm. Finally, we assign to each face a color using a diffuse and specular shading models, according to a lighting position chosen by the user. The article is organized as follows. In Sect. 2, we present the camera perspective model used for camera parameters estimation and for 3D object projection. In Sect. 3, we present in details the steps of the proposed algorithm. In Sect. 4, we present the results of quantitative and qualitative evaluation.
2
The Camera Perspective Model
Calibration Matrix The aim is to establish the relationship between a point coordinates in the image in pixels and the corresponding 3D point in the world space in meters [12]. The camera is considered to follow a pinhole model. The 3D coordinates are specified in a camera reference system (XY Z) as shown in Fig. 2.
Fig. 2. Pinhole model and reference systems for camera and image.
T We consider a point of the scene with coordinates Xc Yc Zc . The projec T ˜ y˜ in the image as a function of the focal length f is: tion x x ˜=f
Xc Zc
y˜ = f
Yc Zc
(1)
T T ˜ y˜ are physical measurements in meters. The values x y The values x in pixels are written as a function of the width dx and the height dy of a pixel T and the center of the image x0 y0 which corresponds to the projection of the origin of the camera reference system in the image plane: x= One thus, we obtain:
x ˜ + x0 dx
y=
y˜ + y0 dy
(2)
82
H. Halmaoui and A. Haqiq
x=
f Xc + x0 dx Zc
y=
f Yc + y0 dy Zc
(3)
The Eq. 3 can be written in matrix form: T T x y 1 ∼ K Xc Yc Zc (4) ⎤ ⎡ αx 0 x0 f f with K = ⎣ 0 αy y0 ⎦, αx = and αy = dx dy 0 0 1 K is called the calibration matrix and the symbol ∼ means that equality is obtained up to a factor (by dividing the right hand term by Zc ). Distortions The cameras used in real life are more complicated than a simple pinhole model. The image usually suffers from radial distortions caused by the spherical shape T ˜ y˜ of the ideal image of the lens. The relationship between the coordinates x T ˜dist y˜dist (without distortion) and the coordinates x of the observed image (with distortion) is as follows [16]: x2 + y˜2 ) + κ2 (˜ x2 + y˜2 )2 )˜ x x ˜dist = (1 + κ1 (˜ x2 + y˜2 ) + κ2 (˜ x2 + y˜2 )2 )˜ y y˜dist = (1 + κ1 (˜
(5)
The coefficients κ1 and κ2 control the amount of distortion. In the case of large distortions (wide angle lens), a third coefficient κ3 can be added as a third order in the polynomial formula. This distortion model is combined with the Eq. T 2 in order to have a model as a function of the pixel coordinates xdist ydist : x2 + y˜2 ) + κ2 (˜ x2 + y˜2 )2 )(x − x0 ) + x0 xdist = (1 + κ1 (˜ ydist = (1 + κ1 (˜ x2 + y˜2 ) + κ2 (˜ x2 + y˜2 )2 )(y − y0 ) + y0
(6)
Camera Matrix Assuming the distortions are compensated (see Sect. 3), the image formation can be modeled by the Eq. 4. This model allows to express the coordinates of a point in the camera reference system. For matchmoving problem, the 3D object coordinates can be defined in any reference system of the scene. We must there T fore, before applying the projection model, transform the coordinates X Y Z T expressed in any reference system into coordinates Xc Yc Zc expressed in the camera reference system. The transformation formula is: T T Xc Yc Zc = R X Y Z + t
(7)
R and t are the extrinsic parameters. R is a 3 × 3 rotation matrix defined by 3 angles and t is a translation vector. Thus, Eq. 4 becomes:
Matchmoving Previsualization
83
T xy1 ∼P XY Z1
(8) with P = K R | t . P is called the camera matrix. Matchmoving problem is equivalent to calculate the matrix P for each frame of the video.
3
Proposed Method
Calibration Matrix Estimation Assuming that the focal length of the camera does not change during video acquisition, we need to estimate the calibration matrix only once at the beginning. This is done by using several images of a planar checkerboard with known dimensions, captured from different points of view [16]. Figure 3a shows the calibration configuration. We used a phone camera mounted on a tripod and controlled remotely for more acquisition stability to reduce the calibration error.
Fig. 3. (a): Camera calibration configuration. (b): Result of 3D object projection in a video acquired with a moving camera.
Considering that Z = 0 corresponds to the plane of the checkerboard pattern, the Eq. 8 is transformed into a homography: T T x y 1 = λK r1 r2 r3 t X Y 0 1
(9)
where r1 , r2 and r3 are the columns of R, and λ an arbitrary coefficient. Thus, we have: T T x y 1 = λK r1 r2 t X Y 1
(10)
Therefore, the homography formula is:
T T xy1 =H XY 1
(11)
We note H the 3 × 3 homography matrix: H = λK r1 r2 t
(12)
84
H. Halmaoui and A. Haqiq
H is estimated using a DLT (Direct Linear Transformation) algorithm [5]. We start by detecting the corners of the squares of the checkerboard using a feature T detection algorithm. Each corner x y gives us two equations from Eq. 11. The T corresponding X Y 1 is known, since we have the physical distance between the corners. Note that H has a degree of freedom of 8 since the homography equation is defined up to a factor. Thus, we need a minimum of four corners to find H. Since the position of the corners is subject to noise, we use more than four corners. Then, the least squares method is used to find an approximate solution, by a singular-value decomposition, corresponding to the best homography H that minimize a cost function. Once we have H for each view, we can deduce K from Eq. 12: by adding constraints on K using the fact that r1 and r2 are orthonormal [16], the Eq. 12 is simplified into a linear equation and the estimation of K is performed again with the DLT algorithm. Distortion Coefficient Estimation Once the intrinsic parameters (x0 , y0 , αx , αy ) are known, the images of the checkerboard are used again. The positions of the ideal points (without distortion) are known since the dimensions of the checkerboard are also known. The corresponding distorted points in the image are detected using a feature detection algorithm. Then, the distortion model (Eq. 6) is solved using a least squares method in order to estimate the distortion coefficients κ1 and κ2 . Markers Detection Since the camera is moving, this step and all those that follow must be done for each frame. As mentioned previously, we use ArUco artificial markers [2]. A single marker is added to the scene in order to be used as a 3D reference (see Fig. 3b). The detection algorithm is as follows: 1) Edge detection and thresholding. 2) Polygonal approximation of concave rectangles with four corners. 3) Perspective projection to obtain a frontal rectangle, and identification of the internal code by comparing it to the markers in the dictionary. For more details see [2]. Pose Estimation Once the corners of the marker are detected, we use their positions to estimate R and t [16]. First, the four corners are used to estimate the homography between the marker and the corresponding image (we consider the marker in the plane Z = 0). As we mentioned early, a minimum of four coplanar points is necessary to estimate a homography with the DLT algorithm [5]. Finally, knowing K and H, the calculation of the external parameters is performed using Eq. 12: r1 = K −1 h1 r2 = K −1 h2 t=K
−1
h3
(13)
Matchmoving Previsualization
85
where h1 , h2 and h3 are the columns of H. We deduce r3 by using a cross product: r3 = r1 × r2
(14)
3D to 2D Projection After adjusting the desired object size and position through geometric transformations, we project the vertices of the 3D object in the image as follows: 1) Transformation of the 3D object’s coordinates in the camera reference system using the extrinsic parameters R and t. 2) Projection of the 3D points in the image using the calibration matrix K. 3) Application of the distortion model. Hidden Surface Removal When projecting 3D objects, as in real life, we want to see only the front of the objects and not the back of them. This process is called HSR (Hidden Surface Removal). For this, we use the Z-buffering HSR algorithm. Two buffers are used, one for color and one for depth (Z-buffer). We start by calculating for each polygon (face formed by three vertices) the distance dpoly from its center T T Xpoly Ypoly Zpoly to the camera using the translation vector t = tx ty tz : (15) dpoly = (tx − Xpoly )2 + (ty − Ypoly )2 + (tz − Zpoly )2 For each pixel of the projected 3D object, if the distance of the corresponding polygon is less than the one stored in the depth buffer, the distance in the depth buffer and the color in the color buffer are replaced by those of the corresponding polygon. In the following, we see how to calculate the polygon color. Ambient, Diffuse and Specular Shading In order to illuminate the projected 3D object, we use the ambient, diffuse and specular models, which are the most widely used [3]. Ambient reflection simulates lighting that affects equally all objects, its contribution is a constant value. Diffuse reflection scatter the light in all directions, its contribution is: Idif f use = Cd max((N • L), 0)
(16)
with L the light direction, N the normal vector and Cd the diffuse color. Light position is selected by the user to best match that of the real scene. The color Cd depends on the type of lighting and the object material, but we don’t cover this aspect here, we simply use a constant value of the color. Finally, specular reflection simulates the shininess of an object, its contribution is: Ispecular = Cs max(0, (R • V )n )
(17)
86
H. Halmaoui and A. Haqiq
Fig. 4. Angles of incidence of light and angle of view.
with Cs the specular color, V the direction of view, n is a factor for controlling the width of the gloss spot, and R is the direction of reflection. Figure 5 shows an example of projection using diffuse and specular lighting. To accelerate the process in CPU, we assign the same color to the whole polygon (flat shading). For better quality, a value obtained by interpolation from the three vertices of the polygon is assigned to each pixel. Also, CPU does not allow fast rendering for thousands of vertices, and a GPU implementation is necessary. Note that all the steps, including the rendering part, were implemented on CPU using C++, OpenCV and the ArUco library.
Fig. 5. A 3D object projection and illumination using, from left to right, the diffuse model, the specular model, and a combination of both.
4
Quantitative and Qualitative Evaluations
We carried out an experiment to measure the stability of the method. We used a tripod-mounted mobile phone to capture videos of a fixed scene (without any camera movement), in order to have ground truth. Given that the camera and the scene are static, a vertex should ideally be in the same position in the image. We compute a repeatability score between each frame and the first frame of R , with R the number of repeatable the sequence considered as reference: S = Nf detections, which corresponds to the number of projected vertices whose position has not changed compared to the reference image. Nf is the total number of vertices. We evaluated three kinds of sequences: high lighting, low lighting, and variable lighting, and for each type of sequence we placed the camera at different distances. The aim is to measure the impact of the lighting and marker size on the stability of the matchmoving. Figures 6 shows the evaluation results. Figure 6a corresponds to the sequences with high lighting. We can see that when the marker size is big enough, the score is around 0.9, therefore the
Matchmoving Previsualization
87
Fig. 6. Quantitative evaluation results.
matchmoving is very stable. When the size of the marker decreases, the score drops to 0.8. Figure 6b corresponds to the cases where the lighting is low. We observe the same thing as previously, but with a slightly higher drop in score (around 0.7) when the size of the marker is small. Figure 6c shows the case of variable lighting. We can see that when we vary the lighting (after 500 frames), by introducing shadows on the marker and varying its contrast, the score can become very low. In order to measure by how many pixels the projected vertices have moved, we recalculated the score of the sequences where the marker size is small (worst results), by introducing a displacement tolerance of ±1 pixel. Figure 6d shows the result obtained. We can see that the score remains equal to 1 all the time, which means that the errors we obtained in the previous experiments are due to vertex displacements of ±1 pixel, which is usually not very visible. In addition to the size of the markers and lighting, another factor that impacts the stability of the matchmoving is the distance of the projected object from the marker. Indeed, the projection error is more important when the object is far away from the marker, due to calibration error. We repeated the last experiment with small marker size for variable lighting (worst results) by choosing a tolerance of ±1, and by placing the object at different distances. Figure 6e shows the result. We can see that when lighting vary, the further the object moves away from the marker, the more score drops. In the majority of the frames, except when the object are very far from the marker, the score remains very high. One solution to correct this stability problem is to manually process the few low score frames in post-production. From these experiences, it can be deduced that in order to have a stable projection result, it is necessary to use a large marker, a high and homogeneous lighting, and to place the object close to the marker.
88
H. Halmaoui and A. Haqiq
It is also important to reduce the calibration error by making sure that the checkerboard are perfectly plan and by using professional camera.
Fig. 7. Examples of results for qualitative evaluation.
To qualitatively evaluate the result, we projected a 3D object in video sequences captured with a moving mobile phone camera. Figure 7 shows result images acquired from different points of view. We can see that the geometry of the 3D object corresponds to the perspectives of the scene during the camera movement. The 3D object is immobile to facilitate the evaluation of the result: for an accurate result, the object must remain immobile during the camera movement. We found that the detection of markers suffers from instability and non-detection when the camera is moving fast. Among our perspectives, we propose to solve this problem with a Kalman filter. Finally, in matchmoving, the marker should not be visible in the output video. In the case of immobile objects, it can be placed above the marker. In the case of a moving object, a solution is to place the marker in a homogeneous area and then remove it using inpainting algorithm.
5
Conclusion and Perspective
We presented a method for matchmoving previsualization by estimating the camera pose using a single artificial marker as a 3D reference. The different modules are independent, this allows to replace each one of them by the most appropriate according to the desired application. In our case, we chose the methods that allow us to process the images accurately and quickly. The quantitative and qualitative evaluations on videos of real scenes attests to the effectiveness of the proposed algorithms, but we found that the method suffer from instability on some frames when the marker have small image size, when lighting is low or inhomogeneous, when the objects projected are far from the marker, and when the camera moves fast. We intend to test other methods for feature detection, pose estimation and calibration, in order to improve the accuracy. All the algorithms have been implemented on CPU, on the one hand this allows easier migration to any platform, but to get a more realistic rendering quality it is necessary to use a parallel GPU implementation. In this context, we envisage to work on the lighting aspect by considering the type of materials and other lighting models. We also intend to apply the method in more practical cases such as the creation of virtual environments, and to consider the case of 3D animated objects.
Matchmoving Previsualization
89
References 1. En, S., Lechervy, A., Jurie, F.: Rpnet: an end-to-end network for relative camera pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018) 2. Garrido-Jurado, S., Mu˜ noz-Salinas, R., Madrid-Cuevas, F.J., Medina-Carnicer, R.: Generation of fiducial marker dictionaries using mixed integer linear programming. Pattern Recognit. 51, 481–491 (2016) 3. Gordon, V.S., Clevenger, J.L.: Computer Graphics Programming in OpenGL with C++. Stylus Publishing, LLC (2018) 4. Harris, C.G., Stephens, M.: A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, pp. 147–151 (1988) 5. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press (2003) 6. Hold-Geoffroy, Y., Sunkavalli, K., Hadap, S., Gambaretto, E., Lalonde, J.F.: Deep outdoor illumination estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7312–7321 (2017) 7. Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp. 2938–2946 (2015) 8. Kotaru, M., Katti, S.: Position tracking for virtual reality using commodity WiFi. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 68–78 (2017) 9. Lee, J., Hafeez, J., Kim, K., Lee, S., Kwon, S.: A novel real-time match-moving method with hololens. Appl. Sci. 9(14), 2889 (2019) 10. Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 675–687. Springer (2017) 11. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004) 12. Radke, R.J.: Computer Vision for Visual Effects. Cambridge University Press (2013) 13. Rambach, J.R., Tewari, A., Pagani, A., Stricker, D.: Learning to fuse: a deep learning approach to visual-inertial camera pose estimation. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 71–76. IEEE (2016) 14. Romero-Ramirez, F.J., Mu˜ noz-Salinas, R., Medina-Carnicer, R.: Speeded up detection of squared fiducial markers. Image Vis. Comput. 76, 38–47 (2018) 15. Wyman, C., Marrs, A.: Introduction to directx raytracing. In: Ray Tracing Gems, pp. 21–47. Springer (2019) 16. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Research Method of Blind Path Recognition Based on DCGAN Ling Luo1, Ping-Jun Zhang1, Peng-Jun Hu1, Liu Yang1, and Kuo-Chi Chang1,2,3,4(&) 1
2
School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China [email protected] Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China 3 College of Mechanical and Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan 4 Department of Business Administration, North Borneo University College, Sabah, Malaysia
Abstract. In order to solve the problem that there are few blind path data sets and a lot of manual data collection work in the current blind guide system, computer vision algorithm is used to automatically generate blind path images in different environments. Methods a blind path image generation method based on the depth convolution generative adversary network (DCGAN) is proposed. The method uses the characteristics of typical blind path, which is the combination of depression and bulge. The aim of long short memory network’ (LSTM) is to encode the depression part, and the aim of convolution neural network (CNN) is to encode the bulge part. The two aspects of information are combined to generate blind path images in different environments. It can effectively improve the blind path recognition rate of the instrument and improve the safe travel of the visually impaired. Conclusion generative adversarial networks (GANs) can be used to generate realistic blind image, which has certain application value in expanding blind channel recognition data, but it still needs to be improved in some details. Keywords: Convolutional neural network (CNN) Depth convolution generative adversary network (DCGAN) Generative adversarial networks (GANs) Advanced blind path recognition system Algorithm
1 Introduction According to a study published in The Lancet Global Health, by 2050, the number of cases of blindness in the world will increase from 36 to 115 million. China is one of the countries with the largest number of vision disorders in the world. At the end of 2016, the number of vision disorders was about 17.31 million, of which more than 5 million were blind, with an annual increase of 450,000 blind people and 1.35 million low vision people [1]. Most of the information obtained by human beings is transmitted by vision, accounting for 80% [2]. Because of the physiological defects and the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 90–99, 2021. https://doi.org/10.1007/978-3-030-58669-0_8
Research Method of Blind Path Recognition Based on DCGAN
91
increasingly complex living environment, it brings many inconveniences to the blind people's life. In view of the inconvenience of blind people, life, guide dogs and guide rods have gradually become tools to help blind people travel. However, the guide dog is not easy to train and the cost is high, and the detection range of guide stick is limited. In view of this, the design of intelligent guide system is beneficial to the actual life of blind people. Blind path recognition is an important part of intelligent guide system. Blind path recognition belongs to image recognition, which has been a hot research topic in the field of computer vision. CNN is the most widely used method in image recognition. The biggest disadvantage of CNN is that it requires a large-scale training data set to train the samples, which will undoubtedly slow down the convergence speed of the network model and require many training skills to improve the recognition rate. This article draws on the following two advanced and usability aspects of image recognition, learning GANs, and using DCGAN to identify blind tracks. Lu et al. used DCGAN to generate face image [3]. Zeng et al. build a semi supervised depth generation network to change the recognition rate by increasing training samples through label distribution [4]. Good fellow et al. proposed a generation model is the GANs in 2014 [5]. This can generate high-quality images and extract features. The difference between the generation model and the traditional generation model lies in that it includes two parts: the network of generation and the network of discrimination. This will generate a confrontational relationship between the network and the discriminatory regional network. The idea of GANs originates from the zero sum game, that is to say, whether the interests of two parties are increased or not is affected by each other, the increase of interests of one party will lead to the decrease of interests of the other party, and the two parties are in the state of confrontation game. This idea is used in the GANs. For the generation network, the data generated will be infinitely close to the real data through a series of data fitting using the input original data. For the discrimination network, the “false data” and “real data” generated by fitting will be compared to finally distinguish whether the generated data is real or not. At the same time, the generation network will be from the discrimination network. The distribution of learning data in the network optimizes the learning of the generated network, so that the two networks finally achieve Nash equilibrium [6]. The operation process diagram of GANs is exhibition in Fig. 1 [7]. The algorithm of GANs core principle is described below: since given generation network, the optimal discrimination network and the discrimination network are two classification models, and the training process is shown in Formula (1). min max VðD; GÞ ¼ Ex pdateðxÞ½ln DðxÞ þ Ez pzðxÞ½ln ð1 DðGðzÞÞÞ G
D
ð1Þ
In the formula: x is shows the real sample of input; z is shows the random noise in the input generation model; D(x) represents the proportion that the discrimination model think the input sample as the real sample; the value of D(x) is 1, X represents 100% of the real sample; the value of D(x) is 0, x cannot be the real sample; G(z) shows the samples created by the generation model after receiving random noise; Pdate (x) shows the real data situation; PZ(x) shows the generated data situation. It is
92
L. Luo et al.
Fig. 1. The operation process diagram of GANs
the task of the discriminator to accurately determine whether the input samples are true or not. The V(D, G) to be obtained is the maximum, that is, to find max D, the D works so that D(G(z)) is infinitely close to 0 and D(x) is infinitely close to 1. It is the task of the G to make the generated samples infinitely close to the real samples. To get the minimum V(D, G), that is, to find the Ming, then the G works to make D(G(z)) infinitely close to 1, D(x) infinitely close to 0 [8]. This is a process of dynamic confrontation game. In this training process, GANs runs an alternative optimization mode. The aim to make the accuracy of the discriminant framework as low as possible, the fixed process is discriminant model, and the optimized process is to generate model. The aim to improve the accuracy of the discriminant framework, the fixed process is to generate model.
2 Methodology DCGAN is a generalized model of GANs improved by Dundar et al. in 2015 [9]. It’s standard structure is the same as that of GANs. The difference is that on the basis of GANs, CNN is used to replace the internal structure of generator, and this is used to generate image samples. In DCGAN, The structure of D is a CNN, which is used to determine the probability that the input image is a real image. The difference of this CNN is that the down sampling operation of the pooling layer is replaced by the step long convolution, because the down sampling operation of the pooling layer is only a simple extraction of pixel points, while the step length convolution can extract the deep features while constantly reducing the size of the feature map In the process of down sampling operation, convolution operation is carried out at the same time. The aim to make the network easier to converge, LeakyRelu is used as the activation function in the D, and its formula is as follows (2), the characteristics of LeakyRelu are shown in the formula, which is composed of two linear segments. On the positive half axis, it shows a positive proportion mapping, on the negative half axis, it shows a negative value adjusted by the slope leak. The input image needs to be convoluted through each layer in D in order to
Research Method of Blind Path Recognition Based on DCGAN
93
extract the convolution features of the subsequent input logistic function, so the final output is the probability of the real image [10]. x LeakyReluðxÞ ¼ leak x;
x[0 else
ð2Þ
There is no difference between the input of G and the input of GANs. The input of G is a 100 dimensional noise vector z. then the extended feature points pass through each linear layer. The first layer of G is the full connection layer. The purpose is to reconstruct the 4 * 4 * 1024 dimensional feature map of the 100 dimensional Gaussian noise vectors. Next, the deconvolution is used for the up sampling operation to gradually reduce the number of channels. Only Sigmoid function is used in the last layer, because sigmoid has better performance when the features are different, and it can continuously enhance the feature effect in the process of circulation. In every other layer, we use Relu to activate the function; Relu’s formula is as follows (3). The characteristic is that the positive half axis behaves the same as LeakyRelu, and the negative half axis is 0 under any condition, so as to filter the negative value, i.e. neurons when the input value is negative, it will not be activated, which can improve the calculation efficiency. The final output is a 64 * 64 * 3 image. The network structure of is G shown in Fig. 2. ReluðxÞ ¼
x; 0;
x[0 else
ð3Þ
Fig. 2. Structure of G in DCGAN
3 Using DCGAN to Enhance Blind Track Image Under the condition of the DCGAN, the enhancement of blind path image mainly goes through the following processes: firstly, the training environment of blind path recognition should be built, then a sufficient number of original data samples should be collected, finally, DCGAN model training should be carried out under the premise of the first two processes, and then the generated data can be obtained.
94
3.1
L. Luo et al.
Build a Training Environment for Blind Path Recognition
Based on Tensorflow learning framework, this paper build DCGAN model. The training environment of blind path recognition is shown in Table 1. Table 1. The training environment of blind path recognition Project name Operating system RAM CPU Graphics card Python version Tensorflow version Development environment Primary library
3.2
Detailed information Windows10, 64 bit 32 GB Memory Intel(R) Core(TM) i7-7700HQ @2.80 GHz NVIDIA GeForce GTX 1050 Ti Python3.7.7 TensorFlow1.8.0 Jupyter Notebook Numpy/Matplotlib/Tensorflow, etc.
Original Data Sample
Since there is no public data set about blind channel, the blind channel data set used in this paper is collected and produced by myself, which contains images in different scenes. DCGAN model is used to enhance the data. 3.3
Model Training
During the training, the data is loaded in the way of batch processing, and the size of batch processing is set to 20, that is, during the training, each batch loads 20 blind channel images; during the test, the size of batch processing is set to 12, that is, during the test, each batch loads 12 blind channel images. The Adam optimizer with global learning rate of 0.0002 and momentum of 0.5 optimizes the loss function [11]. The cycle process can be described in this way: the first step is to generate blind path image output by G, the second step is to accept the discrimination of true and false images in the D, the third step is to calculate the birth loss and discrimination loss respectively from the output of G and D, and in order to maintain the balance of confrontation, the update times ratio between the D and the G is 1:2. The training of GANs is usually unstable. Dropout is used in the first hidden layer of the G of this model, and its value is set to 0.5. The purpose is to prevent over fitting from appearing in the training process. At the same time, it is also used in the first hidden layer of the D and its value is set to 0.5, and its value is set to 0.9. At the same time, L2 regularization is carried out for all convolution layers with stride 1 and all parameters of fully connected layers.
Research Method of Blind Path Recognition Based on DCGAN
3.4
95
Experimental Results Display
After a certain number of cycles, the model converges gradually. Figures 4, 5 and 6 show the corresponding image output by G when the corresponding arbitrarily distributed data is input. Figure 3 is the original blind path image photo; Fig. 4 is the image output after running 5 cycles under DCGAN. It can be seen from the figure that the image cannot show the shape of blind path at this time, and Fig. 5 is the image output after running 100 cycles under DCGAN. At this time, the shape of blind path can be clearly displayed, but the more detailed content of the image cannot be well displayed. Figure 6 is the image output after running 5 cycles under DCGAN. After 400 cycles of running, the image output is stable, and the blind path in the image can also be displayed well.
Fig. 3. Original image
Fig. 5. 100 Cycle image output.
Fig. 4. 5 Cycle image output.
Fig. 6. 400 cycle image output.
96
L. Luo et al.
3.5
Loss Function
Figure 7 and Fig. 8 show the change of the real loss function data (lossd_loss_real) and the generated forged data loss (d_loss_fake) through the D with the increase of training times. It can be seen from the figure that both of them show a downward trend in the overall picture, and there is a large range of shock in the later period of training, which is due to the effect of confrontation with the D after the network model of confrontation generation is stable. Figure 9 and Fig. 10 are the sum of loss functions for D to distinguish the real data
Fig. 7. The change process d_loss_real of blind path image
Fig. 8. The change process d_loss_fake of blind path image
and the generated forged data, It can be concluded that both G and D training are relatively smooth in the early stage and fluctuate obviously in the later stage. This is because the generation network and discrimination network are gradually optimized in the process of increasing training times. As the training process is a dynamic zero sum game process, one party's interests is increased as shown in the figure will result in the other party's interests is drop [12].
Fig. 9. The change process d_loss of blind path image
Fig. 10. The change process g_loss of blind path image
Research Method of Blind Path Recognition Based on DCGAN
3.6
97
Feasibility Generation of Validation Samples
In order to verify the feasibility of generating samples from DCGAN, this paper builds a model based on TensorFlow deep learning framework for cifar-10 data validation. It is a real problem that the number of images is small. There are a few kinds of models that can not be distinguished. The image generated by DCGAN is used to fill in the categories with a small number of images, so that the number of images in each category is approximately the same. The following is a partial code description of the discriminator, generator and the training of both. For the definition of D model. Using LeakyReLU and Dropout, using the minimum binary cross entropy loss function, Adam with a learning rate of 0.0002 and a momentum of 0.5 is used for random gradient descent. Standard convolution: model.add(Conv2D(64, (3,3), padding = ‘same’, input_shape = in_shape)). The three convolution layers use 2 2 steps and fill with 0 to sample the input image: model.add(Conv2D(64, (3,3), padding = ‘same’, input_shape = in_shape)). There is no pooling layer in the classifier, and only one node in the output layer. It has sigmoid activation function, which can predict whether the input sample is true or false: model.add(Dense(1, activation = ‘sigmoid’). For the G, the first layer, the dense layer, needs enough nodes to process multiple versions of the output image: model.add(Dense(n_nodes, input_dim = latent_dim)). Activation of nodes can be reshaped into images similar to those entering the convolution layer, as follows: 256 different 4 4 feature maps: model.add(Reshape((4, 4, 256))). The upsampling process, that is, the deconvolution process, uses the step of upsampling 2d layer configuration (2 * 2). This effect is to double the width and height of the input characteristic map. Because there are three channels, three filters are needed, that is, the same steps are carried out three times. We can output the 32 * 32 image we need: model.add(Conv2DTranspose(128, (4,4), strides = (2,2), padding = ‘same’)). The output layer is a conv2d kernel with a size of 3 * 3 and a fill of 0. Its purpose is to create a single feature map and keep its size at 32 * 32 * 3. Tanh activation is used to ensure that the output value is at [−1,1]: model.add(Conv2D(3, (3,3), activation = ‘tanh’, padding = ‘same’)). Figure 11 is some codes of training generator and discriminator are as follows:
98
L. Luo et al.
Fig. 11. Train the generator and discriminator
3.7
Comparison with Other Blind Path Recognition
In order to improve the safe travel situation of blind people, many scholars put forward their own ideas. For example, document [13] proposed to identify blind path through adaptive threshold segmentation, which acts on the whole image in the lab color space; document [14] proposed to detect all lines in the whole image with blind path, using edge and Hough transform, selecting the blind path boundary from the parallel relationship of blind path; document [15] proposed to distinguish sidewalk and blind path from the characteristics of blind path. It integrates color continuity space and texture, line detection and threshold segmentation; These methods are very applicable when the blind road is straight and other road noodles, but the environment of the blind road in reality is not ideal. For example, the waist of the blind road is cut off by the manhole cover, the blind road is pressed by the parked vehicles on the side of the road, the green trees are on the blind road, and various signboards on the commercial street occupy the blind road. In this case, this method is not available. Document [16] proposed to the blind path recognition method is a combination of biogeographical optimization algorithm (BBO) and kernel fuzzy C-means (KFCM) algorithm. However, the model of this method is complex and its practicability is not high. The method adopted in this paper follows the trend of the development of the times. In the age of AI, DCGAN is used in blind path identification. It only needs to be embedded in the blind guide product, which does not occupy the volume of the product, nor increase the weight of the product, and reduces the user's load. DCGAN has the advantages of high recognition rate, not affected by the environmental conditions, and has the advantages of the use of new science and technology is the biggest encouragement for the development of science and technology, and promotes scientific research to move forward in a more convenient direction for human beings.
Research Method of Blind Path Recognition Based on DCGAN
99
4 Conclusion and Suggestion This paper introduces the principle of generating countermeasure network and deep convolution generating countermeasure network, and then trains the deep convolution generating countermeasure network on the original sample image to realize the recognition of blind path image. The results show that the deep convolution generation network can identify the blind path in the image better, and the significance of the results is to use the depth learning to affect the blind path data of the blind system to effectively solve the problem.
References 1. Li, S.: Research on the guidance system based on ultrasonic sensor array. Chongqing University of Technology (2013) 2. Yin, L.: Research on 3D reconstruction method of computer vision based on OpenCV. Anhui University (2011) 3. Lu, P., Dong, H.: Face image generation based on deep convolution antagonism generation network. Mod. Comput. (21), 56–58, 64 (2019) 4. Zeng, Q., Xiang, D., Li, N., Xiao, H.: Image recognition method based on semi supervised depth generation countermeasure network. Meas. Control Technol. 38(08), 37–42 (2019) 5. Ian, G., Jean, P.-A., Mehdi, M., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014). 6. Ye, C., Guan, W.: Application of generative adversary network. J. Tongji Univ. (Nat. Sci. Ed.) 48(04), 591–601 (2020) 7. Guo, Q.: Generation of countermeasure samples based on generation countermeasure network. Mod. Comput. 07, 24–28 (2020) 8. Ke, J., Xu, Z.: Research on speech enhancement algorithm based on generation countermeasure network. Inf. Technol. Netw. Secur. 37(05), 54–57 (2018) 9. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015) 10. Ke, Y., Wang, X., Zheng, Y.: Deep convolution generation countermeasure network structure. Electron. Technol. Softw. Eng. 24, 5–6 (2018) 11. Tang, X., Du, Y., Liu, Y., Li, J., Ma, Y.: An image recognition method based on conditional depth convolution generation countermeasure network. Acta Automatica Sinica 44(05), 855–864 (2018) 12. Jia, J., Li, J.: Pest image recognition algorithm based on semi supervised generation network. J. Wuhan Light Ind. Univ. 38(04), 45–52 (2019) 13. Ke, J.: Blind path recognition system based on image processing. Shanghai Jiaotong University (2008) 14. Ke, J., Zhao, Q., Shi, P.: Blind path recognition algorithm based on image processing. Comput. Eng. 35(01), 189–191, 197 (2009) 15. Yang, X., Yang, J., Yu, X.: Blind path recognition algorithm in image processing. Shang (15), 228, 206 (2015) 16. Wang, M., Li, Y., Zhang, L.: Blind path region segmentation algorithm based on texture features. Inf. Commun. 07, 23–26 (2017)
The Impact of the Behavioral Factors on Investment Decision-Making: A Systemic Review on Financial Institutions Syed Faisal Shah1, Muhammad Alshurideh1,2(&), Barween Al Kurdi3, and Said A. Salloum4 1
2
University of Sharjah, Sharjah, UAE [email protected] Faculty of Business, University of Jordan, Amman, Jordan 3 Amman Arab University, Amman, Jordan 4 Research Institute of Sciences and Engineering, University of Sharjah, Sharjah, UAE
Abstract. The purpose of the study is to identify the effects of behavioral factors (cognitive biases) on financial decision-making. A systematic review method was implemented and selected 29 research published studies between the years 2010-2020 and were critically reviewed. The main findings of the study indicate that the most common factors appear in papers were overconfidence (18), anchoring bias (11), herding effect (10) and loss aversion (9), which has a significant impact on the financial decision making process. Moreover, almost half of the articles were survey-based (questionnaire), quantitative method and the rest of the articles were the qualitative and mixed-methods. The study concluded that the overall impact of behavioral/psychological factors highly influence on financial decision-making. However, the time and search of the key terms in the papers’ title were considered as the key limitations, which prevent in-depth investigation of the study. For future research, those most repetitive cognitive bias should be measured during COVID-19 pandemic uncertain situation. Keywords: Behavioral finance Behavioral factor Decision-making Systemic review
Cognitive biases
1 Introduction The financial markets are operating in a highly competitive environment, where the activity is revolutionized by technological and geopolitical developments. Globalization has a high impact on financial centers in developed countries [1]. The behavioral finance consists of two elements: Cognitive psychology and limits to arbitrage, where cognitive implies human subjective thinking. Human makes systemic errors in judgment and decision-making, for instance, overconfident as individual rely heavily on a recent experience in decision making that leads to distortions [2]. The author also suggested that behavioral finance should rely on knowledge rather than an arrogant approach. Moreover, the most notable field is behavioral finance for an experiment on © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 100–112, 2021. https://doi.org/10.1007/978-3-030-58669-0_9
The Impact of the Behavioral Factors on Investment Decision-Making
101
psychological influence in human irrational decision-making in the financial markets [3, 4]. Likewise, [5] claim that decision-making is an essential skill and mental function that how people act while making investment decisions. Similarly, a human is relying either on system 1 (unconscious) and system 2 (conscious) or both in the same time in judgement and decision making process [6]. The system 1 is a spontaneous, easy, unconscious, and quick answering process, it depends on heuristics, naive, and nonstatistical decision, on the contrary, the system 2 work consciously, it is a slow, deliberate, effortful process, utilizes statistics, and it requires time and resources to use in decision-making [7]. As the system 1 is unconscious and intuition decision process, it falls into different types of biases and traps, so in this systemic review process 82 independent factors have been identified which affects decision-making and it also helped to form a new model. The proposed model undertaking current COVID-19 uncertain situation as a mediator with the most repetitive factors understand to influence on individual decision process. This paper conducted a systemic review method to identify the impact of behavioral factors influencing financial decision-making. The reason for using systemic review is to summarize the huge number of studies about a subject [8]. Overall, the result of the study shows that cognitive biases have a significant impact on financial decision-making. The arrangement of the paper is as follows: The second part follows the literature review. The third part explains the method with the help of different tables. The fourth part shows the results (Table) and analyses the studies. The final part is the conclusion.
2 Literature Review Several researchers have studied the impact of psychological biases on investment decision-making from various points of view i.e. culture, environment and so on, where they obtained significant and useful results. Why and how behavioral finance come up to the area of finance? The behavioral concept first established in the early 1980s by few scholars from the fields of economics, psychology, engineering in the place of Russel Sage Foundation in New York. Also, authors explain that behavioral finance is informed by three elements of psychology: the first element is cognitive or behavioral psychology, explain how investor mind do a requisite calculation that needed to increase wealth, the second social psychology which appreciates person acts and find the way for acceptance, and finally the emotional responses to the concentration of trading, investor focus on decision-making not precisely on calculation [9]. The research on prospect theory in 1979 and judgment during uncertainty situation used heuristics and biases in 1982, this contribution severely affected behavioral economics. It is understood that studying the impact of behavioral factors in financial decisionmaking is through the systemic review, contributes to the existing literature in the field. The systemic review is research that uses literature from data sources about the established issue. Such investigations offer numbers of evidence related to intervention by the employment of a clear and systemized research method, also critically reviewing the selected studies. It helps to incorporate useful information from the group of studies conducted separately on certain topics which may avoid conflicting or coincident results, and also to classify the themes requiring pieces of evidence and providing
102
S. F. Shah et al.
direction for future investigation [10]. Likewise, the systemic review is a form of literature review, in which authors collect and analyze critically related research papers by applying methods that are selected research question (s) and answer (s) in structured methodology [11]. Here, in this study, systemic review uses reliable sources and the main elements in each study were elicited and summarized in a planned manner. The most repetitive factors are as follows: 1. Overconfidence: [12] is the tendency to overstate the degree of how a person is accurate. [13] suggested that, the overconfidence directs to an error in judgment as a result of financial losses and failure in most new ventures. In other words, overconfidence can help to promote professional performance [14]. It is also noted that overconfidence can enhance other’s perception of one’s abilities, which may help to achieve faster promotion and greater investment duration [15]. The overconfidence has significant impact on decision making [16–21]. 2. Anchoring bias: it is defined as when an individual makes an estimate from the initial starting value (information) and that is in mind till the final answer. As a human is not aware of the bias, so the adjustment is not enough. Any change in starting point can lead to different estimates so the bias depends on initial values (information) [22]. The anchoring bias has significant impact on decision making [14, 23, 24]. 3. Loss aversion: it is explained that the loss effects appear twice greater than gain, even the attractive expected value from lotteries view is not accepted, if it involves potential loss [25–27]. Similarly, the loss aversion has significant impact on decision making [9, 28–30] 4. Herding effect: it is like a kind of imitation behavior leading to an alignment of the individual’s behavior [31]. According to scholars, a person conceals his or her own beliefs and imitates the actions of others is known as herding behavior. It is found that, the herding effects influence decision making [18, 32–34].
3 Methods The method used to conduct a critical review of the studies called systematic review and guidelines used by [35–37]. Their several studies conducted systemic reviews on different topics [38–40]. Systematic reviews are a kind of literature review that gives a rigorous and clear form of picture about a subject, it also helps to reduce implicit researcher bias by taking on comprehensive search strategies, specific searching keywords and standardized inclusion and exclusion criteria. Moreover, systemic reviews eases search beyond researchers on subject [41–44]. This methodological section consists of three stages: 1st stage describes the inclusion and exclusion criteria; 2nd stage present sources of data and searches strategy and 3rd stage is data coding and analysis. The details of these stages are shown below.
The Impact of the Behavioral Factors on Investment Decision-Making
3.1
103
Inclusion/Exclusion Criteria
The selected papers should be analyzed in-depth and should realize the inclusion and exclusion criteria described in Table 1 [38, 45, 46]. Moreover, it helps to maintain focus and keep in mind the quality of research [42, 47–53]. Table 1. Inclusion and exclusion criteria. No. 1
Criteria Source type
2
Selected language Type of studies
3
4
Study design
5
Measurement
6
Outcome
7
Context
3.2
Inclusion Peer-reviewed articles, Scholarly journals, Case studies, Academic Journals, Dissertations & Theses English only
Exclusion Other removed
Quantitative, qualitative, empirical studies, systematic review Prior and controlled studies, survey, interview, case study Behavioral Factors (Heuristics and Biases) and Decisionmaking Relationship between cognitive biases and Decision-making Should involve Behavioral Factors (Heuristics and Biases) and Decision-making
Verbal\ Visual tape, film, documentary, reports, other studies –
Others removed
– All contexts do not mention separately (behavioral finance), (behavioral factor), (psychological factor) and (behavioral economics), AND (decision-making) in the Title
Data Sources and Research Strategies
The research articles have been included in this systematic literature review from a broad search of existing research papers using various databases (ProQuest One Academic, Google scholar, Emerald insight, Taylor & Francis, Springer and Science Direct). Systematic reviews are noteworthy, as they fulfill the need to take a comprehensive review at all existing research studies related to a research question [54]. The search for the below studies ranges from 2010 to 2020 and embark on in March 2020. The keywords that were included in the search terms were (behavioral finance, behavioral factors, psychological factors and behavioral economic AND decisionmaking) see Table 2. The search terms are used only in the title as advance search
104
S. F. Shah et al. Table 2. The data sources and search keywords
Behavioral Finance Serial #
Keywords search
Google Scholar
1
“Behavioral Finance” AND “Decisionmaking” “Behavioral factors” AND “Decisionmaking” “Psychological factors” AND “Decisionmaking” “Behavioral economic” AND “Decisionmaking” Total Frequency Each Database
17
6
0
3
3
1
1
1
32
12
2
1
2
2
2
0
0
21
28
2
0
8
2
0
0
2
42
36
1
0
5
4
0
2
1
49
93
11
1
18
11
3
3
4
144
2
3
4
Emerald IEEE ProQuest Science One Direct Academic
Springer Wiley Taylor Total & Frequency Francis Each Keyword
Table 3. The filtration criteria for articles from each database Behavioral finance Serial Keywords Initial Filtration # search search in of articles databases in databases
1 2 3 4
5 6 7 8
Google Scholar Emerald IEEE ProQuest One Academic Science Direct Springer Wiley Taylor & Francis Total frequency of articles
Number of frequency (after removing duplication)
Number of frequency (irrelevant articles)
Number of frequency (after scamming the papers)
Number of frequency (after critically reading papers) 10
93
38
25
21
15
11 1 18
11 1 18
11 1 12
10 1 8
9 1 4
8 0 3
11
11
11
6
5
4
3 3 4
3 3 4
3 3 4
3 2 2
0 2 2
0 2 2
144
89
70
53
38
29
The Impact of the Behavioral Factors on Investment Decision-Making
105
criteria in databases, however, only two most relevant articles were collected through a basic search. In the initial search in databases Google scholar (N = 93), Emerald (N = 11), IEEE (N = 1), ProQuest One Academic (N = 18), Science Direct (N = 11), Wiley (N = 3), Taylor & Francis (N = 4) so from all databases, the total were (N = 144). The step by step filtration has been done, firstly the filtration of article papers was done by excluding books and other documents (remaining N = 89), the second step removed duplication (remaining N = 70), the third step removed irrelevant (remaining N = 53), in the fourth step the number remained after scamming papers were N = 38 and finally 8 papers were removed by critically reading the documents, so the total remaining papers were N = 29, see Table 3. In this manner, the relevant studies were selected and included in the systematic review process, as in Table 4. [55] suggest that this process ensures the replicability of the study. It formally started in 1979 by [22] and the study was the prospect theory which is decision-making under risk.
4 Result Table 4 is coded as follow: (a) author(s), (b) Database, (c) Year, (d) Place, (e) Dependent variable (s), (f) Context, (g) Data collection method, (h) Methodology and (i) Sample size. The selected studies are critically filtered, and the exclusion process strictly followed the factor (s) that affects the dependent variable (s) [56, 58]. The articles are taken between 2010 to 2020 and selected 29 studies, 16 papers (55%) performed in the stock market. Most studies (50%) were conducted in Asian financial markets, 6 (20.6%) articles in the United States and other papers were from another part of the world. The quantitative methods were used in 13 papers, the qualitative method in 11 papers and the remaining 5 studies used mixed methods. Besides, the highest number of articles is collected from the database of google scholar (10) and Emerald (8) and rest 4 and below papers.
Table 4. Analysis of included research articles Author (s)
Database
Year
Place
Dependent variables
Context
Data collection methods
Methodology
Sample size
[23]
Google Scholar
2014
Universidade Católica de Brasília
Gender (Male & Female) financial decision-making of individual investors
Real Estate
Questionnaire (Survey)
Quantitative Method
217 (108 men & 109 women)
[14]
Google Scholar
2016
Tabriz city, Iran
Investors’ Decisions Making
Stock Exchange
Questionnaire (Survey)
Quantitative method
statistical sample, 385 people
[56]
Google Scholar
2018
Pakistan
Financial decisions of investors
Stock Exchange
Semi-structured interview method
Qualitative research strategy
30 Interviews
(continued)
106
S. F. Shah et al. Table 4. (continued)
Author (s)
Database
Year
Place
Dependent variables
Context
Data collection methods
Methodology
Sample size
[18]
Google Scholar
2011
Vietnam
Investment Performance of Individual Investors (Moderator Decisionmaking)
Ho Chi Minh Stock Exchange
Structured interviews, semi structured interviews, unstructured interviews, selfcompletion questionnaire, observation, group discussion
Mixed methods
1. 172 respondents (Questionnaire) & 2 Managers of HOSE
[28]
Google Scholar
2017
Pakistan
investors’ decision-making and investment performance
Pakistani stock markets
Questionnaire (Survey)
Quantitative Method
41 respondents
[20]
Google Scholar
2018
Pakistan
Investment decisions and perceived market efficiency
Pakistan Stock Exchange
Questionnaire (Survey)
Quantitative method
sample consists of 143 investors trading on the PSX
[57]
Google Scholar
2016
Pakistan
Decision of investment
Islamabad Stock Exchange
Questionnaire (Survey)
Quantitative Method
100 investors from Islamabad Stock Exchange
[58]
Google Scholar
2011
United States
Behavioral Decision-making in Americans’ Retirement Savings Decisions (individuals’ savings behavior)
the retirement savings decision
EBRI/ICI 401(k) database (Retirement saving) & JDM and behavioraleconomics literatures
Qualitative method
21 million participants in the sample & Literature Review
[59]
Google Scholar
2013
Sydney, Australia
Strategic decisions within firms
Organization (Company)
Empirical studies
Qualitative method
Empirical Studies (# not Specified)
[60]
Google Scholar
Croatia
CEO’s process
Business firm
Literature Reviews
Qualitative method
Research studies (Literature Reviews) (# not Specified)
[9]
Emerald
2010
UK
Discrepancy between the academic and the professional world when it comes to utilizing behavioral finance research (Finance industry)
Wholesale and retail financial markets “held in London,11 December 2009 at Armourers’ Hall
Round Table discussion on behavioral finance attended by academics and practitioners. (Viewpoint)
Qualitative method
Round Table discussion on behavioral finance attended by academics and practitioners
[29]
Emerald
2010
United States
Decision-making
The Behavioral financial paradigm. (Conceptual paper)
Empirical Studies
Qualitative method
A cross disciplinary review of relevant natural and social sciences is conducted to identify common foundational concepts
[61]
Emerald
2012
UK
Financial decisions
Behavioral Finance used of psychological experimental methods
Research papers
Qualitative method
Research papers (# not specified)
2012
(continued)
The Impact of the Behavioral Factors on Investment Decision-Making
107
Table 4. (continued) Author (s)
Database
Year
Place
Dependent variables
Context
Data collection methods
Methodology
Sample size
[4]
Emerald
2015
UK
Influence of moods and emotions on financial behavior
Financial markets
Empirical Studies (Literature Reviews)
Qualitative method
Empirical Studies (# not Specified)
[62])
Emerald
2019
Tunisia
Unexpected earnings (UE) and surprise unexpected earnings (SUE), Earning per share EPS and the revision of earnings forecast (REV)
Tunisian Stock Exchange
(Financial Market Council Tunisia) Announcements
Quantitative methods
Sample Publicly traded 39 companies (20102014)
[17]
Emerald
2019
Egypt
Investment decisions (demographic characteristics: age, gender, education level and experience)
Egyptian Stock market
Questionnaire (Survey)
Quantitative method
Structured questionnaire survey carried out among 384 local Egyptian, foreign, institutional, and individual investors
[63]
Emerald
2020
Malaysia
Individual investment decisions
Generation Y in Malaysia
Questionnaire (Survey)
Quantitative method
A total of 502 respondents (male and female)
[34]
Emerald
2012
Malaysia
Day-of-the week anomaly
This paper conceptually investigates the role of psychological biases on the day-of-the week anomaly (DOWA) in Stock Market
Literature Reviews
Qualitative method
Psychological biases literature and links (# not Specified)
[30]
ProQuest
2014
India
Decision-making for investment
Indian stock market
Questionnaire (Survey)
Quantitative method
150 respondents
[64]
ProQuest
2012
United States
Financial practices
the financial processes of individuals, groups, and organizations
Qualitative metaanalysis of the current state of financial behavior
Qualitative method
Meta- Analysis (# not specified)
[65]
ProQuest
2015
Pakistan
Investment decision. (Moderator is Risk Perception)
Islamabad Stock Exchange, Islamabad, (risk perception in Pakistani culture context.)
Questionnaire (Survey)
Quantitative Method
200 Financial investors (Respondents)
[32]
Science Direct
2017
United States
Decision-making
four major US industries (Manufacturing, Construction, Wholesale and Services
All firm level data are from Annual Compustat database from 1996 to 2015 and the annual gross domestic product (GDP) data was extracted from World Bank database
Mixed methods
Sample Period (1996 -2015) All financial services firms (SIC codes 6000–6999), regulated utilities (SIC codes 4900– 4999) and firms with less than ten (10) years of continuous data were excluded
(continued)
108
S. F. Shah et al. Table 4. (continued)
Author (s)
Database
Year
Place
Dependent variables
Context
Data collection methods
Methodology
Sample size
[19]
Science Direct
2016
Malaysia
financial decisions
Malaysian stock market
Questionnaire (Survey)
Quantitative method
200 respondents
[21]
Science Direct
2013
China
Venture Enterprise Value and the Investment
The irrational behavior of venture entrepreneurs and venture capitalists has significant impact on the corporation’s investment
Double-sided moral hazard model
Qualitative method
Studies on the double-sided moral hazard (# not specified)
[66]
Science Direct
2019
United States
Oneself and decision-making for others DMfO
Identified in the behavioraleconomics literature apply in decisionmaking for others (DMfO)
Literature Reviews and laboratory experiment (Questionnaire)
Mixed methods
The 190 subjects were SCU students, recruited by email and studies (Literature reviews)
[67]
Wiley
2015
Hong Kong
Financial decision-making
Stock Market
Five experiments
Mixed methods
5 studies: 1. End anchoring 155, 2. Visual bias 202, 3. Consequential stakes 48, 4. Eye tracking 50, 5. Run-length 162. Total sample size 617
[68]
Wiley
2017
United States
Remediation decision behaviors
Trihydro Corporation
Questionnaire (Survey)
Quantitative method
The survey was completed by 118 respondents representing academia, consultants, clients, and others
[69]
Taylor & Francis
2013
Espana (Spain)
Quality of financial decisions
financial institutions and markets
Research papers
Qualitative method
Selected studies of cognitive biases as well as cognitive models (# not specified)
[33]
Taylor & Francis
2016
Pakistan
Individual investor’s
the Lahore Stock Exchange (LSE)
Questionnaire (Survey)
Quantitative method
The investors of stock exchange sample collected 254
5 Conclusion Prior papers explain the effects of the behavioral factors’ decision-making, which provides insight into the topic to identify human bias and improve investment decisionmaking. The current study led through a systematic review method on behavioral or psychological factors influence in financial decision-making. The reason behind is to examine the comprehensive analysis of published papers and to review the conclusions. The result of this study shows that most frequently appeared factors are overconfidence, anchoring bias (heuristics bias), loss aversion (prospect factor) and herding effect in financial decision-making. Moreover, most of the articles focused on the
The Impact of the Behavioral Factors on Investment Decision-Making
109
financial sector and used a quantitative method (13 studies), a qualitative method (11 studies) and mixed methods (5 studies). Finally, half (50%) of the studies were conducted in Asian financial markets, 20% in the United States and the rest of the articles were from another part of the world. The constraints of the study were the time and search of key terms in the title of the articles. For future direction, those most repetitive cognitive bias should be measured during COVID-19 pandemic uncertain situation.
References 1. Hilton, D.J.: The psychology of financial decision-making: applications to trading, dealing, and investment analysis. J. Psychol. Financ. Mark. 2(1), 37–53 (2001) 2. Ritter, J.R.: Behavioral finance. Pacific-Basin Financ. J. 11(4), 429–437 (2003) 3. Bazerman, M.H., Moore, D.A.: Judgment in Managerial Decision Making. Wiley, New York (1994) 4. Duxbury, D.: Behavioral finance: insights from experiments II: biases, moods and emotions. Rev. Behav. Financ. (2015) 5. Fünfgeld, B., Wang, M.: Attitudes and behaviour in everyday finance: evidence from Switzerland. Int. J. Bank Mark. (2009) 6. Kahneman, D.: Maps of bounded rationality: psychology for behavioral economics. Am. Econ. Rev. 93(5), 1449–1475 (2003) 7. Kahneman, D.: Thinking, Fast and Slow. Macmillan, New York (2011) 8. Corrêa, V.S., Vale, G.M.V., de R. Melo, P.L., de A. Cruz, M.: O ‘Problema da Imersão’ nos Estudos do Empreendedorismo: Uma Proposição Teórica. Rev. Adm. Contemp. 24(3), 232– 244 (2020) 9. DeBondt, W., Forbes, W., Hamalainen, P., Muradoglu, Y.G.: What can behavioural finance teach us about finance? Qual. Res. Financ. Mark. (2010) 10. Linde, K., Willich, S.N.: How objective are systematic reviews? Differences between reviews on complementary medicine. J. R. Soc. Med. 96(1), 17–22 (2003) 11. Salloum, S.A.S., Shaalan, K.: Investigating students’ acceptance of E-learning system in Higher Educational Environments in the UAE: applying the Extended Technology Acceptance Model (TAM). Br. Univ. Dubai (2018) 12. Fischhoff, B., Slovic, P., Lichtenstein, S.: Knowing with certainty: the appropriateness of extreme confidence. J. Exp. Psychol. Hum. Percept. Perform. 3(4), 552 (1977) 13. Singh, R.P.: Overconfidence. New Engl. J. Entrep. (2020) 14. Shabgou, M., Mousavi, A.: Behavioral finance: behavioral factors influencing investors’ decisions making. Adv. Soc. Humanit. Manag. 3(1), 1–6 (2016) 15. Oberlechner, T., Osler, C.L.: Overconfidence Curr. Mark (2004). https://faculty.haas.berkeley. edu/lyons/Osler%20overconfidence%20in%20FX.pdf. Accessed 20 Apr 2011 16. Robinson, A.T., Marino, L.D.: Overconfidence and risk perceptions: do they really matter for venture creation decisions? Int. Entrep. Manag. J. 11(1), 149–168 (2015) 17. Metawa, N., Hassan, M.K., Metawa, S., Safa, M.F.: Impact of behavioral factors on investors’ financial decisions: case of the Egyptian stock market. Int. J. Islam. Middle East. Financ. Manag. (2019) 18. Le Luong, P., Thi Thu Ha, D.: Behavioral factors influencing individual investors’ decisionmaking and performance: a survey at the Ho Chi Minh Stock Exchange (2011) 19. Bakar, S., Yi, A.N.C.: The impact of psychological factors on investors’ decision making in Malaysian stock market: a case of Klang Valley and Pahang. Procedia Econ. Financ. 35, 319–328 (2016)
110
S. F. Shah et al.
20. Shah, S.Z.A., Ahmad, M., Mahmood, F.: Heuristic biases in investment decision-making and perceived market efficiency. Qual. Res. Financ. Mark. (2018) 21. Jing, G.U., Hao, C., Xian, Z.: Influence of psychological and emotional factors on the venture enterprise value and the investment decision-making. In: ITQM, pp. 919–929 (2013) 22. Tversky, A., Kahneman, D.: Prospect theory: an analysis of decision under risk. Econometrica 47(2), 263–291 (1979) 23. Matsumoto, A.S., Fernandes, J.L.B., Ferreira, I., Chagas, P.C.: Behavioral finance: a study of affect heuristic and anchoring in decision making of individual investors, Available SSRN 2359180 (2013) 24. Copur, Z.: Handbook of Research on Behavioral Finance and Investment Strategies: Decision Making in the Financial Industry: Decision Making in the Financial Industry. IGI Global (2015) 25. Merkle, C.: Financial loss aversion illusion. Rev. Financ. 24(2), 381–413 (2020) 26. Tversky, A., Kahneman, D.: Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 5(4), 297–323 (1992) 27. Tversky, A., Kahneman, D.: Loss aversion in riskless choice: a reference-dependent model. Q. J. Econ. 106(4), 1039–1061 (1991) 28. Anum, B.A.: Behavioral factors and their impact on individual investors decision making and investment performance: empirical investigation from Pakistani stock market. Glob. J. Manag. Bus. Res. (2017) 29. Olsen, R.A.: Toward a theory of behavioral finance: implications from the natural sciences. Qual. Res. Financ. Mark. 2(2), 100–128 (2010) 30. Roopadarshini, S.: A study on implication of behavioral finance towards investment decision making on stock market. Asia Pacific J. Manag. Entrep. Res. 3(1), 202 (2014) 31. Yang, J., Cashel-Cordo, P., Kang, J.G.: Empirical research on herding effects: case of real estate markets. J. Account. Financ. 20(1), 1–9 (2020) 32. Camara, O.: Industry herd behaviour in financing decision making. J. Econ. Bus. 94, 32–42 (2017) 33. Sarwar, A., Afaf, G.: A comparison between psychological and economic factors affecting individual investor’s decision-making behavior. Cogent Bus. Manag. 3(1), 1232907 (2016) 34. Brahmana, R.K., Hooy, C., Ahmad, Z.: Psychological factors on irrational financial decision making. Humanomics 28, 236–257 (2012) 35. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review and future directions. In: Joint European-US Workshop on Applications of Invariance in Computer Vision, pp. 92–102 (2020) 36. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep learning techniques for cybersecurity: a review. In: Joint European-US Workshop on Applications of Invariance in Computer Vision, pp. 50–57 (2020) 37. Mallett, R., Hagen-Zanker, J., Slater, R., Duvendack, M.: The benefits and challenges of using systematic reviews in international development research. J. Dev. Eff. 4(3), 445–455 (2012) 38. Meline, T.: Selecting studies for systemic review: inclusion and exclusion criteria. Contemp. issues Commun. Sci. Disord. 33(Spring), 21–27 (2006) 39. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the factors affecting the artificial intelligence implementation in the health care sector. In: Joint European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49 (2020) 40. Alshurideh, M.: A qualitative analysis of customer repeat purchase behaviour in the UK mobile phone market. J. Manag. Res. 6(1), 109 (2014)
The Impact of the Behavioral Factors on Investment Decision-Making
111
41. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceutical sector. Eur. Sci. J. 11(4), 474–503 (2015) 42. Alshurideh, et al.: Loyalty program effectiveness: theoretical reviews and practical proofs. Uncertain Supply Chain Manag. 8(3), 1–10 (2020) 43. Assad, N.F., Alshurideh, M.T.: Financial reporting quality, audit quality, and investment efficiency: evidence from GCC economies. WAFFEN-UND Kostumkd. J. 11(3), 194–208 (2020) 44. Assad, N.F., Alshurideh, M.T.: Investment in context of financial reporting quality: a systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020) 45. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Al A’yadeh, I.: An investigation of factors affecting patients waiting time in primary health care centers: an assessment study in Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020) 46. Alshurideh, et al.: Understanding the quality determinants that influence the intention to use the mobile learning platforms: a practical study. Int. J. Interact. Mob. Technol. 13(11), 157– 183 (2019) 47. Al Kurdi, B.: Investigating the factors influencing parent toy purchase decisions: reasoning and consequences. Int. Bus. Res. 10(4), 104–116 (2017) 48. Kurdi, B.A., Alshurideh, M., Salloum, S.A., Obeidat, Z.M., Al-dweeri, R.M.: An empirical investigation into examination of factors influencing university students’ behavior towards elearning acceptance using SEM approach. Int. J. Interact. Mob. Technol. 14(2), 19–41 (2020) 49. Alzoubi, H., Alshurideh, M., Al Kurdi, B., Inairata, M.: Do perceived service value, quality, price fairness and service recovery shape customer satisfaction and delight? A practical study in the service telecommunication context. Uncertain Supply Chain Manag. 8(3), 1–10 (2020) 50. Al Kurdi, B.: Healthy-food choice and purchasing behaviour analysis: an exploratory study of families in the UK. Durham University (2016) 51. Al-Dmour, H., Al-Shraideh, M.T.: The influence of the promotional mix elements on Jordanian consumer’s decisions in cell phone service usage: an analytical study. Jordan J. Bus. Adm. 4(4), 375–392 (2008) 52. Alshurideh, M., Nicholson, M., Xiao, S.: The effect of previous experience on mobile subscribers’ repeat purchase behaviour. Eur. J. Soc. Sci. 30(3), 366–376 (2012) 53. Ashurideh, M.: Customer service retention – a behavioural perspective of the UK mobile market. Durham University (2010) 54. García-Feijoo, M., Eizaguirre, A., Rica-Aspiunza, A.: Systematic review of sustainabledevelopment-goal deployment in business schools. Sustainability 12(1), 440 (2020) 55. González, I.F., Urrútia, G., Alonso-Coello, P.: Revisiones sistemáticas y metaanálisis: bases conceptuales e interpretación. Rev. española Cardiol. 64(8), 688–696 (2011) 56. Shahid, M.N., Aftab, F., Latif, K., Mahmood, Z.: Behavioral finance, investors’ psychology and investment decision making in capital markets: an evidence through ethnography and semi-structured interviews. Asia Pacific J. Emerg. Mark. 2(1), 14 (2018) 57. Hunjra, A.I., Qureshi, S., Riaz, L.: Psychological factors and investment decision making: a confirmatory factor analysis. J. Contemp. Manag. Sci. 2(1) (2016) 58. Knoll, M.A.Z.: The role of behavioral economics and behavioral decision making in Americans’ retirement savings decisions. Soc. Sec. Bull. 70, 1 (2010) 59. Garbuio, M., Lovallo, D., Ketenciouglu, E.: Behavioral economics and strategic decision making (2013) 60. Galetic, L., Labas, D.: Behavioral economics and decision making: importance, application and development tendencies. In: An Enterprise Odyssey. International Conference Proceedings, p. 759 (2012)
112
S. F. Shah et al.
61. Muradoglu, G., Harvey, N.: Behavioural finance: the role of psychological factors in financial decisions. Rev. Behav. Financ. 4, 68–80 (2012) 62. Bouteska, A., Regaieg, B.: Psychology and behavioral finance. EuroMed J. Bus. (2019) 63. Rahman, M., Gan, S.S.: Generation Y investment decision: an analysis using behavioural factors. Manag. Financ. (2020) 64. Howard, J.A.: Behavioral finance: contributions of cognitive psychology and neuroscience to decision making. J. Organ. Psychol. 12(2), 52–70 (2012) 65. Riaz, L., Hunjra, A.I.: Relationship between psychological factors and investment decision making: the mediating role of risk perception. Pakistan J. Commer. Soc. Sci. 9(3), 968–981 (2015) 66. Ifcher, J., Zarghamee, H.: Behavioral economic phenomena in decision-making for others. J. Econ. Psychol. 77, 102180 (2019) 67. Duclos, R.: The psychology of investment behavior:(De) biasing financial decision-making one graph at a time. J. Consum. Psychol. 25(2), 317–325 (2015) 68. Clayton, W.S.: Remediation decision-making and behavioral economics: results of an industry survey. Groundw. Monit. Remediat. 37(4), 23–33 (2017) 69. De Bondt, W., Mayoral, R.M., Vallelado, E.: Behavioral decision-making in finance: an overview and assessment of selected research. Spanish J. Financ. Accounting/Revista Española Financ. y Contab. 42(157), 99–118 (2013)
Deep Learning Technology and Applications
A Deep Learning Architecture with Word Embeddings to Classify Sentiment in Twitter Eman Hamdi(&)
, Sherine Rady, and Mostafa Aref
Faculty of Computer and Information Sciences, Ain Shams University, Abbassia, Cairo, Egypt {emanhamdi,srady,mostafa.aref}@cis.asu.edu.eg
Abstract. Social Media Networks are one of the main platforms to express our feelings. The emotions we put in text tell a lot about our behavior towards any topic. Therefore, the analysis of text is a need for detecting one’s emotions in many fields. This paper introduces a deep learning model that classify sentiments from tweets using different types of word embeddings. The main component of our model is the Convolutional Neural Network (CNN) and the main used features are word embeddings. Trials are made on randomly initialized word embeddings and pretrained ones. The used pre-trained word embeddings are of different variants such as Word2Vec, Glove and fastText models. The model consists of three CNN streams that are concatenated and followed by a fully-connected layer. Each stream contains only one convolutional layer and one max-pooling layer. The model works on detecting positive and negative emotions from Stanford Twitter Sentiment (STS) dataset. The accuracy achieved is 78.5% when using the randomly initialized word embeddings and achieved a maximum accuracy 84.9% using Word2Vec word embeddings. The model not only proves that randomly initialized word embedding can achieve good accuracy, it also showing the power of the pretrained word embeddings that helps to achieve a higher competitive accuracy in sentiment classification. Keywords: Sentiment classification Deep learning Convolutional neural networks Word embeddings Social medial networks
1 Introduction Emotions felt by humans are a very significant characteristic for understanding their psychological traits. As text is the main method for communication between humans, there are many approaches for studying text for identifying emotions and classifying sentiment from it [1]. Social media networks as Twitter and Facebook, are main platforms in which every event, situation and news are posted and discussed. People are using them as windows to describe their emotions and believes toward different types of topics [2]. Posts on this type of sites have the natural and realistic factor as people are posting freely through the day expressing themselves. That fact makes these sites valuable sources of textual data that can be studied and analyzed to detect sentiments and emotions [3]. Despite having this enormous amount of textual data, it is nearly impossible to process it manually. That raised the need for various techniques to © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 115–125, 2021. https://doi.org/10.1007/978-3-030-58669-0_10
116
E. Hamdi et al.
automatically process and classify text [4]. Therefore, detecting emotions from text can be applied using different approaches. We can roughly divide them to lexicon-based approaches and machine learning approaches. Recently, As a machine learning approach, Deep learning models such as Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN) are used in text classification tasks [5, 6]. These models can detect implicit semantic relations between the words. Also, high-level features can be learnt using deep learning models even when using low-level features. This is achieved in text classification tasks whenever enough amount of textual data is present in the training phase [7]. One of the most essential aspects in text classification tasks is the selection of features. Words are represented as discrete values in lexicon-based approaches. This representation makes the data sparse because it treats words as distinct and discrete values. The problem caused can be solved by using different representation, therefore word embeddings are used. Word embeddings are dense, continuous vector representation of words. These vectors give insights about words meaning as words that have similar semantics appear to be nearby each other in this space [8]. For initializing word embeddings, two methods are used. 1. Randomly initialized word embeddings and 2. Pre-trained word embeddings as Glove [9], Word2vec [10] or fastText [11]. The first initialization mainly defines random values to represent the word embeddings. These values can be kept static or updated through the training phase to learn task specific word embeddings. The second initialization is using word embeddings that are already pre-trained using huge data sets which can also be static or trainable through the training phase. In this paper, a Convolutional Neural Network model is proposed that classify negative and positive sentiments from Tweets. The introduced model works on both randomly initialized word embeddings and pre-trained word embeddings. The model basically contains three CNN streams. They are merged and followed by a fullyconnected Sigmoid layer. The Stanford Twitter Sentiment dataset (STS) [12] is used to train, validate and test the model. The model achieved 78.5% accuracy using randomly initialized word embeddings and 84.9% accuracy using pretrained Word2Vec word embeddings. The paper organization goes as follows: Sect. 2 discusses the related work in emotion detection and sentiment classification from text. In Sect. 3, the architecture of the CNN model is explained, and each layer is described in detail. Experimental results are shown and discussed in Sect. 4, and Sect. 5 is a summary of the proposed work.
2 Related Work Traditional machine learning techniques and deep learning techniques are both applied in text classification tasks. One of the early works on using machine learning with sentiment classification is introduced in [13]. The main idea of this work is studying the ability of machine learning methods in classifying text by sentiment (positive or negative). Three methods are employed which are Naive Bayes, Maximum Entropy and Support Vector Machines (SVMs). These methods are tested using a movie review
A Deep Learning Architecture with Word Embeddings to Classify Sentiment
117
dataset from IMDB. Support vector machines give the best performance recording 82.9% accuracy while Naïve Bayes gives the lowest performance recording 78.7% accuracy. In [14], a Naïve Bayes model and a Maximum Entropy model are implemented to classify positive and negative sentiment from tweets. They used the java version of the Twitter Application Programming Interface (API) to collect tweets. As features, Multinomial unigrams, Bernoulli unigrams and bigrams are tested. The best performance is given using the Naïve Bayes model with unigrams. In [12], Naive Bayes, Maximum Entropy and SVMs are applied for the classification task of tweets. Unigrams, bigrams and unigrams with part of speech tags are the selected features. They achieved a max accuracy of 83% using Maximum Entropy with both unigrams and bigrams. Recently, the deep learning techniques are applied to different classification tasks in speech recognition as in [15–17], image processing as in [18–20] and text classification as in [6, 21–24]. One of the deep learning models used for text classification are CNNs. CNNs are originally created for image processing and then applied to various classification tasks including text classification [25]. In [6], a combination of a CNN and a Recurrent Neural Network (RNN) model is implemented. The ability of the model to understand semantic and orthographic information by using the characters only is proven as it is tested on the English Penn Treebank dataset and achieved competitive results even when using 60% less parameters. In [21], a character-level CNN for text classification is proposed. Large data sets are created to test the model on. AG’s news corpus, Sogou news corpus, DBPedia ontology dataset, Yelp reviews, Yahoo! Answers dataset, and Amazon reviews are used to collect the data. Two deep CNNs were built. The model achieved good performance in sentiment analysis and topic classification compared against traditional and deep learning models. In [22], a CNN that consists of one layer is implemented and tested for different sentence classification tasks. The used features are word embeddings which are either randomly initialized or pretrained (Word2Vec). Four variations of the model were built. The models are tested on 7 text classification tasks using different datasets. It outperforms previous work on 4 tasks out of 7. In [23], a CNN architecture is proposed. It consists of a convolutional layer, max pooling layer and a soft-max layer for classification a tweet as negative or positive. The model is tested on two tasks of sentiment analysis from Semeval-2015 Twitter Sentiment Analysis challenge. For the first task -phrase level task- the accuracy achieved is 84.79% and for the second task -message-level task- the accuracy achieved is 64.59%. In [24], A CNN model is applied to binary sentiment task (positive or negative label) and ternary classification task (positive, negative or neutral label). Using Movie Reviews, the model achieved 80.96% F1 score at binary classification task and 68.31% at ternary classification task. Using Stanford Sentiment Treebank (SST), the model achieved 81.4% F1 score for binary classification task while it achieved 70.2% F1 score for the same task on Customer Reviews (CR).
118
E. Hamdi et al.
3 The Proposed Deep Learning Architecture for Sentiment Classification The proposed deep learning architecture shown in Fig. 1 includes text preprocessing, CNN streams and a fully-connected layer as they are the main modules of the architecture. Text preprocessing is applied by filtering sentences, tokenizing sentences and indexing sentences. Then the processed text is represented in the form of word embeddings and passed to the three CNN streams. The outputs of the CNN steams are merged together and passed to the fully-connected layer. A detailed explanation of each module is discussed in this section.
Fig. 1. The main modules of the Deep learning architecture
3.1
Text Preprocessing
Text preprocessing is important to prepare data to work with deep learning models. We applied Filtering, tokenizing and indexing to sentences. The filtering stage purpose is to clean the data from unneeded words or symbol. The sentences are filtered by removing punctuation marks, stop words, links, mentions, hashtags as in (#happy, #home, #sick), emoticons, names. The filtered sentences are then tokenized as each sentence is split into words. Finally, the indexing is applied to give each distinct word in the vocabulary a specific index that will be the index of the word embedding. In our work, that means when a sentence is to be fed into the CNN, it is fed as indices that are passed to the embedding layer to fetch the corresponding word embeddings for each index in the sentence.
A Deep Learning Architecture with Word Embeddings to Classify Sentiment
3.2
119
Word Embeddings
Word embeddings are continuous word representations that can be applied on the top of a CNN model. Each word is represented by a unique vector either containing random values or containing meaningful values from pre-training. The pretrained word embeddings are trained using huge un-labelled data and can be used in many natural language processing tasks. In the proposed model, randomly initialized word embeddings and pretrained word embeddings are tested on the embedding layers. For each word in the vocabulary there is a word embedding d. given a sentence s of m words, it is represented as a sentence matrix d 1:m concatenating all the word embeddings on it such that: d 1:m ¼ d 1 d 2 . . . d m
3.3
ð1Þ
CNN Streams
The main building block of our model is the CNN. The model contains 3 CNN streams. each one is consisting of an embedding layer, a convolutional layer and a max pooling layer. In Fig. 2 the detailed process of a sentence going through one CNN stream is shown.
Fig. 2. The sentence flow through one CNN stream
The embedding layer holds all the word embeddings for the words in the vocabulary. As an indexed sentence is passed to that layer, a sentence matrix as mentioned in the previous section will be produced. This sentence matrix is then passed to the convolutional layer. The filter number is 100 but the filter sizes differ between the three CNN streams. 3, 5, 7 are the filter sizes respectively. Feature maps are produced from
120
E. Hamdi et al.
the filtering and passed to the max-pooling layer where the maximum important features are chosen from each map and generate feature vectors. The feature vectors generated from each CNN streams are merged to be passed to the fully connected layer. 3.4
Fully-Connected Layer
The Fully connected layer is responsible for classification an input as negative or positive sentiment. As shown in Fig. 3, The output of the CNN streams is flattened before it is fed to this layer. This means whatever the dimensions of the output from the previous layer, it will be converted to a 1D vector so that the Fully-connected layer is able to work on it.
Fig. 3. The input of the fully-connected layer
The activation function used is Sigmoid function given by the equation: ;ð Z Þ ¼
1 1 þ ez
ð2Þ
As Z is the flattened vector to be passed to the output neuron. Also, Dropout is applied on this layer to prevent overfitting. The ratio is set to be 0.5. It is a generalization technique introduced in [26].
4 Experimental Results In this section, the dataset, evaluation metrics, configurations and results discussion will be explained in detail.
A Deep Learning Architecture with Word Embeddings to Classify Sentiment
4.1
121
Dataset
The model is tested on the labelled Stanford Twitter Sentiment dataset, which consists of 1.6 million tweets. 80K randomly selected sentences are used for training and 16K sentences are chosen for validation. For testing, we used the original testing set that is manually annotated by [12] and consists of 359 sentences. The sentences are labelled with one value of three class labels: positive emotion, negative emotion or neutral. We used the positive and the negative classes. 4.2
Evaluation Metrics
Four evaluation metrics are used: Accuracy, Precision, Recall and F1-score. These metrics in [27] are defined as follows: Accuracy is the ratio of correctly predicted sentences to the total number of sentences as given in the equation: Accuracy ¼ ðTP þ TNÞ=ðTP þ TN þ FP þ FNÞ
ð3Þ
In which TP (True Positives) are the correctly classified positive sentences. TN (True Negatives) are the correctly classified negative sentences. FP (False Positives) are the wrongly classified positive sentences. FN (False Negatives) are the wrongly classified negative sentences. Precision is the ratio of correctly predicted positive sentences to the total predicted positive sentences as in the equation: Precision ¼ TP/TP þ FP
ð4Þ
Recall is the ratio of correctly predicted positive sentences to the all sentences in actual class. It is given by the equation: Recall ¼ TP/TP þ FN
ð5Þ
F1-score is the weighted average of Precision and Recall. This score considers both false positives and false negatives into calculation. It is given by the equation: F1 Score ¼ 2 ðRecall PrecisionÞ=ðRecall þ PrecisionÞ
4.3
ð6Þ
Configurations
The maximum length of sentences has been adjusted to 16. The embedding layer dimensions are [vocabulary size * word embedding dimensions] and each sentence matrix is of dimensions [maximum length of sentences * word embedding length]. For word embeddings initialization, we used 4 different initializations as follows: Randomly initialized word embeddings: The word embedding has the dimensions of [300 * 1], with the embedding layer and the sentence matrix dimensions as [vocab size * 300] and [14 * 300] respectively. It took 10 epochs to train the network.
122
E. Hamdi et al.
Pre-trained Glove word embeddings: The pretrained Glove Wikipedia and Glove twitter were both used. We used the 100-dimension version of the word embeddings for both types of them. For both settings, the word embedding has the dimensions of [100 * 1], with the embedding layer and the sentence matrix dimensions as [vocab size * 100] and [14 * 100] respectively. The model took 4 epochs to train in both settings. Pre-trained word2vec word embeddings: Google’s pretrained word embeddings are used in this initialization. The word embedding dimensions in this setting are [300 * 1] with the embedding layer and the sentence matrix dimensions as [vocab size * 300] and [14 * 300] respectively. It took only 4 epochs to train the network. Pre-trained fastText word embeddings: we used both fastText Wikipedia and fastText Crawl. We used the 300-dimension version of the word embeddings for both types of them. For both settings, the word embedding has the dimensions of [300 * 1], with the embedding layer and the sentence matrix dimensions as [vocab size * 300] and [14 * 300] respectively Using fastText Wikipedia, the model needed 4 epochs of training while it took 5 epochs of training using fastText Crawl. The results are obtained using Intel(R) Core (TM) i7-5500U CPU @2.40 GHz personal computer with 16.0 GB RAM. The experiments have been developed using Keras deep learning framework, TensorFlow backend and CUDA. 4.4
Results Discussion
The testing results obtained from using different word embeddings are shown in Table 1 and Fig. 4. The model achieved 78.5% accuracy using the randomly initialized word embeddings and getting higher using different types of word embeddings. Table 1. Model evaluation in terms of Accuracy, Precision, Recall and F1-score Word embeddings Random Word2Vec Glove Wiki Glove Twitter fastText Crawl fastText Wiki
Accuracy Precision Recall F1-score 78.5 84.9 79.3 83.0 82.1 81.3
73.9 83.3 81.7 84.9 82.0 83.2
89 87.9 76.3 80.7 82.8 79.1
80.7 85.5 78.9 82.8 82.5 81.1
Fig. 4. Model evaluation using different word embeddings
A Deep Learning Architecture with Word Embeddings to Classify Sentiment
123
The model’s performance gets better using different types of word embeddings. which is expected as the pretrained word embeddings are already holding information about English words before the training of the model even starts. While in the randomly initialized setting, the values are totally random and starting from scratch to learn the relation between the words. Even with the later setting, the model achieved good accuracy that indicated the ability of CNN to capture syntactic and semantic relations between words without any prior knowledge only with using raw text. The model reached the maximum accuracy using Word2Vec word embeddings. it achieved 84.9% accuracy. Using other types of word embeddings raise the accuracy measures compared to the randomly initialized setting. This proves that the pretrained word embeddings are powerful and have prior insights of words before the training even starts. Word2Vec scoring the maximum accuracy means it is the most related This means Word2Vec embeddings are the most related pretrained type to the training, validation and testing data on our work. It can represent our data the best among the other types of word embeddings.
5 Conclusion A deep convolutional architecture is proposed in this paper. The model contains three main modules namely text preprocessing, CNN streams, and a fully connected sigmoid layer for the classification. The model is tested on the sentiment task (negative or positive emotion) using the Stanford Twitter Sentiment dataset. Using either randomly initialized word embedding or pre-trained word embeddings, the accuracy scores are good. The model achieved 78.5% accuracy using only the randomly initialized word embeddings that are set to be trainable through the training phase and achieved the maximum accuracy of 84.9% using the pretrained word embeddings (Word2Vec). The CNN model can work good with random initialization of word embeddings but using pretrained ones gives the model insights of the relation between the words that helps to improve the performance. For future work, using new types of pretrained word embeddings as ELMO and BERT will be explored. Also, Character-level embeddings will be tested either individually or with word embeddings.
References 1. Polignano, M., De Gemmis, M., Basile, P., Semeraro, G.: A comparison of wordembeddings in emotion detection from text using BiLSTM, CNN and Self-Attention. In: The 27th Conference on User Modeling, Adaptation and Personalization, pp. 63–68 (2019). https://doi.org/10.1145/3314183.3324983 2. Gaind, B., Syal, V., Padgalwar, S.: Emotion detection and analysis on social media. arXiv preprint arXiv:1901.08458 (2019) 3. De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Seventh International AAAI Conference, pp. 128–137 (2013) 4. Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review. Multimed. Tools Appl. 78(3), 3797–3816 (2019)
124
E. Hamdi et al.
5. Cheng, H., Yang, X., Li, Z., Xiao, Y., Lin, Y.: Interpretable text classification using CNN and max-pooling. arXiv preprint arXiv:1910.11236 (2019) 6. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Thirtieth AAAI Conference on Artificial Intelligence (2016) 7. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 8. Word embeddings and their use in sentence classification tasks. arXiv preprint arXiv:1610. 08229 (2016) 9. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space, pp. 1–12. arXiv preprint arXiv:1301.3781 (2013) 11. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information . Trans. Assoc. Comput. Linguist. 5, 135–146 (2017) 12. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Processing 150(12), 1–6 (2009) 13. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of ACL-02, Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002) 14. Parikh, R., Movassate, M.: Sentiment analysis of user-generated twitter updates using various classification techniques. CS224N Final Report, vol. 118 (2009). 15. Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards endto-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv: 1701.02720 (2017) 16. Qian, Y., Woodland, P.C.: Very deep convolutional neural networks for robust speech recognition. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, USA, vol. 1, no. 16, pp. 481–488 (2016) 17. Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.Y., Sainath, T.: Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019) 18. Ren, S., He, K., Girshick, R., Zhang, X., Sun, J.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2016) 19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, pp. 1–14. arXiv preprint arXiv:1409.1556 (2014) 20. Shang, R., He, J., Wang, J., Xu, K., Jiao, L., Stolkin, R.: Dense connection and depthwise separable convolution based CNN for polarimetric SAR image classification. Knowl. Based Syst. 105542 (2020) 21. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015) 22. Kim, Y.: Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv: 1408.5882 (2014) 23. Severyn, A., Moschitti, A.: UNITN: training deep convolutional neural network for twitter sentiment classification. In: Proceedings of the 9th International Workshop on Semantic Evaluation, pp. 464–469 (2015). https://doi.org/10.18653/v1/S15-2079 24. Kim, H., Jeong, Y.S.: Sentiment classification using Convolutional Neural Networks. Appl. Sci. 9(11), 2347 (2019)
A Deep Learning Architecture with Word Embeddings to Classify Sentiment
125
25. Johnson, R., Zhang, T.: Semi-supervised convolutional neural networks for text categorization via region embedding. In: Advances in Neural Information Processing Systems, pp. 919–927 (2015) 26. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929– 1958 (2014) 27. Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. (2018). https://doi. org/10.1016/j.aci.2018.08.003
Deep Neural Networks for Landmines Images Classification Refaat M. Fikry and H. Kasban(&) Nuclear Research Center, Atomic Energy Authority, P.O. 13759, Inshas, Cairo, Egypt [email protected], [email protected]
Abstract. This paper presents an efficient solution for automatic classification between the Anti-Tank (AT) landmines signatures and standard hyperbolic signatures obtained from other objects, including the Anti-personnel (AP) landmines based on pretrained deep Convolutional Neural Network (CNN). Specifically, two deep learning techniques have been tested and compared with another published landmine classification method. The first technique is based on VGG-16 pertained network for both features extraction and classification from the dataset of landmines images. While the second technique use Resnet-18 pretrained network for features extraction and Support Vector Machine (SVM) used for classification. The proposed algorithm has been tested using dataset of landmines images taken by the Laser Doppler Vibrometer based Acoustic to Seismic (LDV-A/S) landmine detection system. The results show that, the deep learning-based technique give higher classification accuracy than published landmine classification method. The Resnet-18 pretrained networkbased and SVM classification gives better average accuracy than VGG-16 pertained network-based classification. Keywords: Landmines
LDV-A/S CNN Deep neural networks
1 Introduction There is no doubt that landmines are among the main problems affecting the entire world. Egypt one of the countries affected by this problem. Over 20 million landmines are subject to explosion anytime and these landmines inhabit a large region estimated at 3800 km2. Landmines are usually explosive devices caused by victims, which are positioned on or near the ground until a person or animal causes its detonating system. Some landmines include explosives, while the others contain shrapnel pieces. They’re categorized into AP landmines, which made for killing human and AT landmines which created for destroying the vehicles [1]. AT landmines are heavier than AP landmines and carry more explosives than AP landmines, which also need extra pressure or weight to detonate [2]. The main obstacles that face the landmines clearance
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 126–136, 2021. https://doi.org/10.1007/978-3-030-58669-0_11
Deep Neural Networks for Landmines Images Classification
127
are; absence of mines maps, the mine locations change due to physical and climatic conditions, the different types of AP and AT landmines, and the high removal costs while the landmines production cost is lower than detection and removal cost [3]. Many algorithms and techniques have been made and created for landmines detection and removal. Each algorithm has its cons and pros [1, 4, 5]. One of the accurate and efficient landmine detection systems is the LDV-A/S system. The collected data from this system include mixtures of mines, blank spots, and clutters [4]. LDV-A/S data interpretation is performed manually and off-line, depending on the experience and skills of the trained operator. This process is time and efforts consume rather than the human skill dependent leads to inconsistencies and errors in the interpretation process. here are many algorithms which are used in landmines detections based on LDV detection system [5–10], also some techniques are used in detection but based on GPR system like [11–13] whereas references [14–16] could classify between AP and AT mines based on LDV-A/S system using MFCC algorithms as a feature extractor and ANN and SVM as a classifier with different techniques. Also [17] successes to classify between AT and AP mines based on GPR using CNN. The classification process of hidden landmine objects contains of two phases for performing both the training of the input image designs and the evaluation of the testing image sets. They’re feature extraction and classification phases as in Fig. 1. The feature extraction stage means converting the input training images into a series of features vectors contain the information which discriminate the main image features. The output of this stage is the feature matrix, which combines the features vectors coefficients [15, 16]. This features matrix is used as a reference for two processes during the classification process. The classifier employs a collection of feature matrices from various training images to build some type of model for model training. After that, testing of this model should really be executed using a test image set to validate the performance of the created design and attain the ideal design enabling classification to get place. Recently, deep learning algorithms have been commonly used with difficult feature extraction and other image fields in conventional RGB object classification and medical images. Conventional deep learning algorithms, such as convolution neural networks, can solve an object classification and recognition problem, although they can achieve good classification accuracy, but rely too much on large amounts of data and long-term practice [18–24]. For the classification and recognition of objects, how to preserve the validity of the deep learning algorithm with less samples are of great importance. Such problems were significantly solved when the transfer learning algorithm came into the vision of the researchers [25–27]. In case of land mine recognition, there’s also a considerable danger that the process will pose to the people involved. Given very good results obtained by deep learning in several recognition and classification tasks [28–30]. In this paper, two techniques from deep learning are chosen for landmine classification based on CNN. First one, VGG-16 pertained network is chosen as both feature extractor and classifier method. While in the second one, Resnet-18 pretrained network is chosen as feature extractor and SVM is used as classifier.
128
R. M. Fikry and H. Kasban Training Images Feature Extractor
Or
Feature Coefficients
Classifier
Testing decision
Testing Images
Fig. 1. Landmine classification system.
Our experimental strategy running on LDV-A/S actual data obtained from a test site highlights the key benefits of using the planned methods with respect to other state-ofthe-art solutions. More specifically: (i) Although our proposed approach does not rely on any theoretical modeling, it is less vulnerable to mistakes because of simplified interpretations or simplifications of models (e.g. linearization, etc.); (ii) the ability of the proposed method of detecting the images with small patches (iii) the possibility of embedding also real acquisitions in the training step enables to improve system performance up to 100% of accuracy. This paper is organized as follows. In Sect. 2, a brief overview on LDV-A/S system is presented. Section presents the traditional landmine classification approach. Section 4 shows the two proposed techniques of deep learning algorithms which are used in classification. In Sect. 5, the obtained results are shown including different performance measures to evaluate classification accuracy. Finally, some conclusions remarks are given in Sect. 6.
2 LDV-A/S System Figure 2 shows the block diagram of the main components used in the LDV-A/S detection technique of buried objects. Examples for landmine images obtained with this technique are shown in Fig. 3. The LDV emits laser beams to a vibrating surface of the land area under test [1]. The surface vibrations lead to reflected laser light doppler frequency shifting. A photo detector detects the backward light returning into the LDV from the buried landmine object on the opposite path. This light is modulated and could be get details about the surface speed along the direction of laser beam. The voltage obtained from the output signal is directly proportional to the instantaneous vibration point surface velocity. A personal computer monitor (PC) shows a 2-D image of the ground surface analyzed by the XY mirrors. A measuring grid is defined before scanning and superimposed on the image of the ground surface. Due to its high sensitivity to detect AP and AT mines, excellent spatial resolution and long working distances, the LDVs are especially suitable for this measuring application.
Deep Neural Networks for Landmines Images Classification
129
Fig. 2. Simple diagram of the LDV-A/S system
Fig. 3. Samples of landmine images
3 Conventional Landmine Classification Technique There are many drawbacks to the current conventional landmine classification, based on geometric details. Object identification from the landmine image is accomplished by thresholding the landmine image to exclude the dark background by a certain threshold. This mask removes the background image and leaves only details about the important objects. Then an area thresholding step based on the areas of objects is carried out for removing the unwanted area clutters. An image preprocessing algorithm is performed prior to thresholding such as morphological operations [1]. Figure 4 shows a block diagram of a conventional method of landmine detection [18]. Thresholding of intensity may not remove all of the unwanted noise and clutters in the images. The problem of noise effect is a matter seldom investigated by researchers in this field.
130
R. M. Fikry and H. Kasban
Landmine Image
Histogram first trough Estimation
Histogram
Pre-Porcessing
Intensity Thresholding
Area Thresholding
Decision making
Classification Result
Fig. 4. Steps of the conventional landmine classification technique
4 Proposed Landmine Classification Approach 4.1
VGG-16 Pretrained Network Implementation
Fig. 5. VGG-16 model architecture
fc 4096
fc 4096
fc 4096 Size:7
Pool / 2
3x3 conv. 512
3x3 conv. 512
3x3 conv. 512 Size:14
Pool / 2
3x3 conv. 512
3x3 conv. 512
3x3 conv. 512 Size:28
Pool / 2
3x3 conv. 256
3x3 conv. 256
3x3 conv. 256 Size:56
Pool / 2
3x3 conv. 128
Pool / 2
3x3 conv. 64
3x3 conv. 128 Size:112
Size:224
3x3 conv. 64
This technique uses the transfer learning to retrain a CNN to classify a new set of images. VGG-16 pretrained image classification network has been trained over large number of images and it can be used images classification up to 1000 object categories. This type of the CNN network is rich features representations for a wide range of images. Very deep convolutional network for large scale image recognition (VGG-16) architecture is introduced by Simonyan and Zisserman [31]. This architecture consists of a 16-layer network comprised of convolutional layers as shown in Fig. 5. For few data sets problem in the deep learning contribute to over-fitting and ideal local solutions, transfer learning solves this problem to a certain extent. However, it is often difficult to solve the under-adaptation problem of the transfer learning. Figure 6 shows the steps of transfer learning from pretrained network.
Deep Neural Networks for Landmines Images Classification
131
Fig. 6. Transfer learning workflow
4.2
Resnet Deep Featurizer and SVM Classifier
The learned image features are extracted from a pre-trained CNN in this section, and using these features for training the classifier. Extraction of features is the fastest and simplest way utilize the representational power of pre-trained deep networks. SVM is used as classifier in this paper. The Resnet-18 network consists of a hierarchical representation of input images. The deeper layers are the higher features level. The global pooling layers pools the input features over the total spatial locations, the total features are 512. Figure 7 shows the proposed workflow of this technique.
Fig. 7. Feature extraction workflow from pretrained network.
5 Results and Discussion In this section, the accuracy and performance of the proposed technique of deep learning networks based on transfer learning from pretrained networks are demonstrated. The algorithms are implemented in MATLAB 2019A environment using deep learning toolbox in MATLAB. The algorithms are performed via a laptop computer with detailed technical parameters shown as one CPU of Intel core I33120 [email protected] GHz and 2 core processors, 4 logical processors, one RAM of 6 GB, and 64-bit system type.
132
5.1
R. M. Fikry and H. Kasban
Image DataSet
The proposed automated object detection techniques are applied in this section to 100 images of different types of AT and AP landmines buried at varying depths. The LDVA/S device scans certain images. The collected contains 100 images; 50 AP images and 50 AT images. A summary of the images data is shown in Table 1. For both techniques, 70% from the collected images are chosen for training purpose from both two types of landmines and 30% for testing and validate the accuracy in each technique. To test the efficiency of both the proposed deep learning techniques, three types of noise with different degrees are added to test dataset images and measure the accuracy according to the following equations: For Gaussian noise: ðzuÞ2 1 PGðzÞ ¼ pffiffiffiffiffiffi e 2r2 r 2p
ð1Þ
Where z is gray level, l is mean value and r is the standard deviation. Assume in our case that l = 0 and r = 10(x/10), x = 1:10. For Speckle noise Rsn ði, jÞ = Msn ði, jÞKsn ði, jÞ þ nsn ði, jÞ
ð2Þ
where Msn is the original image, Ksn is the multiplicative component, Rsn is the observed image and nsn is the speckle noise additive component. With mean = 0, and variance = x/5000, x = 1:10. For Salt and pepper noise the mean is 0 and the variance = x/1000, x = 1:10. After training process is implemented on the training set of both types of landmines using the modified architecture of VGG-16 network, the accuracy and loss function are shown in Fig. 8 which shows the prefect accuracy 100% for the training dataset classification.
Fig. 8. Accuracy and loss values versus number of training epochs during the training process.
Deep Neural Networks for Landmines Images Classification
133
Table 1. Image dataset Type & Model AT VS 2.2 AT VS 2.2 AT VS 2.2 AT VS 1.6 AT VS 1.6
Buried depth (cm) 2.5 5 10 2.5 7.5
No. of images 11 4 3 3 3
AT M 15
7.5
6
AT M 19
5
8
Type & Model AT EM 12 AT TMA 4 AT TMA 4 AP VS 5.0 AP VAL 69 AP VAL 69 AP PMD 6
Buried depth (cm) 5 15 10 5 2.5
No. of images 6 4 2 25 3
5
8
5
7
Then the test process is applied on the test dataset which is chosen randomly (30% of the total images for each type) which shows also 100% accuracy of classification. Applying the different types of the noise on the testing dataset to check the accuracy of this technique, Tables 2, 3 and 4 show the classification accuracy using different classification method for image distorted with using AWGN, Speckle and Salt & Pepper noises. The results demonstrate increasing in the classification rate by rising the SNR. The results show that the usage of the Resnet-18 Pretrained Network + SVM gives a classification rate of 98.67% at SNR = 0 dB and it increases up to 100% at 10 dB SNR. When applying the Resnet-18 pretrained network technique on the training set of the images, the features feed the SVM classifier is applied, 100% accuracy is verified for both training and testing classification between AP and AT landmines.
Table 2. Classification accuracy (%) using different classification method for image distorted with AWGN noise SNR (dB) 0 5 10 15 20
Traditional approach [18] 90 94 96 98 98
GoogleNet pertained network 97.33 98.33 99.33 99.67 100
Resnet-18 pretrained network + SVM 98.67 99.33 100 100 100
134
R. M. Fikry and H. Kasban
Table 3. Classification accuracy (%) using different classification method for image distorted with speckle noise SNR (dB) 0 5 10 15 20
Traditional approach [18] 98 98 98 98 98
GoogleNet pertained network 99.33 99.67 100 100 100
Resnet-18 pretrained network + SVM 100 100 100 100 100
Table 4. Classification accuracy using different classification method for image distorted with salt & pepper noise SNR (dB) 0 5 10 15 20
Traditional approach [18] 92% 98% 98% 98% 98%
GoogleNet pertained network 97.33 98.67 99.33 99.67 100
Resnet-18 pretrained network + SVM 100 100 100 100 100
6 Conclusion In this article, an effective solution for automatic classification between AT signatures mines and the other items such as AP mines or standard hyperbolic signatures based on pretrained deep CNN with LDV-A/S system is presented. Specifically, two techniques from deep learning are chosen, first one, a VGG-16 pertained network is chosen as both feature extractor and classifier method. While in the second one, Resnet-18 pretrained network is chosen as feature extractor and SVM is used as classifier. Both techniques give very high accuracy (100%) in landmines image without any noise types added while by adding the noise, the Resnet-18 pretrained network technique only gives 100% accuracy for both training and testing classification between AP and AT landmines.
References 1. Kasban, H.: Detection of buried objects using acoustic waves, M.Sc. thesis, Faculty of Electronic Engineering, Department of Electronics and Electrical Communications Engineering, Menoufia University (2008) 2. Paik, J., Lee, C., Abidi, M.: Image processing-based mine detection techniques: a review. Subsurf. Sens. Technol. Appl. 3, 153–202 (2002)
Deep Neural Networks for Landmines Images Classification
135
3. El-Qady, G., Al-Sayed, A.S., Sato, M., Elawadi, E., Ushijima, K.: Mine detection in Egypt: Evaluation of new technology. International Atomic Energy Agency (IAEA), IAEA (2007) 4. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., Abd El-Samie, F.: Automatic object detection from acoustic to seismic landmine images. Presented at the International Conference on Computer Engineering & Systems, Cairo - Egypt (2008) 5. Travassos, X.L., Avila, S.L., Ida, N.: Artificial neural networks and machine learning techniques applied to ground penetrating radar: a review. Appl. Comput. Inform. (2018) 6. Kasban, H., Zahran, O., Elaraby, S., El-Kordy, M., Abd El-Samie, F.: A comparative study of landmine detection techniques. Sens. Imaging Int. J. 11, 89–112 (2010) 7. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., El-Rabaie, E.S., Abd El-Samie, F.: Efficient detection of landmines from acoustic images. Prog. Electromagnet. Res. C 6, 79–92 (2009) 8. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., Abd El-Samie, F.: False alarm rate reduction in the interpretation of acoustic to seismic landmine data using mathematical morphology and the wavelet transform. Sens. Imaging 11(3), 113–130 (2010) 9. Makki, I.: Hyperspectral imaging for landmine detection, Ph.D. thesis, Electricaland Electronics Engineering Department, Lebanese University and Politecnico Di Torino (2017) 10. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., El-Rabaie, E.S., Abd El-Samie, F.: Optimizing automatic object detection from images in laser doppler vibrometer based acoustic to seismic landmine detection system. In: National Radio Science Conference, NRSC, Proceedings (2009) 11. Lameri, S., Lombardi, F., Bestagini, P., Lualdi, M., Tubaro, S.: Landmine detection from GPR data using convolutional neural networks. Presented at the 25th European Signal Processing Conference (EUSIPCO), Kos, Greece (2017) 12. Bestagini, P., Lombardi, F., Lualdi, M., Picetti, F., Tubaro, S.: Landmine detection using autoencoders on multi-polarization GPR volumetric data, arXiv, vol. abs/1810.01316 (2018) 13. Silva, J.S., Guerra, I.F.L., Bioucas-Dias, J., Gasche, T.: Landmine detection using multispectral images. IEEE Sens. J. 19, 9341–9351 (2019) 14. Elshazly, E., Elaraby, S., Zahran, O., El-Kordy, M., Abd El-Samie, F.: Cepstral detection of buried landmines from acoustic images with a spiral scan. Presented at the ICENCO’2010 2010 International Computer Engineering Conference: Expanding Information Society Frontiers (2010) 15. Elshazly, E., Zahran, O., Elaraby, S., El-Kordy, M., Abd El-Samie, F.: Cepstral identification techniques of buried landmines from degraded images using ANNs and SVMs based on spiral scan. CIIT Int. J. Digit. Image Process. 5(12), 529–539 (2013) 16. Elshazly, E., Elaraby, S., Zahran, O., El-Kordy, M., El-Rabaie, E.S., Abd El-Samie, F.: Identification of buried landmines using Mel frequency cepstral coefficients and support vector machines (2012) 17. Almaimani, M.: Classifying GPR images using convolutional neural networks, M.Sc. thesis, Computer Science Department, University of Tennessee at Chattanooga, Chattanooga, Tennessee (2018) 18. Zhang, L., Liu, J., Zhang, B., Zhang, D., Zhu, C.: Deep cascade model-based face recognition: when deep-layered learning meets small data. IEEE Trans. Image Process. 29, 1016–1029 (2020) 19. Yayeh Munaye, Y., Lin, H.P., Adege, A.B., Tarekegn, G.B.: UAV positioning for throughput maximization using deep learning approaches. Sensors 19(12), 2775 (2019) 20. Vishal, V., Ramya, R., Srinivas, P.V., Samsingh, R.V.: A review of implementation of artificial intelligence systems for weld defect classification. Mater. Today Proc. 16, 579–583 (2019)
136
R. M. Fikry and H. Kasban
21. Verma, A., Singh, P., Alex, J.S.R.: Modified convolutional neural network architecture analysis for facial emotion recognition. In: 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 169–173 (2019) 22. Treebupachatsakul, T., Poomrittigul, S.: Bacteria classification using image processing and deep learning. In: 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), pp. 1–3 (2019) 23. Talo, M., Baloglu, U.B., Yıldırım, Ö., Rajendra Acharya, U.: Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn. Syst. Res. 54, 176–188 (2019) 24. Stephen, O., Maduh, U.J., Ibrokhimov, S., Hui, K.L., Al-Absi, A.A., Sain, M.: A multipleloss dual-output convolutional neural network for fashion class classification. In: 2019 21st International Conference on Advanced Communication Technology (ICACT), pp. 408–412 (2019) 25. Deng, C., Xue, Y., Liu, X., Li, C., Tao, D.: Active transfer learning network: a unified deep joint spectral-spatial feature learning model for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 57, 1741–1754 (2019) 26. Côté-Allard, U., Fall, C.L., Drouin, A., Campeau-Lecours, A., Gosselin, C., Glette, K., et al.: Deep learning for electromyographic hand gesture signal classification using transfer learning. IEEE Trans. Neural Syst. Rehabil. Eng. 27, 760–771 (2019) 27. Ahn, E., Kumar, A., Feng, D., Fulham, M., Kim, J.: Unsupervised deep transfer feature learning for medical image classification. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 1915–1918 (2019) 28. Harkat, H., Ruano, A.E., Ruano, M.G., Bennani, S.D.: GPR target detection using a neural network classifier designed by a multi-objective genetic algorithm. Appl. Soft Comput. 79, 310–325 (2019) 29. Giovanneschi, F., Mishra, K.V., Gonzalez-Huici, M.A., Eldar, Y.C., Ender, J.H.G.: Dictionary learning for adaptive GPR landmine classification. IEEE Trans. Geosci. Remote Sens. 57, 10036–10055 (2019) 30. Dumin, O., Plakhtii, V., Shyrokorad, D., Prishchenko, O., Pochanin, G.: UWB subsurface radiolocation for object location classification by artificial neural networks based on discrete tomography approach. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 182–187 (2019) 31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Deep Convolutional Neural Networks for ECG Heartbeat Classification Using Two-Stage Hierarchical Method Abdelrahman M. Shaker(&), Manal Tantawi, Howida A. Shedeed, and Mohamed F. Tolba Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt {Abdelrahman.shaker,manalmt,dr_howida, fahmytolba}@cis.asu.edu.eg
Abstract. Electrocardiogram (ECG) is widely used in computer-aided systems for arrhythmia detection because it provides essential information for the heart functionalities. The cardiologist uses it to diagnose and detect the abnormalities of the heart. Hence, automating the process of ECG heartbeat classification plays a vital role in the clinical diagnosis. In this paper, a two-stage hierarchical method is proposed using deep Convolution Neural Networks (CNN) to determine the category of the heartbeats in the first stage, and then classify the classes belonging to that category in the second stage. This work is based on 16 different classes from the public MIT-BIH arrhythmia dataset. But the MIT-BIH dataset is unbalanced, which degrades the classification accuracy of the deep learning models. This problem is solved by using an adaptive synthetic sampling technique to generate synthetic heartbeats to restore the balance of the dataset. In this study, an overall accuracy of 97.30% and an average accuracy of 91.32% are obtained, which surpasses several ECG classification methods. Keywords: Heartbeat classification Networks (CNN)
Arrhythmias Convolution Neural
1 Introduction Cardiovascular Disease (CVD) is one of the most serious health problems and it is the world’s leading global cause of death. CVD involves a large number of health conditions, including heart and blood vessel disease, heart attack, stroke, heart failure, and arrhythmia. Every year around 17.9 million die from CVD, which is 31% of all deaths worldwide [1]. Arrhythmia refers to an abnormal heartbeat, it may be too slow, too fast, or irregular heartbeat. An abnormal heartbeat can affect the normal work of the heart functions, such as pumping inadequate blood to the body [2]. ECG is a common tool used for diagnosing cardiac arrhythmias by measuring the electrical activity of the heart over time. An ECG record consists of consecutive heartbeats, each heartbeat composes of three complex waves, including P, QRS, and T waves. The cardiac activities of the heart can be measured by these complex waves and their study is vital in arrhythmias diagnosis [3]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 137–147, 2021. https://doi.org/10.1007/978-3-030-58669-0_12
138
A. M. Shaker et al.
The category of the arrhythmia can be identified by determining the classes of some consecutive heartbeats [4]. Therefore, there is a need to determine the class of each heartbeat. The process of analyzing an enormous number of ECG records is very complex and requires much time from the cardiologists. Hence, automating this process is very crucial in discovering different cardiac disorders. There are two different approaches for automatic heartbeat classification; the first approach is to extract the features using hand-crafted methods and fed the extracted features to classification algorithms, such as Support Vector Machines (SVM) [5, 6], Feedforward Neural Network (FNN) [7], Probabilistic Neural Networks (PNN) [8], General Regression Neural Networks (GRNN) [9]; the second approach is to use deep neural networks because its structure combines the feature extraction and classification stages into a single learning method without the need for hand-engineering features, including Convolutional Neural Networks (CNN) [10, 11], Deep Convolution Neural Networks (DCNN) [12], Long Short Term Memory (LSTM) [13], and combination of CNN and LSTM [14]. Deep learning—in the last several years—has advanced rapidly and its techniques have shown remarkable success in various fields, including computer vision [15], bioinformatics [16], and medical diagnosis [17]. In this paper, a two-stage hierarchical approach is proposed to classify 16 classes of the public MIT-BIH arrhythmia dataset into one of the five main categories in the first stage, and then determine the class belonging to that category in the second stage, using Convolution Neural Networks (CNN), with superior performance than other existing studies. The rest of the paper is structured as follows: The related work is provided in section two. The proposed architecture and methodology are described in section three. The experimental results are presented in section four. Finally, the conclusion and future work are provided in section five.
2 Related Work In the literature, many researchers utilized the first approach of ECG heartbeat classification by using various features extraction methods, including Principal Component Analysis (PCA), Discrete Wavelet Transform (DWT), Independent Component Analysis (ICA), and Higher Order Spectra (HOS). While in the classification stage, several classification algorithms have been utilized, including PNN, GRNN, GNN, and SVM. R.J. Martis et al. [18] used DWT and PCA to extract the features and SVM to classify five different classes from the MIT-BIH arrhythmia dataset, they obtained an overall accuracy of 98.11%. On the other hand, Yazdanian et al. [19] considered the same five classes and achieved an overall accuracy of 96.67% using a combination of wavelet transform features in addition to morphological and time-domain features, these features were fed into SVM in the classification stage. S. Sahoo et al. [20] classified four different classes from the MIT-BIH arrhythmia dataset using SVM with the usage of DWT in the feature extraction stage, they achieved an overall accuracy of 98.39%. On the other hand, S.N. Yu [21] used ICA to
Deep Convolutional Neural Networks for ECG Heartbeat Classification
139
extract the features and classified different eight classes using neural networks with an overall accuracy of 98.71%. In [22], the authors proposed a feature combination between ICA and DWT with the usage of PNN for classification between five classes from the MIT-BIH dataset and an overall accuracy of 99.28% was obtained. While in [23], the feature set composed of a combination of linear and non-linear features, and SVM was used to classify five different classes. They obtained an overall accuracy of 98.91%. El-Saadawy et al. [24] considered 15 classes of the MIT-BIH arrhythmia dataset. They proposed a hierarchical method based on two-stages. DWT and PCA were used to extract the morphological features. Therefore, the extracted features concatenated with four RR features. SVM was used in the classification stage and an overall accuracy of 94.94% was obtained. The approaches of deep learning have the capability of learning the most relevant features automatically from the data. Hence, the traditional steps that are required in the first approach namely feature extraction, feature reduction, and classification can be developed in one learning method, which is called end-to-end learning. Recently, there are studies that applied several deep learning methods for ECG classification. Zhang [10] proposed a 6-layer CNN model, comprising two convolutional layers, followed by two downsampling layers, and two fully connected layers. They considered five classes of the MIT-BIH dataset, and overall accuracy of 97.50% was obtained. In [11], the authors considered 14 classes and proposed 1D-CNN consisting of 10 layers. They obtained an overall accuracy of 97.8%. Acharya et al. [25] classified five different categories of MIT-BIH arrhythmia dataset using a 9-layer CNN model. To overcome the imbalance problem in the MITBIH dataset, they calculated Z-score of the ECG heartbeats and generated synthetic ones by varying the standard deviation and the mean. They achieved an overall accuracy of 94.03% using the synthetic data, and an overall accuracy of 89.07% when the model was trained only with the original data. A. M. Shaker et al. [12] provided a generalization method of deep CNN for classification of ECG heartbeats to 15 different types of arrhythmias from the MIT-BIH dataset. They solved the imbalance problem by generating synthetic heartbeats using Generative Adversarial Networks (GANs). After the dataset had been balanced using GAN, they obtained an overall accuracy above 98.0%, precision above 90.0%, specificity above 97.4%, and sensitivity above 97.7%. In this study, the imbalance problem of the MIT-BIH dataset is solved by using Adaptive Synthetic (ADASYN) sampling technique [26] by generating synthetic heartbeats based on the density distribution of the data. Also, we propose a two-stage hierarchical approach to overcome the hand-engineering methods of feature extraction in the literature. The proposed approach classifies 16 different classes from the MITBIH arrhythmia dataset using data only from lead 1.
140
A. M. Shaker et al.
3 Methodology We discuss in this section the proposed techniques for preprocessing and classification. A detailed description of each technique will be presented in the following subsections. 3.1
Preprocessing Stage
The first step of this stage is to increase the signal-to-noise ratio by enhancing the quality of the signal. The noise of each ECG record is reduced by removing the undesirable frequencies (low and high frequencies) out of the signal using Butterworth filter with a range [0.5–40] Hz. Therefore, each ECG record of the MIT-BIH dataset is divided dynamically into multiple heartbeats using the positions of the R peaks, each heartbeat should contain P, QRS, and T waves. Detecting the beginning and the end of each heartbeat using a fixed segmentation method is not always reliable because such assumption does not consider the heart rate variations. Hence, a dynamic heartbeat segmentation method is utilized to overcome the heart rate variability as proposed in [24]. The dynamic segmentation strategy measures the number of samples before and after each R peak based on the duration between the current and previous R peaks (RR previous) as well as the duration between the current and next R peaks (RR next). Thereafter, the number of samples of the largest interval is divided into a part before the R peak and the other part is considered after the R peak. Such method ensures that each heartbeat contains the three main waves in an invariant way to the variability of the heart rate. Finally, each heartbeat is resized to contain 300 samples and the amplitude is normalized between [0–1]. Figure 1 shows the results of the preprocessing stage.
(a) Normal heartbeat
(b) Premature ventricular contraction heartbeat
Fig. 1. Segmented & filtered heartbeats after applying the proposed preprocessing.
Deep Convolutional Neural Networks for ECG Heartbeat Classification
3.2
141
The Proposed Method for Classification Stage
Based on ANSI/AAMI EC57: 1998 standard, the 16 classes of the MIT-BIH dataset are mapped into five categories as shown in Table 1. The proposed method, as shown in Fig. 2, classifies the heartbeats to one of the five main categories in the first stage and recognizes the class that falls in this category in the second stage. Table 1. The five main categories and MIT-BIH classes mapping. Category N S V Q F
MIT-BIH Classes NOR, LBBB, RBBB, AE, NE APC, AP, BAP, NP PVC, VE, VF FPN, PACE, UN VFN
Heartbeats
The proposed CNN model
S Category
V Category
The proposed CNN model
The proposed CNN model
5 Classes
4 Classes
3 Classes
F Category
The proposed CNN model
3 Classes
Stage2
The proposed CNN model
Q Category
Stage1
N Category
1 Class
Fig. 2. The proposed architecture of the two-stage hierarchical method.
There is no need for classification network in stage 2 for F category because it has only one class. The heartbeats that have been correctly classified in stage 1 only will be passed to the second stage. The proposed CNN model is inspired from the VGG network [27] with some modifications because the VGG network is very deep and dedicated for large scale images not 1D signals. The first two layers of the proposed network are 1D convolutional layers with 64 filters and kernel size of three, followed by one Max pooling
142
A. M. Shaker et al.
layer with pool size of two, followed by another two 1D convolutional layers with 128 filters and kernel size of five, followed by Max pooling layer with pool size of two, followed by three 1D convolutional layers with 256 filters and kernel size of five. After that, two fully connected layers are added with number of neurons 128 and 64 respectively. Finally, the output layer contains N neurons; where N is the number of classes for each category. The proposed model is shown in Fig. 3.
Two convolutional Two convolutional layers followed by layers followed by pooling layer pooling layer
Three convolutional layers followed by pooling layer
FC layers
Fully connected layers
Heartbeat
Fig. 3. The proposed CNN model for each category.
4 Experimental Results In this section, we describe the utilized dataset and demonstrate how the data is divided into training and testing sets. Also, the achieved results and a comparison with the existing studies are provided. 4.1
Dataset Description
MIT-BIH arrhythmia dataset [28] is the most utilized dataset in the literature. It contains 48 ECG records of different ages and genders; each one is a 30-minute-long with a sampling frequency of 360 Hz. Each record is attached with the beat’s annotations
Deep Convolutional Neural Networks for ECG Heartbeat Classification
143
and the locations of the R peaks, which used as the ground truth for training and testing stages. In this study, ECG records only from lead 1 are considered. The beats of the records are divided into training and testing sets, the data division in [24] is followed for comparison sake. The ratio of training and testing sets is not the same for all the classes because the number of heartbeats for the classes is not distributed in an equal way. The training set of the normal class, which is the dominant class in the dataset, consists of 13% of the total normal heartbeats, whereas training percentage of 40% is considered for some classes that have lower number of beats. On the other hand, training percentage of 50% is considered for the classes that have limited number of heartbeats. The division of the heartbeats is shown in Table 2.
Table 2. Training ratio for each class utilized in this study. Heartbeat type
Normal beat (N) Left Bundle Branch block (LBBB) Right Bundle Branch block (RBBB) Atrial Premature Contraction (APC) Premature Ventricular Contraction (PVC) Paced (PACE) Aberrated Atrial Premature (AP) Ventricular Flutter Wave (VF) Fusion of Ventricular and Normal (VFN) Blocked Atrial Premature (BAP) Nodal (junctional) Escape (NE) Fusion of Paced and Normal (FPN) Ventricular Escape (VE) Nodal (junctional) Premature (NP) Atrial Escape (AE) Unclassifiable (UN) 16 Classes
4.2
Number of total beats
Training ratio
75017 8072 7255 2546 7129
13% 40% 40% 40% 40%
7025 150 472 802
40% 50% 50% 50%
193 229 982 106 83 16 15 110092
50% 50% 50% 50% 50% 50% 50% 21.88%
Number of training beats 9753 3229 2902 1019 2852 2810 75 236 401 97 115 491 53 42 8 7 24090
Results
During the preprocessing stage, the records of the MIT-BIH dataset are segmented into separate heartbeats. The training set is selected randomly of 24090 heartbeats based on the data division in Table 2, and the other 86002 heartbeats are used as the testing set. Adam optimizer [29] is utilized to train the proposed network, the weights of the network are initialized with standard normal distribution. In this study, data from lead1 only of the MIT-BIH arrhythmia dataset is utilized and 16 classes are considered. The data augmentation using ADAYSN is done across
144
A. M. Shaker et al.
the two stages, the number of samples in the first stage for the classes of categories S, V, F, and Q is increased to match the number of heartbeats in category N. In the second stage, the number of heartbeats for each category is balanced separately based on the dominant class in each category. The performance is evaluated by measuring the average accuracy for each class and the overall accuracy across the two stages. The achieved overall accuracy in the first stage is 98.2%, whereas the overall accuracy across the two stages is 97.3%. The achieved average accuracy for each class per category is shown in Table 3. Table 3. Average accuracy for each class per category. Class Average accuracy Normal beat (N) 99.49% Left Bundle Branch block (LBBB) 99.79% Right Bundle Branch block (RBBB) 99.77% Atrial Premature Contraction (APC) 99.71% Premature Ventricular Contraction (PVC) 99.52% Paced (PACE) 99.90% Aberrated Atrial Premature (AP) 85.71% Ventricular Flutter Wave (VF) 95.57% Fusion of Ventricular and Normal (VFN) 78.30% Blocked Atrial Premature (BAP) 97.62% Nodal (junctional) Escape (NE) 80.00% Fusion of Paced and Normal (FPN) 99.57% Ventricular Escape (VE) 96.00% Nodal (junctional) Premature (NP) 97.22% Atrial Escape (AE) 100.00% Unclassifiable (UN) 33.33% 16 Classes 91.32%
The comparison between the presented work and other recent existing studies is given in Table 4. It demonstrates that large number of classes is considered, and the overall accuracy has been improved compared to published results.
Deep Convolutional Neural Networks for ECG Heartbeat Classification
145
Table 4. Comparison of this work with other studies. Study Martis et al. [18] Yazdanian et al. [19] Sahoo el al [20] Yu er al [21] Zhang et al. [10] Acharya [25] El-sadawy [24] Proposed method
#of classes 5 5
Feature set
Classifier
PCA Wavelet
SVM SVM
Overall accuracy 98.11% 96.67%
4 8 5 5 15 16
DWT ICA End-to-end End-to-end DWT Two-stage hierarchical method
SVM NN 1D-CNN 1D-CNN SVM 1D-CNN
98.39% 98.71% 97.50% 94.03% 94.94% 97.30%
5 Conclusion and Future Work In this paper, a two-stage hierarchical method has been proposed to classify 16 classes of the public MIT-BIH arrhythmia dataset. Dynamic heartbeat segmentation method is used in the preprocessing stage to overcome the variability of the heart rate. The imbalance problem of the MIT-BIH dataset is solved by using an oversampling technique (ADAYSN) to restore the balance of the dataset. An overall accuracy of 97.30% across the two stages and an average accuracy of 91.32% are achieved, which surpasses other existing studies as well as more classes (16 classes) are considered. Further research will be done to utilize the ECG records of the two leads. Also, we aim to deploy the proposed model to real-time monitoring systems.
References 1. World Health Organization. Cardiovascular diseases (CVDs) (2017). http://www.who.int/ mediacentre/factsheets/fs317/en/ 2. American Heart Association Arrhythmia (2017). https://www.heart.org/en/health-topics/ consumer-healthcare/what-is-cardiovascular-disease 3. Artis, S.G., Mark, R.G., Moody, G.B.: Detection of atrial fibrillation using artificial neural networks. In: Proceedings of the Computers in Cardiology, Venice, Italy, 23–26 September 1991, pp. 173–176. IEEE, Piscataway (1991) 4. Kastor, J.A.: Arrhythmias, 2nd edn. W.B. Saunders, London (1994) 5. Moody, G.B., Mark, R.G.: The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001) 6. El-Saadawy, H., Tantawi, M., Shedeed, H.A., Tolba, M.F.: Electrocardiogram (ECG) classification based on dynamic beats segmentation. In: Proceedings of the 10th International Conference on Informatics and Systems - INFOS’16 (2016). https://doi.org/10.1145/ 2908446.2908452
146
A. M. Shaker et al.
7. Perez, R.R., Marques, A., Mohammadi, F.: The application of supervised learning through feed-forward neural networks for ECG signal classification. In: Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Vancouver, BC, Canada, 15–18 May 2016, pp. 1–4. IEEE, Piscataway (2016) 8. Zebardast, B., Ghaffari, A., Masdari, M.: A new generalized regression artificial neural networks approach for diagnosing heart disease. Int. J. Innov. Appl. Stud. 4, 679 (2013) 9. Alqudah, A.M., Albadarneh, A., Abu-Qasmieh, I., Alquran, H.: Developing of robust and high accurate ECG beat classification by combining gaussian mixtures and wavelets features. Australas. Phys. Eng. Sci. Med. 42(1), 149–157 (2019) 10. Li, D., Zhang, J., Zhang, Q., Wei, X.: Classification of ECG signals based on 1D convolution neural network. In: 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom) (2017). https://doi.org/10.1109/ healthcom.2017.8210784 11. Shaker, A.M., Tantawi, M., Shedeed, H.A., Tolba M.F.: Heartbeat classification using 1D convolutional neural networks. In: Hassanien, A., Shaalan, K., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. AISI 2019. Advances in Intelligent Systems and Computing, vol. 1058. Springer, Cham (2020) 12. Shaker, A.M., Tantawi, M., Shedeed, H.A., Tolba, M.F.: Generalization of convolutional neural networks for ECG classification using generative adversarial networks. IEEE Access 8, 35592–35605 (2020) 13. Yildirim, Ö.: A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification. Comput. Biol. Med. 96, 189–202 (2018). https://doi.org/10. 1016/j.compbiomed.2018.03.016 14. Shaker, A.M., Tantawi, M., Shedeed, H.A., Tolba M.F.: Combination of convolutional and recurrent neural networks for heartbeat classification. In: Hassanien, AE., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol. 1153. Springer, Cham (2020) 15. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018, 1–13 (2018). https://doi. org/10.1155/2018/7068349 16. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18, 851–869 (2017) 17. Bakator, M., Radosav, D.: Deep learning and medical diagnosis: a review of literature. Multimodal Technol. Interact. 2, 47 (2018). https://doi.org/10.3390/mti2030047 18. Martis, R.J., Acharya, U.R., Mandana, K., Ray, A.K., Chakraborty, C.: Application of principal component analysis to ECG signals for automated diagnosis of cardiac health. Expert Syst. Appl. 39, 11792–11800 (2012) 19. Yazdanian, H., Nomani, A., Yazdchi, M.R.: Autonomous detection of heartbeats and categorizing them by using support vector machines. IEEE (2013) 20. Sahoo, S., Kanungo, B., Behera, S., Sabut, S.: Multiresolution wavelet transform based feature extraction and ECG classification to detect cardiac abnormalities. Measurement 108, 55–66 (2017) 21. Yu, S.N., Chou, K.T.: Integration of independent component analysis and neural networks for ECG beat classification. Expert Syst. Appl. 34, 2841–2846 (2008) 22. Martis, R.J., Acharya, U.R., Min, L.C.: ECG beat classification using PCA, LDA, ICA and discrete wavelet transform. Biomed. Sign. Process Contr. 8, 437–448 (2013) 23. Elhaj, F.A., Salim, N., Harris, A.R., Swee, T.T., Ahmed, T.: Arrhythmia recognition and classification using combined linear and nonlinear features of ECG signals. Comput. Meth. Progr. Biomed. 127, 52–63 (2016)
Deep Convolutional Neural Networks for ECG Heartbeat Classification
147
24. El-Saadawy, H., Tantawi, M., Shedeed, H.A., Tolba, M.F.: Hybrid hierarchical method for electrocardiogram heartbeat classification. IET Sig. Process. 12(4), 506–513 (2018). https:// doi.org/10.1049/iet-spr.2017.0108 25. Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A., San, T.R.: A deep convolutional neural network model to classify heartbeats. Comput. Biol. Med. (2017). https://doi.org/10.1016/j.compbiomed.2017.08.022 26. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning, In: IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008) 27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) 28. MIT-BIH Arrhythmias Database. http://www.physionet.org/physiobank/database/mitdb/. Accessed 3 Apr 2020 29. Kingma, D.P., Jimmy, B.: Adam: a method for stochastic optimization. CoRR, abs/1412.6980 (2014)
Study of Region Convolutional Neural Network Deep Learning for Fire Accident Detection Ntawiheba Jean d’Amour1,2, Kuo-Chi Chang1,2,6(&), Pei-Qiang Li1, Yu-Wen Zhou1,2, Hsiao-Chuan Wang3, Yuh-Chung Lin1,2, Kai-Chun Chu4, and Tsui-Lien Hsu5 1 School of Information Science and Engineering, Fuzhou University, Fujian University of Technology, No. 33 Xuefu South Road, New District, Fuzhou 350118, Fujian, China [email protected] 2 Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China 3 Institute of Environmental Engineering, National Taiwan University, Taipei, Taiwan 4 Department of Business Management, Fujian University of Technology, Fuzhou, China 5 Institute of Construction Engineering and Management, National Central University, Taoyuan, Taiwan 6 College of Mechanical & Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan
Abstract. Fires accident is one of the disasters which take human life, infrastructure destruction due to its violence or to the delay for the rescue. Object detection is one of the popular topics in recent years, which can play the robust impact for detecting fire and more efficient to provide information to this disaster. However, this study presents the fire detection processed using region convolution neural network. We will train images of different objects in fire using ground truth labeling. After labeling images and determining the region of interest (ROI), the features are extracted from training data, and the detector will be trained and will work to each and image of fire. To validate the effectiveness of this system the algorithm demonstrates images taken from our dataset. Keywords: Fire accident detection Convolutional neural network (CNN) Region convolutional neural network (R-CNN) Region of interest Image processing
1 Introduction The last world center of fire statistics report published in 2019 done on 34 countries from 35 cities had shown that in 2017:49 Million calls for all fire safety centers, 3.1 Million of fires, 16.8 thousand civilian deaths and 47.9 thousand civilian fire injuries as it is shown in Table 1. The information about the fire is mainly provided by sensors, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 148–155, 2021. https://doi.org/10.1007/978-3-030-58669-0_13
Study of Region Convolutional Neural Network Deep Learning
149
which are suitable for indoor situations such as houses, government buildings, and industries [1, 2].
Table 1. Summary of 10 countries with many fires (2019 N0 24 CTIF) Country Calls Fires Fire death USA 34683500 1349500 3 400 Russia 146544 N/A 132844 Vietnam 93 000 N/A 4 197 France 66 628 4658600 306 600 Great Britain 63786 687413 199894 Italy 61000 1000071 325 941 Spain 46570 335317 130915 Argentina 44 536 166 952 55 265 Ukraine 42 486 229 313 84 083 Poland 38 413 519902 125892
Nowadays, the technology is developing very fast where it can be used as a tool in different activities for the good not only for human beings but for the entire environment in general. Deep learning neural network is one of the technologies which is trending where the system can work as a human neural system. CNN is a type of artificial neural network, which comes from neuroscience dating back to the proposal of the first artificial neuron in 1943 [3]. It has achieved a great success in a computer vision tasks, this share many properties with the visual system of human brain [4, 5]. CNN is a feed forward architecture, and it introduces the architecture which is nonlinear and feature extraction performance and plays the robust importance in classification [6]. In recent year, researchers thus have attempted to tackle some object detection techniques, one which presented the great success is region-based CNN (RCNN) method, which is the type of CNN extension for performing object detection tasks [7]. RCNN is defined as a visual object detection system that combines bottom up region proposals with features computed by the convolutional neural network. RCNN uses a selective search method in which region proposals are taken from an external network [8, 9]. However, this study will present to use of R-CNN object detection technology for detecting different kind of fire. The Image will be labeled and trained using Labeler train ground truth using a MATLAB deep learning toolbox. This project is organized as follows: First section is an Introduction, second one is related works, the third is implementation and the last is the conclusion and future works.
150
N. Jean d’Amour et al.
2 Related Works 2.1
Fire Accident Detection
Recently, many extensive studies have been presenting the best impact using new technologies. Ordinary camera on the scene detects fire from real time video data processing, where flame and fire flickers are detected by analyzing the video in the wavelet domain. Video based Fire Detection (VFD) where the surveillance camera and computer vision are used. The camera is placed on the hilltop and can cover the square of 100 km2 best for wildfire detection and can give the accurate information about the fire [10, 11], Cost effective fire detection with CNN for surveillance videos is proposed, where the idea inspired by Google Net architecture and is refined with special focus on computational complexity and detection accuracy [12, 13]. To detect the wildfire without barriers, unmanned aero vehicle is proposed and using Deep learning Networks for achieving the high accuracy for the wide range of aerial photographs [14, 15]. For other hand wireless sensor network (WSN) is the most used. This is the system where gate away and coordinating note is in contact and exchange data. They are indoor for house fire detection and outdoor mostly for forest fire detection. Sensors detect the smoke and temperature of the place where they are installed; if the smoke or temperature reach or exceed the set level they send information to the central node, the central note to the base station. Global positioning system (GPS) and positioning techniques can be used to find out the location and information about the place on fire. We believe that WSN will face the challenges of limited memory, limited computing and limited power consumption in the future [16, 17]. Moreover, fire detection with IOT is introduced when during real time fire detection, self-neuron network is developed from scratch, and have been trained on the dataset compiled from multiple sources, then the model is tested on a real-world fire dataset [18]. 2.2
Region Convolution Neural Network Object Detection
For region proposals and feature extraction include two parts discuss below. First is to generate the region proposals which will define the set of candidates for our detector. From an input image, RCNN computes 2000 bottom-up proposals. ROI is the part of the image which we pay more attention than other parts and it has the needed information for the detection [19]. Second is feature extraction: The process of reducing the number resources without losing important or relevant information will help the learning speed. From each region proposal RCNN uses selective search for proposal generation for producing a 4096feature vector. The architecture compatible to CNN requires the input image with 277 277 pixel which will also be needed for computing the features for a regional proposal. Then that image is processed through five CNN and two totally connected layers, extracted features from the image are fed to the Support Vector machine (SVM) to classify the presence of the object within that candidate region proposal. Localizing object with deep network and train with the highest-quality model for only a small quantity of annotated data has shown that the CNN can lead to the high object detection performance (Fig. 1) [20].
Study of Region Convolutional Neural Network Deep Learning
151
Fig. 1. Result of R-CNN: region with CNN features.
3 System Implementation and Verification 3.1
Labeling of Selected Images
This study is implemented using the MATLAB deep learning toolbox, where the dataset consists of the different images of fire downloaded from Google. The dataset is made by 50 images for each category cars, forest or houses. From our dataset, 40 images have been used for training and 10 images for testing. Each and every image to be trained is Labeled using ground truth labeler app which is in the MATLAB deep learning toolbox. The ROI is labeled rectangular for training the object detectoras it is shown on Fig. 2. Ground truth label data will facilitate to create the box label data store that will use for training the detector. It is composed of four coordinates, see Table 2.
Fig. 2. ROI ground truth labeling.
3.2
Object Detector Training
After labeling each and every image, the training option uses the sigmoid function (Sgmd) which is a nonlinear activation function, so it is used because it exists between 0 to 1, and especially it used for models where we want to predict the probability as an output (Fig. 3). The sigmoid function is the right choice because the probability of anything exists between 0 to 1. Sigmoid function is a differentiable which means it is possible to find the slope curve at any point. In updating the curve, to know in which direction or how much to change or update the curve depends upon the slope. That is
152
N. Jean d’Amour et al. Table 2. Box label datastore (Blds).
Fig. 3. Sigmoid function graph.
why we use differentiation in almost every part of Machine Learning and Deep Learning. The object detector is trained through the combination image data store (Imds) and box label data store (Blds). Bounding box plays the biggest role in reducing the range of searching features extracted from fire images which conserves the computing resources: time, processor, memory and error reduction. Figure 4 presents the flow chart of object detector training. 3.3
Results and Discussion
This fire detector has been trained on a single CPU with 8 GB RAM with 25 epochs. The detector has been well completed with the score of 0.995. For result 10 images of
Study of Region Convolutional Neural Network Deep Learning
153
Fig. 4. Train objects detector flow chart.
houses, cars and forest are used for verification. The aim of this work is to present the method that can be more efficient and more reliable for fire detection. Therefore, it becomes inevitable also to use a test data set of images of fire found on Google. So, after all the process for our model, it has been compared with ones of previous researches (Table 3). However, they are some failure because RCNN use selecting search to find out the region proposals, for the future work we will use Faster RCNN and this one is faster and more accurate than RCNN. Table 3. comparison of proposed techniques with previous researches Technique Proposed method Khan Muhammad et al. [10] Arpit Jadon et al. [18] K. Muhammad et al. [19]
Precision 0.83 0.8 0.97 0.82
Recall 0.97 0.93 0.94 0.98
Accuracy (%) 95.6 94.43 93.91 94.3
From the images in Fig. 5, images used for testing present the performance of the model for fire detection. For images of cars the percentage is 99,9%, for house one is 99,9%, other is 92,3%, finally for forest the percentage is 97,3% and 99,9% which is the range of more of the images used for testing.
154
N. Jean d’Amour et al.
Fig. 5. Fire detection results: column one shows results of forest, column two shows car and column three shows house.
4 Conclusion and Future Works This study own object detector. It is implemented from image of fire of different objects, especially houses, cars and forests. This project design and implement the design of fire detection. From international fire statics report, we have seen a huge number of fires in different countries which is accompanied by deaths and injuries. Fire detection can be the solution as the fast provider of information to the fire fighters. This will play the robust impact in reducing the number of invalid and abundant calls provide full information about the incident. For future work the project will be implemented with Faster RCNN to reduce the failure, and classify object on fire. Link the detection with the network so if there is any detection the system provides the full information to the firefight like the location, object on fire and fire intensity.
References 1. Brushlinsky, N.N., Aherens, M., Skolov, S.V., Wagner, P.: World fire statistics. Russia, International Association of Fire and Rescue Service (CTIF) (2019) 2. Chih-Cheng, L., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently safer design strategies (III) advanced thin film process and reduction of power consumption control. J. Loss Prev. Process Ind. 43, 280–291 (2015) 3. McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943) 4. Liang, M., Hu, X.: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3367–3375 (2015)
Study of Region Convolutional Neural Network Deep Learning
155
5. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G base station based on internet of things collaborative control. IEEE Access 8, 32935–32946 (2020) 6. Symeonidis, G.: Recurrent attention for deep neural object detection. Springer (2019) 7. Kriszhevsky, A., Sutkever, I., Hinton, G.E.: Image classification with deep convolution neural network. In: Advances in Neural Information Processing System (NIPS), pp. 1097– 1105 (2012) 8. Uğur Töreyin, B., Dedeoğlu, Y., Güdükbay, U., Enis Çetin, A.: Computer vision based method for real-time fire and flame detection, pp. 49–57. Elsevier (2006) 9. Enis Çetin, A., Dimitropoulos, K., Gouverneur, B., Grammalidis, N., Günay, O., Hakan Habiboǧlu, Y., Uǧur Töreyin, B., Verstockt, S.: Video fire detection. Rev. Digit. Signal Process. 23(6), 1827–1843 (2013). https://doi.org/10.1016/j.dsp.2013.07.003. ISSN 10512004 10. Muhammad, K., Ahmad, J., Mehmood, I., Rho, S., Baik, S.W.: Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6, 18174–18183 (2018) 11. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7, 105562–105571 (2019) 12. Uğur Töreyin, B., Dedeoğlu, Y., Güdükbay, U., Enis Çetin, A.: Computer vision based method for real-time fire and flame detection (2006) 13. Chang, K.-C., Chu, K.-C., Wang, H.-C., Lin, Y.-C., Pan, J.-S.: Agent-based middleware framework using distributed CPS for improving resource utilization in smart city. Future Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006. ISSN 0167-739X 14. Lee, W., Kim, S., Lee, Y.-T., Lee, H.-W., Choi, M.: Deep neural networks for wild fire detection with unmanned aerial vehicle. In: 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, pp. 252–253 (2017) 15. Dener, M., Özkök, Y., Bostancıoğlu, C.: Fire detection systems in wireless sensor networks. Procedia – Soc. Behav. Sci. 195, 1846–1850 (2015). https://doi.org/10.1016/j.sbspro.2015. 06.408. ISSN 1877-0428 16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014) 17. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016) 18. Jadon, A., Omama, M., Varshney, A., Ansari, M.S., Sharma, R.: FireNet: a specialized lightweight fire & smoke detection model for real-time IoT applications. In: EEE (2019) 19. Muhammad, K., Ahmad, J., Baik, S.W.: Early fire detection using convolutional neural networks during surveillance for effective disaster management. Neurocomputing 288, 30– 42 (2018) 20. Weinzaepfel, P., Csurka, G., Cabon, Y., Humenberger, M.: Visual localization by learning objects-of-interest dense match regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15634–5643 (2019)
Document and Sentiment Analysis
Norm-Referenced Achievement Grading: Methods and Comparison Thepparit Banditwattanawong1(&) and Masawee Masdisornchote2 1
2
Department of Computer Science, Kasetsart University, Bangkok, Thailand [email protected] School of Information Technology, Sripatum University, Bangkok, Thailand [email protected]
Abstract. Grading informs learners and instructors of both current learning ability levels and necessary improvement. For norm referenced grading, the instructors conventionally use a statistical method. This paper proposes an algorithm for the norm referenced grading. Moreover, the rise of artificial intelligence nowadays makes us curious how a machine learning technique is efficient in the norm referenced grading. We therefore compare the statistical method and our algorithm with the machine learning method. The experiment relies on the data sets of both normal and skewed distributions. The comparative evaluation reveals that our algorithm and the machine learning method yield similar grading results in several cases. On the other hand, in overall, the algorithm, machine learning, and statistical methods produce the best, moderate, and lowest grading qualities, respectively. Keywords: K-means
Z score Clustering Nonbinary grading T score
1 Introduction There are basically two types of nonbinary grading systems [1]: criterion-referenced grading and norm-referenced grading. The former normally calculates the percentage of a learning score and map it to the predefined range of percent to determine a grade. This grading system is suitable for an examination that covers all topics of learning and thus requires long exam taking as well as answer checking times. In contrast, large classes and/or large courses widely use the norm-referenced grading system to meet examtaking time constrains and to save exam-answer checking resources. The system compares the score of each individual to a relative criteria defined based on all individuals’ scores to decide a grade. The criteria is set by a conventionally statistical means either with or without conditions (e.g., a class’s grade point average (GPA) must be kept below 3.25). This paper focuses on an unconditional norm-referenced grading. We separately implement such grading by three attractive means: our proposed algorithm, a conventionally statistical method, and an unsupervised machine-learning technique namely K-means. K-means is a popular clustering algorithm that is easy to apply to grading. The grading results of each approach will be measured and compared to one another based on the practical data sets of various distribution characteristics. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 159–170, 2021. https://doi.org/10.1007/978-3-030-58669-0_14
160
T. Banditwattanawong and M. Masdisornchote
The main contributions of this paper are a simple and efficient grading algorithm and a novel insight into the comparative performance of statistical and machine learning methods and our algorithm in unconditionally norm-referenced grading. To the best of our knowledge, we also demonstrate for the first time the applicability of Kmeans clustering technique for norm-referenced grading. The merit of this paper would help worldwide graders with the selection of a right grading method to meet their objectives.
2 Related Work As for applying a machine learning clustering technique to learners’ achievement, [2] analyzed the performance of students by using k-means to cluster the 10-subject marks of 118 students. The centroid of each cluster was mapped to a grade of total 7 grades ranging from A to G. The resulting grade of each cluster was the performance indicator of students in the cluster. Academic planners could use such an indicator to take appropriate action to remedy the students. Similarly, [3] clustered previous GPAs and internal class assessments (e.g., class test marks, lab performance, assignment, quiz, and attendance) separately by using K-means. Therefore, each student’s performance was associated with several clusters, which were used to create a set of rules for classifying the student’s final grade. In this way, any weak students were identified before final exam to reduce ratio of fail students. [4] employed K-means to create 9 groups of GPAs: exceptional, excellent, superior, very good, above average, good, high pass, pass, and fail. Students whose GPAs belonged to the exceptional and the fail groups were called gifted and dunce, respectively. The gifted students were enhanced of their knowledge whereas the dunce students were remedied through differentiated instruction. [5] clustered students from different countries based on their attributes: average grade, the number of participated events, the number of active days, and the number of attended chapters. They determined an optimal k value of K-means by means of Silhouette index resulting in k = 3. Among the 3 clusters, the most compact cluster (i.e., a cluster with the least value of within cluster sum of square) was further analyzed for correlation between the average grade and the other attributes. [6] utilized K-means to cluster 190 students’ test scores into 4 classes, excellent, good, moderate, and underarchiever, to take the appropriate self-development and teaching strategy of treatment. [7] explored several machine learning techniques for early grade prediction to allow instructors to improve students’ performance in early stages. Restricted Boltzmann Machine was found to be most accurate for students’ grade prediction. Kmeans was also used to cluster students based on technical course and nontechnical course performance. Regarding an automated grading and scoring approach, [8] proposed a peer grading method to enable student evaluation at scale by having students assess each other. Since students are not trained in grading, the method enlisted probabilistic models and ordinal peer feedback to solve a rank aggregation problem. [9] proposed a method to automatically construct grade membership functions, lenient-type grades, strict-type grades, and normal-type grades, to perform fuzzy reasoning to infer students’ scores.
Norm-Referenced Achievement Grading: Methods and Comparison
161
3 Grading Algorithm In this section, we propose an algorithm below for norm-referenced unconditional grading that improves our previous heuristic in [10]. Algorithm 1. Proposed Algorithm.
The algorithm is explained as follows. Line 1 initially ranks the scores of learners within a group from the best down to the worst. Line 2 and line 3 figure out maximum and minimum scores from the rank to decide the best and the worst grades to be assigned to the learners. For example, the performance of the best learner in the group might perform not good enough to deserve A, so nobody receives A. Line 4 counts the number of eligible grades to be assigned. Once the eligible grades is determined, line 5 sequentially goes through the rank to calculate gaps between every pair of contiguous scores. Line 6 sorts the gaps in a descending order. Line 7 selects maximum gaps to be used to define score ranges on line 8 to match the number of eligible grades. For instance, four eligible grades require four score ranges to be defined, thus selectWidestGaps() returns the first three maximum gaps. Finally, line 9 assigns grades based on the ranges. In this way, the algorithm is simple and straightforward. Its performance will be proved in Sect. 6.
4 Statistical Grading A conventionally statistical grading method relies on z scores and t scores [1]. Z score is a measure of how many standard deviations below or above the population mean a raw score is. Z score (z) is technically defined in (1) as the signed fractional number of standard deviation r by which the value of an observation or a data point x is above the mean value l of what is being observed or measured.
162
T. Banditwattanawong and M. Masdisornchote
z¼
xl r
ð1Þ
Observed values above the mean have positive z scores, while values below the mean have negative z scores. T score converts individual scores into standard forms and is much like z score when a sample size is above 30. In psychometrics, T score (t) is a z score shifted and scaled to have a mean of 50 and a standard deviation of 10 as in (2). t ¼ 10 z þ 50
ð2Þ
The statistical grading method begins by converting raw scores to z scores. The z scores are further converted to t scores to simplify interpretation because t scores normally range from 0 to 100 unlike z scores that can be negative real numbers. The t scores are then sorted and a range between maximum and minimum t scores is divided by the desired number of grades to obtain an identical score interval. The interval is used to define the t-score ranges of all grades. In this way, we can map raw scores to z scores, the z scores to t scores, the t scores to t-score intervals, and the t-score intervals to resulting grades, respectively. One advantage when using z score is the skipping of some grades if no score falls in the corresponding t-score intervals of such grades.
5 Machine Learning-Based Grading In this section, we explain how to apply K-means clustering algorithm to grading as the norm-referenced grading is natural to unsupervised learning rather than supervised one. K-means [11] is an unsupervised machine learning technique for partitioning n objects into k clusters. K-means begins by randomizing k centroids, one for each cluster. Assign every object to a cluster whose centroid is nearest to the object. Recalculate the means of all assigned objects within each cluster to serve as k new centroids as barycenters of the clusters. Iterate the object assignment to the clusters and the centroid re-calculation until no more object moves between clusters. In other words, Pk Pnj K-means algorithm aims at minimizing an objective function i¼1 xi cj j¼1 where nj is the number of objects in cluster j, xi = is an object in cluster j whose centroid is cj, and xi cj is Euclidean distance. Also note that the initial centroid randomization can result in different final clusters. When applying K-means algorithm to higher educational grading, k is set to the number of eligible grades. Graders must decide such a number to avoid some best and worst grades if appropriate. The quality of clustering results can be measured by using a well-known metric namely Davies Bouldin index. Let us denote by dkw the mean intra-cluster distance of Pnj xi cj . Let us the points belonging to cluster Cjw to their barycenter cj : dj ¼ n1j i¼1 also denote a distance between barycenters c0j and cj of clusters C0j and Cj by Djj0 ¼ c0j cj .
Norm-Referenced Achievement Grading: Methods and Comparison
163
DBI is figured out by using (3) [12]. The lower DBI, the better clustering results (i.e., low DBI clusters have low intra-cluster distances and high inter-cluster distances). dj þ d0j 1 Xk 0 0 DBI ¼ max 8j 2f1;::;kg^j 6¼j j¼1 k Djj0
! ð3Þ
6 Evaluation We evaluated z score method, our algorithm, and K-means in norm-referenced unconditional grading. We initially describe experimental configuration and data sets’ characteristics. Then, grading results along with performance metrics are provided. 6.1
Experimental Configuration
We used five different data sets of accumulative term scores. They were scored on the scale of 0.0 to 100.0 points and held the practical patterns of distributions. The first data set has normal distribution namely ND. Table 1 shows the raw scores of ND set. Mean and median are 63. Mode is unavailable as every score has the same frequency of 1. Standard deviation (SD) is 13.9.
Table 1. Sorted scores of ND data set. Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score 1 2 3 4 5 6
88 86 84 79 78 77
7 8 9 10 11 12
76 75 74 73 72 67
13 14 15 16 17 18
66 65 64 63 62 61
19 20 21 22 23 24
60 59 54 53 52 51
25 26 27 28 29 30
50 49 48 47 42 40
31
38
Figure 1 projects the normal distribution of ND set. The horizontal axis represents z score. Area under the curve represents normal distribution value computed with (4) [1]. 1 xl 2 1 f ð xÞ ¼ pffiffiffiffiffiffi e2ð r Þ r 2p
ð4Þ
The second and the third data sets have positively and negatively skewed distributions namely SD+ and SD−, respectively. Positively skewed distribution is an asymmetric bell shape skewed to the left probably caused by overly difficult exam questions from the viewpoint of learners. Table 2 shows the raw scores of SD+ set. Mode, median, and mean are 52, 60.9, and 53, respectively. Figure 2 depicts the normal distribution of SD+ set. The skewness equals 1.006.
164
T. Banditwattanawong and M. Masdisornchote
Fig. 1. Distribution of ND data set
Fig. 2. Distribution of SD+ data set
Table 2. Sorted scores of SD+ data set. Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score 1 2 3 4 5 6
92 90 89 86 77 74
7 8 9 10 11 12
73 73 73 65 62 61
13 14 15 16 17 18
60 54 53 53 53 52
19 20 21 22 23 24
52 52 52 52 51 51
25 26 27 28 29 30
51 51 50 50 46 46
31
45
Negatively skewed distribution is an asymmetric bell shape skewed to the right probably caused by overly easy exam questions from the viewpoint of learners. Table 3 shows the raw scores of SD− set. Mode, median, and mean equal 87, 82, and 73.5, respectively. Figure 3 depicts the normal distribution of SD− set. The skewness is −1.078. These 3 data sets contain the same number of raw scores and were synthesized to be realistic and aim at clarifying the extreme behaviors of the three compared methods. The fourth data set RD− was collected from a group of real learners taking the same undergrad course in academic year 2019. Unlike SD+ and SD− that are heavily skewed, RD− (and RD+) represent imperfectly normal distributions (i.e., slightly skewed). The RD− data set has the negative skew of −0.138 as showed in Table 4 and Fig. 4. Mode, median, and mean equal 66.7, 56.6, and 57.9, respectively. The last data set, RD+ , was the real term scores of the other group of learners from another university different from that of RD−. Opposite to RD−, RD+ data set has the slightly positive skew of 0.155. The characteristics of RD+ are shown in Table 5 and Fig. 5. Mode, median, and mean equal 82.5, 66.4, and 65.7, respectively. Table 3. Sorted scores of SD− data set. Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score 1 2 3 4 5 6
94 93 87 87 87 87
7 8 9 10 11 12
86 86 86 85 85 85
13 14 15 16 17 18
84 84 83 82 77 75
19 20 21 22 23 24
74 73 72 65 64 63
25 26 27 28 29 30
62 61 52 50 38 36
31
34
Norm-Referenced Achievement Grading: Methods and Comparison
165
Table 4. Sorted scores of RD− data set. Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score 1 2 3 4 5 6 7 8 9 10 11
80.8 80.2 78.7 76.8 76.1 75.2 75.1 72.5 72.1 71.6 70.8
12 13 14 15 16 17 18 19 20 21 22
70.6 69.1 68.7 68 67.6 66.7 66.7 65.8 63.5 61.6 61.5
23 24 25 26 27 28 29 30 31 32 33
61.4 60.7 60.5 59.2 58.7 58.5 57.8 57.4 56.6 55.7 55.5
34 35 36 37 38 39 40 41 42 43 44
55.5 55.2 55.2 55.1 54.7 53.9 52.6 52.5 51.7 51.3 51
45 46 47 48 49 50 51 52 53 54 55
50.7 50 48.8 48.7 48.6 46.7 46.4 46.2 45 44.9 44.6
56 57 58 59 60 61
44.5 43.5 42 35.7 28.4 28
Table 5. Sorted scores of RD+ data set. Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
89.47 87.1 82.73 82.53 82.53 82.17 80.7 80.5 79.97 79.43 79.3 78.9 78.47 78.27 77.87 77.87 75.73
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
74.57 35 73.3 36 73.2 37 73.1 383 72.83 9 72.63 40 72.1 41 71.83 42 71.77 43 70.8 44 70.4 45 70.23 46 70.2 47 70.2 48 69.43 49 69.17 50 69.17 51
Fig. 3. Distribution of SD-
69.1 68.77 68.6 68.27 67.87 67.77 67.63 67.63 67.57 67.33 67.1 67 66.77 66.73 66.4 66.37 66.37
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
66.1 65.87 65.8 64.77 64.73 64.73 64.57 64.57 64.3 64.17 64.13 63.93 63.9 63.57 63 62.83 60.63
Fig. 4. Distribution of RD−
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
60.33 86 59.83 87 58.93 88 58.87 89 58.53 90 58.47 91 58.27 92 57.53 93 57 94 56.77 95 55 96 54.8 97 54.57 98 54.5 99 54.5 100 54.43 54.37
53.8 53.73 53.37 53.37 52.87 52.47 52.1 52 51.97 51.8 50.9 50.7 50.2 50.1 45
Fig. 5. Distribution of RD+
166
T. Banditwattanawong and M. Masdisornchote
Besides the data sets, we engaged a grading system that evaluated the scores into 5 eligible grades: A, B, C, D, and F without any class GPA constraint. We made an assumption that there was no skipped grade. We realized the grading system in 3 ways by using our algorithm, z score, and K-means separately. The results of each method had their quality measured in DBI metric as if the grades represented distinct clusters. The underlying reason of using DBI as the quality metric in norm-referenced grading is intuitive. Recall that a DBI value becomes low if clusters are small and far from one another. Learners with much similar achievement should receive the same grade (i.e., low intra-cluster distances), and different grades must be able to discriminate achievements between the groups of learners as much clearly as possible (i.e., high inter-cluster distances). To interpret the quality results of each method, the lower DBI, the better method. 6.2
Grading Result
We graded ND data set, having normal curve distribution, by using our algorithm, z score, and K-means and reported their results, respectively, in angle brackets showed in Table 6. The algorithm delivered exactly the same results as K-means method. Their DBIs equaled 0.330. Z score method yielded the equivalent DBI of 0.443. It might be questionable from student viewpoint why graders using z score gave learners who scored 78 and 79 the same grades A as that of 84, and 47 mark holder the same grade F as that of 42. Technically answering, because 78 and 79 fell in the same z-score interval of A while 47 fell in the z-score interval of F. Table 6. Results of 3 grading methods for ND. Score Grade
Score Grade
Score Grade
Score Grade
Score Grade
Score Grade
88
76
61
66
50
38
86
75
60
65
49
84
74
59
64
48
79
73
54
63
47
78
72
53
52
42
77
67
40
62
51
We graded SD+ data set with the algorithm, z score, and K-means as showed in Table 7. The algorithm delivered exactly the same results as K-means method. DBI was 0.222. Z score method gave the equivalent DBI of 0.575. There were many grades F when using z score method. Table 7. Results of 3 grading methods for SD+ . Score Grade
Score Grade
Score Grade
Score Grade
Score Grade
Score Grade
92
73
60
52
51
45
90
73
54
52
51
89
73
53
52
50
86
65
53
52
50
77
61
53
51
46
74
61
52
51
46
Norm-Referenced Achievement Grading: Methods and Comparison
167
Similarly, we graded SD− data set as in Table 8. Again, the algorithm delivered the equivalent DBI of 0.299. Z score’s and K-means method’s DBIs were equally 0.233. Table 8. Results of 3 grading methods for SD-. Score Grade
Score Grade
Score Grade
Score Grade
Score Grade
Score Grade
94
86
84
74
62
93
86
84
73
61
87
86
83
72
52
87
85
82
65
50
87
85
77
64
38
87
85
75
63
36
34
Since in practice, there is no perfectly normal distribution with respect to learners’ achievement, we describe experimental results based on real data sets having slightly skewed distributions. We graded RD− data set with the 3 methods in Table 9. The gaps between two consecutive raw scores were utilized by our algorithm where four widest gaps (italized) were used as grading steps. Table 9. Results of 3 grading methods for RD−. Score
Gap
Grade
Score
Gap
Grade
Score
Gap
Grade
Score Gap
Grade
Score
Gap
80.8
–
68.7
0.4
58.7
0.5
52.6
1.3
45
1.2
Grade
80.2
0.6
68
0.7
58.5
0.2
52.5
0.1
44.9
0.1
78.7
1.5
67.6
0.4
57.8
0.7
51.7
0.8
44.6
0.3
76.8
1.9
66.7
0.9
57.4
0.4
51.3
0.4
44.5
0.1
76.1
0.7
66.7
0
56.6
0.8
51
0.3
43.5
1
75.2
0.9
65.8
0.9
55.7
0.9
50.7
0.3
42
1.5
75.1
0.1
63.5
2.3
55.5
0.2
50
0.7
35.7
6.3
72.5
2.6
61.6
1.9
55.5
0
48.8
1.2
28.4
7.3
72.1
0.4
61.5
0.1
55.2
0.3
48.7
0.1
28
0.4
71.6
0.5
61.4
0.1
55.2
0
48.6
0.1
70.8
0.8
60.7
0.7
55.1
0.1
46.7
1.9
70.6
0.2
60.5
0.2
54.7
0.4
46.4
0.3
69.1
1.5
59.2
1.3
53.9
0.8
46.2
0.2
All 3 methods produced different grading results in overall. Particularly, the algorithm and K-means methods graded A for the same group of learners whereas z score and K-means methods judged F the same group of learners. To evaluate the quality of each method’s results, we consider DBIs: The algorithm had the DBI of 0.375 whereas K-means method and z score method gave the equivalent DBIs of 0.469 and 0.492, respectively. Therefore, the algorithm delivered the best grading results with respect to RD− data set. The algorithm accomplished the lowest DBI because grade D has only one member score. Such a single-member grading result is comparable to the smallest possible cluster, which DBI prefers.
168
T. Banditwattanawong and M. Masdisornchote
We graded RD+ data set with the algorithm, z score, and K-means methods as showed in Table 10. We found that the algorithm, z score method, and K-means method yielded DBIs of 0.345, 0.529, and 0.486, respectively. This means that the algorithm defeated the others. Table 10. Results of 3 grading methods for RD+ . Score
Gap
Grade
Score
Gap
Grade
Score
Gap
Grade
Score
Gap
Grade
Score
Gap
Grade
89.47
–
73.1
0.1
67.63
0.14
64.17
0.13
54.57
0.23
87.1
2.37
72.83
0.27
67.63
0
64.13
0.04
54.5
0.07
82.73
4.37
72.63
0.2
67.57
0.06
63.93
0.2
54.5
0
82.53
0.2
72.1
0.53
67.33
0.24
63.9
0.03
54.43
0.07
82.53
0
71.83
0.27
67.1
0.23
63.57
0.33
54.37
0.06
82.17
0.36
71.77
0.06
67
0.1
63
0.57
53.8
0.57
80.7
1.47
70.8
0.97
66.77
0.23
62.83
0.17
53.73
0.07
80.5
0.2
70.4
0.4
66.73
0.04
60.63
2.2
53.37
0.36
79.97
0.53
70.23
0.17
66.4
0.33
60.33
0.3
53.37
0
79.43
0.54
70.2
0.03
66.37
0.03
59.83
0.5
52.87
0.5
79.3
0.13
70.2
0
66.37
0
58.93
0.9
52.47
0.4
78.9
0.4
69.43
0.77
66.1
0.27
58.87
0.06
52.1
0.37
78.47
0.43
69.17
0.26
65.87
0.23
58.53
0.34
52
0.1
78.27
0.2
69.17
0
65.8
0.07
58.47
0.06
51.97
0.03
77.87
0.4
69.1
0.07
64.77
1.03
58.27
0.2
51.8
0.17
77.87
0
68.77
0.33
64.73
0.04
57.53
0.74
50.9
0.9
75.73
2.14
68.6
0.17
64.73
0
57
0.53
50.7
0.2
74.57
1.16
68.27
0.33
64.57
0.16
56.77
0.23
50.2
0.5
73.3
1.27
67.87
0.4
64.57
0
55
1.77
50.1
0.1
73.2
0.1
67.77
0.1
64.3
0.27
54.8
0.2
45
5.1
7 Finding, Discussion, and Implication
Fig. 6. Detailed performance
Fig. 7. Overall performance
Figure 6 compares all measured DBIs of 3 methods with respect to each data set. Since the lower DBI the better clustering quality, we can draw conclusions here that K-means method is suitable for heavily skewed distribution (i.e., SD+ and SD− data sets) and unfriendly to normal (i.e., ND) and nearly normal (or slightly skewed) distributions (i.e., RD− and RD+). K-means’ DBIs have l = 0.348 and r = 0.112. The algorithm is
Norm-Referenced Achievement Grading: Methods and Comparison
169
generally appropriate for all kinds of distributions (l = 0.314 and r = 0.052.). In contrast, z score method is not recommended for any case (l = 0.454 and r = 0.119.). The absolute degree instead of the positive or negative polar of skewness has impacts on the methods’ grading qualities. Figure 7 comparatively projects the overall performance of each method across all 5 data sets. Our algorithm is optimal while Kmeans method produces the subtly underneath results of about 10.77% higher DBI. Z score method performs worst (44.67% greater DBI than that of the algorithm) mainly because it is totally blind to raw score gaps between different grading levels. The key findings are that the algorithm and K-means method lead to the same grading results based on normal and positively skewed distributions. Z score method and K-means method yield identical grading results based on negatively skewed distribution. K-means method is suitable for skewed distribution (i.e., SD+ and SD−). The implication of these findings is that the algorithm is generally appropriate for all kinds of distributions. When grading the imperfectly-normal-distribution data sets, our algorithm yields the best DBIs followed by K-means method and z score, respectively.
References 1. Wadhwa, S.: Handbook of Measurement and Testing. Ivy Publishing House, New Delhi (2008) 2. Arora, R.K., Badal, D.: Evaluating student’s performance using k-means clustering. Int. J. Comput. Sci. Technol. 4, 553–557 (2013) 3. Borgavakar, S.P., Shrivastava, A.: Evaluating student’s performance using k-means clustering. Int. J. Eng. Res. Technol. 6, 114–116 (2017) 4. Parveen, Z., Alphones, A., Naz, S.: Extending the student’s performance via k-means and blended learning. Int. J. Eng. Appl. Comput. Sci. 2, 133–136 (2017) 5. Shankar, S., Sarkar, B.D., Sabitha, S., Mehrotra, D.: Performance analysis of student learning metric using k-mean clustering approach. In: 6th International Conference - Cloud System and Big Data Engineering, India, pp. 341–345 (2016) 6. Xi, S.: A new student achievement evaluation method based on k-means clustering algorithm. In: 2nd International Conference on Education Reform and Modern Management, pp. 175–178. Atlantis Press, Hong Kong (2015) 7. Iqbal, Z., Qayyum, A., Latif, S., Qadir, J.: Early student grade prediction: an empirical study. In: 2nd International Conference on Advancements in Computational Sciences, Pakistan, pp. 1–7 (2019) 8. Ramen, K., Joachims, T.: Methods for ordinal peer grading. In: 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York, pp. 1037–1046 (2014) 9. Bai, S.M., Chen, S.M.: Automatically constructing grade membership functions for students’ evaluation for fuzzy grading systems. In: 2006 World Automation Congress, Hungary, pp. 1–6 (2006)
170
T. Banditwattanawong and M. Masdisornchote
10. Banditwattanawong, T., Masdisornchote, M.: Norm-referenced achievement grading of normal, skewed, and imperfectly normal distributions based on machine learning versus statistical techniques. In: 2020 IEEE Conference on Computer Applications, Myanmar, pp. 1–8 (2020) 11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, Burlington (2016) 12. Desgraupes, B.: Clustering Indices. University Paris Ouest, France (2017)
Review of Several Address Assignment Mechanisms for Distributed Smart Meter Deployment in Smart Grid Tien-Wen Sung, Xiaohui Hu(&), and Haiyan Ou College of Information Science and Engineering, Fujian University of Technology, Fuzhou, China [email protected], [email protected], [email protected]
Abstract. Deploying wireless objects or devices is a fundamental basis for many network-based applications. The objects could be smart meters in a smart gird, sensors or actuators in a WSN, or IoT things. After the physical deployment of those devices, a network address allocation becomes another essential procedure to enable network communications among the devices for device controls or message delivery purposes. This paper gives a review on several notable address assignment mechanisms by describing the key technique of each of these proposed approaches. The advantages as well as weaknesses are also introduced and a brief comparison is also given in this paper. The review provides a valuable reference in making further improvement of distributed address allocations and a meaningful reference for the relevant applications in the topics of smart grid, sensor networks, and Internet of things. Keywords: Address assignment Distributed smart meter deployment Smart grid
1 Introduction Smart grid technology and application are being adopted for the development of intelligent power systems in many countries [1]. Smart meter deployment is one of the essentials in a smart grid infrastructure. The smart meters are deployed and utilized for measuring energy consumption as well as billing information of smart houses. A robust communication network is fundamental for message delivery among the meters. The communications network possibly used in smart grids is not yet clearly defined and several choices can be selected as needed [2]. There exist three common network structures used in smart meter communications: communication using a mobile network; communication using a data concentrator (DC); and communication using a gateway [3]. Communication technology success depends on various facts such as network medium, topology, address allocation, routing, installation environment, and etc. The one of address allocation is related to the ones of network topology and routing. This paper focuses on the part of network address allocation and provides a review on five notable address assignment mechanisms which were proposed for and © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 171–178, 2021. https://doi.org/10.1007/978-3-030-58669-0_15
172
T.-W. Sung et al.
can be used in the distributed node deployments of smart grid, sensor network, and Internet of things applications. The purpose is to give the relative concepts of these mechanisms for further improvement and employment in smart grid applications.
2 Related Works Nowadays wireless communication has been utilized in various network-based applications. It is also employed in the infrastructure of smart grids in which smart meters are interconnected with wireless connections [4, 5]. ZigBee, WiFi, Bluetooth and Cellular networks are available wireless technologies for using in smart grids [6, 7]. The tree-based or cluster-based networking methods are usually utilized as a basis for message routing in the networks which consist of smart meters, data aggregation points, gateways, and the control center [8]. Regarding to the network address allocation, there have been several notable approaches proposed for these kinds of networks which consist of wireless distributed devices. To provide a research basis and a valuable reference for making improvement of distributed address allocation and adoption to relevant applications with smart meter networking, this paper reviews the proposed Distributed Address Assignment Mechanism [9], ZigBee Adaptive Joining Mechanism (ZAJM) [10], Expanded Distributed Address Assignment Mechanism (EDAAM) [11], Multi-step Distributed Address Assignment Mechanism (M-DAAM) [12], and Zigbee Connectivity Enhancement Mechanism (ZCEM) [13]. The key technique, advantage, and weakness of these five mechanisms are introduced and described in Sect. 3, respectively.
3 Network Address Allocation Approaches This section reviews and describes five network address allocation approaches which are applicable to a variety of well-known wireless ad hoc networks, includes the distributed networks of wireless smart meters, sensors, IoT things, etc. 3.1
Distributed Address Assignment Mechanism (DAAM)
The Distributed Address Assignment Mechanism is defined in the ZigBee specification [9]. The address allocation is hierarchical and distributed, thus a tree-based topology. A parent node owns a block of network addresses and will allocate a sub-block of the addresses to its child node which is a potential parent node of a possible subtrees. Once the child become a parent node of a subtree, it will do the same step to allocate addresses to its children. If it is a certainty that the child of a parent node is a leaf node, the parent will allocate a single address instead of a sub-block of addresses to the child. In ZigBee networks, there is an algorithm to determine the range of an address subblock or the value of a single address which will be allocated. The algorithm is based on the device types and the network configuration parameters. The device types are ZigBee Coordinator (ZC), ZigBee Router (ZR), and ZigBee End Device (ZED). The ZigBee coordinator, router, and end device can be treated as a control center, data
Review of Several Address Assignment Mechanisms
173
aggregation point, and smart meter in a neighborhood area network, respectively. A ZED has no capability to accept a connection request from any other node. The network configuration parameters are nwkMaxChildren (Cm), nwkMaxRouters (Rm), and nwkMaxDepth (Lm). The address assignment can be achieved by using the formulas: ( Cskip ðd Þ ¼
1 þ Cm ðLm d 1Þ; Rm ¼ 1 Lm d1 1 þ Cm Rm Cm Rm ; otherwise 1Rm
An ¼ Aparent þ Cskip ðd Þ Rm þ n
ð1Þ ð2Þ
A parent node will assign an address that is one greater than its own to its first ZR child. Later addresses assigned to other ZR children are separated from each other by the value of Cskip ðd Þ shown in Eq. (1). Network addresses should be assigned to nth ZED children in sequential numbers of the value of An shown in Eq. (2). Fig. 1 shows an example of address allocation by using the DAAM.
Fig. 1. An example of DAAM
The advantage of DAAM is that the addresses are computable and unique. DAAM is designed and suitable for tree routing. The weakness of DAAM is that it is possible to cause an address space waste in some cases, for example, the number of nodes in a subtree is less than the number of reserved addresses in an address block for that subtree. Moreover, DAAM could cause an orphan node problem due to the network constraints of its configuration parameters.
174
3.2
T.-W. Sung et al.
ZigBee Adaptive Joining Mechanism (ZAJM)
The ZigBee Adaptive Joining Mechanism approach [10] also basically follows the DAAM address assignment, but it has an additional mechanism, called connection shifting, to mitigate the orphan node problem in DAAM. The basic idea of ZAJM is that one node may have more than one parent candidates to choose as its parent. In this case, the node can change its connection target from original parent to another parent, then the address assigned by the original parent can be released and assigned to another new node. Fig. 2 illustrates the connection shifting of ZAJM and the reduction of orphan nodes. It can be found that nodes P, D, M, N are orphan nodes because no any potential parent can accept their join requests and assign addresses to them due to the network constraints of configuration parameters nwkMaxChildren (Cm), nwkMaxRouters (Rm), and nwkMaxDepth (Cm). But it also can be found that node R can change it parent from Z to G, then Z can accept the join request of P. Node Z takes the address of R back and assigns it to P. Similarly, node E can change it parent from A to G, then A can take the address block back and accept the join request of D. Moreover, once D has connected to A and obtained an address block, it can accept the join requests of M and N. In this case, the connection shifting of node E (from A to G) reduces three orphan nodes and raises the connection ratio.
Fig. 2. An example of DAAM
One of the advantages of ZAJM is that ZAJM can decrease the number of orphan nodes, therefore the ratio of connected nodes is increased. In other words, the utilization of address space is improved. Another advantage is that ZAJM usually makes the sizes of subtrees more balanced when the connection shifting is performed. That is to say, the loads among parent nodes are more balanced. The most important advantage is that ZAJM fully retains the rules and feature of tree routing. The weakness of ZAJM is that the address of a node will change while the node performs the connection shifting mechanism.
Review of Several Address Assignment Mechanisms
3.3
175
Expanded Distributed Address Assignment Mechanism (EDAAM)
The Expanded Distributed Address Assignment Mechanism [11] is similar to the stochastic addressing scheme [12], which generally follows the rules of DAAM if no addressing failure occurs. The difference between EDAAM and the stochastic addressing scheme is that once the addressing failed by DAAM, the stochastic addressing scheme uses stochastic numbers generated from the unreserved addresses but EDAAM uses one more DAAM-based address block of the unreserved addresses to allocate addresses. Both of EDAAM and the stochastic addressing scheme have an essential precondition to perform with desired effects. That is the maximum number of the addresses belong to the address block reserved by DAAM networking parameters (nwkMaxChildren, nwkMaxRouters, and nwkMaxDepth) should be much less than the entire address space. Sufficient unreserved addresses are necessary. Under this precondition, the advantage of improving utilization of overall address space can be achieved. Besides the requirement of sufficient unreserved addresses, another weakness of EDAAM is that additional routing tables are needed for routing among different DAAM-based address blocks. Fig. 3 illustrates an example of EDAAM.
Fig. 3. An example of EDAAM
3.4
Multi-step Distributed Address Assignment Mechanism (M-DAAM)
The Multi-step Distributed Address Assignment Mechanism [13] is basically like DAAM that each router allocates its own sub-block addresses to its children. To reduce the number of useless addresses of a router, M-DAAM uses a multi-step method and adjusts the network configuration parameters to improve the connection ratio. The
176
T.-W. Sung et al.
network parameters are able to be changed in the stepwise method of M-DAAM. The type of a node can be selected to change to a router or end device, therefore the network depth could increase. After the change of nwkMaxDepth, M-DAAM goes into another step to adjust nwkMaxRouters(d) and nwkMaxChild(d), thus the maximum routers and children that a router with depth d can support. The adjustment is based on the depth of the network. A lower depth brings a higher value of nwkMaxRouters(d). Conversely, a higher depth brings a lower value of nwkMaxRouters(d). In addition, the routers have no any child will be changed to a logical node type of end device. This decreases the useless routers and reduces the waste of address space. The advantage of M-DAAM is the variable network parameters which can adjust the network topology to improve the utilization of entire network space. It is applicable for a large-scale network and alleviates the orphan node problem. The weakness is the assumption of that the node type of a node can be changed. The required memory size of each node also increases for the multi-step process. 3.5
Zigbee Connectivity Enhancement Mechanism (ZCEM)
The Zigbee Connectivity Enhancement Mechanism [14] is a connection shifting-based approach similar to the ZAJM. The ZECM refines the connection shifting mechanism of ZAJM by concerning the issue of multiple potential parents. It is possible that after two or more potential parents performing connection shifting processes, an orphan node has multiple candidates to choose for joining network and getting an address. The orphan nodes in ZAJM use a first-come-first-served (FCFS) strategy to choose their parent while all the candidates periodically broadcast beacons. Although the decision making is fast, sometimes the result of the choice may not be the best. Accordingly, ZCEM proposed an improvement probability (IP) calculation for each node which sums the remaining capacity ratio of the corresponding subtree. An orphan node will determine its parent by choosing the potential parent which will bring a most increased improvement probability after the network joining. The simulation results of ZCEM indicates the orphan nodes can be well reduced and the connection ratio of ZCEM is a little better than the one of ZAJM. Thus, the utilization of network address space is improved. The weakness is the overhead of improvement probability calculation and the latency of determining the parent. An orphan node needs to wait for a certain duration to receive the related information from all the candidates for the decision making.
4 Comparison Table 1 shows the relative comparison of the above described address allocation approaches. The comparative description is made based on several aspects: (1) whether or not the approach is suitable for cooperating with a tree routing scheme? (2) is the complexity low, moderate, or high when the approach performs? (3) is the scalability low, moderate, or high when the approach applies to a larger-scale network? (4) what is the main weakness of the approach? It is shown that the routing used in EDAAM should be a partially tree routing. EDAAM is difficult to perform a fully tree routing
Review of Several Address Assignment Mechanisms
177
because once the addressing failed by normal DAAM it uses one more additional DAAM-based address block of the unreserved addresses to allocate addresses. This breaks the regular address numbering rule used in tree routing. EDAAM and MDAAM have a higher complexity than the ones of the other approaches since they generally have more operations and overheads. For example, the complexity caused by the adjustment and reconfiguration of the networking parameters in M-DAAM. However, this brings M-DAAM a higher scalability than the ones of the other approaches. Thus, M-DAAM performs well in a large-size distributed network. Table 1. A brief comparison of the reviewed address allocation approaches. Approach
Relative complexity Low
Scalability
Weakness
DAAM
Tree routing Yes
Moderate
ZAJM
Yes
Low
Moderate
EDAAM
Moderate
Moderate
M-DAAM
Partially Yes Yes
Moderate
High
ZCEM
Yes
Low
Moderate
Limitation caused by networking parameters Address change caused by connection shifting Requirement of enough unreserved addresses Assumption of changeable node type of a node Improvement Probability (IP) calculation and acquisition
5 Conclusion There are various wireless ad hoc networks such as a smart meter network, sensor network, Internet of things, etc. These kinds of networks usually consist of many wireless and distributed devices deployed in a certain area or region. The message delivery among these devices relies on a constructed and configured network, and a network address allocation method should be designed and performed for the communication network. This paper reviews on the proposed network addressing approaches: DAAM, ZAJM, EDAAM, M-DAAM, and ZCEM, and then give an introduction, illustration, and brief comparison for these address allocation schemes. The review can provide an important reference for further improvement and advanced utilization in related applications. In future works, this preliminary survey result can facilitate our research project about the smart grid applications based on the smart meter network. Acknowledgement. This work is supported by the Fujian Provincial Natural Science Foundation in China (Project Number: 2017J01730), the Fujian University of Technology (Project Number: GY-Z18183), and the Education Department of Fujian Province (Project Number: JT180352).
178
T.-W. Sung et al.
References 1. Chan, J., Ip, R., Cheng, K.W., Chan, K.S.P.: Advanced metering infrastructure deployment and challenges. In: Proceedings of the 2019 IEEE PES GTD Grand International Conference and Exposition Asia (GTD Asia), Bangkok, Thailand, 19–23 March 2019, pp. 435–439 (2019) 2. Abdulla, G.: The deployment of advanced metering infrastructure. In: Proceedings of the 2015 First Workshop on Smart Grid and Renewable Energy (SGRE), Doha, Qatar, 22–23 March 2015, pp. 1–3 (2015) 3. Chren, S., Rossi, B., Pitner, T.: Smart grids deployments within EU projects: the role of smart meters. In: Proceedings of the 2016 Smart Cities Symposium Prague (SCSP), Prague, Czech Republic, 26–27 May 2016, pp. 1–5 (2016) 4. Aboelmaged, M., Abdelghani, Y., Abd El Ghany, M.A.: Wireless IoT based metering system for energy efficient smart cites. In: Proceedings of the 2017 29th International Conference on Microelectronics (ICM), Beirut, Lebanon, 10–13 December 2017, pp. 1–4 (2017) 5. Dhivya, M., Valarmathi, K.: IoT based smart electric meter. In: Hemanth, D., Kumar, V., Malathi, S., Castillo, O., Patrut, B. (eds.) Emerging Trends in Computing and Expert Technology, COMET 2019. Lecture Notes on Data Engineering and Communications Technologies, vol. 35, pp. 1260–1269. Springer, Cham (2019) 6. Hlaing, W., Thepphaeng, S., Nontaboot, V., Tangsunantham, N., Sangsuwan, T., Pira, C.: Implementation of WiFi-based single phase smart meter for internet of things (IoT). In: Proceedings of the 2017 International Electrical Engineering Congress (iEECON), Pattaya, Thailand, 8–10 March 2017, pp. 1–4 (2017) 7. Burunkaya, M., Pars, T.: A smart meter design and implementation using ZigBee based wireless sensor network in smart grid. In: Proceedings of the 2017 4th International Conference on Electrical and Electronic Engineering (ICEEE), Ankara, Turkey, 8–10 April 2017, pp. 158–162 (2017) 8. Wang, G., Zhao, Y., Ying, Y., Huang, J., Winter, R.M.: Data aggregation point placement problem in neighborhood area networks of smart grid. Mob. Netw. Appl. 23(4), 696–708 (2018) 9. Alliance, Z.: ZigBee Specification (Document 053474r13), 1 December 2006 10. Sung, T.W., Yang, C.S.: An adaptive joining mechanism for improving the connection ratio of ZigBee wireless sensor networks. Int. J. Commun Syst 23(2), 231–251 (2010) 11. Hwang, H., Deng, Q., Jin, X., Kim, K.: An expanded distributed address assignment mechanism for large scale wireless sensor network. In: Proceedings of the 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing (WiCom), Shanghai, China, 21–23 September 2012, pp. 1–3 (2012) 12. Kim, H.S., Yoon, J.: Hybrid distributed stochastic addressing scheme for ZigBee/IEEE 802.15.4 wireless sensor networks. ETRI J. 33(5), 704–711 (2011) 13. Kim, H.S., Bang, J.S., Lee, Y.H.: Distributed network configuration in large-scale low power wireless networks. Comput. Netw. 70, 288–301 (2014) 14. Chang, H.-Y.: A connectivity-increasing mechanism of ZigBee-based IoT devices for wireless multimedia sensor networks. Multimed. Tools Appl. 78(5), 5137–5154 (2017). https://doi.org/10.1007/s11042-017-4584-2
An Approach for Sentiment Analysis and Personality Prediction Using Myers Briggs Type Indicator Alàa Genina1(&), Mariam Gawich2(&), and Abdelfatah Hegazy1(&) 1
College of Computing & Information Technology, Arab Academy for Science, Technology & Maritime Transport, Sheraton, Cairo, Egypt [email protected], [email protected] 2 GRELITE, Université Française en Egypte, Cairo, Egypt [email protected]
Abstract. Due to the rapid development of “Web 5.0”, in the last few years, researchers have started to pay attention to social media using Personality Prediction and Sentiment Analysis. However, due to the high costs and the privacy of these datasets, this paper presents a study on Sentiment Analysis and Personality Prediction through Social Media using Myers–Briggs Type Indicator (MBTI) personality assessment test to analyze textual data with the use of different classification algorithms. The data are collected from Kaggle with approximately 8600 rows of Twitter data. The system is tested using 25% of the dataset and the remaining 75% is for the training set. The results show an average accuracy rate of 78.2% with the use of different classification algorithms, and a 100% accuracy rate using the Random Forest (RF) and Decision Tree classifiers. Keywords: Data mining Text mining Sentiment Analysis Emoticons analysis MBTI personality prediction Machine learning Classification techniques
1 Introduction Due to the rise of virtual assistants, the upcoming intelligent web “Web 5.0” will see that applications are able to interpret information on more complex levels, emotionally as well as logically. Social Media is a place, which is designed to allow people share, comment, and post according to their beliefs and thoughts through websites and applications [1]. This is the reason for increasing the amount of sentiment. Opinion Mining or Sentiment Analysis (SA) is the way of discovering people’s feelings, opinions, and emotions through a service or a product with the use of natural language processing (NLP) to determine whether it is positive, negative, or neutral [2]. Sentiment analysis can be applied by these approaches: Machine Learning, Lexicon-based, and hybrid approach. For machine learning, it utilizes different features to build a classifier to characterize the text which expresses the sentiment from supervised and
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 179–186, 2021. https://doi.org/10.1007/978-3-030-58669-0_16
180
A. Genina et al.
unsupervised data. For the lexicon-based approach, it utilizes a wide range of words suspended by polarity score in order to determine the general assessment score of the content given. The main asset of this technique is that it does not require any training data [3]. Emoticons as well are used to create pictorial icons that display a sentiment or emotion with letters, numbers, and punctuation marks [4]. Personality takes into account various dimensions to define its type based on the Myers-Briggs Type Indicator (MBTI), which are Introversion (I) – Extroversion (E), Sensing (S) – Intuition (N), Thinking (T) – Feeling (F), and Judging (J) – Perceiving (P). These four dimensions lead to 16 types of personality; each type consists of four letters that help in predicting the human personalities and the interactions between them such as (ISTJ, ISFJ, INFJ, INTJ …). The personality assessment will help the market place and organizations to know customers’ feedback on a product or a service. For this reason, the Sentiment Analysis techniques can facilitate the personality assessment through the texts and the emoticons that have been expressed on the social media websites [3]. The purpose of this work is to use the Sentiment Analysis for emoticons expressed and to predict type of personality using MBTI personality assessment. This paper is organized as follows: Sect. 2; provides the Literature Review. Section 3; proposes a model for the approach. Section 4; demonstrates the traditional machine learning algorithms approaches. Section 5; presents the conclusion.
2 Background and Related Work Due to the data available through social networking websites, people could share their feelings, opinions and emotions, which drove researchers to experiment the Sentiment Analysis and Personality Type to analyze and predict the behavior of users. In the early 2000s, as a way to recognize and analyze opinions and feelings, sentiment analysis and opinion mining have been introduced. According to various levels (document level, sentence level, feature level, and word level) of analysis, the sentiment analysis can be applied. At the Document Level, the sentiment of the full document would be taken in order to locate the general sentiment of each document. For the Sentence Level, it’s same as the document level, but each sentence is considered individually when calculating sentiment. The Feature Level sentiment is measured on the attribute level, particularly when applying sentiment analysis to customers’ or products’ feedback [3, 5, 6]. Personality prediction has different types of assessments such as (MBTI, The Winslow Personality Profile, Disc Assessment, Big Five, The Holtzman Inkblot Technique, Process Communication Model, etc.). There are many machine learning algorithms that can be applied to predict the type of personality. Authors in [Sentiment Analysis of Teachers Using Social Information in Educational Platform] have created an automatic Sentiment Analysis System that analyzes textual reviews (in Greek language) and determines the users’ attitude and their satisfaction [5].
An Approach for Sentiment Analysis and Personality Prediction
181
Authors in [Reddit: A Gold Mine for Personality Prediction] have extracted a number of linguistic and user activity features across the MBTI dimensions. Also, they have evaluated a number of benchmark models achieved through the use of machine learning algorithms and achieved a macro F- scores between 67% and 82% on the individual dimension and 82% accuracy for exact or one-off accurate type prediction [7]. Authors in [Machine Learning-Based Sentiment Analysis for Twitter Accounts] have done a comparison and figured out that TextBlob and WordNet use word sense disambiguation (W-WSD) with greater accuracies. They have used machine learning techniques (Naïve Bayes, SVM), sentiment lexicons (W-WSD, Senti Word Net) with the use of python code, Tweepy API, and Weka tool [8]. Authors in [Sentiment Analysis on Social Media using Morphological Sentence Pattern Model] proposed an approach has been useful in solving the partial matching problem and mismatching problem [9]. Authors in [A Comparative Study of Different Classifiers for Automatic Personality Prediction] have compared the results in several classifiers provided in Weka based on correctly classified instances, F-measure, time taken, mean errors, and Kappa statistics through the data of undergraduate students extracted from Twitter [10]. The aim of this work is to compare the performance of different machine learning (ML) algorithms, in order to get the sentiment of tweets taking in account the emoticons expressed in each tweet by converting them into text to get the polarity is it positive, negative, or neutral and predicting the type of personality based on MBTI personality assessment.
3 Proposed Model for Predicting Personality Using Sentiment Analysis and MBTI The model purposed is to analyze tweets collected from Twitter dataset to predict the sentiment of these tweets; taking into consideration the emoticons expressed in each tweet and to predict the type of personality using MBTI personality assessment. Different techniques will be applied in order to determine the most efficient one. Figure 1 shows the proposed system, which includes seven components: Data collection, Converting Emoticons to Text, Pre-processing, Feature Extraction, Sentiment Analysis, Classification, and the Algorithm Stage, which is composed of a comparison between different classifiers in order to predict the personality type.
182
A. Genina et al.
Fig. 1. The proposed model for predicting personality using sentiment analysis & MBTI.
3.1
Data Collection Phase
The data has been collected from Kaggle, which includes approximately 8600 rows of data. The data contains two columns. The first one is the Type, which consists of four letters of MBTI Type/Code of this person. The second one is the Posts, which includes the last 50 tweets that have been posted with a separation of “|||” (3 pipe characters) [11]. 3.2
Converting Emoticons to Text and the Pre-processing Phase
A dictionary has been made containing all the emoticons to convert them into text with the use of Kaggle Cloud, Python, and Anaconda Software i.e. converting: D into a smiley face. The Pre-processing Phase is the most important one in the data mining process, which may affect the way of the final data pre-processing outcomes [12]. It includes the following steps: – Removal of Noise; to remove characters digits that can intervene to text analysis. i.e. removing all the special characters except hash tags in tweets.
An Approach for Sentiment Analysis and Personality Prediction
– – – – – – – –
3.3
183
Removal of Punctuation i.e. “,.” “:;” Removal of Numbers. Removal of the re-duplicated tweets. Using Lowercase; lowercasing all the text data. i.e. CANADA = canada Removal of Stop words, to remove all the commonly used words in English language. i.e. the movie was great = movie great. Using Lemmatization; to return all the words to its root. i.e. Troubles = Trouble Using Bag-of-words; which is a model for classification, where the frequencies are utilized as a feature for training a classifier. Using Tokenization; to break up the strings into pieces such as phrases, words, symbols, keywords and other elements called “Tokens”. Feature Extraction Phase
This step is to analyze the tweets in order to determine certain features. Each one is symbolized by a vector to be understood by the classifier. – Count Vectorizer (CV): it is a simple way to implement vocabulary of known words, tokenize a series of text documents and to encode new documents utilizing such vocabulary. – TFIDF: learn the vocabulary, tokenize documents and inverse the weightings of document frequency, and permit to encode new documents [13]. 3.4
Sentiment Analysis and the Classification Phase
Sentiment Analysis (SA) is the process of analyzing the posts to detect the polarity; is it positive, negative, or neutral. In order to classify the tweets, different techniques have been applied such as: XGBoost, Stochastic Gradient Descent (SGD) classifier, Decision Tree, K-nearest Neighbors (KNN), Naïve Bayes, Logestic Regression (LR), and Random Forest (RF). 3.5
Algorithm Stages
The model is implemented through the use of Python in Anaconda Software and Kaggle Cloud. As shown in Fig. 1, the algorithm consists of six stages: – – – –
Importing the dataset. Converting emoticons into text. Pre-processing or Cleansing and creating a tidy dataset. Feature Extraction.
184
A. Genina et al.
– Sentiment Analysis through detecting the polarity of each post. – Predicting Personality to understand the behavior or language styles of each person through MBTI personality assessment; for example, to help the market place if there is a company which wants to know the feedback of their customers on a product or services. The following parameters have been used to evaluate the performance of the proposed model: – Accuracy: It is the ratio of correctly predicted observations (true positive and true negative) to the total number of observations. Accuracy = TP + TN/TP + FP + FN + TN – Recall (Sensitivity): It is the ratio of true positive observations to all the true positive and false negative observations. Recall = TP/TP + FN – Specificity: It is the ratio of true negative observations to all the true negative and false positive observations. Specificity = TN/TN + FP – Precision: It is the ratio of true positive observations to the total number of true positive and false positive observations. Precision = TP/TP + FP Whereas: – True Positive (TP): Is the result where the approach correctly predicts the positive instances. – True Negative (TN): Is the result where the approach correctly predicts the negative instances. – False Positive (FP): Is the result where the approach incorrectly predicts the positive instances. – False Negative (FN): Is the result where the approach incorrectly predicts the negative instances [14].
4 Experimental Results and Discussion The dataset used consists of two columns: The type of personality and the last 50 tweets separated by (3 pipe characters) “|||” between each tweet. Accordingly, different classifiers as: XGBoost, SGD, Random Forest (RF), Logistic Regression (LR), KNN, Naïve Bayes, and Decision Tree have been used in order to predict the personality type using MBTI personality assessment; taking into consideration the Emoticons expressed in each tweet by converting them into a text through creating a dictionary including Emoticons and its similar texts. The system is tested using 25% of the dataset and the remaining 75% is for the training set.
An Approach for Sentiment Analysis and Personality Prediction
185
Table 1. Classification of different classifiers. Classifiers Sensitivity% XGBoost Classifier 87.27% Stochastic Gradient Descent (SGD) 82.26% Random Forest (RF) 92.29% Logistic Regression (LR) 84.58% KNN 69.23% Naïve Bayes 63.39% Decision Tree Classifier 62.35%
Specificity% 30.49% 40.3% 18% 38.6% 43.5% 54.5% 42.7%
Precision% 66.81% 68.85% 64.35% 68.86% 66.30% 69% 63.58%
The results show that for the Sensitivity, Random Forest has the highest percentage with 92.29%, followed by XGBoost classifier with 87.27%. Finally, Naïve Bayes has the highest specificity with 54.50% and precision with 69%.
100%
76.19%
XGBoost Classifier
72.81%
StochasƟc Gradient Descent (SGD)
100%
72.41%
Random Forest (RF)
LogisƟc Regression (LR)
78.20% 66.45%
KNN
Naïve Bayes Decision Tree Classifier
Fig. 2. The accuracy rate of different classifiers
The performance was examined with the use of different classifiers. The results show that the Decision Tree and Random Forest (RF) have the highest accuracy with 100%, followed by KNN with 78.2%. After that, the XGBoost with 76.19% and SGD reported the fourth-best performance with 72.81% accuracy rate. Finally, Naïve Bayes reports the lowest accuracy.
186
A. Genina et al.
5 Conclusion This paper presents an approach to classify the sentiment of tweets whether they are positive, negative, or neutral and predicts the type of personality based on MBTI personality assessment along with a comparison between different machine learning classifier results in order to understand the behaviour of users, which will assist the organization; for example, to know the feedback on a product or a service. According to Table 1 and Fig. 2, it is found that the KNN classifier has an average accuracy percentage with 78.2%, SGD has 82.26% as a Sensitivity and Specificity, and XGBoost has 66.81% and KNN 66.30% as Precision. The experimental results show that the XGBoost classifier has improved the model performance and speed. Further work aiming to increase the number of data, and use the deep learning techniques.
References 1. Bayu, P., Riyanarto, S.: Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In: Proceedings of International Conference on Data and Software Engineering (ICoDSE), Yogyakarta, Indonesia (2015) 2. http://medium.com/retailmenot-engineering/sentiment-analysis-series-1-15-min-reading-b80 7db860917. Accessed 21 Sept 2018 3. Alàa, G., Mariam, G., Abdel fatah, H.: A survey for sentiment analysis and personality prediction for text analysis. In: The First World Conference on Internet of Things: Applications & Future (ITAF 2019), Springer, Singapore, April 2020 4. http://britannica.com/story/whats-the-difference-between-emoji-andemoticons. Accessed 22 Dec 2019 5. Nikolaos, S., Isidoros, P., Iosif, M., Michael, P.: Sentiment analysis of teachers using social information in educational platform environments. Int. J. Artif. Intell. Tools 29(2), 1–28 (2020) 6. Murugan, A., Chelsey, H., Thomas, N.: Modeling text sentiment: learning and lexicon models. Adv. Anal. Data Sci. 2, 151–164 (2018) 7. Matej, G., Jan, Š.: Reddit: a gold mine for personality prediction. In: Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media New Orleans, Louisiana, pp. 87–97, June 2018 8. Ali, H., Sana, M., Ahmad, K., Shahaboddin, S.: Machine learning-based sentiment analysis for Twitter accounts. Math. Comput. Appl. 23(11), 1–15 (2018) 9. Youngsub, H., Kwangmi, K.: Sentiment analysis on social media using morphological sentence pattern model. In: 15th International Conference on Software Engineering Research, Management and Applications (SERA), London, UK 2017 10. Nor, N., Zurinahni, Z., Tan, Y.: A comparative study of different classifiers for automatic personality prediction. In: 6th IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia, November 2016 11. (MBTI) Myers-Briggs Personality Type Dataset | Kaggle. Kaggle.com. https://www.kaggle. com/datasnaek/mbti-type. Accessed 9 Sept 2018 12. Wikipedia. https://en.wikipedia.org/wiki/Data_pre-processing. Accessed 22 Dec 2019 13. Machine learning mastery. https://machinelearningmastery.com/prepare-text-data-machinelearning-scikit-learn/. Accessed 23 Dec 2019 14. Machine learning crash course. https://developers.google.com/machine-learning/crashcourse/classification/true-false-positive-negative. Accessed 27 May 2020
Article Reading Sequencing for English Terminology Learning in Professional Courses Tien-Wen Sung1, Qingjun Fang2(&), You-Te Lu3, and Xiaohui Hu1 1
College of Information Science and Engineering, Fujian University of Technology, Fuzhou, China [email protected], [email protected] 2 College of Civil Engineering, Fujian University of Technology, Fuzhou, China [email protected] 3 Department of Information and Communication, Southern Taiwan University of Science and Technology, Tainan, Taiwan [email protected]
Abstract. Reading is one of the key methods to learn English, especially for the EFL (English as Foreign Language) students. For the purpose of assisting students in learning the English terminology of a professional (technical) course, this study aims at finding reading sequences among pre-collected articles and determining one ending article for each reading sequence. The topics or contents described in the articles are all highly related to the specific professional course. The determination of the ending article depends on the relations among the articles. The relation is defined by professional terminology overlap between articles. Floyd-Warshall algorithm and clustering concepts are utilized in the recommendation of the ending article and reading sequences. Brief simulation results of the algorithm are also illustrated in the paper. Keywords: Reading sequencing Warshall algorithm Clustering
Terminology English learning Floyd-
1 Introduction With the rapid development of science and technology, professional knowledge changes with each passing day, among which quite a lot of professional information and knowledge are written and published in English. In order to strengthen ESL (English as Second Language) or EFL (English as Foreign Language) [1] students’ acquisition and absorption of professional knowledge not covered in the traditional teaching materials, improve students’ reading and understanding ability of professional English literature, and meet the requirements of enterprise development, it is necessary to adjust the current teaching mechanism and personnel training mode for the major courses by enhancing the reading and learning of English information and knowledge literatures, cultivating students’ ability and habit of English reading, and improving their self-competitiveness. With the development and popularization of the Internet and information technology, it is necessary to add the assistance of network and information technology into learning activities, and properly combine the technology and © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 187–194, 2021. https://doi.org/10.1007/978-3-030-58669-0_17
188
T.-W. Sung et al.
good learning guidance, which can not only make students more convenient to understand and obtain professional knowledge written in English, but also strengthen students’ English ability. All the English word meaning, word usage, sentence composition, sentence meaning, article structure and professional knowledge can be further learned and understood [2]. Accordingly, this study combines the extracurricular learning tasks to a professional course and proposes an article reading sequencing mechanism for students to read English professional articles in a better sequence. The reading guidance is based on a collection of English literatures of the professional course. After the analysis of terminology of the articles, an appropriate reading sequence will be obtained and suggested to students to obtain a better learning efficiency of English terminology and reading ability of the professional course.
2 Related Works Information and communication technology (ICT) has been widely used in modern learning platforms and environments, especially utilized in English language learning [3]. There are different types of technology introduced to improve English learning. For instances, social networking services (SNS) can be used in second and foreign language teaching and learning [4, 5]. Gamification is also a good model for English teaching and learning [6]. Virtual Reality (VR) technology has become an innovative multimedia-based language learning assistance [7]. Mobile devices and wireless communication technology are employed for mobile or ubiquitous English learning [8], and location-based services (LBS) can provide good applications in this kind of learning environments [9]. Moreover, some related works carried out the research on reading activity in English learning [9, 10]. In this kind of learning approaches, artificial intelligence (AI) algorithms can be utilized to provide the functions of classifying learning contents, recommending appropriate articles to read, giving optimal reading sequences to learners, and etc. This paper also focuses on the reading sequence recommendation and employs Floyd-Warshall algorithm [11] and the concept of clustering to find reading sequences among collected articles. The articles are used to the improvement of acquiring and learning English terminology of a professional course for EFL learners.
3 Article Reading Sequencing The basic method of improving the English terminology learning of a professional course for the EFL students is to read a series of English articles containing related contents about the course. An assistance reading/learning tool or platform is also essential for an information technology enhanced learning environment, as shown in Fig. 1. The system provides the functions such as translation, learning portfolio recording, and etc. This paper focuses on the approach of article reading sequencing and does not describe the reading assistance system in detail. Before the system giving a recommendation of article reading sequence, enough number of articles containing the professional contents should be collected in advance. The sources could be Wiki
Article Reading Sequencing for English Terminology Learning
189
pages, book chapters, magazine articles, and etc. Moreover, a set of terminology terms related to the professional course should also be defined in advance.
Fig. 1. A basic learning system
3.1
Article Relation
The basic rule of article sequencing used in this study is that two articles with the most terminology overlap is preferred adjacent in the sequence. This will strengthen the memory of terminology terms and its corresponding knowledge read in the articles. The terminology overlap is treated as the relation between two articles. Let A ¼ fa1 ; a2 ; . . .; an g be the set of collected articles containing related contents of the professional course, where n is the number of articles. Let T ¼ ft1 ; t2 ; . . .; tm g be the set of terminology terms related to the course, and m is the number of terms. The article relation between ai and aj is denoted by rij and defined as: 1 rij ¼ Ti \ Tj \ T 2
1 1 þ jTi \ T j Tj \ T
! ð1Þ
where Ti and Tj represent the sets of terminology terms appear in the articles ai and aj , respectively. If rij is a low value, it means that there is a little relation between the article ai and aj . In the case, it is not appropriate to read aj immediately after reading ai . The possible reading sequence could be aj !ak !ai if the values of rik and rkj are high enough. This study defines a threshold value, denoted by hR . If rij hR , it indicates that the relation between the articles ai and aj is high. Otherwise, it is low and one or more intermediate articles are needed to be read after ai and before aj . To satisfy a reading sequence planning for the cases of a specified ending article of reading, for any article
190
T.-W. Sung et al.
there must be at least one another article that their relation is higher than or equal to hR , as shown: 8ai 2 A; 9aj 2 A such that rij hR
3.2
ð2Þ
Article Sequencing
To find a reading sequence with the article collection, a clustering operation can be performed as the next step. Each article can be treated as a node in the cluster, and a specified ending article of reading can be chosen as the cluster head. This cluster is a little different from the one-hop cluster (a star topology). There should be a one-hop or multi-hop path from an article (a node) to the ending article (cluster head). Each connection between two articles in the path indicates a relation of higher than or equal to hR between the two directly connected articles. Firstly, a 2-dimensional matrix D½i; jnn is defined as follows, and it converts the concept of relation into distance between any article pair ai and aj . A higher relation implies a closer distance between the articles ai and aj . 8 < 1 rij ; 0; D½i; jnn ¼ : 1;
i 6¼ j; rij hR i¼j otherwise
ð3Þ
By utilizing the matrix D½i; jnn and the concept of Floyd-Warshall algorithm, a sequence matrix S½i; jnn can be obtained. The matrix S½i; jnn represents as the next one article from article ai to aj . Before Floyd-Warshall algorithm performs, the S½i; jnn is initialized as: S½i; jnn ¼
j; i 6¼ j; rij hR null; otherwise
Algorithm 1. Determine the article reading sequence from article /* initialization */
for each ( , ) where 1 ≤ i, j ≤ n and i ≠ j ≥ℎ if [ , ]=j end if end for /* Floyd-Warshall algorithm */
for k = 1 to n for each ( , ) where 1 ≤ i, j ≤ n if D [i, j] > D [i, k] + D [k, j] D [i, j] = D [i, k] + D [k, j] S [i, j] = S [i, k] end if end for end for
ð4Þ
to
Article Reading Sequencing for English Terminology Learning
191
After the Floyd-Warshall algorithm is performed with the matrices D½i; jnn and initialized S½i; jnn , the final result of S½i; jnn can indicate a sequence (path) from article ai to aj , one by one hop. The algorithm is shown as following Algorithm 1. If a student reads ai as the first article and will read aj as the ending article, for example, the next article after ai is aS½i;j : If S½i; j is k, the next article after ak is aS½k;j . 3.3
Ending Article
In the cases of that article reading for professional terminology learning is an extracurricular learning activity, the beginning and ending articles to read could be unspecified. The number of articles asked to read could be also unspecified. Students can freely read articles selected by themselves for the first read from the provided article collection. In this kind of reading activity, the simple clustering method can be modified to provide a recommendation of ending article with an optimal reading sequence. This article will be an optimal ending article for any beginning article selected. The optimal reading sequence will be obtained by giving the shortest multihop path from the beginning article to the ending article. The shortest path implies a better correlation among the articles on the path. The average length of all the paths from every article ai 2 A to an recommended ending article b 2 A is defined as gðbÞ, shown in Eq. (5). gð bÞ ¼
1 Xn D½ai ; bnn i¼1 n
ð5Þ
To find an optimal ending article from the article collection (the article set A), the learning system acquires the values of gðai Þ for each ai 2 A. If the value of gðaEND 2 AÞ is minimum, it is acquired that the optimal ending article for reading is aEND . aEND ¼ arg min1 i n gðai Þ
ð6Þ
192
T.-W. Sung et al.
Algorithm 2 performs clustering on the collected articles. It determines an optimal ending article for reading while the beginning article is not specified. In other words, no matter students select which article to be their first reading article, they will read the ending article at the end. However, the number of articles read by each student will be different. It depends on the length of reading sequence (path). Once the article collection changes, the result of clustering as well as each path will also change.
4 Simulation This study uses simulations to validate the algorithm of article reading sequencing. Numbers of total articles are 100, 200, and 300, respectively. The ending article to read is acquired and recommended by the algorithm. The simulation results can show the reading sequence for every freely selected beginning article. Figure 2 shows the article clustering results. The hollow squares represent the articles and the solid square represents an ending article. Each connection between two articles indicates their distance and implies their terminology relation. In Fig. 2(a) and Fig. 2(b), the cases of total number of 100 and 200 articles are shown respectively. Each path shown in the figure represents an article reading sequence. Figure 3 shows the case of total number of 300 articles. In Fig. 3(a), it also shows the reading path from each beginning article to the ending article acquired and recommended by the algorithm. In Fig. 3(b), it shows a special case that there are two clusters. Each cluster has a cluster head (an ending article). This can be done by modifying the clustering algorithm to have two initial cluster heads. After the algorithm is performed, new (final) cluster heads will be acquired. Each article will have a path to one of the two cluster heads, depending on the distance (relation) among articles. In other words, reading from one beginning article will reach one of the two ending
Article Reading Sequencing for English Terminology Learning
193
Fig. 2. The reading sequence. (a) Number of articles = 100; (b) Number of articles = 200.
articles. In the cases of large number of articles, multi-cluster clustering is a good approach to classify the articles into different topics.
Fig. 3. The reading sequence. (a) 300 articles, 1 cluster; (b) 300 articles, 2 clusters.
5 Conclusion This paper proposes a method which utilizes the Floyd-Warshall algorithm and clustering concepts to find the reading path among a pre-collected English professional (technical) articles. This study is to assist EFL students in learning English terminology terms of a professional course. The calculation of reading sequences depends on the relationship strengths among the articles. In this study, the relation between articles is defined as a key parameter to find the reading sequence and the ending article. The algorithms and the brief simulation results of 100, 200, 300 articles dividing into 1 or 2 clusters are shown in the paper. In future works, an additional parameter can be introduced into the algorithm to control the length of reading path. That is to specify a number of articles to read in the learning activity. This can solve the problem of
194
T.-W. Sung et al.
different students read different numbers of articles due to the selection of different beginning articles. More results of statistics and questionnaires could be also presented in future. Acknowledgement. This work is supported by the Research Project of University Teaching Reformation granted by Education Department of Fujian Province (Project Number: FBJG20170130).
References 1. Mousavian, S., Siahpoosh, H.: The Effects of vocabulary pre-teaching and pre-questioning on intermediate Iranian EFL learners’ reading comprehenstion ability. Inte. J. Appl. Linguist. Engl. Lit. 7(2), 58–63 (2018) 2. Wu, T.T., Sung, T.W., Huang, Y.M., Yang, C.S., Yang, J.T.: Ubiquitous English learning system with dynamic personalized guidance of learning portfolio. Educ. Technol. Soc. 14(4), 164–180 (2011) 3. Ahmadi, D., Reza, M.: The use of technology in English language learning: a literature review. Int. J. Res. Engl. Educ. 3(2), 115–125 (2018) 4. Reinhardt, J.: Social media in second and foreign language teaching and learning: blogs, wikis, and social networking. Lang. Teach. 52(1), 1–39 (2019) 5. Andujar, A., Cakmak, F.: Foreign language learning through instagram: a flipped learning approach. In: New Technological Applications for Foreign and Second Language Learning and Teaching, pp. 135–156. IGI Global (2020) 6. Pujolà, J.T., Appel, C.: Gamification for technology-enhanced language teaching and learning. In: New Technological Applications for Foreign and Second Language Learning and Teaching, pp. 93–111. IGI Global (2020) 7. Pinto, R.D., Peixoto, B., Krassmann, A., Melo, M., Cabral, L., Bessa, M.: Virtual reality in education: learning a foreign language. In: Proceedings of the World Conference on Information Systems and Technologies (WorldCIST), Galicia, Spain, 16–19 April 2019, pp. 589–597 (2019) 8. Elaish, M.M., Shuib, L., Ghani, N.A., Yadegaridehkordi, E., Alaa, M.: Mobile learning for English language acquisition: taxonomy, challenges, and recommendations. IEEE Access 5, 19033–19047 (2017) 9. Wu, T.T., Sung, T.W., Huang, Y.M., Yang, C.S.: Location awareness mobile situated English reading learning system. J. Internet Technol. 11(7), 923–934 (2010) 10. Sung, T.W., Wu, T.T.: Dynamic e-book guidance system for English reading with learning portfolio analysis. Electron. Libr. 35(2), 358–373 (2017) 11. Rosen, K.H.: Discrete Mathematics and Its Applications, 8th edn. McGraw-Hill, New York (2019)
Egyptian Student Sentiment Analysis Using Word2vec During the Coronavirus (Covid-19) Pandemic Lamiaa Mostafa(&) Business Information System Department, Arab Academy for Science and Technology and Maritime Transport, Alexandria, Egypt [email protected], [email protected]
Abstract. Education field is affected by the COVID-19 pandemic which also affects how universities, schools, companies and communities function. One area that has been significantly affected is education at all levels, including both undergraduate and graduate. COVID-19 pandemic emphasis the psychological status of the students since they changed their learning environment. E-learning process focuses on electronic means of communication and online support communities, however social networking sites help students manage their emotional and social needs during pandemic period which allow them to express their opinions without controls. The paper will propose a Sentiment Analysis Model that will analyze the sentiments of students in the learning process with in their pandemic using Word2vec technique and Machine Learning techniques. The sentiment analysis model will start with the processing process on the student's sentiment and selects the features through word embedding then uses three Machine Learning classifies which are Naïve Bayes, SVM and Decision Tree. Results including precision, recall and accuracy of all these classifiers are described in this paper. The paper helps understand the Egyptian student's opinion on learning process during COVID-19 pandemic. Keywords: Sentiment analysis Learning Pandemic
Coronavirus COVID-19 Word2Vec
1 Introduction Due to the new environmental change for the spread of Covid-19 that leads to global pandemic, E-learning techniques are emerging in all over the world. Educational institutions should manage the stress and provide a healthy learning environment [1, 2]. The author [3] described the shift of using learning technology from ‘nice to have’ to ‘mission-critical’ for educational process, he also divided the educational process into two vital parts technology and the learning design. He emphasized that educational institution must develop and sustain their learning contents to provide effective teaching and learning process. Student satisfaction is very important for the educational institutions. Teachers can understand the student using his feedback [18]. Students can express their opinions through the following ways: Class room feedback, clickers, mobile phones and social © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 195–203, 2021. https://doi.org/10.1007/978-3-030-58669-0_18
196
L. Mostafa
media like Facebook, Twitter. Social media feedbacks have following problems: Sentiment classification is used to understand the feeling toward a written piece of text through classifying the text whether it is positive, negative, or neutral [4–7]. Data that is feed to classifier must be cleaned and represented in an accurate way. In the sentiment classification process, feature vector is used as the representation of data to work on. There are different types of feature vector; one of the well known techniques is the Bag of Words (BOW) and word embedding. Researchers in [5] uses the combination of Word2vec and Bag-of-Centroids’ feature vector in the sentiment classification of online mobile consumer reviews while testing the results using different machine learning classifiers, it was found that the proposed feature vector performed well in comparison with Word2vec feature vector. The rest of this paper is organized as follows: Sentiment Analysis in the second Section, word embedding will be described in Sect. 3; Student Sentiment Analysis Model will be described in Sect. 4, Student Sentiment model results will be analyzed in Sect. 5; Sect. 6 includes the paper conclusion and future work.
2 Sentiment Analysis Students use the social media to express their feelings. Teacher has to read all the feedbaṇṇcks this may lead to time consumption, database the holds all text must be maintained and students addiction level is increased due to the high usage of social media. Authors in [18] analyzed the Long Short-Term Memory (LSTM) of student emotions in learning and they concluded that fusion of multiple layers accompanied with LSTM improves the result over a common Natural Language Processing (NLP) method. Researchers defined the sentiment analysis process into the following stages: data acquisition, data preparation, review analysis, and sentiment classification [13–16]. Data acquisition is the process of identifying the source of the sentiment text, data preparation include removing irrelevant terms, review analysis techniques such as Term-Frequency-Inverse Document Frequency (TF-IDF), Bag of words (BOW) and Word2vec and finally the classification stage which depends on machine learning techniques like Naïve Bayes (NB) and Support Vector Machine (SVM). Authors in [8] aim in conduct sentiment analysis on Amazon product reviews including 400,000 reviews using various NLP techniques and classification algorithms. They classified the review through the following steps: preprocessing the reviews and converting them to clean reviews, after which using word embedding, words are converted into numerical representations. Classifiers used are the Naïve Bayes, logistic regression, and random forest algorithm. Accuracy of all classifiers are compared, it helps companies to understand their customer's opinions. Authors in [9] uses bag of n-grams for feature selection, and mutual information was used. Naive Bayes classifier achieved accuracy level equals to 88.80% on IMDB movie reviews dataset. Authors in [4] proposed sequential approach that uses automatic machine learning to improve the quality of natural language processing development. Gamification techniques are used in different fields, authors in [15] proposed a Sentiment Analysis
Egyptian Student Sentiment Analysis
197
Classifier that analyzes the sentiments of students when using Gamification tools in an educational course. The results had shown that the best classifier accuracy results is the Naïve Bayes, also when executing a test on the 1000 students, the agree group of using Gamification in learning shown a better results comparing to the disagree group; this proves that Gamification will enhance the student performance in learning. Student sentiment were collected to understand the prioritizing of the appropriate service facility to optimize the facilities output in order to increase student satisfaction and decrease building life-cycle costs. Student sentiments were collected from 100 students and Term frequency as a feature extraction was used while SVM and NB classifiers were used. The most important factor that affects student satisfaction is the communication between the university management and the students [17]. Two techniques are used for sentiment classification: machine learning and lexiconbased [14, 17]. Machine learning uses traditional mining algorithms such as Naive Bayes, Support Vector Machines and Neural Networks (NN). Naïve Bayes [14–17], the following section will explore some of the researches that implement the word embedding.
3 Word Embedding Word embedding is the process of understanding the relationship between words by dividing a sentence into vectors in which many words with a similar context are usually mapped on similar vector representation. Word embedding are dimension vector representations of word types. Word embedding techniques examples are Latent Semantic Indexing, Random Indexing and Word2Vec [5], the most popular word embedding model today is Skip-gram [23]. Word Embedding is used in sentiment analysis [10– 13]. Different dimensions of Word2vec was used in [10] as it shows improvement level compared to Bag of words.Word2Vec is used to cluster product features and the results compared with TF-IDF feature vector and obtained improved results [11]. Authors in [12] use combination of features extraction which is bag of words, TF-IDF, and Word2vec with logistic regression and random forest classifier. They concluded that Word2vec is a better method for feature extraction along with random forest classifier for text classification.Word2vec is more efficient from traditional feature vectors like Bag-of-Words, Term-based feature vector based on [13]. There was an investigation made by authors in [19] to understand how is single word can affect the sentiment context. Authors in [20] train their systems by using a combination of Word Embedding, the focus was to validate the quality of the system using precision and recall analysis. Movie reviews were classified and labeled into five values: negative, somewhat negative, neutral, somewhat positive, positive which combined with Word Embedding for polarity detection of movie reviews [21]. An investigation for the adaption of Word Embedding techniques in a content-based recommendation Scenario was examined in [22] and Content-based Recommendation framework was created from Wikipedia to learn user profiles based on such Word Embedding, authors concluded that algorithms showed efficient results on Collaborative Filtering and Matrix Factorization, especially in high-sparsity recommendation scenarios.
198
L. Mostafa
4 Student Sentiment Analysis Model Student Sentiment Analysis Model is used to analyze the students’ opinion in the learning process through their pandemic; Sentiment Model passes by different stages including data collection, text processing, feature selection and classification. Figure 1 represents the three components of the Student Sentiment Model.
Fig. 1. Student Sentiment Analysis model.
4.1
Data Collection and Text Pre-processing
Sentiments were collected from 1000 students in Arab Academy for Science and Technology and Maritime Transport University (AAST) University in College of Management and technology for a business course using Google sheet in English text. 700 students refused the e-learning process and define the problems of this process while 300 students were interested in the e-learning process. Table 1 classifies the problems of e-learning process; the largest number of sentiments is focusing on the problem of the lack of communication between the student and the teacher. Table 1. Classification of Sentiments describing the problems of E-learning Problems of E-learning
Internet connection
Online lecture voice latency
# of sentiments
117
173
Online examination limited time 164
Lack direct communication with teacher 246
The second phase of the proposed sentiment model is the text processing. It is the process of cleaning the text from unneeded words [14–16]. Preprocessing steps include punctuation eraser, case convertor, and stop word removal and porter stemmer [16].
Egyptian Student Sentiment Analysis
4.2
199
Features Selection
Text is represented using a collection of features. The process of features selection is to select keywords that represent the text meaning while removing the irrelevant features. Example of Feature extraction methods are Document Frequency (DF), Information Gain (IG), Mutual Information (MI) [16] and Word2Vec [5, 22]. Word2vec is a two layer neural networks technique which produces word vectors from many text of words as input by observing the contextual data the input words appear in [22]. Every word is assigned to a related vector in the Word2vec space. Word2vec algorithm will discover that words with similar context are positioned in near proximity to one another in the word space. Wored2vec has two approaches: Skipgram and Continuous Bag of Words (CBOW), Fig. 2 will describe how each technique works, where w(t) is the target word or input given.
Fig. 2. CBOW and Skip-gram Techniques [24].
Skip-gram is the inverse of CBOW, in Continuous Bag of Words (CBOW) architecture, CBOW predicts center words (target) based on the neighboring words [24] however in the Skip-gram model context words nearby are predicted from center word [23], and Skip-gram is more efficient when the corpus is small. Skip-gram passes by the following steps: data preparation, hyper parameters, generate training data, model training and inference. Data preparation defines the corpus and cleans the required text, second step is building the vocabulary, third step is encoding the words and calculate error rate and adjusting the weights, the last step is finding word vector and finding the similar words. Skip-gram works using the following equation [25]: Q ¼ C ðD þ D log2 ðVÞÞ
ð1Þ
Where, D is the document, v is the dimensionality. Student Sentiment Analysis model uses Document Frequency and the Skip-gram.
200
4.3
L. Mostafa
Machine Learning Classifiers
The features are selected and the classifiers will work on the selected features. Statistical methods and Machine Learning classifiers are usually used in sentiment classification including Multivariate Regression Models, Decision Trees, Neural Networks, Support Vector Machine, Concept Vector Space Model (CVSM) and Naïve Bayes [26]. Student Sentiment Analysis model uses Knime [27] for preprocessing and keyword extraction and classification process; here is the description of each one of the classifiers. • A Naive Bayes classifier depends on Bayes’ theorem, based on the following equation [16]: pNB ðcjd Þ ¼
pðcÞð
Qm
pðf i jcÞniðdÞ pðdÞ
i¼1
ð2Þ
Where, c is the class, d is the document, m is the no of features and fi is the feature vector. • Support vector machine separates positive and negative samples using the best surface, SVM uses the following equation [16]: Xl 1 minw;b;e wT w þ C e i¼1 i 2
ð3Þ
Given training vectors wT , x is the input vector, c is the class, b is the bias, i represent vector. • Decision Tree is a K-array tree in which nodes has specific characteristics, Decision Tree uses the following equation [16]: infoðDÞ ¼
Xm i¼1
pi log2 ðpÞ
ð4Þ
where D is the document, i is the vector, m is the number of feature, pi is the likelihood that self-assertive vector in D fits in with class ci.
5 Student Sentiment Analysis Model Results Student Sentiment Analysis Model passes by three steps: The first step is data collection represented in 1000 student sentiments. Step two is the processing in which documents are cleaned and keywords are extracted using Document Frequency and Skip-gram. Step three is the classification and scoring that uses three classifiers (SVM, NN, and Decision Tree), the scorer calculates the accuracy level. Precision and recall are calculated based on the following equations [16]:
Egyptian Student Sentiment Analysis
recall ¼
number of relevant items retrieved a number of relevant items in the collection
precion ¼
number of relevant items retrieved total number of items retrieved
201
ð5Þ ð6Þ
Student Sentiment analysis Model tests the machine learning classifiers based on 1000 student sentiments, sentiments are classified into accept e-learning process sentiment or reject e-learning process sentiment. Naive Bayes, SVM and Decision Tree were used for the classification of the sentiments. Table 2 shows the results of the sentiment model. Table 2. Student Sentiment Analysis Model Results Classifier
Accuracy DF Skip-gram NB 87% 91% SVM 79% 89% Decision Tree 76% 85%
Precision DF Skip-gram 0.85 0.91 0.77 0.82 0.74 0.84
Recall DF Skip-gram 0.82 0.90 0.79 0.89 0.71 0.83
The results of Student Sentiment Analysis Model agreed with the conclusion of [5, 15, 16], proving that NB is the highest accuracy for sentiment classification. The results agreed with [5] by showing that the accuracy of the decision tree classifier in mobile reviews was lower than the rest of the classifiers used which are logistic regression CV (LRCV), multilayer perceptron (MLP), Random Forest (RF), and Gaussian Naïve Bayes (GNB). Student sentiments dislike the e-learning process and this was reflected on their sentiments that is a good indicator for student opinions, this is equivalent to the conclusion of [17, 18].
6 Conclusion and Future Work Student Sentiment Analysis Model is designed and created; Student Sentiment Model data set was 1000 student sentiments that are divided into 700 accept e-learning process sentiment and 300 refuse e-learning process sentiments. Student sentiment Analysis Model passes by text processing, feature selection including Document Frequency and Word2Vedc (Skip-gram) and machine learning classification, three classifiers were used NB, SVM and Decision Tree. The results had shown that the best classifier accuracy result is the NB. The limitations of the Student Sentiment Analysis Model are: Sample size must be enlarged while involving different student majors. Different classifiers should be tested such as Multilayer Perceptron (MLP), Random Forest (RF), and Gaussian Naïve Bayes (GNB). Arabic sentiments should be the future plan of the author since the sentiments are collected from Egyptian students; however the Arabic language process is a complicated process.
202
L. Mostafa
References 1. Cheston, C.C., Flickinger, T.E., Chisolm, M.S.: Social media use in medical education: a systematic review. Acad. Med. 88(6), 893–901 (2013) 2. Marshall, A., Spinner, A.: COVID-19: challenges and opportunities for educators and generation Z learners. Mayo Foundation for Medical Education and Research. In: Mayo Clinic Proceedings (2020) 3. Schaffhauser, D.: National Federation of the Blind takes on e-text pilots. Campus Technology (2012) 4. Polyakov, E.V., Voskov, L.S., Abramov, P.S., Polyakov, S.V.: Generalized approach to sentiment analysis of short text messages in natural language processing. Informatsionnoupravliaiushchie sistemy [Inf. Control Syst.] (1), 2–14 (2020). https://doi.org/10.31799/ 1684-8853-2020-1-2-14 5. Poonam Choudhari, P., Veenadhari, S.: Sentiment classification of online mobile reviews using combination of Word2vec and Bag-of-Centroids. In: Swain, D., et al. (eds.) Machine Learning and Information Processing. Advances in Intelligent Systems and Computing, vol. 1101. Springer (2020) 6. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1– 135 (2008) 7. Van Looy, A.: Sentiment analysis and opinion mining (business intelligence 1). In: Social Media Management. Springer Texts in Business and Economics. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21990-5_7 8. Meenakshi, M., Banerjee, A., Intwala, N., Sawan, V.: Sentiment analysis of amazon mobile reviews. In: Tuba, M., et al. (eds.) ICT Systems and Sustainability. Advances in Intelligent Systems and Computing, vol. 1077. Springer (2020) 9. Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an enhanced naive Bayes model. In: Intelligent Data Engineering and Automated Learning, IDEAL 2013. Lecture Notes in Computer Science, vol. 8206, pp. 194–201 (2013) 10. Bansal, B., Shrivastava, S.: Sentiment classification of online consumer reviews using word vector representations. Procedia Comput. Sci. 132, 1147–1153 (2018). International Conference on Computational Intelligence and Data Science, ICCIDS 2018, Edited by Singh, V., Asari, V.K. Elsevier (2018) 11. Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst. with Appl. 42, 1857–1863 (2015) 12. Waykole, R.N., Thakare, A.D.: A review of feature extraction methods for text classification. Int. J. Adv. Eng. Res. Dev. 5(04) (2018). e-ISSN (O): 2348–4470, p-ISSN (P): 2348-6406 13. Fang, X., Zhan, J.: Sentiment analysis using product review data. J. Big Data (2015). https:// doi.org/10.1186/s40537-015-0015-2 14. Mostafa, L., Abd Elghany, M.: Investigating game developers’ guilt emotions using sentiment analysis. Int. J. Softw. Eng. Appl. (IJSEA), 9(6) (2018) 15. Mostafa, L.: Student sentiment analysis using gamification for education context. In: Hassanien, A., Shaalan, K., Tolba, M. (eds.) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019, AISI 2019. Advances in Intelligent Systems and Computing, vol. 1058. Springer, Cham (2019) 16. Mostafa, L.: Machine learning-based sentiment analysis for analyzing the travelers reviews on Egyptian hotels. In: Hassanien, A.E., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds.) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), AICV 2020. Advances in Intelligent Systems and Computing, vol. 1153. Springer, Cham (2020)
Egyptian Student Sentiment Analysis
203
17. Abd Elghany, M., Abd Elghany, M., Mostafa, L.: The analysis of the perception of service facilityies and their impact on student satisficiation in higher education. IJBR 19(1) (2019). ISSN: 1555–1296 18. Sangeetha, K., Prabha, D.: Sentiment analysis of student feedback using multi-head attention fusion model of word and context embedding for LSTM. J. Ambient Intell. Hum. Comput. (2020) 19. Dessí, D., Dragoni, M., Fenu, G., Marras, M., Reforgiato Recupero, D.: Deep learning adaptation with word embeddings for sentiment analysis on online course reviews. In: Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds.) Deep Learning-Based Approaches for Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore (2020) 20. Buscaldi, D., Gangemi, A., Reforgiato Recupero, D.: Semantic web challenges. In: Fifth SemWebEval Challenge at ESWC 2018, Heraklion, Crete, Greece, 3 June–7 June, Revised Selected Papers, 3rd edn. Springer (2018) 21. Li, Y., Pan, Q., Yang, T., Wang, S., Tang, J., Cambria, E.: Learning word representations for sentiment analysis. Cogn. Comput. 9(6), 843–851 (2017) 22. Cataldo Musto, C., Semeraro, G., Gemmis, M., Lops, P.: Learning word embeddings from Wikipedia for content-based recommender systems. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 729–734. Springer (2016). https://doi.org/10.1007/978-3-319-306711_60 23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013) 24. Yilmaz, S., Toklu, S.: A Deep Learning Analysis on Question Classification Task Using Word2vec Representations. Springer, London (2020) 25. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781v3 [cs.CL] 7 (2013) 26. Gyongyi, Z., Molina, H., Pedersen, J.: Web content categorization using link information, Technical report, Stanford University (2006) 27. Knime. https://www.knime.com/. Accessed 11 Sept 2019
Various Pre-processing Strategies for DomainBased Sentiment Analysis of Unbalanced Large-Scale Reviews Sumaia Mohammed AL-Ghuribi1,2(&), Shahrul Azman Noah1, and Sabrina Tiun1 1
2
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Malaysia [email protected] Faculty of Applied Sciences, Department of Computer Science, Taiz University, Taizz, Yemen
Abstract. User reviews are important resources for many processes such as recommender systems and decision-making programs. Sentiment analysis is one of the processes that is very useful for extracting the valuable information from these reviews. Data preprocessing step is of importance in the sentiment analysis process, in which suitable preprocessing methods are necessary. Most of the available research that study the effect of preprocessing methods focus on balanced small-sized dataset. In this research, we apply different preprocessing methods for building a domain lexicon for unbalanced big-sized reviews. The applied preprocessing methods study the effects of stopwords, negation words and the number of word’s occurrence. Followed by applying different preprocessing methods to determine the words that have high sentiment orientations in calculating the total review sentiment score. Two main experiments with five cases are tested on the Amazon dataset for the movie domain. The best suitable preprocessing method is then selected for building the domain lexicon as well as calculating the total review sentiment score using the generated lexicon. Finally, we evaluate the proposed lexicon by comparing it with the general-based lexicon. The proposed lexicon outperforms the general lexicon in calculating the total review sentiment score in term of accuracy and F1-measure. Furthermore, the results prove that sentiment words are not restricted to adjectives and adverbs only (as commonly claimed); nouns and verbs also contribute to the sentiment score and thus effects in the sentiment analysis process. Moreover, the results also show that negation words have positive effects in the sentiment analysis process. Keywords: User reviews Sentiment analysis Data preprocessing methods Domain-based lexicon Unbalanced dataset Sentiment words
1 Introduction Millions of people share their opinions on goods, services and deals on a regular basis, using, among others, online channels such as social networks, forums, wikis, and discussion boards. These reviews reflect the users’ experiences on the consumed © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 204–214, 2021. https://doi.org/10.1007/978-3-030-58669-0_19
Various Pre-processing Strategies for Domain-Based Sentiment Analysis
205
services and have significant importance for users, vendors, and companies. The essence of these reviews is complicated, in which they are short, unstructured and sensitive to noise, since they are written by regular, non-professional users [1]. To get benefit from these reviews, many fields are involved in processing them such as sentiment analysis. Sentiment Analysis (SA) is used to extract the feeling or opinion of people towards a specific entity. It focuses on predicting the sentiment score (i.e. positive, neutral, negative) of the given entity. SA usually works in three main levels: document-level, sentence-level, and aspect-level. In this research, we are interested in the document-level opinion mining task which aims to extract the sentiment score of the whole document. Approaches to SA can be roughly divided into machine learning and lexicon-based approaches. In this research, we are interested in the lexicon-based approach as it does not require pre-label training data. SA is considered to be a text analysis task, whereby the pre-processing process and the feature selection process are of significance and affect the efficacy of the SA performance. As a result, the words that are chosen for building the lexicon and their scores (i.e. polarities) will greatly affect the SA performance. For example, if the lexicon incorrectly assigned a value to an opinion word or misses important sentiment indicator words, the accuracy will negatively affect the SA results. Many research have been done to study the effects of various preprocessing on many languages, such as Arabic [2], English [3] and Indonesian [4]. The role of the pre-processing step differs based on the nature of the data. For example, stemming methods were claimed to significantly improve the performance of SA for Arabic language but otherwise for the Indonesian language. As a result, not all the datasets require the same preprocessing methods, as some will affect positively while others will not be affected or hardly affected. Most of the works that study on the effect of pre-processing used balanced smallsized datasets (i.e. the number of positive and negative reviews are almost equal) such as [5, 6]. Furthermore, the works mainly observed the effect of simple pre-processing methods such as stemming and stopwords removal. In this paper, however, the focused is on unbalanced big-size dataset, as well as exploring the effect of word types such as noun, adjective, adverb and verb to the performance of SA.
2 Related Work The preprocessing step is the process of removing noisy data from the dataset, which can negatively affect the overall result of a specific task. User reviews contain a lot of noise data that makes preprocessing step a crucial step for such data and improve the performance of SA classifiers [5]. Many studies have conducted the importance of the preprocessing step in the SA process. Follow are some of them: Haddi, Liu et al. [5] explored the role of pre-processing in the SA process for the movie domain. The used preprocessing steps are removing non-alphabetic signs, removing stopwords, removing movie domain-specific words, tagging negation words, and stemming. The reported results of their experiment showed that the appropriate preprocessing methods can clearly improve the performance of the classifier.
206
S. M. AL-Ghuribi et al.
Jianqiang [6] studied the effect of six various preprocessing methods in the sentiment classification for the twitter domain. The methods are removing stopwords, removing numbers, replacing negative indication, removing URL, replacing acronyms with their original words using acronym dictionary and reverting words with repeating letters to their original form. The result of his experiments showed that replacing negative indication and replacing acronyms with their original words have improved the classification accuracy, while the reminder methods are hardly affecting the classifier. Krouska, Troussas et al. [7] presented five preprocessing techniques to study their effects on the classification performance. The methods are a weighting scheme using TF-IDF, stemming using the snowball stemmer, removing stopwords, tokenization (unigram, bigram, trigram) and various feature selection methods. The results of their experiment showed that unigram and trigram with information gain feature selection method improve the accuracy of the SA process. Zin, Mustapha et al. [3] studied the effects of various preprocessing strategies in SA for the movie domain. They used three cleaning strategies tiers for the preprocessing phase as follows: the first is removing stopwords, the second is removing stopwords and meaningless words, and the last is removing stopwords, meaningless words, numbers, and words less than three characters. Their experiments are applied on the Internet Movie Database (IMDb) and the results of the last tier is the best for improving the SA. Finally, we can notice that most of the researches, that study the effect of the preprocessing step in SA, focus only on the balanced small-sized dataset. Also, their used methods focus only on stemming or removing stopwords and ignore the words’ type. The differ in our work that we focus on the unbalanced big-sized dataset and the syntax of each sentence to study the effect of each word type in the SA process.
3 Methodology In this section, the used methodology of this research will be described in detail. Figure 1 illustrates the framework of the study. The study contains of two phases: training phase and testing phase. In the training phase, the lexicon is built based on three parameters: the occurrence of words in a review, the words appended to the lexicon and the effect of stopwords and negation words. While in the testing phase, the total review sentiment score is calculated, and the word’s polarity is assigned from the generated lexicon in the training phase. The total review sentiment score is calculated based on two parameters: the part of the speech of the word in the review and the effect of the negation words. Finally, we evaluate each of the above parameters to extract the main ones that positively affect the lexicon building and the calculation of the total review sentiment score then, compare the best method with other baselines.
Various Pre-processing Strategies for Domain-Based Sentiment Analysis
Amazon Movie Reviews
Apply Different Preprocessing
Building Domain-Based Lexicon
Calculating Review Sentiment Score
207
TRAINING PART
TESTING PART
Evaluation and comparison with baselines
Fig. 1. The Framework of the Study
A. Training Phase (Building a Domain Based Lexicon) In this study, we focus on the domain-based lexicon, and we select the movie domain because it is one of the most used and popular domains. We also choose Kevin’s method for building the lexicon [8], which uses a hybrid method that estimate the sentiment score using the probability method and the information theoretic method. The efficiency of this hybrid method in SA has been proved in many large and diverse corpora and it overcomes the poor performance of supervised machine techniques [9]. The difference in our work with Kevin’s method is that we used various preprocessing methods in building the lexicon to find the most effective method that significantly improve the SA process. We experiment with two approaches of building the domain lexicon which is based on presence and absent of words; and word frequencies. In each experiments few cases are being considered as illustrated in Table 1.
Table 1. The two approaches for building the domain lexicon Experiments#
Case#
1 – Words are presented by the presence and absent of each word in a review
1 2 3 1 2
2 – Words are presented by the frequency (occurrences) of each word in a review
Removal of stopwords No Yes Yes Yes Yes
Removal of negation words No No Yes Yes No
As mentioned before, Kevin’s method depends on both the probability method and the information theoretic method, thus the word occurrences feature is an important factor. In experiment 1, the occurrence of words are not taken into account in each review. (i.e. words are considered to present and absent); whereas otherwise for experiment 2.
208
S. M. AL-Ghuribi et al.
B. Testing Phase (Calculating the Total Review Sentiment Score) The testing phase focuses on converting the review text into a numerical score or known as sentiment score (i.e. sometimes called virtual rating or implicit rating). This value is used in many applications such as recommender system and summarization process, and the accuracy of this value affects the accuracy of the application that uses it. We implement few methods to generate the sentiment score. Some existing research consider only the adjectives as the sentiment words while others consider both adjectives and adverbs as the sentiment words. The difference in our work is that we study the effects of all words type (i.e. noun, verb, adjective, and adverb) and thus, we do not restrict the sentiment words to a specific type. Additionally, we also explore the effect of using negation words in calculating the sentiment score of the review. In this step the lexicons built during the training phase are used for assigning the sentiment polarity for each word, and for the five cases illustrated previously in Table 1 (i.e. three cases in experiment 1 and two cases in experiment 2), the effect of choosing the sentiment words based on the following parameters are being explored: i. Consider all the words as sentiment words. ii. Use a combination Part of Speech (POS) to derive the sentiment words (i.e. adjective; adjective and adverb; noun, adjective, and adverb; noun, verb, adjective, and adverb). iii. Assign higher priorities to the adjectives and/or adverbs in calculating the sentiment scores. iv. Take into account negating words in the parameters of (ii) and (iii). The negation words described in [10] are used. After choosing the sentiment words from each review based on the previous parameters, we use the blocking technique mentioned in [11] for storing the generated lexicon. Blocking technique aims to store the lexicon in blocks (i.e. there are 27 various blocks named from A to Z), each block contains the words that start with the similar letters. We deal in our experiments with big-sized dataset, so blocking technique will enhance the searching process for each word’s score from the lexicon because the algorithm does not go through all the lexicon to get the word’s score, it searches only on the word’s block. In this case, we reduce the number of search comparisons and speed the execution time which are the main parameters for any search algorithm.
4 Results and Evaluation In this section, the results of the two experiments mentioned in Sect. 3 will be described in detail. We use the Amazon dataset for the movie domain [12]. It consists of 1,697,533 reviews in which 86.52% are positive reviews and 13.47% are negative reviews. We divide the dataset into 80% for building the lexicon, and 20% for calculating the total review sentiment score using the generated lexicon.
Various Pre-processing Strategies for Domain-Based Sentiment Analysis
209
A. Training Phase (Building a Domain Based Lexicon) As discussed in the methodology section there are two experimental approaches for building the lexicon with different cases for each (refer to Table 1). As a result, five domain lexicons have been built of which three for experiment 1 and two for experiment 2. Each lexicon was built based on three parameters, for example the first lexicon of experiment #1 for case #1 was built based on the presence or absent of words in a review, according to a parameter O and using stopwords. The parameter O refers to the minimum number of reviews in which the word is mentioned. We have tested three values of O which are 50, 30, 10 (i.e. 50 means that the appended words are mentioned in at least 50 reviews). The test showed that O = 10 gave the best results in terms of F1-measure and accuracy in calculating the review sentiment score for all cases. Table 2 shows the generated lexicons based on O = 10.
Table 2. Details of the generated lexicons Lexicon Experiment Experiment Experiment Experiment Experiment
#1, #1, #1, #2, #2,
Case Case Case Case Case
#1 #2 #3 #1 #2
Lexicon details Total size Positive words 115,242 112,153 92,181 88,923 92,168 88,868 123,738 118,104 123,178 117,657
Negative words 3,089 3,258 3,300 5,634 55,21
B. Testing Phase (Calculating the Total Review Sentiment Score) The aim of this phase is to calculate the total review sentiment score using the generated lexicon in the previous phase based on four parameters. The following describe in more detail how the score of sentiment are generated using the parameters. i. Consider all the words as sentiment words The review is tokenized into words and for each tokenize of words, the score is calculated from the generated lexicon and the total review sentiment score is the summation of the sentiment score of each token. ii. Using POS to choose the sentiment words In this parameter, we want to check which of the words have more sentiment orientations. We will use four different cases for selecting the sentiment words as follows: using adjectives only (ADJ), using adjectives and adverbs only (ADJ +ADV), using adjectives, adverbs, and nouns only (N+ADJ+ADV), and using adjectives, adverbs, nouns, and verbs (N+V+ADJ+ADV). iii. Give more priorities to adjectives and/or adverbs in calculating their scores The idea here is to give the adjectives and the adverbs more priorities than the other types of words by multiplying their scores by a value (i.e. different values are tested). For example: Exp1.Case1 (ADJ(x3)+ADV(x2)) means that in Exp1. Case1 we select only adjectives and adverbs as sentiment words and multiply the sentiment score of each adjective with 3 and each adverb with 2.
210
S. M. AL-Ghuribi et al.
iv. Use negation word with the previous two parameters During the implementation of the previous parameters, we notice that the number of true negative is always small due to the difficulties to determine the negative words. In this part, we suggest using all the negation words to increase the possibility of true negative reviews which in rule will enhance the overall performance measures. Two values for the negative words are tested, either AVG or SEP, AVG means that all the negation words have the same score which is the average sentiment score of all the negation words’ scores. SEP means that the negative sentiment score of each negation word is its value from the lexicon. For experiment 2, beside AVG and SEP, a new case for negative named SMALL is used in which not all the negation words that are mentioned in [10] are used, only few negation words are used as follows: Small_Negation_Words=[‘no’,’not’,’nor’,’none’,’nothing’,’isnt’,’cannot’,’werent’,’rather’, ‘might’,’neither’,’dont’,’could’,’doesnt’,’couldnt’,’couldve’,’wasnt’,’didnt’,’would’,’wouldnt’, ‘wouldve’,’should’,’shouldnt’,’shouldve’,’neverending’,’nobody’,’scare’,’scares’,’less’, ’hardly’].
The resulted sentiment score from the previous four parameters is a value ranged between [−1, 1] and we compare it with the real rating provided in the dataset to calculate the two performance measures (accuracy and F1-measure). If the real rating is between 4 to 5 and the resulted sentiment score is positive, it considers as a correct calculation for the review score (i.e. True Positive), but if the resulted score is negative, it considers as false Positive (i.e. same thing for negative). Tables 3 and 4 present the results for the experiments in calculating the total review sentiment score. The results are sorted in descending order based on the accuracy measure. For experiment #2, only the result for case 2 is presented as the results for case 1 were rather low. In this section, we will summarize the observations of the best results of each experiment that presented in Tables 3 and 4 (i.e. determined by bold font in both accuracy and F1-Measure columns) as follows: 1. Experiment 2 gives a better result than Experiment 1, this means that in building the lexicon, identifying the occurrence of a word in a review based on its occurrence is more accurate than identifying based on presence and absent of the word in a review. 2. All the best results from all the experiments include all the types of POS (i.e. nouns, verbs, adjective, and adverb). It means that words that have sentiment orientation do not restrict only on the adjectives and adverbs, nouns and verbs also have sentiment orientations that highly affect the SA process. 3. Additionally, all the best results include negation words, this proves that negation words have a big effect in the SA process. 4. Four of the best results have priorities for adjectives only or both adjectives and adverbs, this shows that both the adjectives and adverbs have higher sentiment orientations than the nouns and verbs, but this does not significantly influence the nouns and verbs in the SA process.
Various Pre-processing Strategies for Domain-Based Sentiment Analysis
211
Table 3. Accuracy and F1-Measure for the methods of Experiment 1 for Case 1, 2 and 3. Case POS/Words #1 N+ADJ+ADV N+V+ADJ(x3)+ADV(x2) N+ADJ(x3)+ADV(x2) N+ADJ+ADV N+V+ADJ(x2)+ADV N+ADJ+ADV N+V+ADJ+ADV N+V+ADJ+ADV N+V+ADJ+ADV ADJ(x5)+ADV(x2) ADJ(x4)+ADV(x2) ADJ(x2) All words ADJ+ADV ADJ+ADV ADJ ADJ+ADV #2 N+V+ADJ(x3)+ADV N+V+ADJ(x4)+ADV N+V+ADJ(x2)+ADV N+V+ADJ(x3)+ADV(x1.5) N+V+ADJ(x1.5)+ADV N+V+ADJ(x2)+ADV(x1.5) N+V+ADJ(x5)+ADV(x2) N+V+ADJ(x3)+ADV(x2) N+V+ADJ(x3)+ADV(x2) N+V+ADJ+ADV N+ADJ(x3)+ADV(x2) N+ADJ(x5)+ADV(x3) N+ADJ+ADV All words ADJ+ADV #3 N+V+ADJ(x3)+ADV N+V+ADJ(x3)+ADV(x2) N+V+ADJ(x2)+ADV N+ADJ(x3)+ADV(x2) N+V+ADJ+ADV N+ADJ+ADV N+V+ADJ+ADV All words N+ADJ+ADV
Negation SEP SEP – AVG SEP – SEP AVG – – – – – – AVG – SEP SEP SEP SEP SEP SEP SEP SEP SEP SEP SEP SEP SEP SEP – SEP SEP SEP SEP SEP SEP SEP – – –
Accuracy 87.754 87.718 87.693 87.686 87.68 87.435 87.363 87.204 87.02 86.442 86.293 85.764 85.5 83.829 82.895 82.151 81.936 88.197 88.173 88.162 88.137 88.08 88.05 88.047 88.047 87.978 87.894 87.742 87.63 87.593 87.208 83.737 88.148 88.147 88.04 87.925 87.738 87.641 87.527 87.497 87.445
F1-Measure 93.147 93.191 93.103 93.144 93.181 93.027 93.029 92.961 92.872 92.579 92.513 92.263 92.133 90.57 89.794 89.58 89.144 93.405 93.379 93.401 93.38 93.366 93.348 93.311 93.336 93.282 93.271 93.135 93.06 93.065 92.963 90.391 93.402 93.404 93.363 93.248 93.223 93.133 93.124 93.11 93.05
212
S. M. AL-Ghuribi et al. Table 4. Accuracy and F1-Measure for the methods of Experiment 2 for Case 2. Case POS/Words #2 N+V+ADJ+ADV N+V+ADJ+ADV N+V+ADJ+ADV N+V+ADJ+ADV N+V+ADJ(x3)+ADV(x2)+Neg: SEP N+V+ADJ(x3)+ADV+Neg: SEP N+V+ADJ(x3)+ADV N+ADJ+ADV N+ADJ+ADV ADJ+ADV ADJ ADJ+ADV ADV
Negation SEP SMALL AVG – SEP SEP SMALL SEP – – – SEP SEP
Accuracy 89.431 89.412 89.207 89.12 89.084 89.02 89.009 88.594 88.498 83.64 81.654 81.55 77.24
F1-Measure 94.018 94.008 93.91 93.876 93.806 93.77 93.771 93.436 93.481 90.381 89.191 88.639 85.482
Finally, Exp2.Case2(N_V_ADJ_ADV+Neg: SEP) method got the highest accuracy and F1-measure, so it is chosen as the best method for calculating the total sentiment score, and Exp2.Case2 is chosen as the best method for building the lexicon. C. Evaluation After determining Experiment #2 for Case #2 (N+V+ADJ+ADV) with Negation = SEP method as the most efficient for calculating the total review sentiment score in the previous section, we will evaluate the domain-based lexicon that is generated in Experiment #2 for Case #2 with the general-based lexicon (i.e. SentiWordNet) using this method. We use 5 cross-validation and the dataset is divided into 80% for training (i.e. building the lexicon for the domain lexicon), and 20% for testing (i.e. calculating review score). Figure 2 shows the accuracy and F1-measure for both the domain lexicon and SentiWordNet using the Exp2.Case2(N_V_ADJ_ADV+Neg: SEP) method. The domain lexicon outperforms the SentiWordNet in both the accuracy and F1-measure.
96 93 90 87 84 81 78 75 Fold1
Fold2
Fold3
Fold4
Fold5
Average
Accuracy_Domain
Accuracy_SentiWordNet
F1-measure Domain
F1-measure SentiWordNet
Fig. 2. Accuracy and F1-measure of 5 cross-validation for both domain and general lexicon
Various Pre-processing Strategies for Domain-Based Sentiment Analysis
213
5 Conclusion In conclusion, this research aims to study the effects of various preprocessing methods in building the domain-based lexicon and in calculating the total review sentiment score for unbalanced big-sized dataset. Many preprocessing methods are presented and the lexicon that is generated in Experiment #2 for Case #2 proves to be the best lexicon and Experiment #2 for Case #2(N_V_ADJ_ADV) with negation = SEP method is selected as the best method for calculating the total review sentiment score. This in rule proves that sentiment word is not restricted only on the adjectives and adverbs only, it can be nouns or verbs also. Additionally, negation words prove their positive effects in the SA process. Finally, we compared the best proposed domain lexicon with the general lexicon in calculating the total review sentiment score, and the results show that the proposed domain lexicon outperforms the general lexicon in both accuracy and F1measure. Using the domain lexicon, the accuracy is 9.5% higher than the SentiWordNet and the F1-measure is 6.1% higher than the SentiWordNet. This big difference proves that using the domain-based lexicon is more efficient than the general lexicon. For future work, we are planning to use the domain lexicon in the aspect-level sentiment analysis to find the aspect sentiment score inside the review which is deeper than the total review sentiment score. Acknowledgment. We acknowledge the support of the Organization for Women in Science for the Developing World (OWSD) and Sida (Swedish International Development Cooperation Agency).
References 1. AL-Ghuribi, S.M., Noah, S.A.M.: Multi-criteria review-based recommender system–the state of the art. IEEE Access 7(1), 169446–169468 (2019) 2. Duwairi, R., El-Orfali, M.: A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 40(4), 501–513 (2014) 3. Zin, H.M., et al.: The effects of pre-processing strategies in sentiment analysis of online movie reviews. In: AIP Conference Proceedings. AIP Publishing LLC (2017) 4. Pradana, A.W., Hayaty, M.: The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on indonesian-language texts. Kinetik: Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 4(4), 375–380 (2019) 5. Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013) 6. Jianqiang, Z.: Pre-processing boosting Twitter sentiment analysis? In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity). IEEE (2015) 7. Krouska, A., Troussas, C., Virvou, M.: The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA). IEEE (2016) 8. Labille, K., Gauch, S., Alfarhood, S.: Creating domain-specific sentiment lexicons via text mining. In: Proceedings of the Workshop Issues Sentiment Discovery Opinion Mining (WISDOM) (2017)
214
S. M. AL-Ghuribi et al.
9. Labille, K., Alfarhood, S., Gauch, S.: Estimating sentiment via probability and information theory. In: KDIR (2016) 10. Farooq, U., et al.: Negation handling in sentiment analysis at sentence level. JCP 12(5), 470– 478 (2017) 11. Thabit, K., AL-Ghuribi, S.M.: A new search algorithm for documents using blocks and words prefixes. Sci. Res. Essays 8(16), 640–648 (2013) 12. He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2016). http://jmcauley.ucsd.edu/data/amazon/links.html
Arabic Offline Character Recognition Model Using Non-dominated Rank Sorting Genetic Algorithm Saad M. Darwish1(&), Osama F. Hassan2, and Khaled O. Elzoghaly2 1
2
Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt [email protected] Faculty of Science, Department of Mathematics, Damanhour University, Damanhur, Egypt [email protected]
Abstract. In recent years, there was intensive research on Arabic Optical Character Recognition (OCR), especially the recognition of scanned, offline, machine-printed documents. However, Arabic OCR results are unsatisfactory and are still an evolving research area. Exploring the best feature extraction techniques and selecting an appropriate classification algorithm lead to superior recognition accuracy and low computational overhead. This paper presents a new Arabic OCR approach by integrating both of Extreme Learning Machine (ELM) and Non-dominated Rank Sorting Genetic Algorithm (NRSGA) in a unified framework with the aim of enhancing recognition accuracy. ELM is adopted as a neural network classifier that has a short processing time and avoids many difficulties faced by gradient-based learning methods such as learning epochs and local minima. NSRGA is utilized as a feature selection algorithm that has better convergence and spread of solutions. NSRGA emphasizes ranking among the solutions of the same front, along with elite preservation mechanism, and ensuring diversity through the nearest neighbor method reduces the run-time complexity using the simple principle of space-time trade-off. The Experimental results reveal the efficiency of the proposed model and demonstrated that the features selection approach increases the accuracy of the recognition process. Keywords: Arabic OCR NRSGA
Extreme Learning Machine Feature selection
1 Introduction OCR is the automatic recognition of characters from images that has many applications such as document recovery, car plate recognition, zip code recognition, and various banking and business applications. Generally, OCR is divided into online and offline character recognition systems [1]. Online OCR recognizes characters as they are entered and utilizes the order, speed, and direction of individual pen strokes to achieve a high level of accuracy in recognizing handwritten text. Offline OCR is complicated © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 215–226, 2021. https://doi.org/10.1007/978-3-030-58669-0_20
216
S. M. Darwish et al.
because this type of recognition needs to overcome many complications, including similarities of different character forms, overlaps in character, and interconnections between the surrounding characters. Although offline systems are less precise than online setups, they are widely used in specialized applications such as interpreting handwritten postal addresses on envelopes and reading currency amounts on bank checks. Furthermore, offline OCR saves time and money rewriting old documents in electronic format [2, 3]. Consequently, obstacles facing offline OCR, and the increasing need for OCR applications, make offline OCR an exhilarating field of research. OCR system aims to achieve a high recognition rate, overcome the poor quality of scanned images, especially in historical documents, and adapt style and size variations within the same document. Regardless of other languages, Arabic OCR is still developing because of the complex nature of Arabic words structure and syntax. Some of these complexities are that [2]: (1) The outline of each character depends upon its place in the word in two or four shapes. (2) Some characters’ shape is similar, but the position and the number of dots, for example , that can be written above or below the characters differ. (3) The characters are written connected to each other. Yet, some of the characters cannot be connected to the last characters, which cause a word to have many connected components; these are called Pieces of Arabic Words (PAWs). Moreover, special marks called diacritics, written above or below the character, are used to modify the character accent. The OCR output is dependent on text quality, text image processing, and the different methods for classifying to improve the detection rate. Generally, the OCR system comprises six stages: image acquisition (scanning), segmentation, preprocessing, feature extraction, classification, and post-processing [4]. The two main factors that affect the OCR recognition rate are: (1) a set of representative features from word images and (2) an efficient classification algorithm [5]. The selection of a stable and representative collection of features is the heart of the OCR system design. This process captures the essential characteristics of a word and combines them in a feature vector, yet simultaneously ignores the unimportant ones. OCR classification techniques can be broadly grouped into three categories [6, 7]: heuristic (e.g., fuzzy), template matching (e.g., dynamic time warping), and learning-based methods (e.g., neural networks). These algorithms still do not achieve satisfactory results with Arabic OCR as they are not generalize training data well and sensitive to common types of distortions. Currently, the Genetic Algorithm (GA) is considered one of the most powerful unbiased optimization techniques for sampling a large solution and is used to find the most optimized solution for a given problem [8]. To deal with the problem facing multi-objective genetic algorithm such as computational complexity, the need for specifying a sharing parameter, and non-elitism, the Non-dominated Sorting Genetic Algorithm (NSGA-II) is employed that is based on Pareto dominance for measuring the quality of solutions during their search [9]. Recently, many studies introduced variants of NSGA-II that improve on the time-complexity by managing the book-keeping in a better way than the basic algorithm. One of these variants is the Non-dominated Rank Sorting Genetic Algorithm (NRSGA) that classifies the population into fronts first and then assigning ranks to the solutions in a front [10]. This technique ensures diversity with no extra niching parameter. As ensuring diversity is parameterized implicitly in the fitness criteria, convergence property does not get affected while ensuring diversity.
Arabic Offline Character Recognition Model
217
Furthermore, to ensure diversity, distances between nearest neighbors are considered, which is fast and straightforward. However, there are some emerging problems regarding how to get an optimal Arabic OCR that overcomes the curious nature of Arabic characters, achieving a high recognition rate, and dealing with many font styles. This research is motivated by all these challenges. This work mainly contributes to creating a new Arabic OCR model that focuses on printed and segmentation-free images of the Arabic word, both through NRSGA and ELM. NRSGA is utilized in the feature selection process with two different objective functions. Since NRSGA decreases the computational complexity of ordinary genetic algorithms, it has the ability to find a better spread of solutions and better convergence near the true Pareto-optimal. In addition, the suggested model exploits an efficient generalized Single hidden Layer Feed-Forward Networks (SLFN) algorithm, called the Extreme Learning Machine (ELM), as a classification algorithm. The model aims to reach the least recognition error, the shortest running time, and the simplest structure. The work presented in this paper is an extension of the previous work [11] to improve the selection of the best features at an appropriate time and treats the traditional NSGA-II imperfections for features selection such as lack of uniform diversity and absence of lateral diversity preserving operator among the current best non dominated solutions. The suggested model investigates the potential improvements to the recognition accuracy in using NRSGA instead of NSAG-II for word’ features extraction within offline Arabic character recognition applications. The results show that the new variant is on par with traditional one and hence can be considered in such situations, which the warrant an alternative approach for validation of results obtained by any of the other methods, especially for improving computation cost in the testing phase. The structure of the paper is organized as follows: a short survey about previous research is provided in Sect. 2. In Sect. 3, the proposed model is discussed in detail. Section 4 gives experimental results demonstrating the proposed model’s performance and evaluation. Then the paper concludes with final remarks on the study and the future work in Sect. 5.
2 Related Work Work in the Arabic OCR field has been particularly important in recent years, primarily because it has been challenging to achieve both targets increasing recognition rate and decreasing computational cost without degrading one another. For instance, the authors in [12] built an Arabic OCR system using Scale Invariant Feature Transform (SIFT) as features for classification of letters in conjunction with the online failure prediction method. With increasing window dimensions, the system scans every word; segmenting points are set in which the classifier achieves maximum confidence. To highlight the influence of image descriptors, the research in [13] concentrated on improving the feature extraction phase by selecting efficient feature subsets using different feature selection techniques. These techniques rank the 96 possible features based on their importance. The research has shown that the NSGA chooses the right feature sub-set relative to the other four methods. The system also found that the SVM classifier gives the highest classification quality.
218
S. M. Darwish et al.
The concept of partial segmentation processes has been utilized in [14] for recognizing Arabic machine-printed texts using and the Hausdorff distance. To determine the number of multi-size sliding windows in the given form of a PAW, the transforming stroke width was used to calculate the size and font styles. The method uses Hausdorff’s distance to determine the resemblance of the two images (character and sliding window). The system gave satisfying results of the high recognition rate for the APTI database and the PATS-A01 database. However, increasing the number of sliding windows in each image made the processing time-consuming. To tackle the problem of word segmentation, the authors in [15] defined each shape of an Arabic word as a specific class, without word segmentation. The features extracted for every word were twenty vertical sliding windows to include structural and geometrical representations of Arabic words. The last phase was the classification phase, where the multi-class SVM was applied. The system was tested with various Arabic word datasets and obtained a recognition rate of 98.5%. Recently, extreme learning machine has become a cutting edge and promising approach in image classification. Researchers in [16] developed an expert system for identifying Brazilian vehicles’ standard license plates. Many classification algorithms were applied to identify numbers and letters. Among the used classifiers, ELM achieved the highest accuracy of plate characters’ detection with the smallest standard deviation. In [11], the authors suggested a model that fuses ELM with NSGA-II to solve the problem of Arabic OCR recognition. However, utilizing NSGA-II does not find the best solution features. If the number of solutions in the first front does not exceed the population size, diversity among the finally obtained set may not be adequately ensured. From the survey conducted, it has been inferred that the current methods for Arabic OCR fail to handle transformation factors since they have a major limitation in selecting the optimal distinctive features from different words.
3 Proposed Model The block diagram that summarizes the main components of the proposed Arabic OCR model is depicted in Fig. 1. The model utilizes NRSGA to select the optimal features and ELM classifier for recognition of the scanned, offline, machine-printed documents without long training time. The model consists of two main phases: training and testing phases. The following subsections discuss the components of the model in detail with the clarification of the objective of each step. 3.1
Image Acquisition
Although there are many popular Arabic databases, the proposed model used wellresearched printed databases that have good resolution, and many font styles and sizes. The first one is the PATS-A01 database that consists of 2766 text line images in eight fonts. The second is the APTI database that contains 113,284 text images, 10 Arabic fonts, 10 font sizes, and 4 font styles [15, 17]. The samples are variable in size, font type, orientation, and noise degree. Since PATS-A01 images are lines of words, lines were segmented manually to get separated words samples like the APTI samples, in order to unify input for the preprocessing step.
Arabic Offline Character Recognition Model
219
Fig. 1. The proposed Arabic OCR Model by fusing ELM and Ranked NSGA-II.
3.2
Preprocessing
Preprocessing aims to produce a clear version of each image for the OCR system [13– 15]. In this step, each word image follows five operations to prepare for feature extraction. These operations are: (a) transforming the image to grayscale and then to binary format, (b) removing noise from the image by using appropriate median filter, (c) removing all small objects by applying morphologic open and close operation, (d) correcting the image if it is rotated, (e) resizing image to appropriate dimensions.
220
3.3
S. M. Darwish et al.
Segmentation
Since word segmentation is the primary source of errors in recognition, the suggested model avoids this step and uses pre-segmented images (segmentation-free words) [1]. However, images from the PATS-A01 database are lines of words; these will be segmented manually. 3.4
Feature Extraction and Selection
The major goal of the feature extraction stage is to maximize the recognition rate with the least amount of features that are stored in a feature vector. The underlying concept of this step is to extract features from word images that achieve a high degree of similarity between samples of the same classes and a high degree of divergence between samples of other classes [6, 12]. As stated in [5], the second-order statistics focused on feature extraction methods have achieved higher levels of diversity than the power spectrum (transform-based) and the structural methods. From these second-order statistics, image moments achieved the best results [13]. Consequently, the suggested model employs a set of fourteen features dependent on invariant moments; because they are translation and scale-invariant. The feature vector contains the kurtosis and skewness for both horizontal and vertical projection, the vertical and Horizontal center, the number of objects in the image, and the first seven invariant moments [5, 6, 12, 13].
Arabic Offline Character Recognition Model
221
Fig. 2. Flowchart of NRSGA.
As a general rule, the proposed model needs to extract the best features that optimize classification results and highlight the discrepancy among different classes. Therefore, NRSGA is utilized to select the best features and reduce the dimensionality of the training dataset. The basic concept of NRSGA is to classify the population into fronts first and then to assign ranks to the solutions in a front. The ranking is assigned with respect to all the solutions of the population, and each front is reclassified with this ranking. The distinguishing features of this algorithm are: (1) Reclassifying the solutions of the same front based on ranks. (2) Successfully avoiding sharing parameters by ensuring diversity among trade-off solutions using the nearest neighbor method [10]. Algorithm 1 illustrates the main steps of NRSGA, and Fig. 2 graphically depicts its main component [18, 19]. Herein, we adopt two different conflicting objectives: to minimize the number of genes (features) used in classification while maintaining acceptable classification accuracy expressed as testing error. In general, utilizing NRSGA for feature selection ensures diversity with no extra niching parameter. Traditional elitism does not allow an already found Pareto-optimal solution to be deleted. As ensuring diversity is parameterized implicitly in the fitness criteria, convergence property does not get affected while ensuring diversity. To ensure diversity, distances between nearest neighbors are considered that is fast and straightforward. 3.5
Classification Using Extreme Learning Machine
Classification is the OCR system’s decision-making process that uses the features extracted from the previous stage. The classification algorithm is taught with the training dataset; then, it is fed with the testing dataset to recognize the different classes (each class is a word). Achieving a high recognition rate requires a powerful
222
S. M. Darwish et al.
Fig. 3. ELM’s structure.
classification technique that outperforms its contemporaries’ techniques in terms of speed, simplicity, and recognition rate. The proposed model utilizes ELM, a fast and efficient learning algorithm, defined as a generalized Single hidden Layer Feedforward Network (SLFN). Fundamentals of ELM techniques are composed of twofold [20]: universal approximation capability with random hidden layer, and various learning techniques with easy and fast implementations. Figure 3 shows the structure of the ELM. ELM aims to break the barriers between the conventional artificial learning techniques and biological learning mechanism and represents a suite of machine learning techniques in which hidden neurons need not be tuned. Compared with traditional neural networks and support vector machines, ELM offers significant advantages such as fast learning speed, ease of implementation, and minimal human intervention. Due to its remarkable generalization performance and implementation efficiency, ELM has been applied in various applications. See [18, 21, 22] for more details.
4 Experimental Results In this section, the accuracy of the proposed model was tested, and the results were compared with the results of previous systems on the same benchmarked databases. The testbed dataset contains more than 102 images and more than 50 testing samples from PATS-A01 and APTI [15, 17]. The model tests only four of the eight fonts in this database, which are Arial, Naskh, Simplified, and Tahoma in the experiments. The individual text lines of the PATS-A01 database were segmented manually to separate
Arabic Offline Character Recognition Model
223
Fig. 4. Arabic words samples.
them into words. Training classes were 22 different Arabic words in different sizes, orientations, noise degrees, and fonts. Figure 4 shows samples of Arabic words. The experiments were conducted on an AMD Quad-core, 2 GHz processor, 4 GB DDR3 RAM laptop, and Windows 8.1 operating system. The code was written in MATLAB language using MATLAB 2011Rb software. Many criteria were used in the evaluation of the suggested model; these criteria are training time, defined as the time spent on training ELM, testing time, which is the time spent on predicting all testing data, and training/testing accuracy, which is the root mean square of correct classification. The first set of experiments was performed to compare the identification accuracy of the proposed model that employs NRSGA to determine the optimal features and the traditional version of the model without using optimization (i.e., using 14 features from second-order statistics). Results for the previous model that utilizes NSGA-II [11] were included in the table to verify the difference between NRSGA and NSGA-II in terms of recognition accuracy and computational cost (time for training and testing). A set of features is extracted from each word image forming a feature vector for each word. Each feature vector is then classified individually using an extreme learning classifier. The results shown in Table 1 revealed that the use of the five optimal features [f4, f6, f7, f8, f10] extracted by NRSGA with the classifier generates a further identification rate improvement of about 0.2% for the same method using six optimal features [f4, f6, f7, f8, f12, f14] extracted by NSGA-II for both PATS-A01 and APTI. But this slight improvement came at the expense of the increase in training time as compared NSGAII-based recognition model because NRSGA has an extra-added complexity for reclassifying the solutions of the same front based on ranks. One possible explanation of this result is that NRSGA maintains the diversity among the solutions by dynamically controlling the crowding distance. NRSGA alleviates most of the difficulties of non-dominated sorting and sharing evolutionary algorithms. The basic NSGA-II performs even worse since its complexity is O(MN2), where M is the number of objectives, and N is the size of the dataset [9]. Though the difference is negligible for small populations, there is a marked difference for population sizes of 1000 and more. This is significant since such problems frequently demand the use of large population sizes to yield reasonable results.
224
S. M. Darwish et al.
Table 1. The identification accuracy rates for the suggested model using NRSGA or NSGA-II. Dataset PATS-A01 (650 samples)
APTI (550 samples)
With NRSGA With NSGA-II Without feature selection With NRSGA With NSGA-II Without feature selection
Accuracy (%) 98.92 98.69 97.04
Testing Time (sec) on average 14 16 30
95.76 95.37 97.04
11 14 25
Training Time (sec) on average 115 98 60
96 73 15
The performance improvement comes from the correct identification of word image because NRSGA can extract optimal features (discriminative features) with the help of the objective fitness function that mixes the recognition error. APTI contains images with a small Arial font in contrast with PATS-A01 that includes images with a big Arial font; so, the accuracy for APTI decreases compared with the second dataset. As expected, using only six features, on average, for each sample will decrease the time required for identification in the test phase as compared with fourteen features (on average 56% reducing in time). For the training phase, the NSGA-II module consumes more time for feature selection, about a 63% increase in time. The second set of experiments was performed to show how the recognition rate of the suggested model relies on the number of word’s image samples per word because if the word has more enrolled samples, the chance of correct hit increases. The maximum allowed limit of word’s image is 50 (for both PATS-A01 and APTI) per class and through which they appear different operations on the image such as rotation, scaling, and noise. In Table 2, as expected, the recognition rate increases as the number of samples grows due to the increase in inter-class word’s image variability. The Accuracy rate grows nearly by 3% for every increase by 10 of the number of samples in the dataset. Table 2. Relationship between accuracy rate and the number of samples. Test set No of samples Accuracy (%) Testing time (s) PATS-A01 (650 samples) 10 80.98 3.12 20 88.48 5.51 40 90.36 10.82 50 98.69 24.74 APTI (550 samples) 10 80.02 1.09 20 87.41 1.92 40 90.33 3.78 50 98.03 8.66
Arabic Offline Character Recognition Model
225
In general, increasing the number of samples within each class does not affect largely in improving accuracy up to 60 samples, since the suggested model relies on extracting characteristic features from the pattern word image, which does not vary much based on the font type and style. Combining all samples to learn the proposed model increases accuracy up to 99%; due to the NRSGA performance in choosing the best features that represent the word’s image in general. This increase is done at the cost of the time taken to train the model. But this time is negligible compared to the time consumed in the testing phase. In the training phase, the optimal feature selection module takes the most time. One possible justification for this reduction in accuracy for the APTI dataset is that it contains images with a small Arial font, and resizing will degrade the image’s quality. Some limitations are facing our model due to overlapping fonts such as Diwani and Thuluth fonts that significantly affect accuracy compared to the other fonts.
5 Conclusions and Discussions Arabic offline OCR for printed text is a very challenging and open area of research. This paper developed an Arabic OCR for printed words based on a combination of the ELM classifier and the NRSGA. In the beginning, the model used fourteen features dataset. After applying NRSGA, the datasets were reduced to five features dataset; then, data was fed into the ELM network, which is a fast and simple single hidden layer feed-forward network. ELM avoids the local minimum traps and long training time of ordinary neural networks, and the hidden layer of SLFNs needs not to be tuned. Moreover, NRSGA helps select the most defining features that decreased the dataset’s complexity by 57% and significantly improved the performance. The model achieves a high recognition accuracy of 98.87% for different samples in a short time. Future work includes utilizing more complex ELM networks such as optimal weight-learning machines to achieve promising results on the Latin and Indian OCRs and should be tested on Arabic OCRs.
References 1. Lawgali, A.: A survey on Arabic character recognition. Int. J. Signal Process. Image Process. Pattern Recogn. 8(2), 401–426 (2015) 2. Lorigo, L., Govindarajum, V.: Offline Arabic handwriting recognition: a survey. J. Pattern Anal. Mach. Intell. 28(5), 712–724 (2006) 3. Jumari, K., Ali, M.: A survey and comparative evaluation of selected off-line Arabic handwritten character recognition systems. Malays. J. Comput. Sci. 36(1), 1–18 (2012) 4. Bouazizi, I., Bouriss, F., Salih-Alj, Y.: Arabic reading machine for visually impaired people using TTS and OCR. In: 4th International Conference on Intelligent Systems and Modelling, Thailand, pp. 225–229 (2013) 5. Mohamad, M., Nasien, D., Hassan, H., Haron, H.: A review on feature extraction and feature selection for handwritten character recognition. Int. J. Adv. Comput. Sci. Appl. 6(2), 204– 213 (2015)
226
S. M. Darwish et al.
6. Ismail, S., Abdullah, S.: Geometrical-matrix feature extraction for on-line handwritten characters recognition. J. Theor. Appl. Inf. Technol. 49(1), 1–8 (2013) 7. Bhavsar, H., Ganatra, A.: A comparative study of training algorithms for supervised machine learning. Int. J. Soft Comput. Eng. 2(4), 2231–2307 (2012) 8. Hassan, O., Gamal, A., Abdel-khalek, S.: Genetic algorithm and numerical methods for solving linear and nonlinear system of equations: a comparative study. Intell. Fuzzy Syst. J. 38(3), 2867–2872 (2020) 9. Golchha, A., Qureshi, G.: Non-dominated sorting genetic algorithm-II – a succinct survey. Int. J. Comput. Sci. Inf. Technol. 6(1), 252–255 (2015) 10. D’Souza, R., Sekaran, C., Kandasamy, A.: Improved NSGA-II based on a novel ranking scheme. J. Comput. 2(2), 91–95 (2010) 11. Darwish, S., El Nagar, S.: Arabic offline character recognition using the extreme learning machine algorithm. Int. J. Digit. Content Technol. Appl. 11(4), 1–14 (2017) 12. Stolyarenko, A., Dershowitz, N.: OCR for Arabic using sift descriptors with online failure prediction. J. Imaging 3(1), 1–10 (2011) 13. Abandah, G., Malas, T.: Feature selection for recognizing handwritten Arabic letters. Eng. Sci. J. 37(2), 1–20 (2010) 14. Saabni, R.: Efficient recognition of machine printed Arabic text using partial segmentation and Hausdorff distance. In: IEEE Conference on Soft Computing and Pattern Recognition, Tunis, pp. 284–289 (2014) 15. Al Tameemi, A., Zheng, L., Khalifa, M.: Off-line Arabic words classification using multi-set features. Inf. Technol. J. 10(9), 1754–1760 (2011) 16. Neto, E., Gomes, S., Filho, P., Albuquerque, V.: Brazilian vehicle identification using a new embedded plate recognition system. Measur. J. 70(1), 36–46 (2015) 17. Slimane, F., Ingold, R., Kanoun, S., Alimi, A., Hennebert, J.: A new Arabic printed text image database and evaluation protocols. In: IEEE International Conference on Document Analysis and Recognition, Spain, pp. 946–950 (2009) 18. Murugavel, A., Ramakrishnan, S.: An optimized extreme learning machine for epileptic seizure detection. Int. J. Comput. Sci. 41(4), 1–10 (2014) 19. Deb, K.: Multi-objective optimization using evolutionary algorithms: an introduction. J. Multi-objective Evol. Optim. Prod. Des. Manuf. 1(1), 3–34 (2011) 20. Biesiada, J., Duch, W., Kachel, A., Maczka, K.: Feature ranking methods based on information entropy with parson windows. In: International Conference on Research in Electro Technology and Applied Informatics, Indonesia, pp. 1–10 (2005) 21. Huang, B., Wang, H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybernet. 2(2), 107–122 (2011) 22. Huang, G., Zhu, Q., Siew, C.: Extreme learning machine: a new learning scheme of feed forward neural networks. In: IEEE International Joint Conference on Neural Networks, USA, pp. 985–990 (2004)
Sentiment Analysis of Hotel Reviews Using Machine Learning Techniques Sarah Anis(&), Sally Saad, and Mostafa Aref Faculty of Computer and Information Sciences, Ain-Shams University, Cairo, Egypt [email protected]
Abstract. Sentiment analysis is the task of identifying opinions expressed in any form of text. With the widespread usage of social media in our daily lives, social media websites became a vital and major source of data about user reviews in various fields. The domain of tourism extended activity online in the most recent decade. In this paper, an approach is introduced that automatically perform sentiment detection using Fuzzy C-means clustering algorithm, and classify hotel reviews provided by customers from one of the leading travel sites. Hotel reviews have been analyzed using various techniques like Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Logistic Regression, and Random Forest. An ensemble learning model was also proposed that combines the five classifiers, and results were compared. Keywords: Sentiment analysis Sentiment detection classification Machine learning Tourism
Sentiment
1 Introduction Nowadays, people generally prefer to communicate and socialize on the web. On travel sites, users express their opinions and write reviews of their hotel experience. Taking advantage of these huge volumes of data is of great value to tourism associations and organizations which aim to increase profitability and enhance or maintain customer satisfaction. Customer Reviews provided through the text are considered either subjective or objective. Sentences with subjective expressions include opinions, beliefs, personal feelings and views, while objective expressions include facts, evidences and measurable observations [1]. Most of the sentiment analysis approaches apply sentiment detection first which differentiates between objective and subjective reviews then determines the sentiment polarity of subjective reviews whether it is positive or negative. Fuzzy C-Means clustering algorithm was applied for sentiment detection to classify sentences to subjective or objective. Objective sentences are filtered out and only subjective meaningful sentences are retained. We used five different machine learning classifiers for sentiment classification namely Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Logistic Regression and Random Forest. Then we applied an ensemble learning model between the five classifiers to achieve better results. This paper is organized as follows. First, a summary of some related work is found in Sect. 2. Section 3 introduces the proposed © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 227–234, 2021. https://doi.org/10.1007/978-3-030-58669-0_21
228
S. Anis et al.
system and discusses the techniques used throughout this paper. Section 4 shows the final results and compares the performance of various techniques used. Finally, Sect. 5 presents the conclusion.
2 Related Work The main goal of sentiment analysis is to identify the polarity of text whether it is positive or negative. Recently, data available online related to tourism is increasing exponentially [2]. Sentiment analysis can effectively aid decision making in tourism, by improving the understanding of tourist experience [3]. Many machine learning methods are used for sentiment classification of hotel reviews. Machine learning methods are categorized into supervised, semi-supervised and unsupervised approaches. Neha Nandal et al. [4] have utilized the Support Vector Machine to classify Amazon customer reviews, where aspect terms are identified first for each review and finally give a polarity score for each review. Their outcomes showed that among the three kernels of the Support Vector machine, the Radial Basis Function (RBF) kernel provided the best result. Vishal S. et al. [5] have performed sentence-level sentiment analysis. They mainly focused on negation identification from online news articles. They used machine learning algorithms like Support Vector Machine and Naïve Bayes. They achieved accuracies of 96.46% and 94.16% for Support Vector Machine and Naïve Bayes respectively. Aurangzeb Khan et al. [6] proposed a sentence-level sentiment analysis approach. They extracted subjective sentences and labeled them positive or negative based on their word-level feature using the Naïve Bayes classifier. Then they applied the Support Vector machine for sentiment classification. Their proposed method on average achieves an accuracy of 83%. Vishal Kharde et al. [7] showed that machine learning methods, such as Support Vector Machine and Naïve Bayes have the highest accuracy in sentiment classification on twitter data. Xia et al. [8] used an ensemble framework for Sentiment Classification that combined various feature sets namely Part-of-speech information, Word-relations, and three machine learning techniques, which are Naïve Bayes, Maximum Entropy, and Support Vector Machine. They found that Naïve Bayes worked better on feature sets with a smaller size. By contrast, the Maximum Entropy and Support Vector machine were more effective for high-dimensional feature sets. They performed different ensemble approaches like the fixed combination, weighted combination, and Metaclassifier combination to improve the accuracy. Rehab M. Duwairi et al. [9] performed sentiment analysis on Arabic reviews using machine learning methods. Three classifiers were used which are Naïve Bayes, Support Vector Machine and K-Nearest Neighbor. Their experimental results showed that the Support Vector machine achieved the highest precision, while the K-Nearest Neighbor achieved the highest recall. Anjuman Prabhat et al. [10] Performed sentiment classification on twitter reviews. They have used Naïve Bayes and Logistic Regression for the classification of reviews. The results showed that Logistic regression achieved better accuracy and precision than the Naïve Bayes. Bhavitha et al. [11] compared the performance of the Random Forest and Support Vector Machine on sentiment analysis. Random Forest obtained better accuracy, but requires higher processing and training
Sentiment Analysis of Hotel Reviews
229
time. Gayatri Khanvilkar et al. [12] employed both Random Forest and Support Vector Machine on sentiment analysis for sentiment classification. They claim that their proposed system helps to improve Sentiment analysis for Product Recommendation, using Multi-class classification. One of the main tasks in sentiment analysis is sentiment detection. Sentiment detection or subjectivity detection is the task of identifying subjective text. In this work, Samir Rustamov et al. [13] used the Fuzzy Control system and Adaptive Neuro-Fuzzy Inference System for sentence-level subjectivity detection from movie reviews. They used informative features that improve the accuracy of the systems with no languagespecific requirements. Iti Chaturvedi et al. [14] used both Bayesian networks and fuzzy recurrent neural networks for subjectivity detection. Bayesian networks are used to capture dependencies in high-dimensional data, and fuzzy recurrent are then used to model temporal features. They claim that their proposed model can deal with standard subjectivity detection problems, and also proved its portability across languages.
3 Proposed System Initially, the dataset used contains 38,932 labeled hotel reviews from the Kaggle website. This data set provides reviews of a single hotel. Some pre-processing was done to clean and prepare data for sentiment analysis (Fig. 1). Pre-processing of text involves eliminating irrelevant content from text [15]. Online texts usually contain lots of noise, and uninformative parts like typos, bad grammar, URLs, Stop words, and Expressions. The main goal of data pre-processing is to reduce the noise in the text, which should help improve the performance of the classifier, and speed up the classification process. One of the steps of pre-processing is the removal of stop words. Stop words such as “the”, “a”, “an”, and “in” take valuable processing that is not needed so they should be removed. Removal of punctuations is another important step in the process of data cleaning. For example: “.”, “,”, “?” are important punctuations that should be retained while others need to be removed. Moreover, to avoid word sense disambiguation, an apostrophe lookup is required to convert any apostrophe into standard text. For example, “it’s a very nice place” should be transformed into “it is a very nice place”. Sometimes words are not in proper formats therefore standardizing words is important for data cleaning, for example: “it is a good place” should be “it is a good place”. For feature extraction word embedding was used; to represent words in each sentence of a review. Word embedding is a technique where words or expressions are mapped to vectors of real numbers [16]. Word2vec technique [17] was employed for feature extraction. The corpus used in training the word2vec model was built with all the words available in the dataset. Word2Vec is one of the most popular methods that uses shallow neural networks. Word2vec can extract deep semantic features between words [17]. It computes continuous vector representations of words. The computed word vectors retain a huge amount of syntactic and semantic regularities present in the language [18], and transform them into relation offsets in the resulting vector space. The number of features extracted for each word is 100.
230
S. Anis et al.
Fig. 1. The proposed system
Each review is composed of several sentences. After tokenizing sentences into words, the summation of all word vectors in the review is computed. Normalizing the features was a prior step to subjectivity detection. For subjectivity detection, the Fuzzy C-means clustering algorithm was used, to classify data into positive, negative, and objective. The Fuzzy C-means clustering algorithm is very similar to the K-means algorithm, but instead of using hard clustering where each data point can belong to only one cluster, Fuzzy C-means performs soft clustering in which each data point can belong to multiple clusters to a certain degree by means of a Membership Function. For example, data points that are close to the center of a cluster will have a high degree of membership for that cluster while data points that are far from the center of that cluster will have a low degree of membership. After clustering, objective sentences are discarded, and only sentences classified as positive or negative are retained for further classification. Five different machine learning techniques have been used to build our sentiment classification model, which are Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Logistic Regression, and Random Forest. An Ensemble learning model between the five classifiers was also applied to enhance the accuracy. 3.1
Machine Learning Methods
Naïve Bayes is the one of simplest models that is based on the Bayes Theorem. It works efficiently with large datasets as it is fast and accurate with a very low computational cost [19]. Naïve Bayes classifier assumes that all the features are
Sentiment Analysis of Hotel Reviews
231
independent [20], but this assumption is not always true in real-life situations. However, Naïve Bayes often works well in practice. Given a predictor X, P(C|X) is the posterior probability of class C and P(X|C) is the probability of X given the class. P(X) is the prior probability of X. PðCjXÞ ¼ PðXjCÞ PðCÞ=PðXÞ
ð1Þ
K-Nearest Neighbor (KNN) is also a very simple and easy-to-implement method [20]. The KNN algorithm assumes that similar features or data points exist close to each other. The most commonly used method for calculating the distance between two points is the standard Euclidean distance. However, there are other ways of calculating distance, and the choice depends on the problem. The input consists of the K closest training samples in the feature space. To choose the right value for K, we run the KNN algorithm several times with different values of K and choose the K that returns the highest accuracy. The Nearest Neighbors have been successful in many classification and regression problems but the main disadvantage of KNN that it becomes significantly slower when the volume of data increases. Support Vector Machine (SVM) is one of the most effective and famous classification machine learning methods. SVM has been developed from statistical learning theory [19]. Moreover, it is memory efficient as it uses a subset of training points in the prediction process. SVM works well with a clear margin of separation and with high dimensional data. On the other side, it works poorly with overlapping classes and is also sensitive to the type of kernel used. Logistic Regression is a statistical method that is used for binary classification. Logistic Regression measures the relationship between the test variable and our features, by estimating probabilities using its underlying Sigmoid function. The Sigmoid function is an S-shaped curve that can take any real-valued number and map it into a value between the range of 0 and 1, these values are then transformed into either 0 or 1 using a threshold classifier. It is well-known for its efficiency as it does not require too many computational resources, it is highly interpretable, it does not require input features to be scaled, it does not require any tuning, and it is easy to implement. A disadvantage of it is that it cannot solve non-linear problems since its decision surface is linear and it is also vulnerable to overfitting. Random Forest consists of a large number of individual decision trees that operate as an ensemble. Each tree in the Random Forest returns a class prediction and Random Forest makes decision based on the majority of votes. The ensemble learning technique of this algorithm reduces the overfitting problem in decision trees and improves the overall accuracy [21]. It is very stable and can handle missing values and non-linear parameters efficiently. Random Forest is also comparatively less impacted by noise. The disadvantages are the long training time and high complexity as it generates a lot of trees that requires more computational power and resources than the simple decision tree.
232
S. Anis et al.
4 Results and Analysis Dataset of hotel reviews from the Kaggle website, contains 26,521 positive reviews and 12,411 negative reviews. After subjectivity detection, the total number of subjective reviews is 37,827 reviews. Sentiment classification performance was evaluated using precision, accuracy, recall, and F-score measures. Precision is the ratio of true positives to the total number of predicted positives, while Recall is the ratio of true positives to the total number of true positives and false negatives. True Positives (TP) are actually positive data points that are correctly classified as positive by the model, and False Negatives (FN) are positive data points that are misclassified as negative by the model. F-score is the weighted average of Precision and Recall, taking both false positives and false negatives into account. According to those measures, a comparison between the applied machine learning methods was applied, and results are presented in Table 1. Precision ¼ TP=TP þ FP
ð2Þ
Recall ¼ TP=TP þ FP
ð3Þ
F-score ¼ 2 ðRecall PrecisionÞ=ðRecall þ PrecisionÞ
ð4Þ
Table 1. Evaluation of methods performance. Method Naïve Bayes K-Nearest Neighbor Support Vector Machine Logistic Regression Random Forest
Accuracy 0.718 0.838 0.863 0.859 0.846
Precision 0.727 0.836 0.875 0.867 0.848
Recall 0.957 0.956 0.938 0.942 0.95
Fig. 2. Accuracy of various techniques
F-score 0.655 0.827 0.859 0.854 0.838
Sentiment Analysis of Hotel Reviews
233
The results obtained show that Support Vector Machine and Logistic Regression techniques achieved better than the other techniques in terms of accuracy. For KNearest Neighbor, there is no ideal value for the initial number of neighbors K, it is selected after testing and evaluation. However, the best results were achieved when using 17 as K. Naïve Bayes classifier had the least accuracy compared to the other techniques (Fig. 2). One of the problems that could cause errors in the classification process is that a single review can have different or mixed opinions on different aspects, which makes the task of assigning an overall polarity to the review hard. To achieve better results, an ensemble model of all classifiers was taken into consideration. An ensemble learning model is commonly known to outperform the performance of single classifiers. Ensemble models provide higher consistency and reduce errors. The results of the ensemble classifier are shown in Table 2. However, The Ensemble model didn’t improve the accuracy in our case compared to the accuracy achieved by the Support Vector Machine.
Table 2. Evaluation of ensemble model. Method Accuracy Precision Recall F-score Ensemble learning model 0.862 0.872 0.94 0.858
5 Conclusion In this research, different techniques were investigated to classify hotel reviews. The algorithms applied were Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Random Forest, and Logistic Regression. The best results were obtained by the Support Vector Machine. On the other hand, the Naïve Bayes classifier had the least achieved accuracy. The Support Vector Machine achieved 86.3% accuracy, logistic Regression achieved 85.9%, Random Forest classifier accuracy was 84.6% while K-Nearest Neighbor and Naïve Bayes classifier accuracies were 83.8% and 71.8% respectively. An ensemble learning model was also created, combining those five classifiers to increase the accuracy. But after testing the model, the ensemble learning model achieved 86.2% accuracy, which means that the accuracy did not improve, and the Support Vector Machine is more efficient in this case.
References 1. Feldman, R.: Techniques and applications for sentiment analysis. Commun. ACM 56(4), 82–89 (2013) 2. Alaei, A., Becken, S., Stantic, B.: Sentiment analysis in tourism: capitalising on big data. J. Travel Res. 58(9), 175–191 (2017) 3. Valdivia, A., Luzón, M.V., Herrera, F.: Sentiment analysis in trip advisor. IEEE Intell. Syst. 32(4), 72–77 (2017)
234
S. Anis et al.
4. Nandal, N., Tanwar, R., Pruthi, J.: Machine learning based aspect level sentiment analysis for Amazon products. Spat. Inf. Res. 1–7 (2020) 5. Shirsat, V.S., Jagdale, R.S., Deshmukh, S.N.: Sentence level sentiment identification and calculation from news articles using machine learning techniques. In: Iyer, B., Nalbalwar, S., Pathak, Nagendra Prasad (eds.) Computing, Communication and Signal Processing. Advances in Intelligent Systems and Computing, vol. 810, pp. 371–376. Springer, Singapore (2019) 6. Khan, A., Baharudin, B.B., Khairullah, K.: Sentence based sentiment classification from online customer reviews. In: 8th International Conference on Frontiers of Information Technology, Pakistan, Article no. 25, pp. 1–6 (2010) 7. Kharde, V.A., Sonawane, S.S.: Sentiment analysis of twitter data: a survey of techniques. Int. J. Comput. Appl. 139(11), 0975–8887 (2016) 8. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. Int. J. 181(6), 1138–1152 (2011) 9. Duwairi, R.M., Qarqaz, I.: Arabic sentiment analysis using supervised classification. In: 2014 International Conference on Future Internet of Things and Cloud, Barcelona. pp. 579– 583. IEEE (2014) 10. Prabhat, A., Khullar, V.: Sentiment classification on big data using Naïve Bayes and logistic regression. In: 2017 International Conference on Computer Communication and Informatics (ICCCI 2017), Coimbatore, India, pp. 1–5. IEEE (2017) 11. Bhavitha, B.K., Rodrigues, A.P., Niranjan, N.C.: Comparative study of machine learning techniques in sentimental analysis. In: 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 216–221. IEEE (2017) 12. Khanvilkar, G., Vora, D.: Sentiment analysis for product recommendation using random forest. Int. J. Eng. Technol. 7(33), 87–89 (2018) 13. Rustamov, S., Clements, M.A.: Sentence-level subjectivity detection using neuro-fuzzy models. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, Georgia, pp. 108–114 (2013) 14. Chaturvedi, I., Ragusa, E., Gastaldo, P., Zunino, R., Cambria, E.: Bayesian network based extreme learning machine for subjectivity detection. J. Franklin Inst. 335(4), 1780–1797 (2017) 15. Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013) 16. Ray, P., Chakrabarti, A.: A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis. Appl. Comput. Inform. (2019). https://doi.org/10. 1016/j.aci.2019.02.002 17. Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst. Appl. 42(4), 1857–1863 (2015) 18. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Georgia, pp. 746–751 (2013) 19. Sahayak, V., Shete, V., Pathan, A.: Sentiment analysis of Twitter data. Int. J. Innovative Res. Adv. Eng. (IJIRAE) 2(1), 30–38 (2015) 20. Baid, P., Gupta, A., Chaplot, N.: Sentiment analysis of movie reviews using machine learning techniques. Int. J. Comput. Appl. 179(7), 45–49 (2017) 21. Dimitriadis, S.I., Liparas, D.: How random is the random forest? RF algorithm on the service of structural imaging biomarkers for AD: from ADNI database. Neural Regeneration Res. 13 (6), 962–970 (2018)
Blockchain and Cyber Physical System
Transparent Blockchain-Based Voting System: Guide to Massive Deployments Aicha Fatrah1(B) , Said El Kafhali1 , Khaled Salah2 , and Abdelkrim Haqiq1 1
Computer, Networks, Mobility and Modeling Laboratory: IR2M, Faculty of Sciences and Techniques, Hassan First University of Settat, 26000 Settat, Morocco {a.fatrah,said.elkafhali,abdelkrim.haqiq}@uhp.ac.ma 2 Electrical and Computer Engineering Department, Khalifa University of Science and Technology, Abu Dhabi, UAE [email protected]
Abstract. With the development of society and its people democratic consciousness, voting as a crucial canal of democracy has to satisfy high expectations of modern society, new technologies and techniques must be deployed for better voting experience, and to replace traditional cost inefficient and laborious traditional paper voting. Blockchain technology is currently disrupting every industry where security and data integrity are prioritized. In this paper, we leveraged this technology to propose a design and implementation of a blockchain-based electronic voting system for large scale election. The novelty in this paper when compared to other state of the art blockchain-based voting systems is that it respects voter’s privacy with a full transparency for auditing and user-friendly terminals, which will boost the confidence of people in the voting system and therefor increase the number of participants in the election. Keywords: Blockchain Zero-knowledge proof
1
· Smart contracts · Electronic voting ·
Introduction
The democratic regime relies on elections to enable people to formally participate in the decision making. Currently, there exist two major means of voting: paperbased voting and electronic voting. Paper voting system comes with some pros, mainly the ease of use for even illiterate people and the secrecy, since the ballot is not bond to the voter in anyway. But when examining the workflow of this system a lot of issues can be detected, including integrity issues; since the system is run by people, its integrity rely in the trustworthiness of these people. Thus the system is vulnerable to corruption and human errors. There is also accessibility issues since the location of the polling stations can be a struggle for people in remote and rural areas or people with disabilities or even for citizens who might not be in the country. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 237–246, 2021. https://doi.org/10.1007/978-3-030-58669-0_22
238
A. Fatrah et al.
Electronic voting is an online voting system that uses cryptography techniques to ensure anonymity and security. Voters can use their electronic devices to cast their vote. Election results are automatically counted by the system. The entire voting workflow is interconnected, so compared to the traditional paper based system it is more convenient for voters, efficient in terms of organization and fast. But as promising as it seems, electronic voting also has the following disadvantages: – Privacy issues: the voter private information or even the vote choice can be leaked. – Security issues: attackers can eavesdrop on transmitted data or even take control over the system and temper with the result. – Integrity issues: the centralized system is a black box, voters cannot verify whether their vote was counted or not. The centralization also puts the system in danger to Distributed Denial-of Service attacks. In the domain of electronic voting, Blockchain can be the missing puzzle to better e-voting. It can be used as the accounting transparent, auditable and unaltered database, which will effectively reduce the risks and enhance the performance of overall voting system. This paper is an extension of our initial proof of concept version [1] in which we presented a proof of concept to our blockchain-based electronic system, we made changes to the previous protocol, we removed the need of a token as a requirement to cast the vote which reduced the complexity and cost of the overall system, we also changed the use of the zero knowledge proof from proofing the validity of the vote to proving that voters are eligible. In this system the validity of the vote is integrated the voting contract that only accepts valid choices from the list of predetermined candidates. The rest of the paper is organized as follows: Sect. 2 presents an overview of electronic voting state of the art for both centralized and decentralized solutions. The key concepts of our blockchain-based voting systems are explained in Sect. 3. Section 4 illustrates the general design of our scheme. Section 5 represents the implementation of our proposed system and the technological stack used. The evaluation of our system is presented in Sect. 6. Finally, Sect. 7 includes concluding remarks and future works.
2
Electronic Voting
Electronic voting evolved and started to replace traditional voting methods thanks to both the development of Internet technology and the progress made in modern cryptography. There exit a list of important requirements for an electronic voting system on a large scale to be effective: – Integrity: only eligible voters can participate, and each eligible voter can only vote once, and votes cannot be altered or deleted form the system.
Transparent Blockchain-Based Voting System
239
– Accessibility and availability: voters can remotely access the system to participate regardless their physical location at any time during the entire electoral period. – Privacy: the voter choice should always remain anonymous during the election and post-election period. – Transparency: the entire system should be auditable by the public, and voters can verify if their votes were casted and tallied. – Security: the system should be immune to typical cyber-attacks. – Affordability: the system has to be affordable for implement and maintain by the government, it should also be less expensive. – Scalability: the system can support large scale election. 2.1
Centralized Electronic Voting
General electronic voting systems are all based on cryptosystems developed by cryptographers. David Shaum [2] proposed the first electronic voting scheme based on blind signature in 1981, the purpose was to hide voter identity by using public key cryptography and disconnect the voter to its corresponding ballot. Later on more cryptographic protocols were proposed to improve electronic voting [3]. There exist Eletronic voting schemes based on blind signature and ring signature. Also electric voting schemes based on homomorphic encryption. We can also mention Electronic voting schemes based on hybrid network. These schemes depend heavily on a trusted third party to decrypt and count votes which puts the voters’ privacy in risk of exposure. Also these systems are not auditable and puts full trust in the authorities managing the election. 2.2
Blockchain-Based Electronic Voting
The advent of Blockchain technology is expected to disrupt both modern electronic voting and traditional paper voting systems. Blockchain is the backbone of electronic cryptocurrencies such as Bitcoin and Ether, it is a distributed, immutable and transparent ledger on peer-to-peer network. Its consensus algorithms like proof of work in Bitcoin solve the inconsistency problem in distributed systems. That’s why the application of Blockchain in e-voting has gained attention of researchers and even startups like Agora and Follow my vote. The first proposition of Zhao and Chain [4] in 2015 was based on the Bitcoin blockchain with a penalty and reward system for voters. Other schemes were later introduced like Lee, James, Ejeta and Kim [5] in 2016 and Cruz et al. [6] in 2017, but those these two schemes depends heavily on a trusted third party to manage the system. McCorry et al. [7] used Ethereum blockchain, which added more business logic thanks to Ethereum blockchain, the voters’ privacy is done with the use of zero knowledge proof, but unfortunately the voting scheme is only binary which means voters can only vote with a yes or no and does not support multiple candidates. In the same year researchers started experimenting with Zcash blockchain because it offers anonymity of transactions, it is a blockchain based of zero knowledge proof and can be used to protect voters’ privacy. P. Tarasov
240
A. Fatrah et al.
and H. Tewari [8] proposed the first Zcash based electronic voting, although the system provide anonymity is it still lacking the logic capability offered but smart contracts in Ethereum.
3 3.1
Key Concepts of Blockchain-Based Voting System Blockchain
In 2008, the Bitcoin whitepaper was first published “Bitcoin: A Peer-to-Peer Electronic Cash System” [9]. The Bitcoin Whitepaper or some might call it, the Satoshi whitepaper described a decentralized peer-to-peer network to solve the double-spending problem of digital currencies without relying on a central authority, transactions are verified by nodes in the networks called miners, who continuously listen to new transactions coming into the network and they race to solve a hard mathematical problem generated by the system, the work put to solve the problem requires immense computing power, the first to solve the problem get the privilege to add new block into the blockchain and get a bitcoin reward for the proof-of-work. 3.2
Ethereum Blockchain and Smart Contracts
Ethereum is a public blockchain-based platform, it allows to build decentralized applications [10] and run with a modified version of Satoshi Nakamoto consensus via transaction-based state transitions. It supports smart contract functionality which adds a business logic layer, a smart contract is first written in a programming language such as Solidity and then compiled via Ethereum Virtual Machine into a bytecode that can be deployed into Ethereum network. Ethereum has two types of accounts, the first type is the External Owned Account (EOA) controlled by the user, the second type is the contract account, which means a contract run by its code, both types of accounts are represented by a 20-byte (160-bit) address, and both can store Ether (Ethereum generated token/cryptocurrency), transactions between different accounts have a gas cost or fee to encourage miners to include transactions or bytecode of a smart contract into Ethereum Blockchain. Gas is a metric aiming to standardize the fees of operations inside the network. There is three types of transactions inside the Ethereum Blockchain; first, the fund transfer between externally owned accounts, second, the deployment of a contract into the network and third the execution of an already deployed one. 3.3
Zero-Knowledge Proof
The Zero Knowledge Proof (ZKP) was first introduced by Goldwasser, Micali and Rackoff back in 1989 [11]. It uses cryptographic primitives that allow proving that a statement is true about some secret data without actually revealing any other information about the secret beyond that statement. There exist two types of ZKP; interactive and non-interactive. Interactive ZKP (IZKP) requires the
Transparent Blockchain-Based Voting System
241
prover and the verifier to continually interact to prove a statement, while the noninteractive ZKP (NIZKP) allow verifying a statement without having the two parties online, which makes the NIZKP faster and more efficient. In our voting system we will need the ZKP proof that a voter is permitted to vote without revealing its identity or the choice they made. Zero-Knowledge Set Membership Proof (ZKSMP) consider the problem of proving that a committed value belongs to some discrete set. More specifically and for our voting system, it is the problem of proving that voter is within a list of eligible voters so the voter can cast its vote anonymously. Zero knowledge succinct non interactive arguments of knowledge or zkSnarks [12] allow to verify the correctness of a computation without executing it, the content inside of the computation does not need to be known, only the ability to verify that the computation was executed correctly. Succinct in zkSnarks suggests that the size of the proofs are very small even for complex computations, which is ideal to make the blockchain network more efficient. The use of zkSnarks has great potential when combined with smart contracts; it will increase privacy, efficiency and scalability, any information can be verifies without having to reveal it, with only one-way short interaction between prover and verifier and it can also solve major scalability facing blockchains by having complex calculations offchain. In this paper we implemented zkSnarks for our zero knowledge proof verification since it is faster and light using ZoKrates [13] Toolbox.
4 4.1
System Design and Overview System Design
The system architecture is as presented in Fig. 1 has both on-chain and offchain components, the main role of off-chain component is to reduce the cost of both storage and calculation cost inside the Ethereum virtual machine. The user than can be voters or election administration interact with e-voting distributed application via a web interface developed in reactjs, and via metamask (web3.js) to interact with the blockchain. The database is used to store the list of eligible voters and candidates and is used to generate the smart contracts. 4.2
Workflow and Roles
The main three phases in our system Fig. 2 are: – Pre-voting phase: in this phase the voter needs to provide proof of identity to the election admin the KYC, we suppose that this phase is ensured via some KYC application and it is secure enough to protect the voter personal data. The administration has to verify the voters’ identity to see if they are eligible to vote. If the voter proof of identity is valid the admin provide the voter with the secret phrase, this phrase has to be stored and secured because it is going to be used as a proof of knowledge to allow voters to cast their vote. When the registration phase is over, the admin collect all the hash values
242
A. Fatrah et al.
Fig. 1. System general architecture.
of the eligible voters to create the arithmetic circuit with Zorkates for the verify.sol smart contract. The admin has to create the list of candidates for the election.sol contract. – Voting phase: in this phase the voter has to provide the proof and when the proof is validated via verify.sol, the vote can be casted but the voter has to change their Ethereum address at this stage to hide their identity, this is the only phase in which the voter has to use different address so that even the admin cannot know their vote choice. After casting the vote, the voters can get the id of the transaction that will allow them to audit their vote and make sure it was added to the block. Testnet Rinkeby allow the exploration of the blocks via their interface [14]. – Post-voting phase: the election.sol automatically tally the votes and return the election results.
Fig. 2. System workflow and roles.
Transparent Blockchain-Based Voting System
5 5.1
243
Implementation Stuck of Technologies
To implement a proof of concept of our system we used the following stuck of technologies and tools such as Truffle [15] to handle smart contracts and test them locally. Remix [17] the online IDE for solidity programming language. Metamask [16] which is a wallet gateway to communicate with the blockchain via web3.js. Testnet Rinkeby [14] is used to test contracts on Ethereum alike environment. Ganache [15] a local Ethereum to deploy contracts.It comes with free test accounts with fake ethers. Zokrates [13] an Ethereum tool kit for zkSNARKS. 5.2
Generating the Proof
There exit three parts in zkSNARK; G generator, P prover and V verifier. Third party has to generate G by giving a program c and random number r as input parameters. The output will be the proving key pk and the verifier key vk. (pk, vk) = G(c, r)
(1)
Then the generator will share the proving key and verifier key with the prover and verifier. Then the prover will generate the proof by giving a publicly available input x, a witness w and the proving key as the input and is expressed as prf = P (pk, x, w)
(2)
The proof will be sent to the verifier who can verify it using vk, x and the prf to get the output as true if proof is valid. V (vk, x, w) → true/f alse
(3)
So in our case, the prover will be voter and the verifier will be the smart contract. The voter will proof his knowledge for the secret phrase of one of the voters without any information about the voter to the smart contract. To generate the zero knowledge set membership proof using zkSNARKS we first need to generate the secret phrase, we used randomstring js library via command line to generate an example of 64 alphabetic characters. Then we generate the Sha256 from the random phrase by first converting the phrase into a binary of 512 bit and divide it into four parts each has 128 bit so that it can be an input to Zokrates with maximum input length of 254. Each part is converted into a 10 base decimal. The output is a big integer representation of the secret phrase. The sha256packed zokrates function imported from 512bitPacked.zok takes an array of four fields elements as inputs and returns an array of two field elements, each 128 bits. The voters need to proof that they know the secret phrase without for the hash without revealing the secret phrase. The arithmetic circuit is created in Zokrates high level language, the circuit is created using the hash values as an alternative to OR gate logic that does not exit in Zokrates. For more voters the same circuit will be used by parsing an array of voters’ hashes.
244
5.3
A. Fatrah et al.
Smart Contracts
The candidate.sol contains candidate details, it has addCandidate, get NumberOfCandidates and get Candidate funtions Fig. 3.
Fig. 3. Candidate.sol
Fig. 4. Election.sol
Transparent Blockchain-Based Voting System
245
Election contract is responsible on voting action, validate candidate and finally tallying total votes Fig. 4.
6
Evaluation
6.1
Gas Consumption
Smart contract uses the concept of gas which is the fuel of the system, the sender of a transaction need to pay a fee in Wei (i.e. Eth) for the computational work executed by the smart contract invoked. The more complex the computation executed by the smart contract, the more gas will be needed. Setting a gas limit is a way to protect the system from code with infinite loops. The product of gasPrice and gas represents the maximum amount of Wei needed for the execution of a transaction. And it is used by the miners to priorities transactions for inclusion into the Blockchain. We tried to have off-chain component in order to reduce the gas consumption. Table 1. Gas Consumption by voters Transaction name Gas
Transaction name
Gas
Cost (Ether)
Proof verification 1625259 0,001625259
Add candidate list
39110
0,00003911
Vote
Sha256 contract deploy 1903793 0,001903793
28108
Cost (Ether)
Table 2. Gas Consumption by Admin
0,000028108 T1 : 0,001653367
T2 : 0,001942903
Table 1 and Table 2 show an estimation of the gas consumption of the main transactions inside our system. The transactions that have highest consumption are the proof verification transaction executed by voters and Sha256 contract deployment executed by the Admin. The transactions made by the Admin are executed in the pre-voting phase only once, while the transaction made by the voters are executed n times the number of voters. To have the overall cost of the election for a number of voters n we can calculate (4) n × T1 + T2 The total cost is still relatively expensive for a large-scale election but the code provided is just a proof of concept, and it has to be more optimized to be used in real life election.
7
Conclusion
Despite the fact that the Blockchain technology success became conspicuous, a lot of people still do not fully understand it, therefor it limits its ability to emerge and be exploit in different fields other than cryptocurrencies. Another problem facing the Blockchain technology is the binding of the digital and physical identity of its users, the technology cannot manage identities outside the Blockchain
246
A. Fatrah et al.
which calls for a third party to do the work of management. In this paper we presented a design and implementation of a blockchain-based electronic voting system, the main aim is to bring more transparency into the electoral system, protect voters’ privacy and allow anyone to audit the system. Hence, it will increase the number and confidence of voters.
References 1. Fatrah, A., El Kafhali, S., Haqiq, A. and Salah, K.: Proof of concept blockchainbased voting system. In: Proceedings of the 4th International Conference on Big Data and Internet of Things BDIoT 2019, Article No. 31, pp. 1–5, October 2019. https://doi.org/10.1145/3372938.3372969 2. Chaum, D.: Untraceable electronic mail, return addresses and digital pseudonyms. Commun. ACM 24(2), 84–88 (1981) 3. Xiao S., Wang X.A., Wang W., Wang H: Survey on blockchain-based electronic voting. In: Barolli L., Nishino H., Miwa H. (eds.) Advances in Intelligent Networking and Collaborative Systems. INCoS. Advances in Intelligent Systems and Computing, vol. 1035. Springer, Cham (2019) 4. Zhao, Z., Chan, T.H.: How to vote privately using bitcoin. Springer (2015) 5. Lee, K., James, J.I., Ejeta, T.G., Kim, H.J.: Electronic voting service using blockchain. J. Digit. Forensics Secur. Law 11, article 8 (2016) 6. Jason, P.C., Yuichi, K.: E-voting system based on the bitcoin protocol and blind signatures. Trans. Math. Model. Appl. 10, 14–22 (2017) 7. McCorry, P., Shahandashti, S.F., Hao, F.: A smart contract for board room voting with maximum voter privacy. In: International Conference on Financial Cryptography and Data Security, pp. 357–375. Springer (2017) 8. Tarasov, P., Tewari, H.: Internet Voting Using Zcash. IACR Cryptology ePrint Arch. 2017, 585 (2017) 9. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008). https:// bitcoin.org/bitcoin.pdf (2008) 10. Buterin, V., et al.: Ethereum white paper. GitHub Repository 1(2013), 22–23 (2013). https://github.com/ethereum/wiki/wiki/White-Paper 11. Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive proof systems. SIAM J. Comput. 18(1), 186–208 (1989) 12. Reitwiessner, C.: zkSNARKS in a nutshell. 5 December 2016. https://chriseth. github.io/notes/articles/zksnarks/zksnarks.pdf 13. Zokrates project : Toolbox for zkSNARKs on Ethereum (2019). https://github. com/Zokrates/ZoKrates 14. Rinkeby Homepage (2020). https://www.rinkeby.io 15. Truffle Suite Homepage (2020). https://www.trufflesuite.com 16. Metamask Homepage (2020). https://metamask.io/ 17. Ethereum Remix IDE (2020). https://remix.ethereum.org/
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks in MANET Marwa M. Eid1(&) and Noha A. Hikal2 1
2
ECE-Department, Delta Higher Institute for Engineering & Technology, Mansoura, Egypt [email protected] Information Technology Department, Faculty of CIS, Mansoura University, Mansoura, Egypt [email protected]
Abstract. MANETs are still in demand for further developments in terms of security and privacy. However, lack of infrastructure, dynamic topology, and limited resources of MANETs poses an extra overhead in terms of attack detection. Recently, applying modified versions of LEACH routing protocol to MANET has proved a great routing enhancement in preserving nodes vitality, load balancing, and reducing data loss. This paper introduces a newly developed active and passive blackhole attack detection technique in MANET. The proposed technique based on weighing a group of selected node’s features using AdaBoost-SVM on AOMDV-LEACH clustering technique is considered a stable and strong classifier which can strengthen the weights of major features while suppressing the weight of the others. The proposed technique is examined and tested on the detection accuracy, routing overhead. Results show up to 97% detection accuracy in superior execution time for different mobility conditions. Keywords: AdaBoost algorithm SVM
AOMDV Black-hole attack MANET
1 Introduction Recently, mobile ad hoc network (MANET) has gained a great necessity as a mobile network for extracting and exchanging critical information. It has great application areas starting from mobile sensing applications, ending, not endless, by military communications in battlefields. Since node battery’s drains could stop its function and causes a link break. Recent approaches deploy low energy adaptive clustering hierarchy (LEACH) as an efficient energy consumption MANET routing protocol [1]. Researchers have proved that applying LEACH into MANET environment provides network long lifetime and high reliability of link in terms of mobility. However, unlike wired networks, the node’s behavior is unpredictable and unknown during the data routing operation, which makes MANET more vulnerable to different attacks. Hence, MANET security is still an open issue, many types of research are proposing different defenses against many kinds of attacks such as [2]; snooping attack, poisoning attack, denial of service (DoS) and distributed DoS (DDoS), writing table overflow, and black© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 247–260, 2021. https://doi.org/10.1007/978-3-030-58669-0_23
248
M. M. Eid and N. A. Hikal
hole attack. In general, these attacks are categorized under four types of attacks [3], which are; sinking, spoofing, fabrication, and flushing. The most malicious node behavior is sinking, when one or more nodes don’t cooperate in routing operation in the network, moreover, they drop the packets. Sinking behavior undergoes DoS attack [3]. An intelligent routing algorithm must be able to decide the best route from source to destination and isolate the sinker node upon analyzing these features. The absence of an administrator and the capability of every node to act as a router increase the vulnerability of MANET. The contribution of this paper lies in the ability to exploit the data aggregated by employing LEACH with reactive on-demand multipath Ad hoc Ondemand Multipath Distance Vector (AOMDV) [4] routing protocol to MANET in detecting active and passive black-hole attacks in MANET. This protocol works as a hybrid routing technique that integrates the advantages of both reactive and proactive routing. It performs a single route discovery process and caches multiple paths; a new route discovery process occurs when all these paths are broken. The route discovery process is done in a proactive manner while path selection is done in a reactive manner. In this paper, a semi-parametric machine learning framework is proposed for weighting the aggregated data from cluster members to detect the black-hole attack, and hence exclude the attacker nodes and delete paths through them. Moreover, machine learning based on semi-parametric analysis provides the advantage of machine learning high detection accuracy while reducing the computational complexity due to using semiparametric analysis. The proposed method performs a conditional proactive routing phase during the lifetime of the MANET to gather neighbor nodes information, and then apply a semi-parametric machine learning to detect the malicious node through optimized time and considerable computational complexity. Finally, these results can be used in reactive routing fashion to guide other nodes. The novelty lies in features weight adjusting in an iterative way that helps greatly in reducing the computation complexity and time of detection, which helps greatly in preserving nodes vitality during mobility. In addition, the proposed framework has great flexibility in adjusting threshold value to distinguish between malicious and benign sinker nodes. The remainder of this paper is organized as follows. The literature survey is presented in Sect. 2. In Sect. 3, the proposed technique is presented and explained in detail. Section 4 introduces the simulation results and discussions. Finally, the conclusions are introduced in the last section.
2 Related Works One of the most significant challenges is securing the links in designing the network, especially through insecure mediums. However, due to the limited capacity of a node, previous studies have reported that the conventional security routine with large computations and overhead of communications is improper in MANETs [5]. Moreover, it has been observed that adding extra new metrics and performing minor changes in the structure and operation of routing protocols could increase the performance and security of real-time applications. Ektefa [6] analyzed the classification tree and typical support vector machine (SVM) methods to detect intrusions through a set of attributes like information gain for each, entropy function, etc. Applying this algorithm gives
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks
249
better accuracy to the detection process, the required computations, the workload of a network administrator, and overhead of communications to the system are still open problems. The authors in [7] provided a blackhole detection mechanism that develops a dynamic threshold to detect the severe changes in the normal behavior of the network transactions. Additionally, in [8] another solution based on detecting the blackhole attack by always considering the first route reply is the reply from the malicious node and deleted this transaction. Although this solution decreases the data loss and increases the throughput, also it cannot distinguish between malicious and benign packet sinking. Yazhini and Devipriya in [9] proposed the first modified AODV routing protocol supported by a support SVM model to detect blackhole attacks. A density curve with respect to time was drawn to monitor the sent packets from a source to destination nodes with and without blackhole from a simple network composed of seven nodes. It was observed that the low peaks in the density curve indicate to one malicious node detection. However, further data collection is required to determine the black hole node and generate complete behavior proofs that contain information from both data traffic and forwarding paths with more evidence to get a higher accurate prediction result. Ardjani et al. [10] proposed an enhanced SVM by particle swarm (PSO-SVM) to optimize the accuracy of SVM. Hence, Kaur and Gupta in [11] adopted the idea of integrating the minimum and maximum variants of Ant colony optimization to SVM (i.e. ACO-SVM) based on AODV routing protocol to detect only passive blackhole attacks in MANET. Additionally, the authors in [12] applied the idea of Genetic algorithm to identify packet dropping by passive black holes in an intrusion detection system. The authors in [13] proposed machine learning techniques for distinguishing normal and attacked behavior of a network. Furthermore, the Author in [14] attempted to prevent the black hole attack in MANET by applying a little modification in AODV protocol based on introducing the reliability factor-based approach to detect fake RREP. The value of this factor is usually checked to detect the attacker node; this technique has introduced a better result. Although many researchers focused on detecting passive black holes, there are limited existing solutions to detect active black-hole attacks have been proposed. Thus, for a networking problem, a more effective machine learning model can be introduced by extra representative out of bias data.
3 The Proposed Technique [Methods/Experimental] Black hole attacks are generally classified into two types; passive and active black hole. Figure 1 shows an illustration of the passive and active black hole attacks. The proposed technique is based on integrating the LEACH routing protocol based on reactive on-demand multipath AOMDV [15] with machine learning algorithms for enhancing the performance of detecting the two types of black hole attacks. The detection process is based on testing the intelligent group of QoS significant features. These features are proved to be efficient indicators for active malicious behavior, compared with other features which could be noisy data that distracts the accuracy of detection decisions. Moreover, deploying the AdaBoost weight adaption algorithm for adapting each feature weight during the learning process provides an efficient monitoring action for
250
M. M. Eid and N. A. Hikal
detecting an active black hole. To collect these features, a group of cluster head nodes is selected periodically to work collaboratively in data aggregation and to build exchangeable routing reply tables of trustworthy nodes.
Fig. 1. Illustration of black hole attacks; (a) Passive blackhole (b) Active blackhole.
These clusters heads are selected periodically during the whole MANET lifetime based on LEACH-AOMDV dynamic cluster head (CH) selection technique [16]. At each round, LEACH-AOMDV applies the random method to distribute the energy load among nodes. CH is chosen upon having a higher energy level and a threshold probability value. AOMDV applies the basic AODV route discovery technique, with an energy economical perspective. It registers multipath from source to destination, one of them is the primary path and the others are alternatives ones. The primary path, as well as the alternative ones, is used to transmit the packets, thus to increase the network utilization. Multipath selection is done based on a pre-advertised hop counts, the protocol rejects all the replies with hop counts equal to or larger than the advertised one. Multipath is the routs with lower hop counts. At each round during the whole MANET lifetime, clustering is done periodically through two stages; cluster set-up stage and steady-state stage, respectively. During the set-up state the CH is selected while in the steady-state stage, data is sensed. The steady-state stage lifetime is much longer than the set-up stage to reduce energy consumption. CH is selected every round based on a mathematical calculation of threshold formula functioned in the total number of nodes within the cluster, the round number, and node’s CH probability (p). To equalize the nodes’ energy consumption, the workload of CH’s is distributed among all nodes during the whole lifetime of MANET by rotating their roles, i.e.; CH of the first round can’t be repeated in the next 1/p rounds. The proposed technique works through two concatenating phases in each round; i) Data aggregation; that simulates the process of the reactive protocol to collect
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks
251
neighbors’ nodes features, ii) Malicious Nodes identification; that analyzes the collected data based on machine learning to build trustworthy routing tables, as done in proactive routing protocols. Moreover, the adaptive boosting (AdaBoost) algorithm had been strongly recommended to generate stronger classifiers from a set of weak classifiers. The AdaBoost weight update algorithm [17] is applied to adjust the weights of each feature, and iterative learning is used to reduce the computation complexity. AdaBoost algorithm plays an important role in strengthening or suppressing the weights of input features that enhances the performance of the SVM learner. SVM is particularly chosen in the detection phase as a nonlinear machine learning algorithm characterized by being a stable, strong classifier that provides a high detection accuracy. Figure 2 shows a flowchart of the principal processes of the proposed method. This method is repeated periodically during the whole MANET lifetime. 3.1
Phase 1: Set up Phase
It is the first phase in which the entire MANET spatial area is divided into number of clusters and a corresponding cluster heads (CH’s) are selected for each cluster. The residual energy plays an important role in selecting CH in each round. Each node has the same probability (p) to be a CH in the first round, as the number of rounds increase the probability of each node to be selected again as a CH decreases until all nodes are dead [18]. 3.2
Phase 2: Data Aggregation
This phase starts occasionally at CH of each suspicious cluster based on the computed jitter value for this cluster. CM’s are scattered in the deployment field and the simulation is performed by taking the following parameters: Packet Delivery Ratio, Throughput, Average number of hops, Normalized Routing load, Number of Packet loss, Average Data Delivery, Dropped packets, and Energy Consumption. Step 1: The CH node nk, (where k = 1, 2, K; K = total number of suspicious clusters) sends RREQ packets to its neighbors within the cluster, i.e., CM’s. Step 2: When the neighbor node receives RREQ packets, it responds with a RREP packet if it has the destination, otherwise, it multicasts the RREQ to its neighbors to continue the route-finding operation. Step 3: Conventional RREQ, RREP packets contain information about: hop count, modified neighbors with a modified hop count. Sinker node always advertises the smaller number of hops to the flag itself as the shortest path. Step 4: At each CH node nk, a route reply table (RREPT) is constructed containing essential features associated with its neighbor node CM, ni (i = 1, 2,…I; I total number of CM into each cluster). These features are: i. Destination node identification number D-ID. ii. The initiation time (IT); the time at which the RREQ was sent. iii. Waiting time (WT); the time at which the RREP was received, the difference between IT and WT represents an end-to-end delay. iv. Next hop node identification number (NH).
252
M. M. Eid and N. A. Hikal
v. The total number of hops count to reach the destination (HC). vi. Packet delivery rate (PDR) history of each neighbor node. vii. The residual energy of each node within the cluster
Fig. 2. Flowchart of the proposed technique
3.3
Phase 3: Malicious Node Identification
Once the RREPT is completed, a data analysis phase is started. This phase aims to analyze the selected RREP features mentioned above in order to give a decision regarding the benign versus malicious behavior of nodes.
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks
253
Step 1: For each feature Fj, (j = 1, 2, 3..,..J; total number of features), there are N observations recorded over an interval i (i = 1,…T) under normal conditions and based on past records, i.e.; previous RREPT, for each feature, the mean value Cj and the standard deviation dj values are mathematically computed as as follows [19]: 1 XN F i¼1 ji N sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 PN i¼1 Fji cFj dF j ¼ N cFj ¼
ð1Þ
ð2Þ
Step 2: For each new entry, compute a normalized feature value and the Euclidean distance dj, respectively as: Fij ¼
j ; cF j dj F
Fij min Fj max Fj min Fj
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN ji cFj 2 F ¼ i¼1
ð3Þ
ð4Þ
By the above equations, we can get the base of parametric sampled data used in machine learners. Step 3: Apply SVM classifier based on the collected sampled pairs (Fij ; Fij ). The learners are working in a semi-parametric way since the AdaBoost module for weights adjusting is employed. The initial feature weights are of the same value as: W1 ðtÞ ¼
1 J
ð5Þ
The new weights Wt+1 are iteratively updated based on the calculated SVM classifier’s error value e of the sampled pairs (Fij, Fij ) and the previous weights sets Wt as [19]: Wt þ 1
Wt exp /t Fij ht Fij ¼ Zt
ð6Þ
1 1 et /t ¼ log 2 et
ð7Þ
Where:
And ht ¼ arg minhj ej ¼
XN i¼1
W t ði Þ
ð8Þ
254
M. M. Eid and N. A. Hikal
Here N denotes the total number of weights. The weights iteration update process continues until solving the AdaBoost-SVM optimization problem [20]: X 1 minw uðW; nÞ ¼ kW k2 þ C n i i 2
ð9Þ
Where: u denotes the optimization function that is subjected to sampled pairs (Fij ; Fij ), C is a regularization parameter, i is the iteration number. And ni [ 0 is the ith slack variable.
4 Experimental Results and Discussion For testing the proposed technique, a MANET simulation environment is implemented using the NS-2 simulator ver. 2.35 on Intel Core I3 processor at 2.40 GHz, 2 GB of RAM running Ubuntu 12.04 Linux. Simulation of MANET using NS-2 simulator ver. 2.35 rather than NS-3, offers a more diverse set of MANET modules, in addition, to unify the implementation environment with similar researches in this field [15, 21, 22]. The simulation was conducted over an area of size 1500 m 1500 m rectangular space with randomly distributed mobile nodes and communication model with a constant bit rate (CBR) traffic source. The MANET parameters are shown in Table 1. 4.1
Security Analysis
In this paper, the simulations are conducted under different mobility speeds scenarios to assess and investigate the performance of the network with and without the effect of an attack. The speeds of the nodes are fixed to 5 m/s and 20 m/s. Furthermore, these nodes move randomly in the whole direction. The four scenarios are two benign scenarios with node mobility 5 m/s and 20 m/s, and two more scenarios with the same conditions but they are exposed to active sink-hole attack. Now, assume node 4 and node 5, which belongs to cluster 1 and 2 respectively, are attacker nodes. A simple comparison between normal and abnormal network routing parameters can show the significant decrease in throughput, PDR, and jitter values, which is used as a first indication of the existence of abnormality within certain clusters, which means that some clusters are marked as suspicious nodes. Table 2 shows the numerical comparison among different scenarios, the variance in the performance stability in abnormal cases can be noticed obviously when considering jitter values as shown in scenarios. The comparison of average jitter experienced by the two scenarios. Clusters whose CH senses performance abnormality should start immediately in building a trustworthy RREPT for its neighbors and announce it.
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks
255
Table 1. MANET simulation parameters. Parameter Value MAC protocol IEEE 802.11 Antenna model Omni-directional Network scale 1500 m 1500 m Simulation time 10 s. MAC type 802.11 Application traffic, routing protocol CBR, AOMDV/LEACH Packet size 1000 bytes/packet Data rate 0.1 Mbps Node velocity, pause time 5, 20 m/s, 1.0 s No. of mobile nodes 33 [3 clusters] Observation parameters PDF, end-to-end delay, jitter, throughput
Table 2. Transaction abnormality performance indices Scenario Status Scenario (1) Normal Abnormal Scenario (2) Normal Abnormal
4.2
End-to-end (s) Jitter 0.0187 0.0048 0.0196 1.1732 0.0199 0.0065 0.0242 1.3969
Detection Analysis
MATLAB simulation program is used to perform the semi-parametric analysis and to classify the different nodes measured performance parameters based on AdaBoostSVM (Eq.’s 1–9). Standard classification performance metrics are used; Precision, Recall, Accuracy, and F-measure as in [15]. The detection threshold value has a great influence on distinguishing between benign and attacker nodes. Since decreasing the threshold value leads to an increase in the false-positive rate, conversely, increasing the threshold value leads to failing in detection. Figure 3 introduced the performance of different learners for the last scenario, this is considered the worst, with increasing the threshold value. Here, the proposed method shows more detection stability comparing with semi-parametric learning. The proposed technique robustness is also examined and tested against the increasing number of black-hole attacker nodes. The key indices to test the MANET performance under the proposed technique is to measure the PDR, throughput value, average end to end delay, and the average packet loss. Figure 4 introduces an
256
M. M. Eid and N. A. Hikal
Fig. 3. Detection accuracy versus learner threshold value.
illustration of these key performance parameters against the increasing number of attacker nodes. Hence, the proposed algorithm clearly approves to work better compared to other examined learners. Comparing to other common algorithms in this field; Figs. 5 and 6 respectively, compares the results of absolute error and the detection time values based on the same attacks. Considering the obtained values, applying the conventional SVM [15] introduces an absolute error of 0.148 and an elapsed time of 10.62 s. Moreover, by combing Ant colony optimization to SVM (SVM+ACO) [23], the values are 0.122 and 6.99 s., respectively while the Particle swarm optimization (SVM+PSO) [9] has values of 0.094 and 2.19 s. The result confirms better accuracy for classification but a higher execution time. While the values in the case of applying Decision-tree (C4.5) [24] are 0.111 and 2.48 s. Regarding the results, this technique reaches a higher detection rate with a low false alarm, while the clustering technique limits network overheads. Moreover, one of the major disadvantages of LEACH is that the Cluster head can be died due to any reason then the cluster will become useless. Hence, the data gathered by this cluster nodes would never reach its destination. Thus, in the future, another clustering protocol can be investigated and examined using the proposed method. Furthermore, to extend this work, multiple cooperative sinkhole nodes will be considered, also different attacks can be examined and tested using the proposed technique. Moreover, more robust optimizer can be examined like the proposed in [25] to be added in the feature selection stage.
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks
257
(a)
8
(b)
Fig. 4. MANET performance against the increasing number of black-hole attacker nodes.
0.4
Absolute Error
0.3 0.2 0.1 0
[24] Decision Tree
[23] Naïve Bayes
[23] [18] [15] [9] Proposed Random Conventional SVM-ACO SVM-PSO Technique Forest SVM
Fig. 5. The absolute error of the proposed algorithm compared with the common existing algorithms.
258
M. M. Eid and N. A. Hikal
Detecon me (sec.)
15 10 5 0
[24] [23] Decision Tree Naïve Technique
[23] [18] [15] [9] Random Conventional SVM-ACO
Proposed SVM-PSO
Fig. 6. Detection time of the proposed algorithm compared with the common existing algorithms.
5 Conclusions Cluster-based routing has confirmed to be more efficient in terms of enlarging network lifetime, load balancing, and robustness against different attacks. Although dealing with MANET poses extra overhead in terms of attack detection, the proposed technique has proved an efficient detection accuracy and superior detection time. The selected group of features and their weights adaptation based on AdaBoost algorithm further reduce the time complexity of the SVM classifier, so it results in an accurate and fast detection technique. However, the proposed method mainly depends on the correctness of the data aggregated from RREP packets, so any packet modification attacks are not considered in this paper. Moreover, LEACH algorithm has the ability to detect a passive black-hole attack, since the CH that doesn’t transmit data all the time is remarked as a black-hole one and its probability to be chosen is highly reduced. However, it might be so hard to detect collaborative sinkholes that cooperate to send fake requests (i.e. RREQ and RREP).
References 1. Belavagi, M.C., Muniyal, B.: Performance evaluation of supervised machine learning algorithms for intrusion detection. Procedia Comput. Sci. 89, 117–123 (2016). https://doi. org/10.1016/j.procs.2016.06.016 2. Jangir, S.K., Hemrajani, N.: A comprehensive review and performance evaluation of detection techniques of black hole attack in MANET. J. Comput. Sci. 13, 537–547 (2017). https://doi.org/10.3844/jcssp.2017.537.547 3. Abdel-Fattah, F., Farhan, K.A., Al-Tarawneh, F.H., Altamimi, F.: Security challenges and attacks in dynamic mobile ad hoc networks MANETs. In: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019, pp. 28–33 (2019). https://doi.org/10.1109/JEEIT.2019.8717449
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks
259
4. Sarika, S., Pravin, A., Vijayakumar, A., Selvamani, K.: Security issues in mobile ad hoc networks. Procedia - Procedia Comput. Sci. 92, 329–335 (2016). https://doi.org/10.1016/j. procs.2016.07.363 5. Vimala, S., Khanaa, V., Nalini, C.: A study on supervised machine learning algorithm to improvise intrusion detection systems for mobile ad hoc networks. Cluster Comput. 22(2), 4065–4074 (2018). https://doi.org/10.1007/s10586-018-2686-x 6. Ektefa, M., Memar, S., Affendey, L.S.: Intrusion detection using data mining techniques. In: 2010 International Conference on Information Retrieval & Knowledge Management (CAMP), Shah Alam, Selangor, pp. 200–203 (2010) 7. Panos, C., Ntantogian, C., Malliaros, S., Xenakis, C.: Analyzing, quantifying, and detecting the blackhole attack in infrastructure-less networks. Comput. Netw. 113, 94–110 (2017). https://doi.org/10.1016/j.comnet.2016.12.006 8. Koujalagi, A.: Considerable detection of black hole attack and analyzing its performance on AODV routing protocol in MANET (mobile ad hoc network). Am. J. Comput. Sci. Inf. Technol. 06, 1–6 (2018). https://doi.org/10.21767/2349-3917.100025 9. Yazhini, S.P., Devipriya, R.: Support vector machine with improved particle swarm optimization model for intrusion detection. Int. J. Sci. Eng. Res. 7, 37–42 (2016) 10. Ardjani, F., Sadouni, K., Benyettou, M.: Optimization of SVM multiclass by particle swarm (PSO-SVM). In: 2010 2nd International Workshop on Database Technology and Applications, DBTA 2010, p. 3 (2010). https://doi.org/10.1109/DBTA.2010.5658994 11. Kaur, S., Gupta, A.: A novel technique to detect and prevent black hole attack in MANET. Int. J. Innov. Res. Sci. Eng. Technol. 3, 4261–4267 (2015). https://doi.org/10.15680/ IJIRSET.2015.0406092 12. Elwahsh, H., Gamal, M., Salama, A.A., El-Henawy, I.M.: A novel approach for classifying MANETs attacks with a neutrosophic intelligent system based on genetic algorithm. Secur. Commun. Netw. 2018 (2018). https://doi.org/10.1155/2018/5828517 13. Nagalakshmi, T.J., Rajeswari, T.: Detecting packet dropping malicious nodes in MANET using SVM. Int. J. Pure Appl. Math. 119, 3945–3953 (2018). https://doi.org/10.5958/09765506.2018.00752.0 14. Gupta, P., Goel, P., Varshney, P., Tyagi, N.: Reliability factor based AODV protocol: prevention of black hole attack in MANET. In: Advances in Intelligent Systems and Computing, pp. 271–279. Springer (2019). https://doi.org/10.1007/978-981-13-2414-7_26 15. Shakya, P., Sharma, V., Saroliya, A.: Enhanced multipath LEACH protocol for increasing network life time and minimizing overhead in MANET. In: 2015 International Conference on Communication Networks, pp. 148–154. IEEE (2015). https://doi.org/10.1109/ICCN. 2015.30 16. Chandel, J., Kaur, N.: Energy consumption optimization using clustering in mobile ad-hoc network. Int. J. Comput. Appl. 168, 11–16 (2017). https://doi.org/10.5120/ijca2017914405 17. Tu, C., Liu, H., Xu, B.: AdaBoost typical algorithm and its application research. In: 3rd International Conference on Mechanical, Electronic and Information Technology Engineering (ICMITE 2017), pp. 1–6 (2017). https://doi.org/10.1051/matecconf/201713900222 18. Almomani, I., Al-kasasbeh, B., Al-akhras, M.: WSN-DS : a dataset for intrusion detection systems in wireless sensor networks (2016) 19. Mazraeh, S., Ghanavati, M., Neysi, S.H.N.: Intrusion detection system with decision tree and combine method algorithm. Int. Acad. J. Sci. Eng. 6, 167–177 (2019) 20. Li, X., Wang, L., Sung, E.: AdaBoost with SVM-based component classifiers. Eng. Appl. Artif. Intell. 21, 785–795 (2008). https://doi.org/10.1016/j.engappai.2007.07.001 21. Anbarasan, M., Prakash, S., Anand, M., Antonidoss, A.: Improving performance in mobile ad hoc networks by reliable path selection routing using RPS-LEACH. Concurr. Comput. Pract. Exp. 31, e4984 (2019). https://doi.org/10.1002/cpe.4984
260
M. M. Eid and N. A. Hikal
22. Arebi, P., Rishiwal, V., Verma, S., Bajpai, S.K.: Base route selection using leach low energy low cost in MANET (2016). https://www.semanticscholar.org/paper/Base-Route-SelectionUsing-Leach-Low-Energy-Low-In-Arebi-Rishiwal/b88542600c90a97a7af0b6eb42f37d7920 c2ecf1. Accessed 19 July 2019 23. Tyagi, S., Kumar, N.: A systematic review on clustering and routing techniques based upon LEACH protocol for wireless sensor networks. J. Netw. Comput. Appl. 36, 623–645 (2013). https://doi.org/10.1016/j.jnca.2012.12.001 24. Pavani, K., Damodaram, A.: Anomaly detection system for routing attacks in mobile ad hoc networks. Int. J. Netw. Secur. 6, 13–24 (2014) 25. El-Sayed, E.-K.M., Eid, M.M., Saber, M., Ibrahim, A.: MbGWO-SFS: modified binary grey wolf optimizer based on stochastic fractal search for feature selection. IEEE Access 8, 107635–107649 (2020). https://doi.org/10.1109/ACCESS.2020.3001151
A Secure Signature Scheme for IoT Blockchain Framework Based on Multimodal Biometrics Yasmin A. Lotfy1(&) and Saad M. Darwish2 1
2
Faculty of Engineering, Department of Computers, Pharos University in Alexandria, Alexandria, Egypt [email protected] Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt [email protected]
Abstract. Blockchain technology is receiving a lot of attention in recent years for its wide variety of applications in different fields. On the other hand, the Internet of Things (IoT) technology currently considers the new growth engine of the fourth industrial revolution. Current studies developed techniques to overcome the limitations of IoT authentication for secure scalability in the Blockchain network by only managing storing the authentication keys. However, this authentication method does not consider security when extending the IoT to the network; as the nature of IoT ensures connectivity with multiple objects, with many places that increase security threats, it causes serious damage to assets. This brought many challenges in achieving equilibrium between security and scalability. Aiming to fill this gap, the work proposed in this paper adapts multimodal biometrics for extracting a private key with high entropy for authentication to improve IoT network security. The suggested model also evaluates the security score for the IoT device using a whitelist through Blockchain smart contract to ensure the devices authenticate quickly and limit the infected machines. Experimental results prove that our model is inherently unforgeable against adaptively chosen message attacks; it also reduces the number of infected devices to the network by up to 49% compared to the conventional schemes. Keywords: Blockchain IoT biometrics Feature fusion
Fuzzy identity signature Multimodal
1 Introduction As life increasingly moves online, one of the challenges facing Internet users is to execute a transaction in an environment in which there is no trust among them. This unavoidably increases the need for a cost-efficient secured data transfer environment. Blockchain is a peer-to-peer network that is cryptographically secure, immutable, and updatable only via an agreement among peers [1]. It’s a promising technology whereby peers can exchange values using transactions despite requiring a central authority; it protects the user’s privacy and prevents identity theft [2]. The digital signature verification protects the transaction in blockchain by ensuring that the creator has the right © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 261–270, 2021. https://doi.org/10.1007/978-3-030-58669-0_24
262
Y. A. Lotfy and S. M. Darwish
private key [3]. Unfortunately, in several critical fields where highly strict verification is required, this technique doesn’t ensure that the creator of a transaction is an authorized user. There is a possibility that an intruder may collect the private key and produce illegal processes. Permissioned Blockchain works for a wide scale of industries involving banking, finance, insurance, healthcare, human resources, and digital music services [4]. Blockchain is based on asymmetric cryptography and digital signature scheme participants. For a Blockchain to identify and maintain proof of ownership of any digital assets, each user has a pair of public and private keys. Private keys are at considerable risk of exposure; leads to weak blockchain system [5]. One of the potential solutions to this issue is to use biometric data like fingerprint, face, and iris as a private key [6]. Since biometric data consider part of the body, it provides an efficient way to identify the user. However, biometric data is noisy and tends to vary while taken, since two biometric scans produced from the same feature are rarely identical despite the advantages of biometric authentication. Consequently, even though the parties use a mutual secret key generated from biometric data, conventional protocols don’t guarantee correctness. The fuzzy private key is the perfect solution for the current issues in biometric authentication [7]. Fuzzy private key principle implements based on in a fuzzy identity-based signature scheme allows for the generation of a signature using “fuzzy data” (i.e., a noisy data such as a biometric) as a signing and verification key. Internet of things (IoT) offers a more responsive environment by allowing various devices to communicate and exchange information. IoT devices collect and analyze significant amounts of data, such as personal and confidential data from daily life. However, hacking and cyber-attacks hitting IoT devices are growing each year and, as most of the IoT devices low-power and low-performance devices, that make it difficult to apply security methods implemented for traditional PCs to IoT devices; hence, they become vulnerable to cyber-attacks. To solve this problem, Attention has been given to the incorporation of Blockchain technology and IoT [8]. Motivated by the above challenges also in trying to adapt with them, in this paper, we aim to introduce multimodal biometrics technology for authentication in the blockchain system to extend IoT devices securely. The multimodal biometric system decreases the potential of spoofing and helps overcome the weaknesses of unimodal biometric systems. The private key created from two biometric features is fused to obtain the most unique and high entropy key. Furthermore, regardless of whether the user is authenticated through the network, our model automatically evaluates the IoT devices’ security score using the whitelist via smart contract and restrict the scalability of the infected device when the score is low. Furthermore, fuzzy matching is exploited to match the digital signature in the verification phase to handle noisy biometric features. The rest of this paper is organized as follows: Sect. 2 describes some of the recent related works. The detailed description of the proposed model has been made in Sect. 3. In Sect. 4, the results and discussions on the dataset are given. Finally, the conclusion is annotated in Sect. 5.
A Secure Signature Scheme for IoT Blockchain Framework
263
2 Related Work Conventional security on blockchain private keys is mainly done in two ways, either by encrypting the keys or developing on hardware or software-based wallets [9]. Those are unsuitable as the safety probably won’t be guaranteed; also, all these wallets still must synchronize to the blockchain. To address the problems described above, in 2018, W. Dai [10] has suggested a lightweight wallet based on the Trust zone that can build a secure and stable code environment that demands high security. Their approach is more portable compared to the hardware wallet and secure than the software wallet. In recent years, IoT has been received much attention from researchers. For instance, in 2016, Samaniego, M., [11] introduced a collaborative system for an automatic smart contract between blockchain and IoT. This approach suggested a secure and scalable platform with non-centralized authority. The main goal is not only confirming that a correct device has generated a transaction but to ensure that a proper user has generated that transaction on his intention. Yet, checking the user’s intention when generated blockchain transactions is still a challenging issue. In another work, in 2015, Balfanz, D. [12] used biometric information such as fingerprints and irises in secure hardware, and then activated the private key. By linking such an authentication method with blockchain, it ensures a secure blockchain system. However, it is necessary to convey a device with biometric information and to input that biometric information from that dedicated secured equipment. Since blockchain still considers new technology, there is room for making it more efficient and practical in real applications. According to the review mentioned above, it can be found that past studies were primarily dedicated to (1) Devising different types of wallets either offline or online that employ storing the private key and keeping backups of the wallets. (2) Creating new techniques for singing and verification by encrypting the private key still does not guarantee a trusted secure authentication. (3) Not addressing the problem that varies between issuing a transaction from a legitimate user and another one that hacked the private key. (4) Not addressing the issues when an infected device extends across the network. However, to the best of our knowledge, little attention has been paid to suggest a new approach to merge blockchain and biometrics at the algorithm level to help improve IoT security and usability in the blockchain-based framework.
3 Proposed Methodology This paper proposes a modified model combines multimodal biometrics and blockchain technology in a unified framework based on a fuzzy identity-based signature to secure IoT device authentication and allows the extension in the blockchain network. We take into consideration that every IoT device has security flaws and subjected to installing infected software. Therefore we evaluate the security score of the devices by using the whitelist, which defines a list of verified software and then restricts device extension other than the list [13]. We apply a multimodal biometrics method based on feature level fusion of both fingerprint and finger-vein to extract a unique biometric private key during the biometric key extraction phase.
264
Y. A. Lotfy and S. M. Darwish
During the transaction generation phase, the sender encrypts the data with his private biometric key, and the fuzzy biometric signature is created. The sender adds his biometric signature to a new blockchain transaction, then this new block’s transaction is appended to the ledger waiting for approval or rejection. Finally, during the verification phase a unique strict verification is applied using the biometric public key from the previous transition, private biometric key and the signature, if the verification is valid the block is added to the public ledger of the blockchain and the data is transmitted through the network otherwise the block is rejected. The main diagram of the suggested IoT blockchain model is depicted in Fig. 1. To create and submit a transaction to any user in the IoT network, the user has to go through five phases: evaluating the security score, biometric key extraction phase, registration phase, transaction generation phase, and the verification phase.
Fig. 1. Block diagram of the proposed IoT blockchain authentication model
A Secure Signature Scheme for IoT Blockchain Framework
3.1
265
Fuzzy Identity Based Signature Scheme
Our proposed model relies on Fuzzy Identity Based Signature (FIBS) [14] for generating and verifying blockchain transactions through IoT devices. Since the person can produce a different biometric key every time, he initiates a transaction. Therefore, the conventional digital signature schemes aren’t suitable since they allow a stable data to be used as a key. FIBS uses fuzzy data as a cryptographic key such as finger print, iris, finger-vein, etc. It allows a user with identity w to issue a signature that could be verified with identity w0 if and only if w and w0 are within a certain distance. Refer to [7] for the full steps of building FIBS. 3.2
Security Score Evaluation for IoT Devices Phase
IoT devices can be hacked due to various vulnerabilities, but the most common scenarios are malicious programs installed by the inattention of the user or by hacker attacks [15]. To resolve such issues, our suggested model evaluates the security score based on software verification installed in IoT devices using the whitelist and records it into the blockchain via the smart contract. The whitelist located in an agent embedded in a secure area inside the device contains all the software installed in IoT devices. The security score evaluation is implemented through the following steps: (1) IoT device manufacturers install the software in the device, then provide the whitelist, and also create smart contracts with both the manufacturer’s whitelist. The Initial Agent Hash Value (IAHV) of the agent embedded in the IoT device, then recorded them in the blockchain through Whitelist Smart Contract (WSC). (2) The device access WSC recorded in the blockchain then compares the IAHV recorded in the block with the Device Agent Hash Value (DAHV) of the current whitelist installed on the device. (3) if both match, means the device is not infected or hacked, and the security score is set to be a high and list of software in the whitelist is transmitted to the device, else means forge detection and the connection will be restricted and the manufacture will be alerted with the request for whitelist update via smart contract. (4) Scoring Smart Contract (SSC) is created include device unique data and the security status based agent software and recorded into the blockchain. (5) SSC of the device can be inquired when the device is extended to other networks to guarantee safety authentication [16]. 3.3
Biometric Key Extraction Phase
During this phase, the biometric data is generated by extracting unique features from both fingerprint and finger vein. We use image enhancement techniques to improve contrast, brightness, and remove noise. Regions of interest (ROIs) are extracted. Finger-print vector and finger-vein vector are generated respectively based on a unified Gabor filter framework. Both vectors are reduced in dimensionality by applying Principal Component Analysis (PCA) and concatenated to create a Bio-Private key. The reasons for conducting fusion of fingerprint and finger-vein characteristics are that (1) finger-vein and fingerprint are two physiological characteristics carried by one finger and both have reliable properties in biometrics, (2) fingerprints and finger veins hold complementarity in universality, accuracy and security [17]. Both pre-processing
266
Y. A. Lotfy and S. M. Darwish
(region of interest) and fusion steps reduce key dimensions to reduce computational processing. See [17, 18] for more details. Since we work with a private permissioned blockchain, where all the participants have known identities, thus, our proposed scheme is built over the biometric key infrastructure that includes Biometric Certificate Authority (BCA) for confirming the participant identities in order to determine the exact permissions over resources and access to information that participants have in a blockchain network. 3.4
Registration Phase
In this phase, the user biometric private key is generated from the user biometric data w. His public key associated with the private key is also created and certified, then registered into the blockchain network. This phase consists of four steps: (1) Confirming the user identity. (2) Generating public key and master key. (3) Creating a private key associated with the user’s biometric information. (4) Issuing a public key certificate. The Biometric Certificate Authority (BCA) issues a Public Key Certificate (PKC) by giving a digital signature to the public key which is a set of attributes related to the holder of the certificate such as user ID, and expiration date and much more information, all of those attributes are encrypted by the BCA’s private key so that tampering will invalidate the certificate. Then the BCA registers a PKC in the repository and publishes it to the network. See [19] for more details. 3.5
Transaction Generation Phase
In this phase the sender generate a new blockchain transaction which includes the Owner 3’s PKC (a receiver’s PKC), contents, and their hash value H. This phase consists of the three steps: (1) Creating new blockchain transaction. (2) Generating the fuzzy signature (3) Issuing the signature into the Blockchain. The sender attaches his biometric signature to a new blockchain transaction, then the new block’s transaction is appended to the ledger waiting to be confirmed or rejected. See [19] for more details regarding these steps. 3.6
Transaction Verification Phase
In this phase, we use a verification algorithm that takes the public parameters PP, an identity w0 such that jw \ w0 j d, hash of the message H, and the corresponding signature S as an input. It returns a bit b, where b = 1 means that the signature is valid, otherwise invalid. The verification is done through two-phase hierarchical verification. This phase consists of the two steps: (1) Checking the expiration date, and (2) Calculating the signature verification results. If the equality holds BVer ¼ 1; verification success; otherwise, BVer ¼ 0 verification failed. The client application will then be notified that the transaction has been immutably appended to the chain, as well as notification of whether the transaction was validated or invalidated [18, 19].
A Secure Signature Scheme for IoT Blockchain Framework
267
4 Experimental Results Currently, there is no open database contain Fingerprint and Finger Vein images for the same person. The ready-made database for fingerprint and finger vein is selected from the Shandong University for SDUMLA-HMT Database [20]. We assume that the same person has independent different biometric traits. The fingerprint and finger vein images were collected from 106 volunteers. We perform the evaluation using the IoT device with the following specifications: CPU: Intel Core i7 7500U Processor 2.7 GHz and memory: 8 GB. System type: 64-bit operating Enterprise as a running operating system. The biometric part of key extraction has been implemented using MATLAB. The blockchain part is employed in the open-source blockchain platform, the hyper ledger Fabric [21], a platform used to develop and implement cross-industry blockchain applications and systems. Moreover, we use a fuzzy identity-based signature as a digital signature algorithm during this evaluation. In our proposed model, we restrict installing infected software on the IoT device by automatically evaluating the security score of the device using the whitelist and the smart contract and then recording it to the blockchain. Our proposed scheme ensure maximum scalability when the security score is high, and restrict scalability when the security score is low. The adversary has two common attacks to obtain that information. One is attacking the private key, and the other is forging the signature.
Table 1. Security level comparison of our proposed model and other signature schemes against some popular attacks Signature schemes Private key attack or leakage Forgery of the digital signature PKSS Low Middle FKSSU Middle Middle-High FIKSSM High High
In the following, we analyze the security of our proposed scheme and confirm their effectiveness against common attacks in the blockchain system. Also, we compare security level against three signature schemes: the conventional private key based signature scheme (PKSS), the fuzzy key signature scheme based on unimodal biometrics (FKSSU), and our proposed fuzzy identity key signature scheme based on multimodal biometrics (FIKSSM). Table 1 illustrate the security level comparison of the three signature scheme. In the PKSS, the long-term private key is managed in the IoT device, a cloud server, or wallet. Therefore, there is a high risk that an adversary steals the private key and a forges a digital signature and create illegal Blockchain transaction. Losing the private key meaning that losing the ownership of the asset forever as there is no recovery for the private key. This makes PKSS vulnerable to security breaches and doesn’t provide sufficient security in today’s connected and datadriven world. On the other hand, in FKSSU, the private key is not managed in the IoT device; and its private key is derived from his biometric information. It considers a secure scheme,
268
Y. A. Lotfy and S. M. Darwish
but based on only one biometric trait; it has several problems in capturing clean data, including noisy data, inter-class variations, spoof attacks, and unacceptable error rates. Therefore, the FKSSU is not completely secure against that attack. In our FIKSSM, the private key is not managed by the IoT device, too, like FKSSU. However, we use the user’s fingerprint and finger knuckle print to derive a unique private key. Multimodal biometric increases the amount of data analyzed, helps in an accurate match in the verification and makes it exponentially more difficult for the adversary to spoof. Adding multiple modalities makes it difficult to find and use all the biometric data needed to spoof the algorithm. That makes our system more secure, robust, and able to avoid spoofing attacks. Regarding forge a digital signature attack, this threat is to produce an illegal blockchain transaction and make the verification of a digital signature succeed. This threat can be managed if we adopt a safe algorithm for generating key pairs. In the PKSS and FKSSU, if a secure signature algorithm that is difficult to be forged is used, these signature schemes are safe. In the FIKSSM, we use a secure fuzzy identity signature algorithm that is proved secure under the EUF-CMA model (Existential Unforgeability against Adaptive Chosen Message Attacks) [7]. In the UF-FIBS-CMA model, the adversary’s probability of producing a valid signature of any message under a new private key is negligible.
Fig. 2. Comparison results of vulnerability.
The second set of experiments aims to validate the robustness of the proposed model in terms of vulnerability defined as the flaws or weaknesses in the IoT device that leave it exposed to threats. It is the result of dividing the number of infected devices by the number of all devices in the network. In Fig. 2, the horizontal axis represents simulation time, and the vertical axis represents the number of infected IoT devices due to connection with other peripheral malicious devices. In PKSS model, the infected devices are linked directly to the IoT devices in the network that makes the scheme vulnerable to Distributed Denial of Service (DDOS) attack in which the
A Secure Signature Scheme for IoT Blockchain Framework
269
attacker uses several infected IoT devices to overwhelm a specific target node, and 53% of the devices have been infected due to their connection to infected devices. The FKSSU still isn’t secure enough because it is only depend on only securing the user authentication and verification using a unimodal biometric key, but ignore the security score for the device when it extended to the network that make the device vulnerable to malicious programs created by the user’s carelessness or by hackers’ attacks. On the other hand, in our FIKSSM proposed scheme, the number of devices connected to malicious devices is reduced to 4%. The number of devices connected to malicious devices reduced by up to 49% compared to PKSS. As our scheme is restricted to connection depending on the security score and continuously verifies software via the agent and the whitelist, the number of infected devices continues to decrease thanks to the whitelist update.
5 Conclusion In this work, we introduce a new signature scheme that resolves security and scalability issues in IoT devices based on blockchain technology. The suggested model utilizes multimodal biometrics that securely extends the connection of IoT devices in the network and guarantees that the person who generates the blockchain transaction is the correct asset owner and not just a fraud. The proposed model improves security throughout two phases. The first is by using multimodal biometric as a private key for authentication and verification by applying the fuzzy identity signature to create a blockchain transaction. Secondly, by deploying whitelist software on IoT devices, then record all installed software to the blockchain via smart contract. Our proposed model reduces the extension of infected devices, up to 49% compared with conventional schemes. Therefore, the proposed scheme achieves practically high performance in terms of security against spoofing and signature forgery and scalability. On the other hand, our proposed model is expected to outperform in latency and throughput only at the start compared to conventional models. For future work, we plan to extend our proposed model in a real-world and examine the performance during implementation.
References 1. Bozic, N., Pujolle, G., Secci, S.: A tutorial on blockchain and applications to secure network control-planes. In: 3rd IEEE Smart Cloud Networks & Systems Conference, United Arab Emirates, pp. 1–8, IEEE (2016) 2. Cai, Y., Zhu, D.: Fraud detections for online businesses: a perspective from blockchain technology. Financ. Innov. 2(1), 1–10 (2016) 3. Zyskind, G. Nathan, O., Pentland, A.S.: Decentralizing privacy: using blockchain to protect personal data. In: Proceedings of the IEEE Security and Privacy Workshops, USA, pp. 180– 184. IEEE (2015) 4. Min, X., Li, Q., Liu, L., Cui, L.: A permissioned blockchain framework for supporting instant transaction and dynamic block size. In: Proceedings of the IEEE Trustcom/BigDataSE/ISPA Conference, China, pp. 90–96. IEEE (2016)
270
Y. A. Lotfy and S. M. Darwish
5. Gervais, A., Karame, G., Capkun, V., Capkun, S.: Is bit coin a decentralized currency? IEEE Secur. Priv. 12(3), 54–60 (2014) 6. Murakami, T., Ohki, T., Takahashi, K.: Optimal sequential fusion for multibiometric cryptosystems. Inf. Fusion 32, 93–108 (2016) 7. Yang, P., Cao, Z., Dong, X.: Fuzzy identity based signature with applications to biometric authentication. Comput. Electr. Eng. 37(4), 532–534 (2017) 8. Fernández-Caramés, T., Fraga-Lamas, P.: A review on the use of blockchain for the Internet of Things. IEEE Access 6, 32979–33001 (2018) 9. Goldfeder, S., Bonneau, J., Kroll, J.A. Felten, E.W.: Securing bit coin wallets via threshold signatures. Master thesis, Princeton University, USA (2014) 10. Dai, W., Deng, J., Wang, Q., Cui, C., Zou, D., Jin, H.: SBLWT: a secure blockchain lightweight wallet based on trust zone. IEEE Access 6, 40638–40648 (2018) 11. Samaniego, M., Deters, R.: Blockchain as a service for IoT. In: IEEE International Conference on Internet of Things (iThings), China, pp. 433–436. IEEE (2016) 12. Balfanz, D.: Fido U2F implementation considerations. FIDO Alliance Proposed Standard, pp. 1–5 (2015) 13. Dery, S.: Using whitelisting to combat malware attacks at Fannie Mae. IEEE Secur. Priv. 11 (4), 90–92 (2013) 14. Waters, B.: Efficient Identity-based encryption without random Oracles. Lecture Notes on Computer Science, vol. 3494, pp. 114–127, Berlin (2005) 15. Alaba, A., Othman, M., Hashem, T., Alotaibi, F.: Internet of Things security: a survey. J. Netw. Comput. Appl. 88, 10–28 (2017) 16. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the Internet of Things. IEEE Access 4, 2292–2303 (2016) 17. Ross, A., Govindarajan, R.: Feature level fusion of hand and face biometrics. Biometric Technol. Hum. Ident. II, USA 5779, 196–204 (2005) 18. Wang, Z., Tao, J.: A fast implementation of adaptive histogram equalization. In: 8th International Conference on Signal Processing, China. pp. 1–4. IEEE (2006) 19. Zhao, H., Bai, P., Peng, Y., Xu, R.: Efficient key management scheme for health blockchain. Trans. Intell. Technol. 3(2), 114–118 (2018) 20. Shandong University (SDUMLA). http://mla.sdu.edu.cn/info/1006/1195.htm 21. Cachin, C.: Architecture of the hyperledger blockchain fabric. In: Workshop on Distributed Crypto-Currencies and Consensus Ledgers, Zurich. vol. 310, p. 4 (2016)
An Evolutionary Biometric Authentication Model for Finger Vein Patterns Saad M. Darwish1(&) and Ahmed A. Ismail2 1
Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt [email protected] 2 Higher Institute for Tourism, Hotels and Computers, Seyouf, Alexandria, Egypt [email protected]
Abstract. Finger vein technology is the biometric system that utilizes the vein structure for recognition. Finger vein recognition has gained a great deal of publicity because earlier biometric approaches experienced significant pitfalls. These include its inability to handle the imbalanced collection and failure to extract salient features for finger vein patterns. Such disadvantages have triggered a lack of consistency of the optimization algorithm or have contributed to a decrease in its efficiency. The key objective of the research discussed in this paper is to examine the impact of the genetic algorithm in the selection of the optimum vector characteristics of the finger vein. This is done by incorporating a multilevel of control genes (Hieratical genetic algorithm) to boost the features’ variability inside the features’ vector, and minimizing the correlation among features. The boosted feature selection method yields the ideal features’ vector that can handle the utility of large intra-class differences and limited inter-class similarities. The proposed model also offers the idea of reducing the finger vein features dimension to diminish duplication, but not at the expense of decreasing accuracy. The performance study of the proposed model is carried out through multiple tests. The findings indicate an overall increase of 6% is achieved in accuracy relative to some state-of-the-art finger vein recognition systems present in the literature. Keywords: Biometrics genetic algorithm
Finger vein recognition Optimization Hierarchal
1 Introduction Biometrics is seen as a basis of highly robust authentication systems facilitating several advantages over the traditional systems. The reliability of a biometric is a measure of the extent to which the feature or attribute is sensitive to considerable modification over time. A highly robust biometric does not change significantly over time [1]. A generalized biometric system is a functional combination of five main components, as shown in Fig. 1 [2]. Finger vein is characterized as inside the human body, which cannot be stolen or counterfeited. These advantages endow this tool with a highly viable in the application in commercial places, residences, or other private places [3]. However, the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 271–281, 2021. https://doi.org/10.1007/978-3-030-58669-0_25
272
S. M. Darwish and A. A. Ismail
texture information of the finger vein is limited, and the pose variation of the finger may cause a change of finger vein infrared image. These finger vein variations always produce high inner-class-distance of two images from one individual, and so degrade the matching performance, even for the accurate segmented images [4].
Fig. 1. Components of biometric system and process flow diagram.
Recently, the incorporation of specialized Genetic Algorithms (GAs) in the finger veins-based human identification to increase its functioning has received a great deal of attention among researchers working in this field [5]. Unfortunately, the GA chromosome and the phenotype structure are assumed to be fixed or pre-defined (characterized by specific numbers of parameters/genes). Also, it cannot be used in real-time applications because the convergence time is unknown. Hierarchical Genetic Algorithm (HGA) is an improved version of standard GA. Moreover, HGA differs from standard GA in the structure of chromosomes. The hierarchical structure consists of parameter and control genes. Parameter genes exist at the lowest level, and control genes reside at the higher levels of the parameter genes. HGA will search over a more extensive search space and converge to the right solution with a higher grade of accuracy [6]. 1.1
Problem Statement and Contribution
The uniqueness, reliability, and tolerance of the forgery are the finger vein’s main features that make user authentication safe. To develop highly accurate personal identification systems, the finger-vein patterns should be extracted accurately from the captured images. In general, choosing the best set of discriminative features for extracting from input data is one of the most challenging aspects of designing a verification system. Even now, the ideal set of features for separating data can be realized using optimization algorithms. This research aims to implement the optimum finger vein identification model, taking into account the more discriminative descriptors that can significantly improve biometric system performance, using HGA. In the suggested model, HGA has been employed instead of traditional GA to evolve the standard structure of GA’s chromosome through two levels of control genes to regulate parametric genes. This evolved chromosome’s structure helps to increase the diversity of the genes, thus avoiding the coming together to a single optimum (most favourable
An Evolutionary Biometric Authentication Model for Finger Vein Patterns
273
solution). Employing the optimal feature selection process produces the narrow search space that can be used to locate particular subject quickly in a small set of subjects instead of an exhaustive search on the complete database. The rest of the paper is organized as follows: Sect. 2 describes some of the state-ofthe-art related finger vein identification schemes. The detailed description of the proposed model has been made in Sect. 3. In Sect. 4, the results and discussions are given. Finally, conclusions are drawn in Sect. 5.
2 Related Work Existing studies regarding finger vein feature extraction can be divided into four main groups [5, 7–9]: (1) Filtering or transforming methods, (2) Segmenting the pixels corresponding to the veins and reflect or directly compare those pixels. (3) Concise extraction methods. (4) Machine learning approaches, such as neural networks. In recent years, deep learning has been received much attention from researchers [10–12]. For instance, in the article [10], the authors suggested a convolutional-neural-network-based finger vein recognition framework and analysed the network’s strengths over four publicly accessible databases. The comprehensive set of experiments recorded indicates that the recommended solution’s precision will surpass 95% of the accurate recognition rate for all four publicly-accessible datasets. However, the neural network application is facing some difficulties, including hyper-parameter tuning, which is non-trivial, needs a big dataset for proper training, is still a black box, and is comparatively slow. Some pioneering work has been done recently by incorporating wavelet transformation to characterize the finger vein patterns explicitly. In [13], a new technique is introduced to extract vein patterns from near-infrared images that are improved by directional wavelet transformation and eight-way neighbourhood methods to reduce the necessary computational expense and to conserve essential details from low-resolution images. However, greater complexity translates into more resources required to perform the computation - more memory and processor cycles and time. Furthermore, the flexibility of DWTs is a two-edged sword - it is sometimes very difficult to choose which basis to use. The aim of the research presented in [14] is to deliver a new scheme focused on finger vein patterns by utilizing block standard local binary template and block two-dimensional principal component analysis approach to minimize data redundancy efficiently. Next, a block multi-scale uniform local binary pattern features operator focused on enhanced circular proximity is used to remove spatial texture properties of finger vein images effectively. See [15] for more details regarding current finger vein identification approaches. According to the review above, it can be found that past studies were primarily devoted to (1) Developing various forms of feature extraction that are used to collect the pattern of finger vein details (spatial/transformation- features); (2) Failure to resolve problems relevant to the collection of suitable features from the pools of derived features for feature extraction algorithms (most frequently depending on the methods used); and (3) In order to obtain compact features vector, most dimensional reducing methods depend on transformation matrix that, in case of managing all extracted functions, involves comprehensive calculations. However, to the best of our knowledge
274
S. M. Darwish and A. A. Ismail
based on Google scholar, there has been no attention given to designing new optimal vector technologies as well as to boost their vector matching efficiencies.
3 Methodology Biometric data cannot be measured directly but require stable and distinctive features first to be extracted from sensor outputs. The problem of feature selection is selecting a set of candidate features and selecting a subset that best performs under some classification system [3, 16]. This proposal’s motivation is the valorization of the features, which respectively maximize and minimize the signatures variation in the inter- and intra-class assessments, respectively [16]. The key difficulty in dealing with features produced by finger vein tools is the heterogeneity of features with respect to the same class due to several factors. The proposed finger-vein recognition algorithm consists of two phases: the enrolment (training) phase and the verification (testing) phase. Both stages start with finger-vein image pre-processing. After the pre-processing and the feature extraction step, the finger-vein template database is built for the enrolment stage. For the verification stage, the enrolled finger-vein image is matched with a similar pattern after its optimal features are extracted. Figure 2 shows the flowchart of the proposed model.
Fig. 2. Hierarchal genetic algorithm-based finger vein recognition model.
An Evolutionary Biometric Authentication Model for Finger Vein Patterns
3.1
275
Finger Vein Data Acquisition
The database of finger vein images was collected from 123 persons comprising of 83 males and 40 females from University Sains Malaysia [17]. The age of the subject ranged from 20 to 52 years old. Every subject provided four fingers: left index, left middle, right index, and right middle fingers resulting in a total of 492 finger classes obtained. The captured finger images provided two important features: the geometry and the vein pattern. The spatial and depth resolution of the collected finger images were 640 x 480 and 256 grey levels, correspondingly. 3.2
Region of Interest Extraction and Pre-processing
Collecting the samples of finger vein can introduce translational and rotational changes in various images picked up from the same finger or individual. Therefore, programmed extraction of the region of interest (ROI) that can diminish intra-class deviations is highly desirable. The useful area is said to be “Region of Interest.” The image area without an effective pattern is first discarded since it only holds surroundings (background) data [16]. In this case, the theoretical model can be checked by a collection of benchmark finger vein datasets, which only have an ROI. The aim of finger vein image pre-processing is to enhance some image features relevant for a further processing task. As a result, interesting details in the image are highlighted, and noise is removed from the image. Finger images include noise with rotational and translational variations. To eliminate these variations, finger images are subjected to pre-processing steps that include image filtering, enhancement, and histogram equalization. See [18, 19] for more details. 3.3
Feature Extraction
In machine learning and pattern recognition, feature extraction starts from a preliminary group of measured data. It builds derived values (features) intended to be informative and non-redundant [3]. The feature extractor’s key objective is to switch the finger vein images into a set of features that are like to the images of the same class and are distinctive to the images of a different class. From another point of view, feature extraction is a dimensionality reduction process, where an initial set of raw variables is reduced to more manageable groups (features) for processing, while still accurately and completely describing the original data set. There are two types of features extracted from the images based on the application. They are local and global features. See [19] for more details. A combination of global and local features enriches the accuracy of the recognition with more computational overheads. In this research, the Gabor filter was employed to extract finger vein pattern’ features. The most important advantage of Gabor filters is their invariance to rotation, scale, and translation. Furthermore, they are robust against photometric disturbances, such as illumination changes and image noise. The Gabor filter-based features are directly extracted from the gray-level images. Three types of information can be found by using Gabor filters: magnitude, phase, and orientation, which can be individually or
276
S. M. Darwish and A. A. Ismail
jointly applied in different systems. Readers looking for more information regarding how to extract Gabor feature vector can refer to [20]. 3.4
Feature Selection Based on Hierarchal Genetic Algorithm
Feature selection is a widely researched topic in the area of machine learning. It has been found to be valuable in reducing the complexity, computational speed while improving the accuracy of an identification problem [21]. Obtaining robust features in the biometric applications is a hard assignment due to features being influenced by inherent factors such as sexual characteristics, and outside stimuli like varying illumination can cause further difficulties in feature extraction. The suggested model utilizes a hierarchal genetic algorithm to select from the set of chosen initially features the optimum ones. The use of the HGA is particularly essential for the structure or topology as well as the parametric optimization [22]. Unlike the set-up of the conventional GA optimization, where the chromosome structure is assumed to be fixed or pre-defined, HGA operates without these constraints. The complicated chromosomes may provide a good new way to solve the problem, and have demonstrated better results in complex issues than the conventional GA [22].
Fig. 3. Hierarchical chromosome structure.
One of the main differences between HGA and GA is that HGA can dynamically vary the presentation approach due to active and inactive genes, which shows that the phenotype with different lengths is available within the same chromosome representation. Hence, HGA will search over a more extensive search space and converge to the right solution with a higher grade of accuracy. Those chromosome structures of the conventional GA are assumed pre-defined or fixed, while the HGA works without these constraints. It utilizes multiple levels of control genes that introduced hierarchically, as illustrated in Fig. 3 [23]. Since the chromosome structure of HGA fixed, and this is true even for different parameter lengths, there is no extra effort required for reconfiguring the usual genetic operations. Therefore, the standard methods of mutation and crossover may apply independently to each level of genes or even for the whole chromosome if this is homogenous. However, the genetic operations that affect the high-level genes can result in changes within the active genes, leading to multiple changes in the lower level genes. It’s the precise reason why the HGA cannot only obtain a good set of system parameters but can also reach a minimized system topology [22]. Herein, an instance of
An Evolutionary Biometric Authentication Model for Finger Vein Patterns
277
a GA-feature selection optimization problem can be described formally as a fourtuple (R, Q, T, f) defined as [5, 6, 22, 23]: • R is the solution space that represents a collection of n-dim Gabor feature vectors. Each bit is a gene that characterizes the nonexistence or presence of the feature inside the vector. • Q is the probability predicate, such as crossover and mutation (for each level of genes). The crossover is the process of exchanging the ascending’ genes to produce one or two descendants that carry inherent genes from both parents to increase the variety of mutated individuals. Herein, a single point crossover is employed because of its simplicity. The goal of mutation is to prevent falling into a locally optimal solution of the solved problem; a uniform mutation is employed for its simple implementation. The selection operator retains the best fitting chromosome of one generation and selects the fixed numbers of parent chromosomes. Tournament selection is the most popular selection technique in the genetic algorithm due to its efficiency and simple implementation. • f is the set of feasible solutions (new generation populations). With these new generations, the fittest chromosome will represent the finger vein vector with a set of salient elements. This vector will specify the optimal feature grouping according to the identification accuracy. • f is the objective function (fitness function). The individual that has higher fitness will win to be added to the predicate operators’ mate. Herein, the fitness function is computed based on MSE value that shows the difference between the input image and the matched image. 3.5
Building Finger Vein Feature Set
In the training phase, once the hierarchical genetic procedure made the selection of an optimal feature set, the features are extracted for each training sample. They are stored in a database that contains a finger vein image and its feature vector beside the label of the image that identifies the user. In general, the proposed model’s quality depends on the number of feature’s vector stored per user, which will be experimentally verified. Overall, increasing the number of features vectors per user increases the required storage in the database. In this work, given the training data set, the suggested model builds the finger vein feature database. In the testing phase, given the inquiry sample PEf m from a test dataset, the feature extraction stage is implemented to get FPE. Then the distance dPi between FPE and the K class centers (feature vector for each user) as dpi ¼ kFPE Ci k; i ¼ 1; 2; . . .; K is calculated. At that time, the inquiry sample is allocated to the cluster according to the smallest distance [24].
4 Experimental Results To verify the suggested model, a lot of experiments were piloted on a benchmarked Finger Vein USM (FV-USM) Database [17]. The database consists of the information of the finger vein with the extracted ROI (region of interest). The database’s images
278
S. M. Darwish and A. A. Ismail
were collected from 123 individuals comprising 83 males and 40 females, who were staff and students of University Sains Malaysia. The age of the subject ranged from 20 to 52 years old. Every subject provided four fingers: left index, left middle, right index, and right middle fingers resulting in a total of 492 finger classes obtained. Each finger was captured six times in one session, and each individual participated in two sessions, separated by more than two weeks. In the first session, a total of 2952 (123 4 6) images were collected. Therefore, from two sessions, a total of 5904 images from 492 finger classes are obtained. The spatial and depth resolution of the captured finger images were 640 480 and 256 grey levels, respectively. These set of experiments are running with the following configuration parameters that includes: Generation Number (GN) = 10, Population Size (PS) = 10, Mutation Ratio = 0.3, Crossover Ratio = 0.8. The suggested system has been implemented in MATLAB (2017) using the laptop computer with the following specifications: Processor: Intel (R), Core (TM) i7-7500U CPU@270 GHZ @ 290 GHz. Installed memory (RAM): 8 GB. System type: 64-bit operating system, x64 based processor. Microsoft windows 10 Enterprise as running operating system. The identification accuracies achieved by the state-of-the-art finger-vein-based biometric systems [10, 13, 14], that are discussed in the literature survey section, are reported in Table 1 together with the obtained performance with the proposed HGAbased approach, when using the same training and testing strategies. Herein, Equal Error Rate (EER) was used for evaluation. An ERR is a point where the false acceptance rate and false rejection rate intersects. A device with a lower EER is regarded to be more accurate. As it can be seen from the reported accuracies, our HGA based identification model achieves better results. As shown in Table 1, the CNN–based approach, when implemented to classify finger vein images, performs poorly compared to the proposed model. Deep learning finger vein identification method performance can be enhanced by employing large datasets, so there is a need for a large finger vein image dataset. Table 1. Performance Evaluation of various finger vein classification techniques (using 3 training images per session) Reference R. Das et al. [10] H. C. Hsien [13] N. Hu et al. [14] Proposed model
Method Convolutional-neural- network-based finger vein recognition framework Directional wavelet transformation and eight-way neighbourhood methods Two-dimensional principal component analysis approach Features selection using HGA
EER 0.086 0.163 0.124 0.079
The second set of experiments works to verify the function of the selection module for features to enhance accuracy. Herein, the adaptive features selection procedure is implemented in order to find the best significant features that help to reduce the total
An Evolutionary Biometric Authentication Model for Finger Vein Patterns
279
assessment time with no loss of accuracy. The suggested model is applied as a selection method using both HGA and conventional GA. Tables 2 and 3 show the detailed confusion matrix for finger vein recognition dependent on HGA and GA, where population size is set to 100 chromosomes, and the number of maximum generations is set to 20 for large search space. The crossover and mutation rates are set to 60% and 40%, respectively.
Table 2. Confusion matrix for HGA-based identification (average) Prediction
Actual Within class Between class Within class 97.2% 2.8% Between class 2.8% 97.2%
Table 3. Confusion matrix for GA-based identification (average) Prediction
Actual Within class Between class Within class 91.3 8.7 Between class 8.7 91.3
As shown in Tables 2 and 3, the proposed HGA-based identification model achieves a higher accuracy of approximately 6% compared to the traditional GA method and reduces the false acceptance rate. One of the explanations of this result the HGA depends on the existence of active and inactive genes, which shows that the phenotype with different lengths is available within the same chromosome representation. Hence, HGA will search over a larger search space and converge to the right solution with a higher grade of accuracy. Those chromosome structures of the conventional GA are assumed pre-defined or fixed, while the HGA works without these constraints. Utilizing HGA forces the GA to maintain a heterogeneous population throughout the evolutionary process, thus avoiding the convergence to a single optimum. In general, the feature selection problem has a multimodal character because multiple optimum solutions could be found in the search space [9].
5 Conclusions and Future Work This study has provided an effective personal identity model focused on the finger vein. The enhanced finger vein images were fused with local and global characteristics to obtain the vein’s Gabor transformation-driven design. Use the hierarchal genetic algorithm in general, offers several benefits as a resource to identify the optimal features for the identification of finger veins. The suggested model can help with the use of control genes to generate discriminated features vectors based on small datasets. Thus, they provide utility to the proposed model for operating on a small data set instead of
280
S. M. Darwish and A. A. Ismail
using a large data sample to create a significant feature vector, which in turn takes a considerable amount of time. Due to its relatively low computational complexity in an online process, the proposed model is perfect for mobile applications. Future research involves using fuzzy logic to improve representations of finger veins and minutiae extraction for matching.
References 1. Jaiswal, S.: Biometric: case study. J. Glob. Res. Comput. Sci. 2(10), 19–48 (2011) 2. Vishi, K., Yayilgan, S.: Multimodal biometric authentication using fingerprint and iris recognition in identity management. In: IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 334–341, China (2013) 3. Van, H., Thai, T.: Robust finger vein identification base on discriminant orientation feature. In: Seventh International Conference on Knowledge and Systems Engineering, pp. 348–353, Vietnam (2015) 4. Liu, Z., Yin, Y., Wang, H., Song, S., Li, Q.: Finger vein recognition with manifold learning. J. Netw. Comput. Appl. 33(3), 275–282 (2010) 5. Hani, M., Nambiar, V., Marsono, M.: GA-based parameter tuning in finger-vein biometric embedded systems for information security. In: IEEE International Conference on Communications, pp. 236–241, China (2012) 6. Qi, D., Zhang, S., Liu, M., Lei, Y.: An improved hierarchical genetic algorithm for collaborative optimization of manufacturing processes in metal structure manufacturing systems. Adv. Mech. Eng. 9(3), 1–10 (2017) 7. He, C. Li, Z., Chen, L., Peng, J.: Identification of finger vein using neural network recognition research based on PCA. In: IEEE International Conference on Cognitive Informatics & Cognitive Computing, pp. 456–460, UK (2017) 8. Kono, M., Ueki, H., Umemura, S.: Near-infrared finger vein patterns for personal identification. Appl. Opt. 41(35), 7429–7436 (2002) 9. Wu, J., Liu, C.: Finger-vein pattern identification using principal component analysis and the neural network technique. J. Expert Syst. Appl. 38(5), 5423–5427 (2011) 10. Das, R., Piciucco, E., Maiorana, E., Campisi, P.: Convolutional neural network for fingervein-based biometric identification. IEEE Trans. Inf. Forensics Secur. 14(2), 360–373 (2019) 11. Liu, Y., Ling, J., Liu, Z., Shen, J., Gao, C.: Finger vein secure biometric template generation based on deep learning. Soft. Comput. 22(7), 2257–2265 (2017) 12. Jalilian, E., Uhl, A.: Improved CNN-segmentation-based finger vein recognition using automatically generated and fused training labels. In: Handbook of Vascular Biometrics, pp. 201–223. Springer, Cham (2020) 13. Chih-Hsien, H.: Improved finger-vein pattern method using wavelet-based for real-time personal identification system. J. Imaging Sci. Technol. 62(3), 304021–304028 (2018) 14. Hu, N., Ma, H., Zhan, T.: Finger vein biometric verification using block multi-scale uniform local binary pattern features and block two-directional two-dimension principal component analysis. Optik 208(1), 1–10 (2020) 15. Mohsin, A., Zaidan, A., Zaidan, B., Albahri, O., et al.: Finger vein biometrics: taxonomy analysis, open challenges, future directions, and recommended solution for decentralised network architectures. IEEE Access 8(8), 9821–9845 (2020)
An Evolutionary Biometric Authentication Model for Finger Vein Patterns
281
16. Parthiban, K., Wahi, A., Sundaramurthy, S., Palanisamy, C.: Finger vein extraction and authentication based on gradient feature selection algorithm. In: IEEE International Conference on the Applications of Digital Information and Web Technologies, pp. 143– 147, India (2014) 17. Ragan, R., Indu. M.: A novel finger vein feature extraction technique for authentication. In: IEEE International Conference on Emerging Research Areas: Magnetics, Machines and Drives, pp. 1–5, India (2014) 18. Yang, J., Zhang, X.: Feature-level fusion of fingerprint and finger-vein for personal identification. Pattern Recogn. Lett. 3(5), 623–628 (2012) 19. Iqbal, K., Odetayo, M., James, A.: Content-based image retrieval approach for biometric security using color, texture and shape features controlled by fuzzy heuristics. J. Comput. Syst. Sci. 78(1), 1258–1277 (2012) 20. Veluchamy, S., Karlmarx, L.: System for multimodal biometric recognition based on finger knuckle and finger vein using feature-level fusion and k-support vector machine classifier. IET Biomet. 6(3), 232–242 (2016) 21. Unnikrishnan, P.: Feature selection and classification approaches for biometric and biomedical applications. Ph.D. thesis, School of Electrical and Computer Engineering, RMIT University, Australia, (2014) 22. Xiang, T., Man, K., Luk, K., Chan, C.: Design of multiband miniature handset antenna by MoM and HGA. Antennas Wirel. Propag. Lett. 5(1), 179–182 (2006) 23. Guenounou, O., Belmehdi, A., Dahhou, B.: Optimization of fuzzy controllers by neural networks and hierarchical genetic algorithms. In: Proceedings of the European Control Conference (ECC), pp. 196–203, Greece (2007) 24. Itqan, S., Syafeeza, A., Saad, N., Hamid, N., Saad, W.: A review of finger-vein biometrics identification approaches. Int. J. Sci. Technol. 9(32), 1–8 (2016)
A Deep Blockchain-Based Trusted Routing Scheme for Wireless Sensor Networks Ibrahim A. Abd El-Moghith(&) and Saad M. Darwish Department of Information Technology, Institute of Graduate Studies and Research, Alexandria University, Alexandria, Egypt {ibrahim.abd.el.moghith,saad.darwish}@alexu.edu.eg
Abstract. Routing is one of the most important operations in Wireless Sensor Networks (WSNs) as it deals with data delivery to base stations. Routing attacks can easily destroy and significantly degrade the operation of WSNs. A trustworthy routing scheme is very essential to ensure the protection of routing and the efficiency of WSNs. There are a range of studies to boost trustworthinessbetween routing nodes, such as cryptographic schemes, trust protection, or centralized routing decisions, etc. Nonetheless, most routing schemes are impossible to implement in real cases, because it is challenging to efficiently classify untrusted actions of routing nodes. In the meantime, there is still no effective way to prevent malicious node attacks. In view of these problems, this paper proposes a trusted routing scheme using fusion of deep-chain, and Markov Decision Processes (MDPs) to improve the routing security and efficiency for WSNs. The proposed model relies on proof of authority mechanism inside the blockchain network to authenticate the process of sending the node. The collection of validators needed for proofing is governed by a deep learning technique focused on the characteristics of each node. In turn, MDPs are implemented to select the right next hop as a forwarding node that can transfer messages easily and safely. From experimental results, we can find that even in the 50% malicious node routing environment, our routing system still has a good delay performance compared to other routing algorithms. Keywords: Wireless Sensor Networks Trusted routing Blockchain Markov Decision Processes
Deep-chain
1 Introduction The multi-hop routing technique is one of the critical technologies of WSN. Nevertheless, distributed and dynamic features of WSN make the multi-hop routing vulnerable to the various pattern of attacks and thus seriously affect the security. Classical secure routing schemes target specific malicious or selfish attacks. They are not suitable for multi-hop distributed WSN as they mainly rely on the authentication mechanism and encryption algorithm [1]. Routing nodes cannot recognize the truth of the information routing released by other routing nodes in certain specific routing algorithms. A malicious node may emission a fake queue length information to increase the probability of receiving packets, thereby affecting other routing nodes’ routing scheduling. Current routing schemes find it tricky to identify such malicious nodes, as © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 282–291, 2021. https://doi.org/10.1007/978-3-030-58669-0_26
A Deep Blockchain-Based Trusted Routing Scheme for WSNs
283
it is difficult to accurately distinguish the real-time change in routing information between two routing nodes [2].
Fig. 1. Black hole attack.
Fig. 2. Key elements of blockchain systems.
When a malicious node gets data packets from a neighbor node, it directly discards packets and does not forward data packets to its next-hop neighbor node. This generates a “black hole” data in the network that is hard to detect in WSNs for routing nodes (see Fig. 1) [3]. These malicious nodes may be external intrusion attackers or legitimate internal nodes intercepted by outside attackers. Trust management has recently become a popular way of ensuring the safety of the routing network. This method can make the routing node effectively select the relatively trustworthy routing links. On the other hand, its usage is limited since the trust values of adjacent routing nodes can only be accessed by one routing node that does not entirely follow the distributed multi-hop WSN. The blockchain is a trusted, decentralized, self-organized ledger system ideal for WSNs spread across multi-shops. In recent years, there has been a lot of research on the blockchain technology of routing algorithms [4]. The
284
I. A. Abd El-Moghith and S. M. Darwish
blockchain is a distributed database maintained by multiple nodes and fundamentally deals with trust and security issues. Figure 2. Illustrates the main component of the blockchain. Most crucially, the consensus process is how any accounting node achieves agreement to assess the effectiveness of an interruption avoidance transaction. Several standard consensus algorithms are discussed in [5], among them the proof of authority (PoA) that is a Byzantine Fault Tolerant (BFT) consensus algorithm for authorization and private blockchain. The algorithm relies on a collection of trustworthy entities (i.e., authorities) known as validators. Validators collect, create, and add blocks on the chain for the transactions from customers. We must, therefore, pay particular attention to the choice of validators. A deep-learning model then expands the collection of validators through the properties set for each node [6]. Recently, Reinforcement learning has been used to overcome the routing issues by allowing wireless nodes to observe and gather information from their effective local operating environment, learn and make efficient routing decisions on the fly. A standard decision-making approach is to choose the best next-hop, depending on the current scenario. Many researchers in literature have introduced MDSs as one of the most suitable decisionmaking methods for a random dynamic approach to solving this issue. In this case, each hop in the routing process can be considered as a state, so that each hop decides to select one of the best hops. Then, following sequential decisions, messages can be efficiently and safely transmitted to the destination [7]. This article provides a modified WSN trusted routing scheme to address the abovementioned black hole problem. This current model, as distinct from other solutions in the same type, utilizes proof in authority inside the blockchain network in order to authenticate the node transmission phase. To accomplish this aim, a deep neural network is used to pick the salient nodes that reflect the characteristics of the node dependent validators. In addition, MMDPs is employed for better routing decisions. The remainder of this paper is structured as follows: Sect. 2 provides a summary of current strategies for the reliable routing scheme in WSN. Section 3 describes the proposed model of efficient routing. Section 4 offers several experimental results to evaluate the efficiency of the proposed model. Eventually, we finish the paper and layout future plans in Sect. 5.
2 Related Work Throughout this section, we will review several conventional trustworthy routing strategies to increase route protection and reliability. Then we introduce some related approaches to blockchain development routing schemes. Finally, we analyze current systems implementing MDP to make the right judgment for the delivery of the messages. The authors in [8] suggested a lightweight, low-energy, adaptive hierarchy clustering used to detect suspicious behaviors between nodes. As stated in [9], many proposals was presented to provide a stable spatial routing algorithm for a wireless sensor network to classify the incident and transmit details to the base station. The authors in [10] utilized hierarchical routing algorithms based on several factors like distance between nodes and the base station, the nodes’ distribution density, and residual energy of nodes to design secure routing protocol. In [11], the author has
A Deep Blockchain-Based Trusted Routing Scheme for WSNs
285
suggested secure communication and routing architecture based on the security architecture in the design of the routing protocol. Recently, several researchers have merged the tamper-proof and traceable features of blockchain technologies with routing algorithms to increase the stability of routing nodes. The trusted public key management framework is introduced by De la Rocha et al. [12]. The solution substituted a blockchain protocol with conventional public key infrastructures, thereby removing central authentication and offering a decentralized inter-domain routing network. In [13], Li et al. developed a multi-link, concurrent, blockchain-based communications network. The nodes can be marked as malicious or non-malicious, depending on the particular interconnected factor connectivity tree algorithm and the behavioral characteristics of the blockchain-based data routing nodes. Ramezan et al. used smart contracts to build a blockchain-based contractual routing protocol for routing networks with untrusted nodes [14]. The key theory is that the source node verifies each hop routing arrival at the smart contract, and malicious behavior nodes are registered. The following packets will not move through an established malicious node anymore. However, a malicious node with the to-kens algorithm may falsely claim the packets were received. So, there are safety hazards. Recent advances in MDP solvers have made feasible the solution for large-scale structures and launched new future work into WSNs. For instance, the authors in [15] used MDPs to establish a WSN-controlled transmission power-level routing protocol. By identifying the optimal policy for the MDP configuration, the preferred power source is selected. In [16], the authors proposed a solution for managing the selection of nodes in event-detection applications using mobile nodes equipped with a directional antenna and a global positioning system. The approach fails to offset the energy usage of the next transmitting nodes. To summarize, most of the secure protocols offer security against replay and routing table poisoning attacks, but do not have an adequate means of defense against black-hole attacks. Current blockchain- based routing protocols depend on proof of work principle to authenticate transactions (packets) that allow further overhead handling. Unlike these protocols, the proposed scheme relies on proof of authority for authentication that requires less computational time as it depends only on a few key nodes (validators).
3 The Proposed Secure Routing Scheme This proposed scheme aims primarily to construct a reliable, trustworthy routing for wireless sensor networks, using an integration between the deep chain and Markov decision-making to minimize computational overhead. The main diagram of the proposed scheme is shown in Fig. 3 consisting of three phases; constructing a node data structure, electing a validator through a deep learning model, and optimizing the next hop via MDP. The following subsections discuss each of these phases in detail.
286
I. A. Abd El-Moghith and S. M. Darwish
Fig. 3. The proposed trusted routing scheme
3.1
Step 1: Build Node Data Structure
In the beginning, all sensors have the same processes and have no role as validators or slave nodes. They have a unique identifier (e.g., anonymous addresses), not anonymous sensors. In every transmission, every packet size is equal. In the wireless sensor network, two forms of data transmission exist direct transmission and multi-hop data transfer. See [17] for more details. In this case, the multi-hop data transmission is used. With symmetrical communication, every cell in the WSN has the same initial energy, and the cells keep static. The function of any node initially set to unstated is transformed into the validator or the minion during the initial initiation. Each node in the network holds a data structure with a variety of information pieces on the node property, such as selected action (validator or nor), energy level, coverage, connectivity, and the number of its neighbors, as described in Fig. 4 with an example. See [17] for more details.
4
1
0
1
1
Status (S)
Action
Coverage
Connectivity
Energy level (E)
30
No. neighbors (ζ)
A Deep Blockchain-Based Trusted Routing Scheme for WSNs
287
Fig. 4. Filled node’s data structure
3.2
Step 2: Validators Election Using Deep Neural Learning and Node Authentication
After the data structure has been established for each node, the features of these nodes are used to select the most prominent nodes which are used as validators in the authentication network in the blockchain proof framework. The selection is based on a deep neural network. Deep learning approaches aim to learn functional hierarchies, in which features are built on higher levels utilizing minor levels. The activation potentials provided by each of the individual input measurements of the first hidden layer are used to pick the most suitable functions. The characteristics are chosen to give improved classifications than the initial high dimensional characteristics. Herein, stacked RBMS (Deep Belief Network) is employed. See [18] for more details. The suggested scheme uses the blockchain network, which is essentially a distributed ledger of tamper-resistant, decentralized, and traceable functionality throughout the wireless sensor network to improve the trust and reliability of the routing information. To record related information of each node, the blockchain token transactions are used. The main framework is divided into two parts: the actual routing network and the blockchain network. In general, packets from the source terminal to the destination terminal are transmitted to a routing node Ri, this node then selects the next-hop routing node Rp via the routing policy obtained by the local learning model (MDP in our case). The local learning model continuously searches and collects information on the status of the blockchain network for the appropriate network routing. Upon continuous transmission, the packets will be sent to the targeted routing node Rt and then to the destination terminal. A unique consensus algorithm is given in every blockchain platform to ensure fairness of the blockchain transaction. In our blockchain network, we use the Proof of Authority (PoA) Consensus algorithm that can handle transactions more effectively. Throughout our scenario, there are two types of entities in the PoA blockchain network: (1) the validators are pre-authenticated blockchain nodes; they have advanced authorization. Their particular tasks include smart contract execution, blockchain transactions verification, and block release on the blockchain. In this case, a deep neural network selects the validators. If a malicious validator occurs, it will attack just one contiguous block at the most, while another validator votes will remove the malicious validators. (2) The minions are less-privileged nodes and cannot perform the verification work as validators in the PoA blockchain. Every routing node on our
288
I. A. Abd El-Moghith and S. M. Darwish
system is also a minion, has fewer blockchain privileges, and has a unique blockchain address. They may execute token tracts, activate other contract functions, and check on the Blockchain transaction information. We use specific blockchain tokens throughout the blockchain network to reflect the numerous packets sent to the target nodes. The purpose of a token is to reflect the digital details of the related packets contained in the smart contract. Routing nodes will initiate token contracts to generate tokens and map the packet’s status information. They must perform token transfers on a token basis with each other by way of the token agreement. The token transfers cannot randomly be updated with malicious nodes due to the consensus mechanism between server nodes; to some degree, it is the same token that reflects a packet exchanged through the routing nodes. 3.3
Step 3: Next Hope Selection Using MDPs
The key question of WSN routing is how best to find the next step in every hop. As mentioned in the literature, the key impacts on next hop decision taking involve trust, congestion probability and distance to the target”. Readers looking for more information regarding how to compute these factors can refer to [7]. The optimal next step is a standard decision-making mechanism focused on current circumstance, and we are implementing MDPs to address the issue as it is one of the better choices for a random dynamic system. Any hop on the route can be seen as a state; each hop is determined to pick one of the next best hops. The decision-making in each stage relies on the current scenario, and the entire routing method is efficient in chain decisions. Because hops are not infinite from source to destination, we follow a final Markov decision to solve it. The simple principle is that to find a sequence of better hops by candidates; we must use optimal decision metrics in the routing process as a criterion for the decisions to construct a Finite Markov Decision control system. As the network of the wireless sensor is a global network, central computation does not appeal to accomplish one path. Every node is, therefore, responsible for measuring and making decisions in any hop. Thereby, we find the decision of next-hop as a one-phase decision-making process; the purpose of the decision is to optimize the reward for each move.
4 The Experimental Results The goal of this experimental package is to check token latency efficiency in the proposed scheme. In this scenario, three kinds of malicious nodes are allocated, and they are expected to appear: (i) A malicious node releases a false low (10% of the exact amount) queue length information. Still, it transmits packets to other routing nodes. (ii) A malicious node releases the correct queue length information, but it doesn’t send any package to other routing nodes. (iii)A malicious node releases a false low- queue length information and doesn’t transmit any packet to different routing. We have used the transaction packaging period as an estimation factor of the average latency of the token transaction that tracks the time span that miners have placed on the token
A Deep Blockchain-Based Trusted Routing Scheme for WSNs
289
Average Transaction Latency (ms).
transaction. We have reported PoA and PoW blockchain systems’ token transaction latency with an increase in arrival rate k. The experiment results are shown in Fig. 5, we can see that the latency of the transaction is relatively stable and does not fluctuate much with the arrival rate k. The average transaction latency of our PoA blockchain system was around 0.29 ms, while that of the PoW blockchain system was around 0.52 ms. The results show that our blockchain system based on the PoA consensus mechanism can save about 44% of the transaction latency. Such a token transaction delay is acceptable and has little impact on the routing schedule. It is practical and efficient to use our PoA blockchain system to collect and manage routing scheduling information.
PoW
0.60
PoA
0.50 0.40 0.30 0.20 0.10 0.00
1
2
3
4
5
6
7
8
9
Arrival Rate λ (packets/slot)
Fig. 5. Average transaction latency for both PoA and PoW-based blockchain systems
The second set of experiments was conducted to validate the efficiency of the suggested trusted routing scheme in terms of token transaction throughput. Figure 6 reveals that the token transaction throughput rises steadily as the rate of synchronous requests increases, and the curve progressively flattens out as the throughput reaches its peak. The token transaction throughput of our blockchain system using the PoA consensus mechanism is stable at 3300 concurrent requests per second, whereas that of the classical blockchain system using the PoW consensus mechanism is only stable around 1900 concurrent requests per second. We can see from the experimental results that the PoA-based scheme has more efficient transaction processing ability in the face of high request concurrency depending on the limited number of validators. It is suitable and valid to take the PoA algorithm as the consensus mechanism algorithm of the blockchain system. This PoA blockchain-based routing scheduling scheme can effectively cope with the situation of large concurrent requests in the routing environment.
I. A. Abd El-Moghith and S. M. Darwish
Token Transaction Throughput (times/s)
290
PoA
PoW
3500 3000 2500 2000 1500 1000 500 0 500
1000
1500
2000
2500
3000
3500
4000
Concurrent Request Rate (times/s)
Fig. 6. Throughput of transaction token for both PoA and PoW-based blockchain systems
5 Conclusions and Future Work In this paper, we proposed a trusted routing scheme using a fusion of deep-chain and Markov decision processes to improve the routing network’s performance. We use the blockchain token to represent the routing packets, and each routing transaction is released to the blockchain network through the confirmation of the validator nodes. Routing nodes will track dynamic and trustworthy routing information on the blockchain network by having each routing transaction tracker traceable and tamperresistant. We also define the MDP model in depth for the efficient discovery of the right route and prevent routing ties to hostile nodes. Our test results indicate that our schema will easily remove the assaults of hostile nodes, and the latency of the device is outstanding. In the future, we plan to use our approach for experiments in more routing scheduling algorithms besides the backpressure algorithm to verify its effectiveness and portability. We also plan to incorporate the blockchain-based data validation technology.
References 1. Yang, J., He, S., Xu, Y., Chen, L., Ren, J.: A trusted routing scheme using blockchain and reinforcement learning for wireless sensor networks. Sensors 19(4), 1–19 (2019) 2. Jiao, Z., Zhang, B., Li, C., Mouftah, H.T.: Backpressure-based routing and scheduling protocols for wireless multihop networks: a survey. IEEE Wirel. Commun. 23(1), 102–110 (2016) 3. Ahmed, F., Ko, Y.: Mitigation of black hole attacks in routing protocol for low power and lossy networks. Secur. Commun. Netw. 9(18), 5143–5154 (2016) 4. Gomez-Arevalillo, A., Papadimitratos, P.: Blockchain-based public key infrastructure for inter-domain secure routing. In: International Workshop on Open Problems in Network Security, pp. 20–38, Italy (2017)
A Deep Blockchain-Based Trusted Routing Scheme for WSNs
291
5. Bach, L.M., Mihaljevic, B., Zagar, M.: Comparative analysis of blockchain consensus algorithms. In: Proceedings of the IEEE International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 1545–1550, Croatia (2018) 6. Bogner, A., Chanson, M., Meeuw, A.: A decentralized sharing app running a smart contract on the Ethereum blockchain. In: Proceedings of the 6th International Conference on the Internet of Things, pp. 177–178, Germany (2016) 7. Wang, E., Nie, Z., Du, Z., Ye, Y.: MDPRP: Markov decision process based routing protocol for mobile WSNs. Commun. Comput. Inf. Sci. Book Series 698, 91–99 (2016) 8. Wang, Y., Ye, Z., Wan, P., Zhao, J.: A survey of dynamic spectrum allocation based on reinforcement learning algorithms in cognitive radio networks. Artif. Intell. Rev. 51(3), 493– 506 (2018). https://doi.org/10.1007/s10462-018-9639-x 9. Arfat, Y., Shaikh, A.: A survey on secure routing protocols in wireless sensor networks. Int. J. Wirel. Microw. Technol. 6(3), 9–19 (2016) 10. Deepa, C., Latha, B.: HHCS hybrid hierarchical cluster based secure routing protocol for wireless sensor networks. In: Proceedings of the IEEE International Conference on Information Communication and Embedded Systems, pp. 1–6, India (2014) 11. Khan, F.: Secure communication and routing architecture in wireless sensor networks. In: Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, pp. 647–650, Japan (2014) 12. De la Rocha, A., Arevalillo, G., Papadimitratos, P.: Blockchain-based public key infrastructure for inter-domain secure routing. In: Proceedings of the International Workshop on Open Problems in Network Security, pp. 20–38, Italy (2017) 13. Li, J., Liang, G., Liu, T.: A novel multi-link integrated factor algorithm considering node trust degree for blockchain-based communication. KSII Trans. Internet Inf. Syst. 11(8), 3766–3788 (2017) 14. Ramezan, G., Leung, C.: A blockchain-based contractual routing protocol for the Internet of Things using smart contracts. Wirel. Commun. Mob. Comput. 2018, 1–14 (2018). Article ID 4029591 15. Rehan, W., Fischer, S., Rehan, M., Husain, M.: A comprehensive survey on multichannel routing in wireless sensor networks. J. Netw. Comput. Appl. 95, 1–25 (2017) 16. Kim, H.-Y.: An energy-efficient load balancing scheme to extend lifetime in wireless sensor networks Expert Syst. Appl. Clust. Comput. 19(1), 279–283 (2016) 17. Darwish, S., El-Dirini, M., Abd El-Moghith, I.: An adaptive cellular automata scheme for diagnosis of fault tolerance and connectivity preserving in wireless sensor networks. Alexandria Eng. J. 57(4), 4267–4275 (2018) 18. Wang, T., Wen, C.K., Wang, H., Gao, F., Jiang, T., Jin, S.: Deep learning for wireless physical layer: opportunities and challenges. China Commun. 14(11), 92–111 (2017)
A Survey of Using Blockchain Aspects in Information Centric Networks Abdelrahman Abdellah1,3(B) , Sherif M. Saif1 , Hesham E. ElDeeb1 , Emad Abd-Elrahman2 , and Mohamed Taher3 1
3
Electronics Research Institute of Egypt, Cairo, Egypt {abdosheham,sherif saif,eldeeb}@eri.sci.eg 2 National Telecommunication Institute, Cairo, Egypt [email protected] Computer and Systems Engineering Department, Ain Shams University, Cairo, Egypt
Abstract. A host-centered networking paradigm was used to deal with the needs of early Internet users in today’s Internet architecture. Nevertheless, the use of Internet has grown, with most people mainly involved in collecting large quantities of information independent of their physical location. In addition, these users usually have immediate demands. Hence, the needs from internet have got a new form and changed the web paradigm, where a stronger need for security and connectivity arose. This has driven researchers to think about a fundamental change to the architecture of the Web. In this respect, we reviewed many research attempts that explored information centric networking (ICN) and how it can be coupled with blockchain technology to strengthen the security aspects and to develop an effective model for the creation of the future Internet.
Keywords: Information Centric Network infrastructure · Future Internet
1
· Blockchain · Internet
Introduction
The concept of information centric network (ICN) is a promising common approach to various prospective Internet research projects. The technology involves in-network caching, multi-sided communication via replication, and models of interaction that disassociate senders and receivers [1]. The objective is to provide a better-suited network infrastructure service which can be more resilient to disruption and failure. ICN concept was started by different names such as Named-objects, Named-Data and Information Aware Network [2]. The concept of ICN (Information Centric Network) is a shifting paradigm from user-centric to content-centric one [3]. While Internet is mainly used to share information instead of connecting pair-wise among end users, ICN intends to represent existing and anticipated requirements [3] better than the current Internet architecture c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 292–301, 2021. https://doi.org/10.1007/978-3-030-58669-0_27
A Survey of Using Blockchain Aspects in Information Centric Networks
293
whose networking size is expanding remarkably in terms of memory cost, power computations and processing complexity [4]. Furthermore, the current traditional Internet architecture faces many problems regarding the content availability such as: – – – –
content moved within the site. content moved to a different site or site changed domain. source temporarily unreachable (e.g. server overloaded). content permanently unavailable (e.g. company out of service).
In contrast, ICN adopts the deployment of in-network caching model, and multicast mechanisms by naming information on the network layer to enhance an effective and real time delivery of information to the consumers. However, despite the fact that ICN model is an up-and-coming solution that tackles many issues of the current Internet, ICN still faces many challenges that should be considered. One of these challenges is the possibility of tampering the original data when the publisher registers its content in ICN nodes. Another challenge when a malicious ICN node refuses to forward data to other ICN nodes or users, this will cause additional delay in the network. To end this, with the development of blockchain technology that entered the view of the general public, most of ICN problems can be solved using the powerful security aspects of this technology. By using the blockchain paradigm, all of the executed transactions that carry the ICN node behaviors [5], and then committed to the global blockchain. Each blockchain ledger stores a copy of data, thus, any ICN node cannot deny or refuse the transactions that have been committed by the blockchain. The blockchain can achieve global agreement for the whole sequence of contents. Therefore, an incompatible record/transaction will be removed once it is confirmed. These non-repudiation and non-tamping characteristics of the blockchain guarantee a secure availability of contents in ICN. The rest of the paper is organized as follows. Section 2 proposes some related work and the previous ICN implementations followed by a comparison among them in terms of the applied security mode. In Sect. 3, we propose how the blockchain can be used as a solution in ICN, Sect. 4, the role of blockchain to protect public key system of ICN. Finally, a brief summary and conclusion is provided in Sect. 5.
2 2.1
Related Work and Previous Implementations Data Oriented Network Architecture (DONA)
DONA radically changes naming by substituting flat names for the linear URLs. Like URLs that are connected to specific locations by their DNS portion, even if the data changes, the flat names in Data Oriented Network Architecture (DONA) may be permanent. This allows the caching and replication of information on the network layer, thus increasing the availability of information. DONA allows copies other than the closest to access explicit requests. Furthermore, DONA uses the existing IP mechanism to protect its nodes against security attacks [6]. In this architecture:
294
A. Abdellah et al.
– The publisher sends a REGISTER message that contains the name of the object to its local Resolution Handler (RH), which maps between the content’s name and its Location, to store a pointer to the publisher. – Then, this registration is propagated by the RH to other RHs. – A user sends a FIND message to his local RH in order to find a content, which also spreads this message to other RHs in according to their routing policy until an appropriate registry entry is built. Then, the request follows the pointers created by the RHs to reach the publisher [6]. 2.2
Named Data Networking (NDN)
NDN intends to reconfigure the Internet protocol stack by making the exchange of named data by using a variety of networking technologies. Names in NDN are hierarchical and may be identical to URLs, every name element in NDN may be anything, including a pointed human-readable string or a hash key [7,8]. As shown in Figure the below figure (see Fig. 1), all messages are forwarded hop-by-hop by Content Routers (CRs) to map information names to the output interface(s) that should be used to forward INTEREST messages towards appropriate data sources. The CS serves as a local cache for information/objects that have passed through the CR. The subscriber sends an INTEREST That contains the object’s name of the data requested. When an information object that matches the requested name is found at a publisher node or in a CS, the INTEREST message is neglected and the information is provided in a DATA message. This message is forwarded back to subscriber(s) in a hop-by-hop manner. On the other hand, in NDN, the subscriber is responsible to trust the owner of and his public key that was used for signing [7,9].
Fig. 1. Named data networking (NDN)
2.3
Scalable and Adaptive Internet Solutions (SAIL)
The SAIL architecture supports many extended properties, such as searching for specific data objects by entering some keywords [3]. The SAIL is able to combine elements present in NDN and other approaches. Furthermore, SAIL mode can even operate in a hybrid mode and can be implemented over different routing
A Survey of Using Blockchain Aspects in Information Centric Networks
295
and forwarding technologies [10]. In SAIL, The publisher sends a PUBLISH message with its locator to the local NRS, Name Resolution System (NRS) is used to map object names to locators that can be used to reach the corresponding information object, and then the publisher makes an information object available. The local NRS sends a PUBLISH message to the global NRS [10]. The global NRS stores mapping between the publisher and the local NRS, substituting any earlier mapping like this. If a subscriber is interested in an information object, they can send a GET message to their local NRS consulting the global NRS in order to return a locator for the object. Finally, the subscriber sends a GET message to the publisher, using the returned locator, and the publisher responds with the information object in a DATA message. In line with that The SAIL architecture depends on the hash values in names, which allows self-certification of both the authority and the local part [11]. 2.4
Convergence
The CONVERGENCE architecture (see Fig. 2) has many similarities with NDN, in fact, its prototype has been implemented as an improvement of the NDN model [12]. Subscribers submit INTEREST messages to request an information object, which are forwarded hop-by-hop by the Border Nodes (BNs) to publishers or Internal Nodes (INs) that provide caching (arrows 1–3 and 6). Publishers respond with DATA messages which follow the opposite path (arrows 7–10). Moreover, Convergence adopts the NDN protection strategy per DATA message, and each DATA message is electronically signed [13].
Fig. 2. Convergence model
2.5
Mobility First
This project is funded by the initiative of the United States Future Network Infrastructure [14]. Mobility First offers comprehensive protocols for managing
296
A. Abdellah et al.
mobility and cellular communications, as well as multicast. The structure of Mobility First is based on the isolation of names from their network addresses for all individuals connected to the network (including data objects, devices and services) [16]. A Globally Unique Identifier (GUID) is allocated to each network object in Mobility First via a global naming system that converts humanreadable names into GUIDs. Every Mobility First device has to obtain GUIDs for its objects of information, and its services. In MobilityFirst, all interaction begins via a Global Name Resolution Service (GNRS) with GUIDs that are converted into network addresses through one or more phases. A publisher wishing to make any information available requests for a GUID’s naming system and then signs the GUID in the GNRS with its network name. A GUID is assigned to a collection of addresses of GNRS databases that are contacted using regular routing [15]. The subscriber sends a GET message to its local Content Router (CR) that includes the GUID of the requested object along with its own GUID for the response [14]. Furthermore, MobilityFirst proposes a decentralized trust model for name certification, and can be securely bounded to the entity via cryptographic techniques. 2.6
General Comparison Among ICN Architectures in Terms of Security
The main focus of the proposed architectures is the data integrity rather than the dependability on the media and IP based solutions. All ICN approaches pinpoint security as an essential problem especially in the used encryption mechanisms or in data naming level of the existing Internet structure. As in DONA [6], the self-certifying techniques is used to verify the received data matches the requested data, while NDN and CONVERGENCE models force the consumers to accept the publisher public key and its signature [9]. In Mobility First architecture, GUIDs performs all tasks by using self-certifying hashes and other cryptographic techniques. However, PURSUIT is the only noted model that check the incoming and outgoing packets in the forwarding nodes as well as in the destination nodes [17] (see Table 1). Many implementations of ICN relies on self-certifying names (see Table 1), which need the network nodes to test whether the name on a packet fits the data within the packet. This makes it hard for consumers to determine if that information is what they needed or Who publish, distribute and remove. In fact, ICN solutions depend on cryptographic keys and trusted parties to verify information-name. Thus, the need for key management mechanisms is becoming a vitally necessary. However, very little research papers are done in this scope [1]. Therefore, blockchain technology emerged to be used with ICN to tackle these drawbacks and responses because the formation of trust relationships and effective privacy protection among the different parties involved remains an open issue in ICN.
A Survey of Using Blockchain Aspects in Information Centric Networks
297
Table 1. The used security modes in ICN architectures Architecture name Signatures Self-certifying Packet level authentication
3
DONA
No
Yes
No
NDN
Yes
No
No
SAIL
No
Yes
No
PURSUIT
No
No
Yes
CONVERGENCE Yes
No
No
Mobility first
Yes
No
No
Using Blockchain as a Solution in ICN
The blockchain is a defined as distributed and tamper-proof record that no centralized entity controls, but can be shared and accessed by all members. Each record is called a block, and can be added to the existing blocks as long as the new block is approved by all ledgers in the network. It uses complicated hash functions, blockchain applies data integrity by preventing any alteration, deletion, manipulation, and invalid data from being recorded. It may be stated blockchain fits automatically ICN approaches because of its decentralized nature and its strong security aspects The integration between blockchain technology and ICN have become more popular recently and this combination achieves a significant positive impact in many previous studies like in [18,19]. Throughout ICN, all nodes are operating together to boost the content’s delivery. However, these nodes are vulnerable to many difficulties related to protection and malicious behaviours such as: – Denial of Services (DoS) Attack: the DoS attacks in ICN abuse the stateful forwarding plane, targeting either the intermediate ICN nodes, intermediate nodes or the publishers nodes [5]. – Hijacking: A malicious ICN node is able to declare as publisher invalid paths to any content. Since content requests were directly related to the identified invalid pathways in ICN, they will not be addressed in the vicinity of the malicious node. – Cache pollution: An opponent can often request less popular content in order to destroy the cache based on popularity in ICN. For these purposes the authors of [5] developed Blockchain-Based Information-Centric Networking (BICN). In this model the blockchain is carrying all transactions that transactions record ICN node behaviours. This system will efficiently feed the behavioral reports at ICN nodes into the database to ensure and track any fraudulent entities. By applying blockchain and account its advantages in ICN, many security issues can be solved such as: – the hash of inappropriate content rarely appears in the blockchain because each hash value is verified by blockchain miners.
298
A. Abdellah et al.
– jamming attacks and hijacking attacks can be avoided. When an adversary acts as authorised user to deliver an unnecessary or harmful content. It first needs to send a request message to its local RH and broadcast this request to the parent RH. Both the local and parent RHs need to upload request message and the address of the subscriber to the blockchain (see Fig. 3). After that, the blockchain will verify the credibility of the address and the request message. If it is not legal, this unnecessary or malicious content request will be removed. – Blockchain system is used to find the malicious ICN node for hijacking attacks. When a malicious ICN node declares invalid pathway for any content as a publisher, Blockchain can help to detect and delete the fake register message recorded in the blockchain. Open issues in BICN: – Privacy Concerns: It has been proved that the anonymization of the transaction address still cannot guarantee the anonymity of the users, and some deliberate attacks can still cause threats. – Compared with Bitcoin, the amount of transactions in BICN is far more. Hence, the communication network is challenged by broadcasting transactions. In summary, the main challenge of ICN caching is the distributed in-network cache nature. So, assuring the data integrity can be guaranteed by signing the contents from their producers or publishers and also by authenticating the interested objects for these contents1 . Moreover, the in-network caching proposed by ICN increases the chance of attacks. This is because of the long-lived content caching nature. Another thing, the intermediate aware nodes between the publisher/producer and consumers. So, designing a security model for ICN architecture had some challenges like: the model is not end to end (i.e., non transparent), long-lived cache and secure key management issues either for generation, distribution or refreshing. So, we consider that Blockchain can overcome those challenges and provide an efficient secure way for ICN.
4
The Role of Blockchain in Protecting Public Key System in ICN
As discussed in Sect. 2, many ICN architectures such as NDN use the signature and Public Key Infrastructure (PKI) based on Certificate Authorities (CA) to ensure authentication and data integrity [9]. However, This kind of security aspect suffers from serious security issues. For instance, CA can be compromised by a hacker to use an unauthorized public key and produce malicious data using a counterfeit certificate. In contrast, blockchain can present a significant solution to this problem by applying a decentralized public key management system. This method helps 1
ICN Research Challenges (2014): https://www.ietf.org/archive/id/draft-kutschericnrg-challenges-02.txt.
A Survey of Using Blockchain Aspects in Information Centric Networks
299
many organizations to decide on the status of a shared public key database. To accomplish this aim, blockchain, which enables to record every digital transaction in a secure, transparent, and non-repudiation manner. The main concept is to establish a blockchain public key in each domain (e.g. /com, /net, /gov), which is completed by many miners who validate the originality behind public keys and create blocks containing the certified keys [20]. In other words, instead of relying on an individual CA to issue a certificate, like traditional methods, blockchain allows multiple distributed entities to verify the security certificates by using the majority rule to reach a perfect consensus about the state of the issued public keys. If more than a half of the miners (the validator nodes) reveal a positive result about the issued key then this key is approved and its related transactions are recorded in the blockchain.
Fig. 3. BICN
5
Conclusion
Within this paper we have presented a study of the information-centered networking research area. This paper presented the challenges associated with using ICN, including problems related to the efficient and cost effective delivery of content, the need for consistent and exclusive naming of the data objects, and the significant problem of security that was addressed by several related works that used the blockchain technology to enhance the security aspects of the ICN. After
300
A. Abdellah et al.
listing the challenges, we then stated a set of common information-centric implementations that can be used as building blocks to construct an architecture that satisfies the criteria of the raised issues. We also outlined the characteristics that make blockchain technology a possible candidate for many applications. In the present networking implementations, we described each infrastructure, addressed its legislation, outlined conventional solutions to the needed service and its challenges. Finally, we presented some related works of combining blockchain aspects with ICN approaches. Nonetheless, owing to a variety of problems, technology implementations are still controversial. In potential study projects, these problems should be overcome and the multiple blockchain implementations in specific and real-environments should be verified.
References 1. Nikos, F., Giannis, F., George, C.: Access control enforcement delegation for information-centric networking architectures. In: Proceedings of the Second Edition of the ICN Workshop on Information-Centric Networking, Helsinki, Finland, pp. 85–90 (2012) 2. Kutscher, D.: It’s the network: towards better security and transport performance in 5G. In: IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), San Francisco, USA, pp. 656–661 (2016) 3. Ahlgren, B., Dannewitz, C., Imbrenda, C., Kutscher, D., Ohlman, B.: A survey of information-centric networking. IEEE Commun. Mag. 50(7), 26–36 (2012) 4. Corella, F.: User authentication with privacy and security. Unfunded Proposal to the NSF Secure and Trustworthy Cyberspace (SaTC) Program (2012) 5. Li, H., Wang, K., Miyazaki, T., Xu, C., Guo, S., Sun, Y.: Trust-enhanced content delivery in blockchain-based information-centric networking. IEEE Netw. 33(5), 183–189 (2019) 6. Koponen, T.: A data-oriented network architecture. Teknillinen korkeakoulu (2008) 7. Wu, T.Y., Lee, W.T., Duan, C.Y., Wu, Y.W.: Data lifetime enhancement for improving QoS in NDN. In: ANT/SEIT, pp. 69–76 (2014) 8. Content Centric Networking project. http://www.ccnx.org/ 9. NSF Named Data Networking project. http://www.named-data.net/ 10. FP7 SAIL project. http://www.sail-project.eu/ 11. Xylomenos, G., et al.: A survey of information-centric networking research. IEEE Commun. Surv. Tutor. 16(2), 1024–1049 (2013) 12. FP7 CONVERGENCE project. http://www.ict-convergence.eu/ 13. Salsano, S., Detti, A., Cancellieri, M., Pomposini, M., Blefari-Melazzi, N.: Transport-layer issues in information centric networks. In: Proceedings of the Second Edition of the ICN Workshop on Information-Centric Networking, Helsinki Finland, pp. 19–24 (2012) 14. NSF Mobility First project. http://mobilityfirst.winlab.rutgers.edu/ 15. Vu, T., Baid, A., Zhang, Y., Nguyen, T.D., Fukuyama, J., Martin, R.P., Raychaudhuri, D.: A shared hosting scheme for dynamic identifier to locator mappings in the global internet. In: IEEE 32nd International Conference on Distributed Computing Systems, pp. 698–707, Washington, DC United States (2012) 16. Baid, A., Vu, T., Raychaudhuri, D.: Comparing alternative approaches for networking of named objects in the future internet. In: Proceedings of IEEE INFOCOM Workshops, pp. 298–303 (2012)
A Survey of Using Blockchain Aspects in Information Centric Networks
301
17. Lagutin, D.: Redesigning internet-the packet level authentication architecture. Licentiate’s thesis, Helsinki University of Technology (2008) 18. Ortega, V., Bouchmal, F., Monserrat, J.F.: Trusted 5G vehicular networks: blockchains and content-centric networking. IEEE Veh. Technol. Mag. 13(2), 121– 127 (2018) 19. Mori, S.: Secure caching scheme by using blockchain for information-centric network-based wireless sensor networks. J. Sig. Process. 22(3), 97–108 (2018) 20. Yang, K., Sunny, J.J., Wang, L.: Blockchain-based decentralized public key management for named data networking. In: The International Conference on Computer Communications and Networks, Hangzhou, China (2018)
Health Informatics and AI Against COVID-19
Real-Time Trajectory Control of Potential Drug Carrier Using Pantograph “Experimental Study” Ramy Farag1(B) , Ibrahim Badawy1 , Fady Magdy1 , Zakaria Mahmoud2 , and Mohamed Sallam1 1
2
Helwan University, Cairo 11795, Egypt [email protected] Mechatronics Engineering Department, High Institute of Engineering, Giza, Egypt
Abstract. Microparticles have the potentials to be used for many medical purposes in-side the human body such as drug delivery and other operations. In this paper, we present a teleoperation system that allows an operator to control the position of a microparticle by making a trajectory for the microparticle to follow over a distance using a 2 DOF haptic device in real-time manner. The reference position is sent continuously from the haptic device to the local setup, and in order to achieve real-time, the lowest rate of updating the set point and the lowest rate of updating the control signal are preset. The mechanism controlling the microparticle consists of four electromagnetic coils utilized as wireless actuators to remotely control the motion of the microparticle. For achieving closed loop control, a microscopic camera is used to measure the actual position of the microparticle while flowing in the water. The results showed that the operator can control the microparticle while achieving real-time system response. Moreover, an auto-tuned control system is deployed to guarantee the position control with maximum settling error less than 8 µm on step response experiment, making the system a candidate to perform further evaluations inside the microfluidic channels. Keywords: Drug carrier · Object tracking Real-time system · Control · Auto-tuning
1
· Image processing ·
Introduction
Microparticles can be coated with drugs and injected inside the human body to work as a drug delivery robot or to do microassembly operations [1]. Under the influence of the magnetic fields, the particle could be positioned at the required coordinates, where the medicine is needed to be positioned. Such a micro-manipulation process requires having a precise micro-robotic system that allows the physician to remotely control the motion of the particle and at the c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 305–313, 2021. https://doi.org/10.1007/978-3-030-58669-0_28
306
R. Farag et al.
same time feel the interaction forces between the particle and the environment inside the human body. Several researchers proposed different systems for the wireless micro-manipulation of the magnetic particles as in [1] and [2]. Kummer et al. [3] demonstrated a system with 8 coils to control a microrobot that has a diameter of 500 µm. A haptic interface was presented in [4] to enable the physician to feel the interaction forces arising from the contact between the particle and a microberad without visual feedback. Sun et al. [5] developed a similar but autonomous system with visual servoing and precision position control.
Fig. 1. Our system, comprising 4 coils, microparticle and its reservoir, real-time embedded board and microscopic camera
However, the researchers who have worked on similar systems as this have not emphasized the design details of their controllers or the exact solution for selecting the gains of their controller [6–9]. The system’s parameters are dynamic, making an accurate modeling is not feasible. Such uncertainty due to many reasons, one is due to the presence of the microparticle in a water contain-er. The microparticle should flow on the top of the water; however, by time it continues to submerge in water, till it fully falls to the reservoir’s bottom. Also, because the microparticle is affected by its surrounding such as the residual magnetism left in the coils. These conditions affect the system’s parameters and arise uncertainty as the time flows. In this paper, we present our developed system (see Fig. 1 and Fig. 3) in which a microparticle with a diameter 100 µm is controlled to follow a certain trajectory provided by the operator. We also developed the system to response in real-time manner, making a system working in real-time, makes it more reliable in working in some critical conditions [16]. We also tackled the problem of selecting the controller’s gains, by deploying an auto-tuning optimization algorithm to select
Real-Time Trajectory Control of Drug Carrier
307
the parameters. The system has two separate devices; slave device that consists of 4 coils generating magnetic field on the microparticle, and a 2 DOF haptic device that is used for controlling the trajectory of the microparticle. More details about the system’s are given in the next section. The microparticles’ (Fig. 2) control algorithm is built on MyRio, which is real-time embedded evaluation board.
Fig. 2. The paramagnetic particles used in the experiments. The particles have diameters 100 µm [6]
Our previous work on similar systems [18–21] which include deploying control schemes for controlling the systems’ output and performance.
2
Experimental Setup
The system comprises of real-time embedded evaluation board and real-time controller running over it, 4 coils its driver to control and the microparticle in two-dimensional space, pantograph robot.
Fig. 3. The system setup layout
308
2.1
R. Farag et al.
Real-Time Controller
The MyRio embedded board has the feature to work in real-time, thus we could achieve real-time controller. The whole monitoring and control algorithm are divided into multiple tasks (layers) (see Fig. 4 and Fig. 5), some work on the PC connected to the MyRio board and other work on the MyRio board. That causes computation parallelism, which minimize the whole computational time.
Fig. 4. Main layer of the control algorithm
Fig. 5. Pantograph’s inverse kinematic layer
The PID controller used is auto-tuned to run an optimization algorithm to calculate the gain of the controller, which is minimize the settling error, the rise time and settling time. Also, the minimum rate of updating the control signal and calculating the new set point position via the pantograph is set to be 20 update per second, thus real-time control is achieved. 2.2
Pantograph Robot
It is 4 link system, with two encoders to determine the two main angles of the its configuration (see Fig. 1). The pantograph system can be used as two degree
Real-Time Trajectory Control of Drug Carrier
309
of freedom robot. However, in our system it is used to enable the operator to control the trajectory of the microparticle by manipulating its end-effector. The pantograph has 4 main angles. However, two of them are passive angles, which can be calculated using the other two angles as follows [17]: l cosq1 + l2 cosq2 − l3 cosq3 − l4 cosq4 − lo f1 = 1 (1) f= f2 l1 cosq1 + l2 cosq2 − l3 cosq3 − l4 cosq4 The 2 passive angles are q2 and q3 (see Fig. 6), all the link lengths are known, as well as, q1 and q4 by the pictographs encoder. q2 and q3 can be calculated iteratively using Newton-Raphson method as follows: qpi+1 = qpi −
δf f δqp
(2)
Where p represents the angle’s index (2–3) and (1) is the holonomic constrains of the pantograph. However, the main use of the pantograph is to use it to make a trajectory for the microparticle to follow, so it is required to also obtain the x-y coordinate of the pantograph’s end-effector. That can be done by two methods, one by using Newton-Raphson method on the holonomic constrains (1) and the following equations to determine the x-y coordinate: x − l1 cosq1 − l2 cosq2 = 0
(3)
y − l1 sinq1 − l2 sinq2 = 0
(4)
Fig. 6. Pantograph’s Kinematics [17]
310
R. Farag et al.
The second method can be used to obtain the x-y coordinate is by the inverse kinematics method (see Fig. 5), which is more recommended than the NewtonRaphson’s method, since Newton-Raphson’s method is computational more expensive and does not determine the exact solutions [17].
3
Results
Four experiments have been conducted, the first two of them to evaluate the system’s response when it is given a trajectory to follow (see Fig. 7 and Fig. 8). The second two to evaluate the system’s response given a single step input (see Fig. 9). One of the first two experiments and one of the second two experiments are done with the PID controller’s gains “P1 ” (Fig. 7), that are calculated using the optimization algorithm to minimize the cost function (5), where ET is a vector holding the error at every sampling time. The other two experiments are done with the initial PID controller’s gains “P2 ” (Fig. 8). F (Pk ) = E T E
(5)
Fig. 7. The microparticle’s response given a trajectory to follow with controller’s gains “P1 ”
The maximum error when the microparticle is following the operator’s manipulation over the pantograph and the controller’s parameters are set to “P1 ”, is approximately equal to 120 µm and the mean absolute error is equal to 27 µm. while the maximum error is approximately equal to 541 µm, when the controller’s gains are set to “P2 ” and the mean absolute error is equal to 111 µm.
Real-Time Trajectory Control of Drug Carrier
311
Fig. 8. The microparticle’s response given a trajectory to follow with controller’s gains “P2 ”
Fig. 9. The step responses of the microparticle, one using the controller’s gains “P1 ” and the other using “P2 ”
The other two experiments are to evaluate the optimization algorithm parameters “P1 ” with comparison to the initial parameters “P2 ”, when the microparticle is given a step input. The rise-time and settling time of the microparticle given “P1 ” gains are 2.92 and 29 ms respectively, while the overshoot is equal to 2.89%. On the other hand, the rise time of the microparticle given “P2 ” gains is 5.03 ms and the settling time is more than 196 ms and the overshoot is equal to 9.41%.
312
4
R. Farag et al.
Conclusion and Future Work
In this paper we propose the use of an optimization algorithm to compute the gains of the PID controller and its deployment in real-time manner. Also, we investigated the system’s response, when the microparticle is given a single step input and when its is given an online trajectory to follow using a pantograph robot. The system’s settling error when it is given a single point is less than 8 µm, that outperforms some work of the previously published work that we mentioned before. Also, the results show that the operator can control the microparticle via the pantograph in micro-scale precision, while achieving real-time system response. This performance makes the use of optimization algorithms to compute the gains of the controllers more recommended than the approaches taken by the previously mentioned work, also makes this system a candidate to perform further evaluations inside the microfluidic channels.
References 1. Khalil, I., Brink, F., Sukas, O., Misra, S.: Microassembly using a cluster of paramagnetic microparticles. In: IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany (2013) 2. Abbott, J., Nagy, Z., Beyeler, F., Nelson, B.: Robotics in the small, part I: microbotics. IEEE Robot. Autom. Mag. 14(2), 92–103 (2007) 3. Kummer, M., Abbott, J., Kratochvil, B., Borer, R., Sengul, A., Nelson, B.: OctoMag: an electromagnetic system for 5-DOF wireless micromanipulation. IEEE Trans. Robot. 26(6), 1006–1017 (2010) 4. Lu, T., Pacoret, C., Heriban, D., Mohand-Ousaid, A., Regnier, S., Hayward, V.: KiloHertz bandwidth, dual-stage haptic device lets you. IEEE Trans. Haptics 10(3), 382–390 (2016) 5. Sun, Y., Nelson, J.: Biological cell injection using an autonomous microrobotic system. Int. J. Robot. Res. 21(10–11), 861–868 (2002) 6. Keuning, J., de Vries, J., Abelmann, J., Misra, S.: Image-based magnetic control of paramagnetic microparticles in water. In: International Conference on Intelligent Robots and Systems. IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA (2011) 7. Khalil, I., Keuning, J., Abelmann, L., Misra, S.: Wireless magnetic-based control of paramagnetic microparticles. In: International Conference on Biomedical Robotics and Biomechatronics, Roma, Italya, June (2012) 8. Khalil, I., Metz, R., Reefman, B., Misra, S.: Magnetic-based minimum input motion control of paramagnetic microparticles in three-dimensional space. In: International Conference on Intelligent Robots and Systems, Tokyo, Japan (2013) 9. El-Gazzar, A., Al-Khouly, L., Klingner, A., Misra, S., Khalil, I.: Non-contact manipulation of microbeads via pushing and pulling using magnetically controlled clusters of paramagnetic microparticles. In: International Conference on Intelligent Robots and Systems, Hamburg, Germany, 2 October 2015 10. Du, X., Htet, K., Tan, K.: Development of a genetic-algorithm-based nonlinear model predictive control scheme on velocity and steering of autonomous vehicles. IEEE Trans. Ind. Electron. 63(11), 6970–6977 (2016)
Real-Time Trajectory Control of Drug Carrier
313
11. Guazzelli, P., Pereira, W., et al.: Weighting factors optimization of predictive torque control of induction motor by multiobjective genetic algorithm. IEEE Trans. Power Electron. 34(7), 6628–6638 (2019) 12. Xu, F., Chen, H., Gong, X., Mei, Q.: Fast nonlinear model predictive control on FPGA using particle swarm optimization. IEEE Trans. Ind. Electron. 63(1), 310– 321 (2016) 13. Smoczek, J., Szpytko, J.: Particle swarm optimization-based multivariable generalized predictive control for an overhead crane. IEEE/ASME Trans. Mech. 22(1), 258–268 (2017) 14. AliZamani, A., Tavakoli, S., Etedali, S.: Fractional order PID control design for semi-active control of smart base-isolated structures: a multi-objective cuckoo search approach. ISA Trans. 67, 222–232 (2017) 15. Bououden, S., Chadli, M., Karimi, H.: An ant colony optimization-based fuzzy predictive control approach for nonlinear processes. Inf. Sci. 299, 143–158 (2015) 16. Arridha, R., Sukaridhoto, S., Pramadihanto, D., Funabiki, N.: Classification extension based on IoT-big data analytic for smart environment monitoring and analytic in real-time system. Int. J. Space-Based Situated Comput. 7(2), 82–93 (2017) 17. Khalil, I., Abu Seif, M.: Modeling of a Pantograph Haptic Device. http://www. mnrlab.com/uploads/7/3/8/3/73833313/modeling-of-pantograph.pdf 18. Sallam, M., Ramadan, A., Fanni, M.: Position tracking for bilateral teleoperation system with varying time delay. In: the 2013 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Wollongong, pp. 1146-1151 (2013). https://doi.org/10.1109/AIM.2013.6584248 19. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of master salve system using optimal NPID and FOPID. In: 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, pp. 485-490 (2019). https://doi.org/10.1109/ISIE.2019.8781129 20. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of master slave robotics system using optimal control schemes. IOP Conf. Ser.: Mater. Sci. Eng. 610, 012056 (2019). https://doi.org/10.1088/1757-899X/610/1/012056 21. Sallam, M., Ramadan, A., Fanni, M., Abdellatif, M.: Stability verification for bilateral teleoperation system with variable time delay. Int. J. Mech. Mech. Eng. 5, 2477–2482 (2011)
Early Detection of COVID-19 Using a Non-contact Forehead Thermometer Ahmed G. Ebeid1, Enas Selem2(&), and Sherine M. Abd El-kader3 1
Faculty of Engineering, Higher Technological Institute 10th of Ramadan, Cairo, Egypt [email protected] 2 Faculty of Science, Suez Canal University, Ismailia, Egypt [email protected] 3 Computers & Systems Department at Electronics Research Institute, Giza, Egypt [email protected]
Abstract. In this paper a non-contact thermometer is designed. It calculates the temperature from the infrared radiation produced by the subject being measured. It can be used for industrial purpose as well as for medical purposes. In medical purpose it is used to measure the core body temperature from forehead temperature depends on ambient air temperature that affects in the heat transfer coefficient. Heat transfers by conduction from the core body temperature and by convection from forehead to ambient air. The overall heat transfer coefficient is determined empirically depends on studies that had been made by measuring forehead temperature and core body temperature in various ambient air temperatures for hundreds of persons. The accuracy of the proposed thermometer is ±0.3 °C compared with Rossmax HA-500 device which depends on the accuracy of the studies and its results in various conditions. Keywords: Corona virus Non-contact thermometer Core body temperature Forehead Internet of Things (IoT) Fifth Generation (5G) Local manufacturing
1 Introduction The sudden appearance of the Coronavirus caused global panic and occupied the minds of all researchers around the world. Egypt is at the forefront of the world’s countries that took all precautions and measures to face the Coronavirus crisis, and the Egyptian health system provided a model in dealing with the Coronavirus crisis professionally in accordance with the instructions of the World Health Organization (WHO). It also encourages researchers in all fields to confront the Coronavirus, many teams trying to produce respirators and others in the production of smart masks, or in the production of a non-contact thermometer. Actually, there is a great need for the non- contact thermometer [1, 2] to early detect coronavirus by rapidly discovering the person who suffers from high temperature. It is used to remotely measure the core body temperature, it can measure the core body © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 314–323, 2021. https://doi.org/10.1007/978-3-030-58669-0_29
Early Detection of COVID-19
315
temperature within few seconds, few centimeters away from the body [3], records approximately 30 readings, and gives an alarm at 38 c. The prices of this kind of thermometers are ranging from 100 to 200 dollars. IR thermometers have two types namely medical and industrial thermometer. Industrial thermometers [4] can be used as a medical thermometer [5], but it gives inaccurate readings [6]. Many companies in Egypt imported large quantities of thermometers with large amounts money and still there is a need for more and more. So, we designed and manufactured non-contact IoT thermometer to help in the early detection of the coronavirus with reasonable cost. Generally, the health care services are suffering from enormous challenges such as, high cost of the devices, the increase number of patients, a wide spread of chronic diseases, and a lack of healthcare management resources. The utilization of IoT & 5G in medical services [7, 8] will solve a lot of these problems by introducing the coming benefits: Easy access to health care service by both of patients and doctors, smooth incorporation with several technologies, ensure the analysis and processing of massive data, effective utilization of healthcare resources, provide real time and remote monitoring established by the joint healthcare services, ensure real-time interaction between doctors and patients, and authorize health care services. All of these benefits of the IoT & 5G services will improve the performance of healthcare applications by presenting different services to hospital, clinics and nonclinical patient environments. IR thermometers are of two types, industrial and medical infrared thermometers [9, 10]. Table 1 summarizes the difference between clinical and industrial thermometers. Table 1. Difference between clinical and industrial thermometers thermometer Clinical thermometer Short range as it used for human body Used for human body Used in hospitals, homes and airports The range from 35 °C to 42 °C Give alarm at 38 °C Measure core body temperature within 3 s, 10 cm away from the body Price is approximately 2000 L.E
Industrial thermometer Broad range as it used for different substance Used for solid, liquid and gas substances Used in laboratories and companies The range is −10 °C to 300 °C No alarm Measure the skin temperature which is differed approximately 3 °C from the core body temperature 300 L.E
In this paper we are targeting to design a local manufactured non-contact thermometer to help in the early detection of coronavirus. The designed thermometer has two modes namely clinical mode and industrial mode, in clinical mode the core body temperature is accurately estimated with accuracy ±0.3 °C which depends on the accuracy of the studies in Rossmax HA-500 and its results in various conditions. The remainder of the paper is arranged as follows: In Sect. 2, related research is surveyed; method of our work is introduced in Sect. 3, whereas connection between IR thermometer and a WBAN network is presented in Sect. 4. In Sect. 5, results are analyzed. Eventually, the conclusions are outlined in Sect. 6.
316
A. G. Ebeid et al.
2 Related Research Measuring the core degree of core body temperature TC without surgery (non- invasively) is one of the most important research topics. Conventional techniques of estimating TC are usually not proper for the continual applications for the time of physical activity because it is invasive and it is not accurate enough. The rectal measurement of TC or measuring at the bottom border of the esophagus (close to the heart) is annoying particularly in case of using sensors connected via wired connection. Axillary, orally, and ear (tympanic membrane) temperatures are not accurate especially through the practice of actions. The alternative method for measuring TC is the use of ingestible telemetric thermometers but they are very expensive for daily use by a large number of persons. Although, its great importance, easy, cheap, and accurate measurement of TC still a great challenge. Therefore, a lot of techniques have been investigated for the noninvasive TC measurement. The invasively measurement of core body temperature is conducted using several experiments with varying assumption. The experiment is done in each method with a varying number of volunteers. The volunteers are varying in characteristics such as age, weight, height, and body fat. They wear different clothes with different thermal and vapor resistance. The volunteers were put under different test scenarios such as standing rest, walking on the treadmill for different times. The volunteers engaged in a different number of test sessions. The volunteers enter the test room with different room conditions such as snug [50% Relative Humidity (RH), 25 ° C], hot-dry (20% RH, 40 °C), and hot-humid (70% RH, 35 °C)… etc. TC was measured within different places such as pectoralis, sternum forehead, left scapula, left thigh, and left rib cage. The method that estimate TC non-invasively are varying in the factors which TC depends on such as skin temperature, ambient temperature, heat flux, heart rate…etc. The estimated TC was compared to the observed TC . The observed TC is a rectal temperature or taken through a thermometer pill. In [11] the protocol estimates the core body temperature TC based on three factors which are skin temperature Ts , ambient temperature Ta and Heat Flux (HF) or heat loss which is known as the heat transferred per unit time per unit area or from or to an object. Several linear regression techniques were presented to predict TC which is dependent variable from two dependent variables HF and Ts . In [12] TC was estimated exactly as [11] based on Ts , HF but the main difference that it take into consideration the Heart Rate (HR). In [13] the core temperature is estimated using Kalman Filtering (KF) which consists of training and validation datasets. These data sets are composed of the data from test volunteers. The parameters of KF model were predicted from the practice dataset using linear regression of TC against Ts , HF, and HR. KF method used to estimate TC consists of three states: state-transition state (A), noise correlated with each state and observation state (C). KF approach achieves perfect assessment of core body temperature in case of the availability of two of those three inputs (Ts , HF and HR). In [14] a Kalman filter was used to suits the parameters of its model to each person and gives real-time Tc estimates. This model uses the Activity (Ac ) of the person, HR, and Ts , also uses two environmental variables, Ta and Relative Humidity (RH), to estimate the person’s TC in real time. There are several methods of estimating core body temperature that has already experimentally applied as a patent [15, 16]. In [17], the patented introduce the
Early Detection of COVID-19
317
procedure of estimating the core body temperature including: determining heat flux from an aimed surface area of the body that way giving the surface temperature; estimating the core temperature of the body based on two factors (ambient temperature and surface temperature), the function including the skin heat loss to the environment. In [18, 19], this invention is the thermometer that aimed to measure the body cavity temperature utilizing infrared sensing methods. Infrared radiation released by tissue surfaces gathered by an infrared lens and directed to an infrared sensor. This infrared sensor produces a signal voltage based on the difference of temperature between the body tissues being spotted and the infrared sensor. To detect the correct tissue temperature, a supplementary sensor is utilized to determine the ambient temperature of the infrared sensor and this ambient temperature is combined to the signal voltage. In [20], this invention presents an IR thermometer which introduces a method for measuring core body temperature based on contact temperature, ambient temperature, and humidity degree. In [21], this invention presents a forehead the non-contact thermometer that measures the core body temperature using the thermal radiation of the forehead. The core body temperature is calculated as a function of Ta and Ts .
3 Method The designed IR thermometer is a non-contact temperature measuring device that reveals infrared radiation from the surface of all objects and calculates it to relevant temperature readings. It enables the user to measure temperature quickly without touching the measuring object. It can be used to find overheated equipment and electrical circuits, and also it can be adjusted for medical purposes to measure the core body temperature. The detailed description of the designed IR thermometer will be covered in the next subsection. 3.1
The Proposed Infrared Thermometer Component
The components of the designed Infrared Thermometer, as shown in Fig. 1 and 2 are: 1. 2. 3. 4. 5. 6. 7.
0.96” LCD 8-bit microcontroller IR temperature sensor Buzzer Trigger push button Power on/off switch Battery.
318
A. G. Ebeid et al.
Fig. 1. IR Thermometer’s component
3.2
Fig. 2. The designed thermometer
IR Thermometer Working Theory
The designed IR thermometer can work as an industrial thermometer and medical thermometer with two modes switch namely industry mode and medical mode. In the case of medical mode, an additional calculation is made to estimate the core body temperature, the range is reduced to be from 35 to 42, the emissivity changed to 0.98 and the accuracy is raised. In the case of industry mode, the estimation of the core body temperature is skipped and the range is raised by changing the amplification of the signal. The IR thermometer works as follows: It records various readings. This reading is taken by measuring the radiation produced from the object. The greatest reading is chosen as a parameter in the equation containing the reading of the thermopile sensor, the reading of Negative-Temperature Coefficient (NTC) which read the ambient air temperature and the emissivity of the surface. This equation is used to measure the actual temperature of the object. In the case of medical use, the measurement of the core body temperature is made. Core body temperature is estimated using different techniques and using various factors. In the proposed IR thermometer, the core body temperature depends on three factors: the ambient air which measured by on board temp sensor, convection heat transfer from skin to ambient air, the emissivity of forehead skin. The proposed IR thermometer measures the temperature at 1 cm away from the object. This is because the field of view of the used sensor is very large as it ranges from 80 to 100 mm. This distance can be increased using the optics system, using convex lenses minimize the field of view but it must be suitable for the wave length of the radiation produced from the object. The wavelength of the sensor used in the proposed IR thermometer ranges from 5.5 to 14 µm. So, the used material of the convex lens with the same wavelength of the used sensor. Germanium is used for medical use but in the industrial purpose, the Fresnel lens is used. Determining the distance between a thermometer and a person depends on the type of the convex lens.
Early Detection of COVID-19
3.3
319
Core Body Temperature Estimation
As mention above, in the designed thermometer, the core body temperature is calculated depend on ambient temperature, skin temperature, and the emissivity of the skin. First, the skin temperature is calculated with the Thermopile sensorThe thermopile voltage VTP is then determined by: VTP ¼ S: 2obj : Tobj4 Tsen4
ð1Þ
Where: VTP is the thermopile output voltage, S is the instrument factor, 2obj is the emissivity of the object Tsen is the ambient Temperature, Tobj is the object Temperature. Finally, the core body temperature is calculated as follows: Tc ¼ hnqcðTs Ta Þ þ Ts
ð2Þ
Where: Ts is the skin temperature Ta is the ambient temperature, C is the blood specific heat, h is the coefficient of a radiation view factor between ambient and the skin tissue q is the blood flow per unit area. 3.4
The Designed Non-contact Thermometer
Operation: After turning on the power switch, the IR temperature sensor communicates with the microcontroller, after pressing the trigger push button the microcontroller reads the temperature from the sensor then send it to the LCD, and then the buzzer alerts the object reading temperature, the temperature will be shown on the LCD and lasts until the next push button trigger for new temperature reading. The designed thermometer work flow is shown in Fig. 3. Specification • • • • • •
Measurement range: −70 to 380 °C Accuracy: ±0.3 °C Resolution: 0.1 °C Emissivity: 0.98 (For Human Skin) Distance spot ratio: 1:2 9-volt battery.
320
A. G. Ebeid et al.
Start Measure forehead temperature Temperature is in range Yes
No
LCD print low
Yes
Body Yes temperature 38.2> No
Estimate core body temperature
Body temperature >37.8 No Turn on green led
Turn on red led
Turn on orange led
LCD print core body End Fig. 3. the designed thermometer work flow
4 Connection Between IR Thermometer and a WBAN Network The non-contact thermometer can be used as a wireless temperature sensor in a WBAN system [22] that gives an alarm when the person temperature is 38 °C or more. This IoT non-contact thermometer will be connected to the mobile gateway which is connected to a database in the medical server, the persons whose temperature is 37 °C will be colored with green mark while the assumed infected persons whose temperature exceed 38 °C will be colored with red as shown in Fig. 4. This medical server will be located in the ministry of health to continually follow up on the temperature of a large number of people. In order to meet the 5G and the IoT requirements [23, 24], The AESAS algorithm [25] will be used to increase the capacity and density of the network to serve a large number of devices effectively without any degradation in the quality of services, and to ensure that the priority is satisfied among all types of mobile gateways, increased the overall throughput of the network due to decreasing of the data drop rate, and decreased the delay or latency for sensitive-delay applications and services. The first part of the system had been completed by designing the non-contact thermometer; the second part will be the connection of all of these devices by a WBAN system [22] to ensure reliable and fast delivery of the medical data to the medical server.
Early Detection of COVID-19
321
Fig. 4. Tracking of infected people with Coronavirus
5 Result The studies that had been made through measuring the forehead temperature and the core body temperature for many persons under various ambient air temperatures are difficult to obtain and costly high then we compare our results with another device (Rossmax HA-500) as its output represents these studies as its algorithm based on it. By comparing the infrared thermometer estimation by Rossmax HA-500 [26] as shown in Fig. 5, it is found the maximum variation in estimating the core body temperature is 0.23 degree by calibrating both of them in National Institute of Standards (NIS). Rossmax HA-500 has been tested clinically in various huge education hospitals depend on ASTM E1965-98:2009 protocol regulatory standard, covering enough feverish and normal core body temperature related to satisfied clinical rrepetitions and accuracy measurement with comparison to the competitor oral temperature estimation reading. The accuracy of the estimation depends on the accuracy of the studies and its results in various conditions. The overall accuracy of the medical forehead thermometer is the combination of the accuracy of measuring the forehead temperature with the thermopile sensor and the accuracy of estimation of the core body temperature. Each manufacturer make algorithm for the forehead thermometer device either depending on estimating equation that calculate the core body temperature after measuring ambient air temperature and forehead temperature or depending on look up table that had been saved before in EEPROM in the device for the corresponding core body temperature to the forehead temperature and ambient air temperature.
322
A. G. Ebeid et al.
Fig. 5. Our designed thermometer estimation compared to Rossmax HA-500
6 Conclusions In this paper a local manufacturing easy use, low price non-contact thermometer. It can be used for industry and medical purpose. The accuracy 0.23 is produced which is considered a perfect accuracy for medical use, with a certificate from the National Institute of Standards in Egypt. The non-contact thermometer could be used as a part of WBAN system for early, real time detection and tracking of people suffers from a fever which might infected with corona virus. In the future work the non-contact thermometer will be experimentally tested with WBAN system that will be connected to a database in the medical server, and track the infected persons with coronavirus whose have a fever with red color. Acknowlegment. We would like to express our special thanks of gratitude to the National Institute of Standards (NIS) in Egypt for their support in calibration process, as well as Eng. Mohammed Ibrahim.
References 1. Sebban, E.: Infrared Noncontact Thermometer, US. Patent 549,114, issued 21 August 2007 2. Yelderman, M., et al.: Noncontact Infrared Tympanic Thermometer, US. Patent 5,159,936, issued 3 November 1992 3. Wenbin, C., Chiachung, C.: Evaluation of performance and uncertainty of infrared tympanic thermometers. Sensors 10(4), 3073–3089 (2010) 4. Cascetta, F.: An evaluation of the performance of an infrared tympanic thermometer. Measurement 16(4), 239–246 (1995) 5. Jang, C., Chou, L.: Infrared Thermometers Measured on Forehead Artery Area, US. Patent US 2003/0067958A1, issued 10 April 2003 6. Teran, C.G., Torrez‐Llanos, J., et al.: Clinical accuracy of a non-contact infrared skin thermometer in paediatric practice. Child Care Health Dev. 38(4), 471–476 (2012)
Early Detection of COVID-19
323
7. Dhanvijay, M.M., Patil, S.C.: Internet of Things: a survey of enabling technologies in healthcare and its applications. Comput. Netw. 153(22), 113–131 (2019) 8. Alam, M.M., Malik, H., Khan, M.I., Pardy, T., Kuusik, A., Le Moullec, Y.: A survey on the roles of communication technologies in iot-based personalized healthcare applications. IEEE Access 6(4), 36611–36631 (2018) 9. ASTM Standard E.: 1965, Standard specification for infrared thermometers for intermittent determination of patient temperature (2003) 10. Fraden, J Diego, S.C.: US, Medical Thermometer For Determining Body Core Temperature US. Patent 7,785,266 B2, issued 31 August 2010 11. Xu, X., Karis, A.J., et al.: Relationship between core temperature, skin temperature and heat flux during exercise in heat. Euro. J. Appl. Physiol. 113, 2381–2389 (2013) 12. Welles, A.P., Xu, X., et al.: Estimation of core body temperature from skin temperature, heat flux, and heart rate using a Kalman filter. Comput. Biol. Med. 5(21), 1–6 (2018) 13. Eggenbereger, P., Macrae, B.A., et al.: Prediction of core body temperature based on skin temperature, heat flux, and heart rate under different exercise and clothing conditions in the heat in young adult males. Front. Physiol. 10(9), 1–11 (2018) 14. Laxminarayan, S., Rakesh, V., et al.: Individualized estimation of human core body temperature using noninvasive measurements. J Apll Physoil 124(6), 1387–1402 (2017) 15. Zou, S., Province, H.: Thermometer, US. Patent D837, 668 S, issued 8 January 2019 16. Roth, J., Raton, B.: Contact and Non-Contact Thermometer, US. Patent/000346, issued 2 January 2014 17. Pompei, F., Boston, M.A.: Ambient and Perfusion Normalized Temperature Detector, EP 0 991 926 B1, issued 12 December 2005 18. Fraden, J., Jolla, L.C.: Infrared Thermometer, US. Patent 6,129,673, issued 10 October 2000 19. Fradenand, J., Calif, L.: “INFRARED THERMOMETER.: US. Patent 6, 129, 673, issued 10 October 2000 20. Jones, M.N., Park, L.F, et al.: Infrared Thermometer, US. Patent 0257469, issued 15 October 2009 21. Pompei, F., Boston, M.A.: Temporal Artery Temperature Detector, US. Patent 6,292,685, issued 18 September 2001 22. Selem, E., Fatehy, M., Abd El-Kader, S.M., Nassar, H.: THE (Temperature Heterogeneity Energy) aware routing protocol for iot health application. IEEE Access 7, 108957–108968 (2019) 23. Hussein, H.H., Abd El-Kader, S.M.: Enhancing signal to noise interference ratio for device to device technology in 5G applying mode selection technique. In: 2017 International Conference on Advanced Control Circuits Systems (ACCS) Systems & 2017 International Conference on New Paradigms in Electronics & Information Technology (PEIT), Alexandria, pp. 187–192 (2017) 24. Salem, M.A., Tarrad, I.F., Youssef, M.I., Abd El-Kader, S.M.: An adaptive EDCA selfishness-aware scheme for dense WLANs in 5G networks. IEEE Access 8, 47034–47046 (2020) 25. Salem, M.A., Tarrad, I.F., Youssef, M.I., Abd El-Kader, S.M.: QoS categories activenessaware adaptive EDCA algorithm for dense IoT networks. Int. J. Comput. Netw. Commun. 11 (03), 67–83 (2019) 26. RossMax HA500 Thermometer Instruction Manual, Model: HA500 www.rossmax.com
The Mass Size Effect on the Breast Cancer Detection Using 2-Levels of Evaluation Ghada Hamed(&), Mohammed Abd El-Rahman Marey, Safaa El-Sayed Amin, and Mohamed Fahmy Tolba Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt [email protected]
Abstract. Breast cancer is one of the most dangerous cancers and with the tremendous increase in the mammograms taken daily, computer-aided diagnosis systems play an important role for a fast and accurate prediction. In this paper, we propose three phases to detect and classify breast tumors. First, is the data preparation for converting DICOM files to images without losing data. Then, they are divided into mammograms with large and small masses representing the input to the second model training phase. The third phase is the model evaluation through two testing levels, first is the large masses checking and the second level is the small masses checking to output the detection results for large and small masses. The two testing levels using the trained small and large masses model overcomes the recent YOLO based detection work and the combined sizes trained model by achieving an overall accuracy of 89.5%. Keywords: Breast cancer detection Digital mammograms classification You Only Look Once Computer aided diagnosis systems
1 Introduction Breast Cancer comes after skin cancer in being one of the common and leading causes of increasing the mortality among people and especially women in the whole world [1]. Mammography is the process of screening breast using some amount of Xrays to generate mammograms that shall contain breast cancer signs if they exist. It is one of the most common and used tools to screen for breast cancer. However, due to the tremendous increase in the number of mammograms taken daily, the process becomes very hard for doctors and consumes a lot of time which makes it prone to errors in the decision and the diagnosis process [2–4]. So, Computer-Aided Diagnosis (CADs) plays an important role as a second decision next to the doctors decision [5, 6]. Mainly, the current research works on deploying the Convolutional Neural Networks (CNNs) to develop CADs since CNNs are able during training to extract features representing the various con-texts of images without feature engineering which leads to a great impact on the detection performance. Also, it is proved that the use of CNN overcomes the drawbacks of conventional mass detection models [7–11]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 324–335, 2021. https://doi.org/10.1007/978-3-030-58669-0_30
The Mass Size Effect on the Breast Cancer Detection
325
The works goal of this paper is to detect the existent masses in mammograms and classify them with high accuracy. The masses in the mammograms have no fixed sizes or near a range of sizes, they may be very small for example of width 10 and height 10 pixels and maybe large for example of width 900 and height 800 pixels. For this wide range we utilize You Only Look Once (YOLO) model to detect masses by first training YOLO using large masses only, then training using small masses only and the final step is combining the resultant detected large and small masses. Each time YOLO is trained with dierent configurations and images properties to fit the training required objective. The paper is organized as follows. Some of the recent literature reviews is presented and stated in Sect. 2. Followed by discussing the used datasets in Sect. 3 and the proposed method in Sect. 4. Section 5 lists the used eval-uation metrics and how they are calculated. Then, in Sect. 6 the conducted experiments and their results comparing the YOLO based state of artwork to show the contribution of the proposed approach. Finally, the work conclusion is discussed in Sect. 7.
2 Related Work There is a lot of conducted research on breast cancer detection to prove the importance of CADs in taking decisions regarding breast cancer diagnosis [12–15] and [16]. For example, in [13] the CADs development impact to early detect breast cancer is proved comparing with late cancer reveal. In [14], the authors proved the great help of CAD systems existence to diagnose the chest radiography. In [15] and [16], the authors showed the improvement of the radiologists’ performance to diagnose the breast tumors using CADs. In [17], the multi-scale belief network is used to detect the masses’ positions in the mammograms of INbreast to obtain a sensitivity of 85%. In [18], a CNN based model is constructed to merge the low and high-level deep features extracted from two different CNN layers for training to get 96.7% for the classification accuracy. In [19], the deep CNN is trained to detect masses exist in mammograms with a sensitivity of 89.9% using the transfer learning advantage. In [20], all images are first preprocessed by removing the pectoral muscles of the given mammograms and extracting the fibro-glandular. Then, all pre-processed mammograms are divided into overlapped parts to be trained using RCNN. Their testing detection and classification accuracies are 72% and 77%, respectively. In [21], the residual network is used to classify the of screening pop-ulations mammograms to obtain AUC of 0.886 and 0.765 for the malignant and benign masses. In [22], the GoogLeNet and the AlexNet models are used to classify tumors that exist in the breast to obtain Area Under Curve (AUC) of 0.88 for GoogleNet and 0.83 for AlexNet using Film Mammography dataset number 3 (BCDR-F03). In [23], the detection performance of the existing masses is obtained very near to the human evaluation by using feature-based classification, where the radiologists’ average AUC is 0.814, and the system AUC is 0.840.
326
G. Hamed et al.
3 Datasets In our experiments, we used two datasets from the public and commonly used. The annotations of both datasets with the masses regions of interest are set using Breast Imaging Reporting and Data System (BI-RADS) descriptors. In our work, we assigned the BI-RADS categories 2 and 3 to the benign cases and the categories 4, 5, and 6 to the malignant cases. Both datasets also contain Cran-iocaudal (CC) and Mediolateral Oblique (MLO) views for their mammograms. The two datasets are: INbreast dataset INbreast dataset [24] are Full-Field Digital Mammograms (FFDM) which can be requested from here. We selected all the mammograms contain biopsies in the INbreast dataset which are 107 mammograms with 116 masses. Mammograms contain both small and large masses as shown in Fig. 1(a) & (b). CBIS-DDSM dataset CBIS-DDSM [25] is an updated version of the Digital Database for Screening Mammography (DDSM) since DDSM annotations are set to indicate generate locations of lesions that are not precise, so segmentation algorithms are implemented by many researchers on DDSM to get accurate feature extraction in the CBIS-DDSM. In our experiments, we worked on the 891 cases with large and small masses as shown in Fig. 1 (c) & (d). The dataset is available online for use from here.
(a)
(b)
(c)
(d)
Fig. 1. Examples from the most widely used public datasets of breast mammograms. (a) INbreast Mammogram Example 1 with SMALL mass; (b) INbreast Mammo-gram Example 2 with LARGE mass; (c) CBISDDSM Mammogram Example 1 with SMALL mass; (d) CBISDDSM Mammogram Example 2 with LARGE mass
4 Methods In this paper, the deep learning You Only Look Once (YOLO) model is used to detect masses that exist in the breast and classify them. YOLO is selected since it does not need to go through the image by dividing it into regions of interest for detecting the included objects bounding boxes and classify them like what has been done by RCNN and Faster RCNN. YOLO looks to the image in a one-shot [26]. Besides that, the traditional commonly used CNN based networks like AlexNet, GoogleNet, and RCNN achieve good detection results but they are slow to predict mammograms in a real-life
The Mass Size Effect on the Breast Cancer Detection
327
application. So, since YOLO examines the image in one shot leads to good results and faster detection at the same time which is proved later in the experimental results section. YOLO has 3 versions which are YOLO-V1 [26], YOLO-V2 [?], and YOLOV3 [?] such that each version makes an update that leads to better results. So, YOLO-V3 is used in our experiments and it is the deepest model that is composed of 106 layers of convolutional layers, max-pooling layers, activation functions, 3 3 and 1 1 filters with skip connections like ResNet. We exploited YOLO-V3 advantage of working on the image at the different scales which leads to very good results in the case of large masses [?]. Since the training process is going on with the help of the anchor boxes which are set of 9 pairs of width and height to the object can be detected by YOLO. In our case, the masses exist in the breast dont have a range of sizes, they may be very small for example in the INbreast there exist very small masses and the smallest one is of size (W = 12, H = 8) of the case ID 51049107. On the other size, there are very large masses for example the largest mass exist is of size (W = 163, H = 110) of the case ID 24065530. So, as obviously there are large differences in the masses sizes which makes it very difficult to cluster all the masses of the training set in 9 anchors and if this is done, there will be a large set of anchors missed between every 2 anchors. So, to overcome the variance of masses sizes exist in mammograms we train YOLO-V3 two times, one with large masses with 9 large anchor boxes and one with small masses with 9 small anchor boxes. The full workflow is given in Fig. 2 which is composed into 4 blocks: preprocessing, training, testing, and evaluation phases. In the first preprocessing phase, the DICOM images of both the INbreast and the CBIS-DDSM are converted to images formats then scaled to be 8-bits images instead of 16-bit. All the extracted mammograms are resized to 448448 to be trained with YOLO. The INbreast annotations are given in XML files so they are read to be placed in a separate annotation text file for each case with class type, xmin, ymin, mass width and mass height. For the CBISDDSM, the annotations are given in the form of DICOM files that contain the regions of interest. So, we process the DICOM image by converting it to image then extract from it the ROI coordinates in the same format done in case of INbreast. The mammograms of both datasets are then split into mammograms with large masses and others with small masses. This accomplished based on the area of the mass, if the mass area is less than or equal 100,000, then the mammogram is considered from the small cases and otherwise, it is considered from the large cases. The second phase which is the training phase is composed of training YOLO-V3 using 80% from the large cases and training another time but with 80% from the small cases. Then, the testing phase comes to test the model with the remaining 20% of the large and the small cases to compute the model detection performance. Finally, the last phase which is the evaluation phase using new mammograms different from those using the training and the testing phases. The objective of this phase is to simulate this approach if applied in real life. So, new mammograms are used to be evaluated using the model weights generated from training the mammograms with large masses to detect any large masses. Then, the given new mammograms are passed through another
328
G. Hamed et al.
Fig. 2. The proposed approach phases to detect the breast SMALL and LARGE masses
level of evaluation using the model weights generated from training the mammograms with small masses to detect any small masses if they exist. For each detected object if exist, the mass coordinates are extracted with the class probability.
5 Evaluation Metrics To evaluate the detection accuracy of the masses exist in the mammograms of the testing set and their classification accuracy, we used the following metrics: 1. Intersect Over Union (IOU): It measures the correctness of the predicted bounding box by calculating the ratio of the intersection area between thebounding box of the mass ground truth and the bounding box of the predicted mass divided over their union area. IOU in our experiments is used to deduce the mass region of interest detected location but if and only if its value equals to or exceeds 50% comparing with its ground truth coordinates. 2. The confusion matrix: Is a matrix used to evaluate the classification performance of a binary classifier using the True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN). 3. Precision: It measures the model positive predictions from all that shall be positive and it is calculated as follows:
The Mass Size Effect on the Breast Cancer Detection
P recision =
TP TP + FP
329
ð1Þ
4. Recall (Sensitivity): It is known as the true positive rate and is calculated as follows: RecallðSensitivityÞ =
TP TP + FN
ð2Þ
5. Average Precision (AP) & Mean Average Precision(mAP): The AP combines both precision and recall together by calculating the area under the precision-recall curve. While the Mean Average Precision is the mean of the AP calculated for all the classes.
6 Experimental Results In this section, the conducted experiments of the proposed CAD system with their results are presented. All are executed on an Intel Core (TM) i7-9700 K desktop processor 8 cores up to 4.9 GHz Turbo 300 series with 16 GB RAM and GIGABYTE GeForce RTX 2080 Ti. The development environments used to preprocess and extract the ground truth annotations from the INbreast and the CBIS-DDSM datasets are Matlab and Python 3.7. To compile YOLO and train it, we used C++ programming languages on Ubuntu 14.04 operating system. In all experiments, we used the INbreast and the CBIS-DDSM datasets in different combinations to do different trials for performance evaluation. Both are divided into 80% for the training set and 20% for the testing set. The experiments are done after setting the following configurations for training: – All the mammograms are divided into 13 13 grid cells (N). – Number of anchor boxes used during training = 9. – The classes number (C) is 2 which are benign & malignant and the number of coordinates (coords) predicted for each box is 4. – Mammograms are resized to 448 448, i.e. model input. – Number of training iterations = 4000 iterations. – Resizing augmentation is enabled during training. – Learning Rate (LR) = 0.001, other values are used and 0.001 is the best value leads to best results without overfitting. – The steps when the LR is changed during training are at 2500 and 3000. – Scales used to change the LR during training are as follows: 0.1, 0.1. The above configurations results in output tensor of prediction (ToP) of size 13 13 (k. (1 + 4 + 2)), i.e. N N (k. (1 + coords + C)).
330
6.1
G. Hamed et al.
Experiment I - Large & Small Masses Detection using YOLO-V3
The objective of this experiment is to study the effect of applying the training step on large and small masses together as shown in Table 1. This experiment is conducted to be able to compare our 2-levels detection approach results with [3] since they used the same approach done in this experiment of detecting masses in mammograms using trained YOLO model on mammograms with different sizes of masses. So, 80% from both INbreast and CBIS-DDSM datasets are used for training which results in 315 mammograms of benign and malignant classes and the remaining 20% are used to test the model which are 83 mammograms. As shown in 1, the achieved mAP is 71.47%. When we check the undetected mammograms at all such for example the mammogram IDs of 20586908 and 20586934 in the INbreast that are not detected due to the small size of their masses as shown in Fig. 3 (a) & (b), i.e. their masses area are 11,286 and 30,456, respectively. On the other side, most of the detected mammograms are those that contain large masses such as mammogram ID of 20588562 shown in Fig. 3 (c) which contains a malignant mass of area 357,984. Table 1. Results of mass detection & classification using YOLO-V3 on mammograms with LARGE and SMALL sizes (large variances in masses sizes), where B is donated for benign class, M is donated for malignant class, ap is the average precision and mAP, is the mean average precision Training Set
Testing Set
B: TPFP
M: TPFP
B AP
CBISDDSM INbreast
CBISDDSM INbreast
23-7
34-14
67.85% 75.10% 73.00% 66.00% 71.47%
6.2
M AP
Prec.
Recall MAP@IOU = 0.5
Experiment II - Small Masses Detection using YOLO-V3
The objective from this second experiment is to study the effect of model training using large sizes only from masses which gives the results in Table 2. Many trails and conducted using different combinations of the datasets in the training and the testing set as follows: 1. Trail 1: Trained with 94 mammograms and tested with 24 mammograms from CBIS-DDSM. 2. Trail 2: Trained with 156 mammograms from CBIS-DDSM and INbreast and tested with 24 mammograms from INbreast. The best mAP obtained in the case of training YOLO-V3 on mammograms with small masses is 67.98% when the model is trained with CBIS-DDSM and tested on the same dataset. While the second trial is less than the first one since most of the samples in the training set are from CBIS-DDSM which have different features from those of the INbreast with taken into consideration that the main difference is that INbreast is FFDM while CBIS-DDSM is screened mammograms which are not originally digital.
The Mass Size Effect on the Breast Cancer Detection
(a)
(b)
331
(c)
Fig. 3. Detected large masses versus the undetected small masses. (a) The undetected INbreast mammogram ID of 20586908 with SMALL mass; (b) The undetected IN-breast mammogram ID of 20586934 with SMALL mass; (c) The detected INbreast mammogram ID of 20588562 with LARGE mass Table 2. Results of mass detection & classification using SMALL sized masses only Trail B:TP-FP M:TP-FP B AP M AP Presicion Recall mAP @IOU = 0.5 Trail 1 7-7 10-2 57.75% 78.21% 65.00% 71.00% 67.98% Trail 2 4-3 13-2 45.54% 80.86% 77.00% 65.00% 63.20%
6.3
Experiment III - Large Masses Detection using YOLO-V3
This experiment’s objective is to study the effect of training using large sizes only from masses to obtain the results in Table 3. Many trails and conducted using different combinations of the datasets in the training and the testing set as follows: 1. Trail 1: Trained with 136 mammograms and tested with 40 mammograms from CBIS-DDSM 2. Trail 2: Trained with 163 mammograms and tested with 80 mammograms from both from CBIS-DDSM and INbreast. 3. Trail 3: Trained with 85 mammograms and tested with 12 mammograms from INbreast. 4. Trail 4: Trained with 203 mammograms from CBIS-DDSM and INbreast and tested with 40 mammograms from INbreast.
Table 3. Results of mass detection & classification using LARGE sized masses only Trial Trail Trail Trail Trail
1 2 3 4
B: TP - FP M: TP - FP B AP M AP Precision 10-5 14-8 66.73% 72.42% 65.00% 17-10 48-13 64.48% 77.93% 74.00% 3-1 9-0 99.20% 80.00% 92.00% 4-1 33-1 90.29% 88.73% 95.00%
Recall 59.00% 78.00% 92.00% 88.00%
MAP @IOU = 0.5 69.58% 71.20% 90.00% 89.51%
332
G. Hamed et al.
In this experiment, the obtained mAP reached 89.5% when trained and tested using mammograms of large masses. This is considered the best result in the done experiments. The main reason behind this is that the masses sizes are large and large here is computed relative to the full size of mammograms which makes these large masses more obvious and contained more features to be trained with and consequently fetched in the testing results in better detection results. 6.4
Comparative Study between the Previous 3 Experiments
The performance of the model trained with large and small masses together in Experiment I is compared versus the other two models with each other as shown in Table 4. Here, completely new mammograms from CBIS-DDSM are used for testing. The evaluation process is done as follows: 1. Select new mammogram (M) with large or small masses. 2. Test the selected mammogram with the trained model in Experiment 1 to get (Combined Model Result). 3. Test the selected mammogram with the trained models in Experiment 2 and 3 through 2 parallel paths as follows: – Path 1: Test M with the model trained with small masses in Experiment 2 to get (Small Model Result). – Path 2: Test M with the model trained with large masses in Experiment 3 to get (Large Model Result). 4. Union the results obtained from path 1 and path 2 to get the detection results obtained by both trained models, i.e. Small Model Result [ Large Model Result. 5. Check the intersected resultant masses from path 1 and path 2 and if exist, the masses with greater confidence score are detected. As shown in Table 4, in case of checking mammograms with small masses, the best accurate detection and classification results obtained from the model trained on mammograms with small masses in Experiment 2 (Small Model Result). From the (Small Model Result) of the given mammograms, the model is able to detect all existing masses and classify them correctly. In the case of the large masses, when we compared the three models’ performance, the model trained detected all the masses correctly and only one case is classified wrongly from the given 4 mammograms and this is can be treated by leaving the classification task to other classifiers other than YOLO. So, we can depend to get accurate results on the LARGE masses & SMALL masses trained models which yields to better results than [3] that used YOLO-V1 for detection after training it by using the same approach applied in the combined model.
The Mass Size Effect on the Breast Cancer Detection
333
Table 4. Results of mass detection & classification using YOLO-V3 on NEW mam-mograms using COMBINED trained model versus LARGE & SMALL trained model together (GT: is annotated for the Ground Truth, B: annotated for benign class, M: annotated for malignant class) Mammogram ID Test Test Test Test Test Test Test Test
P_ 00131_ P_ 01101_ P_ 00576_ P_ 00347_ P_ 01365_ P_ 01595_ P_ 00296_ P_ 00758_
LEFT LEFT LEFT LEFT LEFT LEFT LEFT LEFT
Size category GT class Combined model result Small model result Large model result CC CC CC CC CC CC CC CC
Small Small Small Small Large Large Large Large
B B M M B B M M
B: 99% B: 30% M: 33% M: 57% M: 93% M: 53% M: 80% Not Detected
B: 97% B: 99% M: 96% M: 95% Not Detected Not Detected Not Detected Not Detected
Not Detected B: 83% Not Detected B: 83% B: 82% M: 99% M: 72% M: 92%
7 Conclusion In this paper, we utilized YOLO model to develop new detection methodology by passing mammograms through two paths of testing. In our work, we used two from the commonly used and public datasets which are INbreast and CBIS-DDSM. Then, all the mammograms selected from both datasets are divided by considering all the mammograms that have masses of area less than or equal 100,000 as small mammograms and otherwise as large masses. Then, the same YOLO model is trained on each set separately followed by two parallel paths of testing for the new mammograms evaluation. The first path is testing the new mammogram using the model trained on mammograms with large masses. While the second path which is testing the same mammogram using the model trained on mammograms with small masses. This results in mAP of 89.51% compared by the detection accuracy of the model trained on a large scale of sizes from masses (large and small) which is 71.47%. Also, by implementing the proposed idea the mass type classification performance is improved compared with the recent YOLO-based breast masses detection.
References 1. Boyle, P., Levin, B., et al.: World Cancer Report 2008. IARC Press, International Agency for Research on Cancer, Lyon (2008) 2. Al-antari, M.A., Al-masni, M.A., Park, S.U., Park, J.H., Metwally, M.K., Kadah, Y.M., Han, S.M., Kim, T.S.: An automatic computer-aided diagnosis system for breast cancer in digital mammograms via deep belief network. J. Med. Biol. Eng. 38(3), 443–456 (2017) 3. Al-masni, M., Al-antari, M.A., Park, J.M., Gi, G., Kim, T., Rivera, P., Valarezo, E., Han, S. M., Kim, T.S.: Detection and classification of the breast abnormalities in digital mammograms via regional convolutional neural network. In: 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2017), Jeju Island, South Korea, pp. 1230–1236 (2017) 4. Al-masni, M.A., Al-antari, M., Park, J.M., Gi, G., Kim, T.Y.K., Rivera, P., Valarezo, E., Choi, M.T., Han, S.M., Kim, T.S.: Simultaneous detection and classifi-cation of breast masses in digital mammograms via a deep learning YOLO-based CAD system. Comput. Meth. Prog. Biomed 157, 85–94 (2018)
334
G. Hamed et al.
5. Al-antari, M.A., Al-masni, M.A., Park, S.U., Park, J.H., Kadah, Y.M. Han, S.M., Kim, T. S.: Automatic computer-aided diagnosis of breast cancer in digital mammograms via deep belief network, Global Conference on Engineering and Applied Science (GCEAS), Japan, pp. 1306–1314 (2016) 6. Al-antari, M.A., Al-masni, M.A., Kadah, Y.M.: Hybrid model of computer-aided breast cancer diagnosis from digital mammograms. J. Sci. Eng. 04(2), 114–126 (2017) 7. Wang, Y., Tao, D., Gao, X., Li, X., Wang, B.: Mammographic mass segmentation: embedding multiple features in vector-valued level set in ambiguous regions. Pattern Recognit. 44(9), 1903–1915 (2011) 8. Rahmati, P., Adler, A., Hamarneh, G.: Mammography segmentation with maximum likelihood active contours. Med. Image Anal. 16(9), 1167–1186 (2012) 9. Domnguez, A.R., Nandi, A.: Toward breast cancer diagnosis based on automated segmentation of masses in mammograms. Pattern Recognit. 42(6), 1138–1148 (2009) 10. Qiu, Y., Yan, S., Gundreddy, R.R., Wang, Y., Cheng, S., Liu, H., Zheng, B.: A new approach to develop computer-aided diagnosis Scheme of breast mass classification using deep learning technology. J. X-Ray Sci. Technol. 25(5), 751–763 (2017) 11. Hamed, G., Marey, M.A.E.R., Amin, S.E.S., Tolba, M.F.: Deep learning in breast cancer detection and classification. In: Joint European-US Workshop on Applications of Invariance in Computer Vision, pp. 322–333. Springer, Cham (2020) 12. Hamed, G., Marey, M., Amin, S.E.S. and Tolba, M.F.: A Proposed Model for denoising breast mammogram images. In: 2018 13th International Conference on Computer Engineering and Systems (ICCES), pp. 652–657. IEEE December 2018 13. Doi, K.: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput. Med. Imaging Graph. 31(4), 198–211 (2007) 14. Van Ginneken, B., ter Haar Romeny, B.M., Viergever, M.: Computer-aided diagnosis in chest radiography: a survey. IEEE Trans. Med. Imaging 20(12), 1228–1241 (2001) 15. Jiang, Y., Nishikawa, R.M., Schmidt, R.A., et al.: Improving breast cancer diagnosis with computer-aided diagnosis. Acad. Radiol. 6(1), 2233 (1999) 16. Chan, H.-P., Doi, K., Vybrony, C.J., et al.: Improvement in radiologists detection of clustered microcalcifications on mammograms: the potential of computer aided diagnosis. Invest. Radiol. 25(10), 1102–1110 (1990) 17. Dhungel, N., Carneiro, G., Bradley, A.P.: 2015. Automated mass detection from mammograms using deep learning and random forest. In: International Conference on Digital Image Computing: Techniques and Applications (DICTA) (2018). https://doi.org/10. 1109/dicta.2015.7371234 18. Jiao, Z., Gao, X., Wang, Y., Li, J.: A deep feature based framework for breast masses classification. Neurocomputing 197, 221–231 (2016) 19. Suzuki, S., Zhang, X., Homma, N., Ichiji, K., Sugita, N., Kawasumi, Y., Ishibashi, T., Yoshizawa, M.: Mass detection using deep convolutional neural network for mammographic computer-aided diagnosis. In: Proceedings of the SICE Annual Conference 2016, Tsukuba, Japan, pp. 1382–1386 (2016) 20. Akselrod-Ballin, A., Karlinsky, L., Alpert, S., Hasoul, S., Ben-Ari, R., Barkan, E.: A region based convolutional network for tumor detection and classification in breast mammography, pp. 197–205. Springer, Cham (2016) 21. Wu, N., Phang, J., Park, J., Shen, Y., Huang, Z., Zorin, M., Jastrzbski, S., et al.: Deep neural networks improve radiologists performance in breast cancer screening. IEEE Trans. Med. Imaging 39(4), 1184–1194 (2019) 22. Jiang F.: Breast mass lesion classification in mammograms by transfer learn-ing. In: ICBCB17, Hong Kong, pp 59–62 (2017). https://doi.org/10.1145/3035012.3035022
The Mass Size Effect on the Breast Cancer Detection
335
23. Rodriguez-Ruiz, A., Lng, K., Gubern-Merida, A., Broed-ers, M., Gennaro, G., Clauser, P., Thomas, H.H., et al.: Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radi-ologists. JNCI: J. Natl. Cancer Inst. 111(9), 916– 922 (2019) 24. Moreira, I., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M., Cardoso, J.: INbreast: toward a full-field digital mammographic database. Acad. Radiol. 19(2), 236–248 (2012) 25. Lee, R.S., Gimenez, F., Hoogi, A., Miyake, K.K., Gorovoy, M., Rubin, D.L.: A curated mammography dataset for use in computer-aided detection and diagnosis research. Sci. Data 4, 170–177 (2017) 26. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 27. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) 28. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804. 02767 (2018)
An Integrated IoT System to Control the Spread of COVID-19 in Egypt Aya Hossam1(&), Ahmed Magdy2, Ahmed Fawzy3, and Shriene M. Abd El-Kader4 1
Electrical Engineering Department, Faculty of Engineering (Shoubra), Benha University, Benha, Egypt [email protected] 2 Electrical Engineering Department, Faculty of Engineering, Suez Canal University, Ismailia, Egypt 3 Nanotechnology Central Lab, Electronics Research Institute (ERI), Cairo, Egypt 4 Computers and Systems Department, Electronics Research Institute, Cairo, Egypt
Abstract. Coronavirus disease 2019 (COVID-19) is one of the most dangerous respiratory illness through the last one hundred years. Its dangerous is returned to its ability to spread quickly between people. This paper proposes a smart real solution to help Egyptian government to track and control the spread of COVID19. In this paper, we suggest an integrated system that can ingest big data from different sources using Micro-Electro-Mechanical System (MEMS) IR sensors and display results in an interactive map, or dashboard, of Egypt. The proposed system consists of three subsystems, which are: Embedded Microcontroller (EM), Internet of Things (IoT) and Artificial Intelligent (AI) subsystems. The EM subsystem includes accurate temperature measuring device using IR sensors and other detection components. The EM subsystem can be used in the entrance of places like universities, schools, and subways to screen and check temperature of people from a distance within seconds and get data about suspected cases. Then, the IoT subsystem will transmit the collected data from individuals such as temperature, ID, age, gender, location, phone number. etc., to the specific places and organizations. Finally, a software based on AI analysis will be applied to execute statistics and forecast how and to what extent the virus will spread. Due to the important role of Geographic Information Systems (GIS) and interactive maps, or dashboards, in tracking COVID-19, this paper introduces an advanced dashboard of Egypt. This dashboard has been introduced to locate and tally confirmed infections, fatalities, recoveries and present the statistical results of AI model. Keywords: COVID-19 Embedded Microcontroller (EM) MEMS Internet of Things (IoT) Artificial intelligence (AI)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 336–346, 2021. https://doi.org/10.1007/978-3-030-58669-0_31
An Integrated IoT System to Control the Spread of COVID-19 in Egypt
337
1 Introduction The novel coronavirus has spread from china to several countries around the world, so it has been announced as a global pandemic by the World Health Organization(WHO) on 12th March 2020 [1–3]. From that time, all the infected countries still exploring for an effective and practical solution to tackle the problems arising due to COVID-19 [4, 5]. At the time of writing this paper, June 24, 2020, the total number of COVID-19 confirmed cases reached 9,154,232 cases including around 473,650 persons, recorded as deaths from the disease reported to WHO. For Egypt, there are 56,809confirmed cases with around 2,278 persons recorded as deaths from the disease [6]. Researchers in sciences and engineering are attempting to suggest new models, and systems that can help community to fight COVID-19 [7–9]. Any system or model for monitoring COVID-19 infected people can help to contain the spread of virus and alarming the people or health workers on the expected spread rate. The latest improvements in the field of information and communication technologies (ICTs) [10, 11], Internet of Things (IoT) [12–14], and artificial intelligence (AI) [13, 15] can help researches to build systems and models that can help to stop the spread of the COVID19. These technologies can be used to handle the big amount of data taken from public health surveillance, real-time epidemic outbreaks monitoring, trend nowcasting/forecasting, regular situation briefing and updating from governmental institutions and organisms [16]. AI technology can help to fight and contain the coronavirus through many applications such as population screening, tracking how infection outbreaks and where, and notifications of when to seek medical help [13]. Screening the population help to identify and certain who is potentially ill which is necessary for containing COVID-19. Due to that, the prediction of where the virus might appear next can be obtained and help the government to take an effective response [17]. IoT technology can provide community with Real-time tracking and live updates in various online databases like interactive maps of countries or dashboards [18]. Such these applications assist communities to predict which populations are most susceptible to the negative impacts of a coronavirus spread and help to alert the individuals of expected infection regions in real time so those regions must be avoided. In this paper, a new integrated system that can ingest big data from different sources using IR sensors and display results in an interactive map of Egypt (see Fig. 1). This system consists of two parts which are: hardware, and software. Hardware part includes the Embedded Microcontroller (EM) and IoT subsystems. The goal of proposed hardware system is to manufacture a device based on MEMS IR sensors that used to monitor the temperature of people in the proximity and quickly determine whether they may have a fever, as one of the symptoms of the coronavirus. In the other hand, Software part includes AI models that used to do statistics on the collected data by the hardware part. Also, AI part helps in introducing an advanced interactive map locates and tallies confirmed infections, fatalities and recoveries. Graphs declares the virus spread over time. The proposed system will help the competent authority in the Egyptian government to identify or suspect the corona virus, thereby reducing the spread of the virus.
338
A. Hossam et al.
No indicators of possible COVID19 infection.
Assessment of people at Checkpoint.
Treat as a normal Case. This operation is done for all people and at different check i t
Indicators of possible COVID-19 infection (High temperature). *Must take data from the case like: -(Name / Age / Gender/ ID) -(Other symptoms/ travel history / Contact with confirmed cases) ** Explain to the patient the importance of home isolation and contact 105 for Egyptian Ministry help "Patient under Investigation". Collected Data
IoT Subsystem
Statistics of collected data
Final Output
Reporting Monitoring App.
AI Subsystem
EGYPT dashboard to track COVID-19
Tracking and monitoring of patients overall Egypt.
Registration of patients on data-
Show the output curve of AI model.
Fig. 1. The Flowchart of the whole proposed Integrated System.
2 The Proposed System This section demonstrates a detailed proposed integrated system based on IoT to fight the spread of COVID-19 in Egypt. This integrated system has two parts which are: Hardware and Software parts. The hardware part includes the EM and IoT subsystems, while the software part includes the AI software-based models. Firstly, the EM system includes accurate temperature measuring device using MEMS IR sensors, microcontroller, digital screen with online contact, and other detection components. This proposed device can be used in places including universities, schools, airports, subway and railway stations. The advantage of this hardware device is that it can screen people from a distance and within minutes can test hundreds of individuals for fever.
An Integrated IoT System to Control the Spread of COVID-19 in Egypt
339
Secondly, the IoT subsystem which transmits the collected data from individuals such as temperature, ID, age, gender, location, phone number…. etc., to the specific places for analysis. Finally, after collecting the required data, a software based on AI analysis will be applied to execute statistics and forecast how and to what extent the virus will spread, using a set of features and pre-determined parameters. Finally, an advanced interactive map, denotated as 3AS digital dashboard, of Egypt has been introduced to locate confirmed infections, fatalities and recoveries. This service allows Geographic Information System (GIS) users to consume and display disparate data inputs without central hosting or processing to ease data sharing and speed information aggregation. The 3AS dashboard helps to predict the virus spread over time and regions. Fig. 1 shows the flowchart of the whole proposed integrated system to control the spread of COVID-19. 2.1
Hardware Part
This section explains the hardware part of the proposed system in details. This paper proposes a new hardware device based on MEMS IR sensors to check the temperature of people at the entrance of various places. This part includes the EM and IoT systems. Embedded Microcontroller Subsystem is considered as one of the most important modern systems worldwide. In the proposed system, the EM subsystem has an indispensable function and represents the link between the system mind and its other parties. The basic component which is responsible for temperature detection is the MEMS IR sensor. Infrared (IR) radiation gets out from any object with a temperature degree above absolute zero value. The band of frequencies range from 3x1011 to 4x1014 Hz and a wavelength of 0.75 to 1000 lm. Thus, the definition of IR radiation is the electromagnetic wave that radiate between visible light and microwave ranges. There are some important factors that affected on both IR spectrum and energy density which are object type, surface shape, surface temperature, and other factors. The Planck radiation formula [19] represents the relationship between IR radiation and temperature. The ideal IR radiation object is denoted as a “black body”. In the proposed EM system, MEMS IR sensor can help in detection of COVID-19 as a basic element to detect the body temperature of the people. It consists of LED and a photodiode. Principle of work is based on emission of light from IR LED and photodiode which sense with the IR radiation. When IR radiation falls on photodiode, photodiode resistance will change due to the intensity of radiation. Also, the voltage drop across the photodiode will be changed. The voltage comparator will be used to measure the voltage drop and produce the output accordingly as shown in Fig. 2(a,b) [20]. The position of LED and photodiodes classified into two techniques: direct and indirect. For direct method, IR LED and photodiode will be line of sight. But in the case of indirect Incidence, that used in the proposed thermal gun device, both the IR LED and Photo diode are placed in parallel (side by side), facing both in same direction. In that position as shown in Fig. 2(d), when an object is kept in front of IR pair, the IR light gets reflected by the object and gets absorbed by photodiode. For the proposed thermal gun device, a sensor has been needed to sense the temperature of human body (not ambient) without any direct contact with the person.
340
A. Hossam et al.
One of working principles can be summarized as: there are two different materials A and B. When an IR radiation is gathered by the absorber as shown in Fig. 2(c), the thermocouple junction will warm up. The temperature difference between the hot junction and cold junction stabilizes. The see back effect generates a voltage between the open ends as follows: V out ¼ ðaA aB ÞDT
ð1Þ
where aA and aB are the Seebeck coefficients for thermoelectric materials A and B, respectively. A thermopile is a series-connected array of thermocouples. Thus, the voltage generated by the thermopile IR detector is directly proportional to the number of thermocouples, N V out ¼ N ðaA aB ÞDT ¼ ðaA aB ÞDT total
ð2Þ
where DTtotal is the sum of the temperature differences in the thermocouples.
Fig. 2. Components and working principle of IR sensor (a) Example of MEMS IR sensor [20], (b) Main components of MEMS IR sensor, (c) MEMS thermoelectric IR sensor, and (d) Indirect incidence.
IoT System is an important field that help to capture real-time data at scale. These data will be used by AI and data analysis systems to understand healthcare trends, model risk associations, and predict outcomes. IoT can help to develop real-time tracking map for following cases of COVID19 across Egypt. Due to that, this paper used IoT in the proposed system to transmit the collected data from different checkpoints at real time to the specific places and organizations for
An Integrated IoT System to Control the Spread of COVID-19 in Egypt
341
analysis. There are a lot of design solutions of IoT subsystem. Also, IoT helps in updating the numbers of the proposed 3AS dashboard every day. This paper introduces two of them which can be used in the proposed system. According to the place and number of people will be monitored the suitable design solution of these two models can be chosen. Fig. 3 presents the components of the two design solutions of IoT subsystem. Table 1 provides a comparison between these two models of the proposed IoT system.
Fig. 3. The components of the two design solutions of IoT subsystem.
Table 1. A comparison between Raspberry Pi and ESP models of the proposed Hardware part. Raspberry Pi Model
Smart Phone Model
Figure
Raspberry Pi used here as a main brain (processor) for embedded system to make any analysis on EM system. This model uses GSM and GPS module to link raspberry pi with IoT and AI subsystems.
Principle of working
• • Advantages
•
Disadvantages
• •
Smart phone is the main function in data entry in this model and can make connection between IoT subsystem and EM system. This model uses ESP Module to the direct connection between sensors, IoT and AI subsystems.
low cost. DSI display port for connecting a Raspberry Pi touchscreen display Micro SD port for loading your operating system and storing data
• • • •
High GUI Performance. Reliable system. More secure. Support all features of iPhone like Firebased.
Low GUI Performance. Low Reliability
•
High Cost.
342
2.2
A. Hossam et al.
Software Part: AI Model
The software part includes the AI models which play a vital role in analyzing the collected data using the proposed hardware part. Since the coronavirus outbreak, researchers scramble to use AI models, and other data analytic tools to explain COVID19 infections, predict and monitor the virus spread. This can help government to manage and limit of socio-economic impacts. In this paper, the proposed system uses AI model to track and predict the manner of COVID-19 disease spread not only over time but also over areas. After collecting the required data from the hardware part, a software based on AI analysis has been applied to execute statistics and forecast how and to what extent the virus will spread, given a set of pre-determined parameters and characteristics as shown in Fig. 4.
Fig. 4. The operation of AI in the proposed integrated system.
This paper also introduces an AI advanced interactive map (or dashboard) locates and tallies confirmed infections, fatalities and recoveries in Egypt. The role of AI model in this dashboard is that it can estimate and divide regions into groups of no risk, moderate risk, and high-risk regions. The identified high-risk areas can then be quarantined earlier to help government reducing the spread the coronavirus. This web service allows GIS users to display different data inputs without central processing that can help to ease data sharing and speed information aggregation. The proposed dashboard also presents an “outbreaks near you” feature, or alert message, that informs individual users about nearby infected areas based on their current location as obtained from their Web browser/smartphone [6, 7]. The proposed interactive Map helps users to detect the areas that should be put under quarantine. Communication through the proposed dashboard introduces accessible information to
An Integrated IoT System to Control the Spread of COVID-19 in Egypt
343
Egyptian people around the country to protect themselves and their communities. This tool type improves data transparency and helps authorities disseminate information.
3 Results The Geographic Information Systems (GIS) and interactive maps, or dashboards, are considered as important and critical tools in tracking, monitoring, and combating COVID-19. In response to that, this paper develops the first interactive dashboard in Egypt to visualize and track the daily reported cases of COVID-19 in real time. This developed dashboard, called 3AS dashboard according to the first letter of Authors’ Names, declares the location and number of confirmed COVID-19 cases, deaths, and recoveries in Egypt. Also, it helps researchers, scientists, and public health authorities to make searches and use AI models to make statistics. All data collected and displayed are made available which taken from Egyptian Health Ministry, WHO reports, and Google Sheets about COVID-19. These reported data which displayed on the developed 3AS dashboard aligns with the daily and WHO situation reports within Egypt (see Fig. 5). Furthermore, 3AS dashboard is particularly effective at capturing the data about infected cases and deaths…etc. of COVID-19 in new infected regions all over Egypt. The developed 3AS dashboard provides many information related to COVID-19 such as the reported detailed data of each governate in Egypt, hotline links with Egyptian Health Ministry, International link of COVID-19 for WHO, and statistics results of the AI model daily/accumulated (see Fig. 6). Also, the new confirmed cases can use 3AS dashboard to find the nearest hospital that has empty intensive care beds.
Fig. 5. The developed 3AS dashboard of Egypt.
344
A. Hossam et al.
Fig. 6. The developed 3AS dashboard indicates Alex map and detailed report.
4 Conclusion This paper introduces an integrated IoT system to control COVID-19. The proposed system consists of two parts which are: hardware and software. The hardware part includes the EM and IoT systems while the software part includes the AI softwarebased models. The hardware part was designed to help in screen temperature of people and provides the AI models with the collected data to make statistics on it. MEMS IR sensors in EM system were used to get more accurate and fast temperature degrees of people. Also, this paper introduces the best two models in IoT system that can be used to transmit data for doing processing and analysis. This paper developed the first interactive dashboard, named 3AS dashboard, specialized for Egypt. 3AS dashboard declares the location and number of confirmed COVID-19 cases, deaths, and recoveries in Egypt. It provides community with many information related to COVID-19 such as the reported detailed data in Egypt, hotline links with Egyptian Health Ministry, International link of COVID-19 for WHO, the location of hospitals that have empty intensive care beds, and statistics results of the AI model daily/accumulated. With respect to further improvement, ongoing developments are illustrating that important enhancement in the proposed integrated system as well as design and implement the proposed EM system can be expected in the near future. This enhancement will have a great effect on the proposed 3AS dashboard to control the spread of COVID-19 in Egypt.
An Integrated IoT System to Control the Spread of COVID-19 in Egypt
345
References 1. Brüssow, H.: The novel coronavirus – a snapshot of current knowledge. Microb. Biotechnol. 13, 607–612 (2020). https://doi.org/10.1111/1751-7915.13557 2. Zhu, N., Zhang, D., Wang, W., et al.: A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 382, 727–733 (2020). https://doi.org/10.1056/ NEJMoa2001017 3. WHO/Europe—Coronavirus disease (COVID-19) outbreak. http://www.euro.who.int/en/ health-topics/health-emergencies/coronavirus-covid-19. Accessed 30 May 2020 4. Huang, C., Wang, Y., Li, X., et al.: Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 497–506 (2020). https://doi.org/10.1016/S01406736(20)30183-5 5. Rapid AI Development Cycle for the Coronavirus (COVID-19) Pandemic: Initial Results for Automated Detection & Patient Monitoring using Deep Learning CT Image Analysis. https://arxiv.org/abs/2003.05037. Accessed 24 June 2020 6. Coronavirus COVID-19 (2019-nCoV). https://www.arcgis.com/apps/opsdashboard/index. html#/bda7594740fd40299423467b48e9ecf6. Accessed 30 May 2020 7. COVID-19: China’s Resilient Digital & Technologies—Accenture. https://www.accenture. com/cn-en/insights/strategy/coronavirus-china-covid-19-digital-technology-learnings. Accessed 24 June 2020 8. Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner. https://arxiv.org/abs/2002. 05534. Accessed 24 Jun 2020 9. Ting, D.S.W., Carin, L., Dzau, V., Wong, T.Y.: Digital technology and COVID-19. Nat. Med. 26, 459–461 (2020) 10. New Report: How Korea Used ICT to Flatten the COVID-19 Curve. https://www.ictworks. org/korea-used-ict-flatten-covid-19-curve/#.XuJ990UzbIU. Accessed 11 June 2020 11. Okereafor K, Adebola O (2020) The role of ICT in curtailing the global spread of the coronavirus disease. https://doi.org/10.13140/RG.2.2.35613.87526 12. Singh, R.P., Javaid, M., Haleem, A., Suman, R.: Internet of Things (IoT) applications to fight against COVID-19 pandemic. Diabetes Metab Syndr 14, 521–524 (2020). https://doi. org/10.1016/j.dsx.2020.04.041 13. Vaishya, R., Javaid, M., Khan, I.H., Haleem, A.: Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab. Syndr. Clin. Res. Rev. 14, 337–339 (2020). https:// doi.org/10.1016/j.dsx.2020.04.012 14. Selem, E., Fatehy, M., El-Kader, S.M.A., Nassar, H.: THE (Temperature heterogeneity energy) aware routing protocol for IoT health application. IEEE Access 7, 108957–108968 (2019). https://doi.org/10.1109/ACCESS.2019.2931868 15. Hu, Z., Ge, Q., Li, S., et al.: Artificial Intelligence Forecasting of Covid-19 in China (2020). https://arxiv.org/abs/2002.07112 16. Selem, E., Fatehy, M., El-Kader, S.M.A.: E-Health applications over 5G networks: challenges and state of the art. In: ACCS/PEIT 2019 - 2019 6th International Conference on Advanced Control Circuits and Systems and 2019 5th International Conference on New Paradigms in Electronics and Information Technology. Institute of Electrical and Electronics Engineers Inc., pp 111–118 (2019)
346
A. Hossam et al.
17. Ahmed, E.M., Hady, A.A., El-Kader, S.M.A., et al.: Localization methods for Internet of Things: current and future trends. In: ACCS/PEIT 2019 - 2019 6th International Conference on Advanced Control Circuits and Systems and 2019 5th International Conference on New Paradigms in Electronics and Information Technology. Institute of Electrical and Electronics Engineers Inc., pp 119–125 (2019) 18. Kamel Boulos, M.N., Geraghty, E.M.: Geographical tracking and mapping of coronavirus disease COVID-19/severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemic and associated events around the world: how 21st century GIS technologies are supporting the global fight against outbreaks and epidemics. Int. J. Health Geogr. 19, 8 (2020). https://doi.org/10.1186/s12942-020-00202-8 19. Xu, D., Wang, Y., Xiong, B., Li, T.: MEMS-based thermoelectric infrared sensors: a review. Front. Mech. Eng. 12(4), 557–566 (2017). https://doi.org/10.1007/s11465-017-0441-2 20. D6T MEMS Thermal Sensors—OMRON - Americas. https://www.components.omron.com/ product-detail?partNumber=D6T. Accessed 21 June 2020
Healthcare Informatics Challenges: A Medical Diagnosis Using Multi Agent CoordinationBased Model for Managing the Conflicts in Decisions Sally Elghamrawy1,2(&) 1
Computer Engineering Department, MISR Higher Institute for Engineering and Technology, Mansoura, Egypt [email protected], [email protected] 2 Scientific Research Group in Egypt (SRGE), Mansoura, Egypt
Abstract. Healthcare Informatics mainly concerns with the management of patient medical information using different information technologies. The automated medical diagnosing is one of the main challenging tasks in healthcare informatics field due to diverse clinical considerations and the conflicting diagnosing that might occur. To this end, a Multi Agent Coordination-based Model (MACM) is presented in this paper to manage conflicts in decisions that might occurs during the diagnosing process. In MACM, a coordination between different agents will be applied in form of competition and negotiation processes. A Bidding Contract Competition Module (BCCM) is proposed to handle the bidding and contracting between agents. In addition, an Adaptive Bidding Protocol (ABP) is proposed to manage the bidding and selecting phases in BCCM. The performance of the proposed BCCM module are evaluated using number of experiments. The results obtained show better performance when compared to different multi agent systems. Keywords: Healthcare informatics Multi-Agent System (MAS) Medical diagnosing Agent competition Agent negotiation Bidding protocols
1 Introduction Healthcare Informatics is a multidisciplinary area [1, 2] that combines medical, social and computer sciences. Healthcare informatics employs information technologies to extract knowledge from medical data and to manage healthcare information. Automated medical diagnosing is a primary challenge in healthcare informatics. The healthcare workers need an automated medical diagnosing that save their time and efforts. In this sense, many researchers provide innovative diagnosing models for various diseases. (e.g. breast cancer [3], COVID-19 [4, 21] and Alzheimer [5]). The automated diagnosing models help clinicians in reaching diagnostic decisions using medical decision support systems. But there are some cases where two or more diagnosing decisions are conflicted with each other causing uncertainty in the data provided. This happens due to the significant feature for the accuracy in the decision-making process. Multi-Agent System (MAS) [6] is the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 347–357, 2021. https://doi.org/10.1007/978-3-030-58669-0_32
348
S. Elghamrawy
best choice for solving this conflicting due to its nature that are inspired from the coordination and communications skills of human structural models in team working mode. The coordination within MAS is essential for managing the inner dependencies among agents [7, 8]. Furthermore, a MAS can demonstrate different perspectives in a medical situation, for example during COVID-19 pandemic [9], the senior medical specialists may be remotely monitoring and diagnosing patients and being located far from the care centre that has junior doctors with less experience. This situation may lead to conflicting in decisions. In this paper, a Multi Agent Coordination-based Model (MACM) is proposed to manage the conflicting decisions that might occurs during the diagnosing process. The proposed coordination-based model simulates the interaction between two agents: Senior (S-Agents) and Junior (J-Agents) medical agents during diagnosing case of a patient. To ensure success of agent’s actions and interactions in MACM, their coordination, including competition and cooperation becomes imperative. The negotiation module in MACM used to resolve these conflicts that might occurs when agent trying to gain more reward in assigning a specific task or goal. As a result, this paper main goal is to develop a coordination module in order to provide the agents the ability to solve their conflicts and reach a compromise. The paper is organized as follows: Sects. 2 presents the recent work devolving MASs in healthcare informatics field. The medical diagnosing using the proposed Multi Agent Coordination-Based Model (MACM) is proposed in Sects. 3. In Sect. 4, a Bidding Contract Competition Model (BCCM) in the Competition Module is proposed, considering the bidding and contracting between agents and showing the main contributions that can help in the development of MACM. In addition, an Adaptive Bidding Protocol (ABP) is proposed to manage the bidding and selecting phases in BCCM. The performance of BCCM is evaluated, in Sect. 5, using experimental evaluation. Finally, Sect. 6 concludes the paper’s main contributions and proposes the topics for future research.
2 Related Work Multi-Agent Systems (MAS) are extensively considered in the healthcare Informatics [11] field, for example in medical diagnosis [12–15], patient scheduling [16–18], and medical decision support systems [19]. The authors in [12] proposed a distributed architecture for diagnosis using an artificial immune systems (AIS). The architecture used four common agents in their MAS: Cure provider agent, Grouped diagnosis agent, Diagnosis agent and B cell agents. A fuzzy inference system is presented for automated diagnosing. A real distributed medical diagnosing application are presented in [13] using a mixture of possibilistic logic and an argument theory. The query dialogue method is presented to detect ambiguous and inconsistent data. An Agent Based modelling framework is proposed in [14] for analysing Alzheimer MRI using image segmentation and processing. The framework depends on the cooperation and negotiation among different agents. The patient scheduling challenge is one of the recent research area that many researchers made efforts for presenting the models that reduce the patients waiting times. Gao et al. [16] presented a decentralized MAS model for services scheduling. The model presented a negotiation process between agents using
Healthcare Informatics Challenges: A Medical Diagnosis
349
game theoretic theories. This research assumed that the patient selection is a type of agent and the other type is the scheduling priorities using a contract net protocol. In addition, the authors in [18] presented two agent negotiation models to automatically schedule patients’ meeting using counterproposal approach. The first model main goal is to present new slots for the meeting. The second one attempted to manage that the patient attends in the particular slots. Jemal et al. [19] proposed a Decision Support System based on MAS using Intuitionistic Fuzzy to allow the implementation of the considered project in healthcare spots using cloud and mobile computing technologies. These researchers devolved many agent-based negotiation models for healthcare domain. However, there are limited efforts presented on how the agents are governed in cooperative or competitive manner. In this context, this paper proposed a multi agent coordination based model that manage the interaction between agents using competition, negotiation and cooperative modules. Also, the authors in [20] argued the problem of the negotiating agents operating on MAS by applying an optimization algorithm. An integration between machine learning technique with a negotiation optimization algorithm is presented to analyze intelligent supply chain management. And a number of researchers [22–24] presented solutions for the communications between different types of agents using ontology mapping systems, in order to provide semantic interoperability between agents.
3 The Proposed Multi Agent Coordination-Based Model (MACM) During Medical Diagnosing The main goal of MACM is to propose a multi agent system for coordination between two different medical opinions during medical diagnosis. The first opinion is the senior medical specialists may be remotely monitoring and diagnosing patients and being located far from the care centre. This opinion is simulated in MACM as Senior-agents (S-Agent). The second opinion is the trainee (junior) doctors with less experience. This opinion has enough knowledge about the case of the patient, but have difficulty to reach a diagnosing decision without consulting the senior. This opinion is simulated in MACM as Senior-agents (S-Agent). This situation may lead to conflicting in decisions. In this context, a Multi Agent Coordination-based Model (MACM) is proposed to manage the conflicting decisions that might occurs during the diagnosing process. MACM simulates the interaction between Senior-agents (S-Agent) and Junioragents (J-Agent) medical agents during diagnosing the case of a patient. MACM is responsible for providing agents the ability to coordinate the interactions among different agents. This coordination can be in the form of coordination or competition with other agents. Figure 1 shows the coordination module and its interaction with other modules. S-Agent and J-Agent are associated with a set of task specific agents. According to the user’s needs, these agents cooperate with each other to achieve user’s required task. The coordination module attempts to allocate diagnosis to other agents and synthesizes the results from these agents to generate an overall output. S-Agent and J-Agent are defined as two main agents in MACM using the Java Agent Development Framework (JADE). The medical knowledge of patients’ cases is stored and used by S-Agents. While J-Agents use the raw data in the patient case database that
350
S. Elghamrawy
contains signs and personal data of the patients. When J-Agents need to cooperate with S-Agents to reach a specific diagnosis, a dialogue is initiated between them.
CT-X-ray PCR-CRC data
Cooperate Tasks/Agents Repository
Compete
Patient Information Agents data Repository
Diagnosis Decision
Fig. 1. The Multi Agent Coordination based Model (MACM) for managing medical diagnosis
4 The Proposed Bidding Contract Competition Module (BCCM) The Agent Competition module in MACM is activated if the coordination module of an agent gets information indicating that this agent will compete with other agents to win a specific diagnosis when there is a conflict in decision between the S-agent and J-Agent. To win a specific diagnosis, the agents in competition need to propose a bid and then this bid is revised. After revising the proposed bid, the diagnosis is assigned to pair of agents with the highest bid. As a result, there must be contract messages between these agents to control and manage the interaction between them. To manage this sequence of competition between agents, a Bidding Contract Competition Model (BCCM) is proposed, as shown in Fig. 2. BCCM main goal is to allow agents to compete with each other’s in order to give the right diagnosis decision based on the result of proposed bidding protocol, and to facilitate agents to dynamically generate contracts with other agents. The J-Agents are the agents that announce that they need help for the diagnosis task and they create a task/Plan respiratory to provide information about the diagnosing process. While the S-Agents are responsible for offering the help to these announcements by proposing a bid, and then perform the delegated diagnosis. A competition will be held among different S-Agents in BCCM using the proposed bidding protocol, trying to get the right of taking a specific task. As showed in Fig. 2, BCCM consists of four main phases: (1) The Broadcasting Phase used to connect the J-Agents’ needs with S-Agents’ Offers. It consists of three basic modules: The Task
Healthcare Informatics Challenges: A Medical Diagnosis
Broadcasting Phase
Bidding Phase
Selecting Phase
J-Agents
Task Announcer Tasks/Sub -Tasks Repositor
Bid Formulator
Blackboard Checker
Broadcasted Validator
Agent Bidder Creator
Contracting Phase Negotiation Module
Pair of S-Agents/ contract Library
Agent Pairs/Contact Association
Capabilities Evaluator
Behaviours Evaluator S-Agents
Bid/Agent Association
351
Bid Comparator Contract Message Creator Agent Selector
Result Evaluator
Fig. 2. The Bidding Contract Competition Model (BCCM)
Announcer is used to make each J-Agent announce the task or sub task that need to be diagnosed. The J-Agent broadcasts its information based on pre-defined diagnosis Repository. The Blackboard Checker: Each J-Agent provides their needs in executing a task or subtask on the blackboard space, then the S-Agents fetches the queries coming from the J-Agents using this module. (2) The Bidding Phase used to handle the bids proposed by S-Agents to allocate a specific task, depending on the capabilities and behaviour of S-Agents, it consists of four basic modules: The Agent Bidder Creator Module which is used to collect information about the requested diagnosis broadcasted. Then, all registered S-Agents will create bids to start the competition of allocating the diagnosing task. The Capabilities Evaluator Module is used to evaluate the capabilities tuple for each S-Agent to perform the announced diagnosis. The behaviours Evaluator Module: Also each S-Agent evaluates its behaviours when performing the broadcasted task or sub-task [10]. The Bid Formulator Module: After each S-Agent evaluates its capabilities and behaviours of performing the broadcasted task/sub-task, it formulates a corresponding bid based on bidding protocol, which is in turn sent to the selection phase. (3) The Selection Phase is used to revise the bids proposed by each SAgent and then choose and modify the bid after one bid iteration. It consists of three main modules: The Bid/Agent Association Module which is used to associate each bid for its desired task/subtask and with its formulator S-Agent and stores it in the bid/agent library. The Bid Comparison Module: The J-Agent uses this module to compare the delivered bids from S-Agents. The Agent Selector Module used to select pair of SAgent with the maximum bids for execution a specific diagnosing task. (4) The Contracting Phase used to control the interaction between the competitive S-Agents by generating a contract messages between them. It consists of three main modules: Result Evaluator Module used to evaluate the performance. Contract Message Creator Module used to generate the contract between the selected pair of S-Agents. The Agent Pair/Contact Association collects the output (result) of the BCCM phases by associating each contract message with its corresponding pair of S-Agents and stores it in the agent pair/contact library.
352
4.1
S. Elghamrawy
The Proposed Adaptive Bidding Protocol (ABP)
A bid represents an offer to execute a specific diagnosis based on its announcement. In BCCM context, the bid is defined as an indicator of the capability of S-Agent to perform a specific diagnosis. Generally, agent’s bids in any MAS are influenced by different factors depending on the auction situation they attempt to compete in it. Thus, An Adaptive Bidding Protocol (ABP) is proposed for the bidding and selecting phases of the BCCM, as shown in Fig. 3.
Input= S-Agent Check Blackboard Check Diagnosis Demands
Can Satisfy
Not Available Get S-Agent Capabilities Available
Can’t Satisfy Demands/capabilities Mapping Generate Agent/Task Rules Get strength and certainty for each assigning
Check Task Min(str, cer)
= ABP Selecting Phase
Generate S-Agent
Fig. 3. The ABP protocol used in the bidding phase of BCCM
In the bidding phase, ABP used to generate the S-Agent’s bid in terms of its capabilities and behaviours, and in terms of desired diagnosing’s demands in the auction of task allocation problem. In the selection phase, ABP is used to determine which factor has the deep impact in selecting the pair of S-Agents with highest bid. J-Agents announce the diagnosis needed to be allocated in the blackboard. Then, the S-Agents check the blackboard for the announced tasks. Then, the ABP protocol is used by the S-Agents in the bidding phase of BCCM, as shown in Fig. 3, to generate the bids. This ABP protocol mainly focuses on the factors that describe the agents. Capability of any S-Agent and on the factors describe the tasks. A Mapping is used
Healthcare Informatics Challenges: A Medical Diagnosis
353
after each S-Agent checks the announced task’s demands and checks its capabilities. From this mapping, U1 is obtained that represents the factors that describe S-Agent’s capabilities, behaviours, and announced task’s demands.
5 The Experimental Evaluation A number of experiments are used to validate the performance of the proposed competition module in BCCM. Simply speaking, the problem discussed here is an optimization problem of agent competition for diagnosing process. The objective is to maximize the utility, reduce the cost by shorten the processing time for allocating the diagnosis to the competitive agents and reduce the failure rate. In each experiment, the agents’ bids are generated with random utility. For each bid, the sum of all the agents’ utility are calculated to find pair of bids that maximize the total utilities for agents. There are three experiments in this stage. Number of agents is 100 to 600. Number of Diagnosis tasks is 1 to 15. The limitation of the number of bids per agent is 10 bids per agent. The code was implemented using the .NET technology (Visual Stuio.Net 2019) and run on an intel core i5-8250U processor with 8 GB of RAM. Experiment One: In this experiment the optimal utility of the proposed BCCM is measured with different agents’ number. Different number of agents per task are taken as 100, 200, 300, 400, 500 and 600. The results of implementing BCCM are shown in Fig. 4.
Optimal Utility
1.05
100 Agents
200Agents
300 Agents
400 Agents
500 Agents
600 Agents
1 0.95 0.9 0.85 0.8 0.75 1
2
3
4
5
6
7
8
9
10
11 12 13 14 15 No. of Diagnosis tasks
Fig. 4. The optimal utility of the proposed BCCM
This figure describes that as the number of agents per task increased better results in optimality rate is obtained. In addition, the proposed BCCM achieves much higher (near optimal) results when compared with different recent models: SCM [20] and DSS [16]. It can secure over 98% of the optimal utility; as shown from Fig. 5.
354
S. Elghamrawy
1.1
Optimal Utility
1 0.9
BCCM
0.8
DSS
0.7
SCM
0.6 0.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No. of Diagnosing Tasks
Fig. 5. The optimal utility of BCCM compared to recent models DSS [16] and SCM [20]
This means that the utility of BCCM increases when the diagnosis increased, due to the rewards that the diagnosis granted to the S-Agents, that attempts to allocate or bid for winning the task, this leads to increase the utility for these agents. Experiment Two: In this experiment, the cost needed by BCCM is measured. Figure 6 shows the computation time needed by BCCM, DSS [16] and AIS [12] with 100 agents. In the proposed BCCM, as expected, the computational cost grows rapidly as the size of the contract constraints (problem size) increases. The cost of BCCM is slightly higher than DSS [16] and AIS [12]; however, BCCM gives highest utility and lowest failure rate when compared with them.
8000 7000
CPU Time (ms)
6000 5000
BCCM
4000
DSS
3000
AIS
2000 1000 0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
No. of Diagnosis Task
Fig. 6. The CPU time for BCCM compared to recent models DSS [16] AIS [12]
Healthcare Informatics Challenges: A Medical Diagnosis
355
In BBCM, the agents have the benefits of generating the bids by themselves, finding the winning combinations of S-Agents and assigning contracts to each pair of winning S-Agents, and that prevents any conflict of interests that might happened between mediators, this also grantee the autonomous and independence of agents. Experiment Three: The failure rate of BCCM is measured and compared to the rates of recent models DSS [16] and AIS [12], as shown in Fig. 7. Lower failure rate resulted from BCCM when compared with DSS and AIS. Noticed that DSS depends on a mediator in the bid generation and AR has a limited number of bids, due to that agent can cover only a narrow portion of its utility space with its own bids. As a result, there is a risk of not finding an overlap between the bids from the negotiating agents which maximize the failure rate.
0.25 BCCM
Failure Rate %
0.2
DSS
0.15
AIS
0.1 0.05 0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
No. of Diagnosis Task
Fig. 7. The failure rate of BCCM compared to recent models DSS [16] AIS [12]
6 Conclusions and Future Work The agent coordination module in MACM is responsible for giving agents the ability to coordinate with other agents. This coordination can be cooperated or competed with other agents. The cooperation module in MACM is used for coordination among collaborative agents, however the competition module used for coordination among selfish or competitive agents. The negotiation module in MACM is used to resolve conflicts that might occur when agents compete to assign a specific task. The main contribution of this paper reflected in simulating the behaviour of two types of medical decisions (Senior and Junior agents), in MACM, during medical diagnosing. These agents may have conflicts of decisions during the diagnosis, for this reason, a Bidding Contract Competition Module (BCCM) is proposed to handle the bidding and contracting between agents using an Adaptive Bidding Protocol (ABP). Finally, a number of experiments are performed to validate the effectiveness of the proposed modules, and their associated algorithms; through comparative studies between the result obtained from these modules and those obtained from recent models. The preliminary
356
S. Elghamrawy
results demonstrated the efficiency of BCCM and their associated algorithms. As a future work, an intention for handling the unknown dependencies and relations between the available agents, these dependencies lead to a complex cooperation process.
References 1. O’Donoghue, J., Herbert, J.: Data management within mhealth environments: patient sensors, mobile devices, and databases. J. Data Inf. Qual. 4, 5:1–5:20 (2012). https://doi.org/ 10.1145/2378016.2378021 2. Hassan, M.K., El Desouky, A.I., Elghamrawy, S.M., Sarhan, A.M.: Big data challenges and opportunities in healthcare informatics and smart hospitals. In: Security in Smart Cities: Models, Applications, and Challenges 2019, pp. 3–26. Springer, Cham (2019) 3. Almurshidi, S.H., Abu-Naser, S.S.: Expert System for Diagnosing Breast Cancer. Al-Azhar University, Gaza, Palestine (2018) 4. ELGhamrawy, S.M.: Diagnosis and prediction model for COVID19 patients response to treatment based on convolutional neural networks and whale optimization algorithm using CT images. medRxiv, 1 January 2020 5. Shaju, S., Davis, D., Reshma, K.R.: A survey on computer aided techniques for diagnosing Alzheimer disease. In: 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT), 18 March 2016, pp. 1–4. IEEE (2016) 6. Ferber, J.: Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence. Addison-Wesley, Reading (1999) 7. Tweedale, J., Ichalkaranje, N., Sioutis, C., Jarvis, B., Consoli, A., Phillips-Wren, G.: Innovations in multi-agent systems. J. Netw. Comput. Appl. 30(3), 1089–1115 (2007) 8. Bosse, T., Jonker, C.M., Van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and verification of dynamics in agent models. Int. J. Coop. Inf. Syst. 18(01), 167–193 (2009) 9. Gorbalenya, A.E.: Severe acute respiratory syndrome-related coronavirus–the species and it viruses, a statement of the coronavirus study group. BioRxiv (2020) 10. El-Ghamrawy, S.M., Eldesouky, A.I.: An agent decision support module based on granular rough model. Int. J. Inf. Technol. Decis. Making 11(04), 793–820 (2012) 11. Li, M., Huang, F.: Formal describing the organizations in the pervasive healthcare information system: multi-agent system perspective. In: ICARM 2016 – 2016 International Conference on Advanced Robotics and Mechatronics, pp. 524–529 (2016). https://doi.org/ 10.1109/icarm.2016.7606975 12. Rocha, D., Lima-Monteiro, P., Parreira-Rocha, M., Barata, J.: Artificial immune systems based multi-agent architecture to perform distributed diagnosis. J. Intell. Manuf. 30(4), 2025–2037 (2019). https://doi.org/10.1007/s10845-017-1370-y 13. Yan, C., Lindgren, H., Nieves, J.C.: A dialogue-based approach for dealing with uncertain and conflicting information in medical diagnosis. Auton. Agents Multi-Agent Syst. 32(6), 861–885 (2018) 14. Allioui, H., Sadgal, M., El Faziki, A.: Alzheimer detection based on multi-agent systems: an intelligent image processing environment. In: International Conference on Advanced Intelligent Systems for Sustainable Development, pp. 314–326. Springer, Cham, July 2018 15. Nachabe, L., El Hassan, B., Taleb, J.: Semantic multi agent architecture for chronic disease monitoring and management. In: International Conference on Emerging Internetworking, Data & Web Technologies, pp. 284–294. Springer, Cham, February 2019
Healthcare Informatics Challenges: A Medical Diagnosis
357
16. Gao, J., Wong, T., Wang, C.: Coordinating patient preferences through automated negotiation: a multiagent systems model for diagnostic services scheduling. Adv. Eng. Inform. 42, 100934 (2019) 17. Ahmadi-Javid, A., Jalali, Z., Klassen, K.J.: Outpatient appointment systems in healthcare: a review of optimization studies. Eur. J. Oper. Res. 258(1), 3–34 (2017) 18. Rodrigues Pires De Mello, R., Angelo Gelaim, T., Azambuja Silveira, R.: Negotiation strategies in multi-Agent systems for meeting scheduling. In: Proceeding of 2018 44th Latin American Computing Conference. CLEI 2018, pp. 242–250 (2018). https://doi.org/10.1109/ clei.2018.00037 19. Jemal, H., Kechaou, Z., Ben Ayed, M.: Multi-agent based intuitionistic fuzzy logic healthcare decision support system. J. Intell. Fuzzy Syst. 37(2), 2697–2712 (2019). https:// doi.org/10.3233/jifs-182926 20. Chen, C., Xu, C.: A negotiation optimization strategy of collaborative procurement with supply chain based on multi-agent system. Math. Probl. Eng. 2018 (2018). https://doi.org/10. 1155/2018/4653648 21. Khalifa, N.E.M., Taha, M.H.N., Hassanien, A.E., Elghamrawy, S.: Detection of coronavirus (COVID-19) associated pneumonia based on generative adversarial networks and a finetuned deep transfer learning model using chest X-ray dataset. arXiv preprint (2020). arXiv: 2004.01184 22. Calvaresi, D., Schumacher, M., Calbimonte, J.P.: Agent-based modeling for ontology-driven analysis of patient trajectories. J. Med. Syst. 44(9), 1–11 (2020) 23. El-Ghamrawy, S.M., El-Desouky, A.I.: Distributed multi-agent communication system based on dynamic ontology mapping. International J. Commun. Netw. Distrib. Syst. 10(1), 1–24 (2013) 24. Elghamrawy, S.M., Eldesouky, A.I., Saleh, A.I.: Implementing a dynamic ontology mapping approach in multiplatform communication module for distributed multi-agent system. Int. J. Innovative Comput. Inf. Control 8(7), 4547–4564 (2012)
Protection of Patients’ Data Privacy by Tamper Detection and Localization in Watermarked Medical Images Alaa H. ElSaadawy1(&), Ahmed S. ELSayed1, M. N. Al-Berry1, and Mohamed Roushdy2 1
Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt [email protected] 2 Faculty of Computers and Information Technology, Future University in Egypt, New Cairo City, Egypt
Abstract. Sharing medical documents among different specialists in different hospitals has become popular as a result of modern communication technologies. Accordingly, protecting patients’ data and authenticity against any unauthorized access or modification is a must. Watermarking technique is one of the solutions to protect patients’ information against any signal processing or geometric attacks. This paper uses a tamper detection and localization watermarking technique. It embeds a Quick Response (QR) code generated from the patient’s information into a medical image. In the extraction step, it can detect if the QR image is attached or not. The proposed approach detects signal processing and geometric attacks and localizes the tamper resulting from text addition, content removal and copy and paste attacks. Keywords: Tamper detection Watermark QR code
Tamper localization Medical imaging
1 Introduction Using shared medical images in some services like telemedicine, telediagnosis, and teleconsultation has been facilitated after the availability of computer networks. Sharing patient information among specialists in different hospitals is a must to understand diseases and avoid misdiagnosis [1–3]. One of the available techniques and approaches to protect medical images while transferring through the internet against any corruption or unauthorized access is the watermarking techniques [4]. Hiding the patient’s data into the medical image without distorting the image during transmission is essential to ensure the confidentiality of transmitted data. Recovering the hidden data and the original medical image without errors is the priority in Electronic Patient Record (EPR) data hiding [5, 6]. Since making any modifications on medical images may lead to misdiagnosis, authenticity, which ensures that the source is valid and belong to the right patient, and integrity control, which checks that the image has not tampered, are the major purposes of medical images watermarking [7–9]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 358–369, 2021. https://doi.org/10.1007/978-3-030-58669-0_33
Protection of Patients’ Data Privacy
359
Watermarking techniques can be categorized using three different criteria (each criterion split the methods into different categories). Namely, the working domain, human perception and reversibility. From the working domain perspective, watermarking techniques can be classified into transform and spatial domain. In the spatial domain, the values of the pixels are modified directly to change the colour and intensity of pixels, i.e., information is added by modifying the value of the pixel [10]. Least Significant Bit (LSB), Additive Watermarking and Text mapping are examples of the spatial domain techniques. Spatial domain techniques are simple and have low computational complexity [11]. Working in the transform domain is more complex than spatial domain but it is more robust against attacks [12]. It depends on transforming the medical image into another domain before embedding the watermark. This can be done using the Discrete Wavelet Transform (DWT) [13], Discrete Cosine Transform (DCT) [14] or Singular Value Decomposition (SVD) [15]. Watermarking techniques can be divided, concerning reversibility, into reversible and irreversible techniques. Reversible techniques ensure the recovery of both the medical image and the embedded bits without any distortion [16], while by using irreversible techniques we can recover the embedded bits only [17]. On the other hand, watermarking techniques can be classified based on human perception into two classes: visible watermarks, like logos, and invisible watermarks that can be used in authentication and integrity applications [17, 18]. Invisible watermarking methods can be divided into three groups: robust, fragile, and semifragile. For copyright protection, robust watermarking techniques have been used; as they are robust against multiple attacks [17]. Fragile techniques are used mainly in authentication; as they are sensitive to any linear or nonlinear modification [19, 20]. Finally, semi-fragile methods are used for fuzzy authentications; as they combine the advantages of both robust and fragile techniques [21, 22]. Attacks are one of the most popular challenges of watermarking techniques. The two common attacks are signal processing attacks (like image compression, adding noise and different filters) and geometric attacks (such as rotation, translation and scaling) [23]. In this paper, a fragile, spatial domain watermarking technique is used to detect and localize tampers due to different attacks. The rest of the paper is organized as follows: Sect. 2 presents the literature review, Sect. 3 explains the proposed technique, Sect. 4 shows the experimental results and finally, Sect. 5 contains the conclusions and future work.
2 Literature Review Authentication, integrity and data hiding are priorities in watermarking techniques [24]. Protecting transmitted medical documents against the attack and detecting any tamper on the medical images become the common objective of the watermarking techniques. A brief literature review of watermarking techniques for medical images is presented in this section. Y. AL-Nabhani et al. [25] have developed a blind, invisible and robust watermarking technique against some signal processing attacks, such as Gaussian noise and median filter, and other geometric attacks like JPEG compression, rotation and cropping.
360
A. H. ElSaadawy et al.
The proposed technique used the wavelet domain to embed the watermark. In embedding, the watermark was inserted in the middle-frequency coefficient block of three DWT levels, while in extraction, a Probabilistic Neural Network (PNN) was used. The proposed technique was able to extract the watermark in all cases but, depending on the attack, the quality of some extracted watermarks was poor. A. Sharma et al. [26] have evaluated their proposed method on Magnetic Resonance Imaging (MRI), Computed Tomography Scan (CT Scan) and ultrasound images [27] of size 512 512 and watermark of size 256 256. The proposed method has been evaluated against salt & paper noise, Gaussian noise, low pass filter, histogram equalization and speckle noise from signal processing attacks. Rotation, JPEG compression and cropping from geometric attacks. Their proposed method decomposed the medical image into Region of Interest (ROI) and Non-Region of Interest (NROI) using second-level DWT then embedded the hashed watermark image in ROI and encrypted the EPR and embedded it in NROI. Normalization Correlation (NC) and Bit Error Rate (BER) for evaluating their results. In [12], L. Laouamer et al. presented a tamper detection and localization approach against some attacks like compression, adding noise, rotation, cropping and median filter. Peak Signal to Noise Ratio (PSNR) has been used to measure the robustness of the proposed approach on eight grayscale images of size 255 255 and watermark with size 85 85. the presented approach was semi-blind. It can detect the tamper blocks by extracting the attacked watermark and comparing it with the original one. In the spatial domain, a robust watermarking technique was presented by M. E. Moghaddam et al. [28]. Their presented approach was based on changing the least significant colour for 5 5 neighbours of a certain location, which was selected using the Imperialistic Competition Algorithm (ICA). PSNR was used to evaluate the results. The PSNR changed after applying some attacks especially JPEG compression which indicates that, the extracted watermark is far from the original. A blind watermarking technique was proposed by R. Thanki et al. [4]. The proposed scheme was robust against geometric and signal processing attacks. DCT was applied on the block with High Frequency (HF) of size 8 8 pixels. Then White Gaussian Noise (WGN) sequence is used to modify the mid-band frequency DCT coefficients of each DCT block. These steps were done after applying a Discrete Cosine Transform (DCT). On the other hand, the correlation properties of the WGN sequence was used for watermark extraction. Tamper detection and localization approaches were developed by S. Gull et al. [24]. The proposed approaches are robust against multiple signal processing and geometric attacks and detect tampered caused by text addition, copy and paste and content removal attacks. The approach applied LSB algorithm on blocks of the image of size 4 4 pixels. The proposed approach was tested on some medical images of size 256 256 using PSNR and BER. After analyzing the literature work, we have noticed that some techniques are rousted against signal processing and geometric attacks and others have tamper detection and localization on some attacks like text addition, content removal and copy and paste. All the discussed techniques used greyscale images as the watermark. In [29] we have presented medical images watermarking technique, which generate a QR Code that contains the data of the patient and embed it in the medical image. In this paper, we
Protection of Patients’ Data Privacy
361
extend to tamper detection and localization. we detect signal processing and geometric attacks in addition to text addition, content removal and copy and paste attacks [24].
3 Proposed Tamper Detection Watermarking Technique 3.1
Embedding Technique
The used method [29] based on the original image [24], as shown in Fig. 1, splits the input medical image into blocks of size 4 4 pixels (B). It then sets the last two least significant bits to zero and computes the mean of the block (M) which is embedded into the upper half of the block (B). Besides, the proposed method takes the QR code as a watermark and encrypts it by XORing it with the mean (M) then embeds it in the lower half of the block (B).
Fig. 1. Embedding technique
3.2
Extraction Technique
The main target of the proposed extraction technique is to detect if the encrypted medical image is attacked or not and extract the QR code. As shown in Fig. 2, the proposed method splits the encrypted medical image into 4 4 pixels blocks (B), then computes the mean of the block (B) after setting the least two significant bits to zero (M). The QR code bit is extracted from the lower half of the block (B) then the extracted bit is decrypted by XORing it with the mean (M). To apply tamper detection, the original mean (Mb) is extracted from the upper half of the block (B) and XORed with the computed mean (M). In the case of zero results, the encrypted medical image is not attacked otherwise there is an attack on the image. 3.3
Attacks and Tampers
The proposed method has been evaluated against multiple signal (Gaussian noise, salt and pepper noise, median filter, histogram equalization, sharpening and low pass filter, JPEG), geometric attacks (resize, rotation and crop) and tamper localization (content removal, copy and paste and text addition). a. Singal attacks: For salt and pepper attack we have used Matlab function with noise density 0.05. While the median filter attack using the default 3-by-3 neighbourhood. In Gaussian noise attack, we add white noise of mean zero and variance 0.01 to the image.
362
A. H. ElSaadawy et al.
Fig. 2. Extraction technique
b. Geometric attacks: In rotation attack, we rotate the image with angle 30 clockwise. While the image is resized up with the double size of the original image. For crop attack, we have used a fixed size 50 50 pixels block to be cropped from the image. c. Tamper localization: In content removal, we remove a block with size *50−150 pixel. While in copy and paste, we take a part of 50 pixels and paste it in another position in the image. Finally, text addition we add a word with four characters only “text” and with font size *30−60.
4 Results and Discussion The proposed method has been tested on 138 grey medical images taken from OPENi [30] medical images database as shown in Fig. 3, and the QR code is generated using patients’ data. It has been tested on different sizes of the medical images and QR codes as follows: (64 64, 128 128, 256 256, 512 512, 1024 1024, 2048 2048 and 4096 4096) for medical images and (16 16, 32 32, 64 64, 128 128, 256 256, 512 512 and 1024 1024) for QR codes.
Fig. 3. Sample of dataset
Protection of Patients’ Data Privacy
363
We have evaluated the presented scheme against tamper detection using Bit Error Rate (BER). BER is the percentage of bits with an error related to the total number of bits [31] as shown in Eq. 1. BER ¼
NE NB
ð1Þ
where NE is the number of bits with error and NB is the total number of bits, when the BER is not equal to zero it indicates that this image is attacked. 4.1
Tamper Detection
The results of tamper detection of attacks tested on sample image in Fig. 4 is presented in Table 1 and Table 2, the values represent the percentage of the tampered blocks in the attacked image. The extracted QR codes after different attacks on the sample image with various sizes are presented in Fig. 5. After analyzing the QR code extracted after attacks, it is observed that we can’t extract visually accepted QR Code, we have tested the acceptance of the extracted QR Code using BER. The average BER of each attack on the 138 medical images is recorded in Table 3 and Table 4. After analyzing the results, we have realized that the value of BER for most of the attacks exceed 80% which indicates that the extracted watermark is different from the original watermark. For crop attack, BER is decreased as image size increase this is because we crop the same size of the block on all sizes on images. Regarding sharpening and low pass filter, we apply the same filter for all sizes so it affects the small size more than the large ones. Finally, salt and pepper filter BER are the same for all sizes that is because we add the same ratio of salt and pepper so it affects all of them with the same percentage, also we add a small ration 0.05 so BER for this attack is the smallest one 50%. For the cropping attack, the proposed method was tested using a fixed cropping block size of 100 100 pixel so the extracted watermark will be the same as the original one except for the cropped block. As the image size increases, the BER for the cropping attack is decreased.
Fig. 4. Tested medical image
364
A. H. ElSaadawy et al.
(a)
(b) Fig. 5. Results of attacks on image (a) Signal attacks (b) Geometric attacks
Protection of Patients’ Data Privacy
365
Table 1. Tamper Detection results for Signal attacks on a sample image Size of QR code
Gaussian noise
16 16 32 32 64 64 128 128 256 256 512 512 1024 1024
0.996 0.996 0.993 0.993 0.994 0.994 0.994
Salt & pepper noise 0.511 0.52 0.523 0.533 0.53 0.527 0.528
Median filter
Histogram equalization
0.921 0.873 0.874 0.898 0.91 0.911 0.908
1.0 0.999 0.998 0.999 0.999 0.999 0.999
Sharpening Low pass filter 0.953 1.0 0.896 0.993 0.879 0.997 0.859 0.989 0.846 0.995 0.836 0.996 0.815 0.996
JPEG
0.996 0.966 0.947 0.938 0.935 0.93 0.927
Table 2. Tamper Detection results for Geometric attacks on a sample image Size of QR code 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024
Resize 0.99 0.971 0.967 0.965 0.963 0.961 0.961
Rotation (30) Crop 0.882 0.027 0.854 0.004 0.846 0.004 0.843 0.001 0.84 0.001 0.839 0.0003 0.839 0.0002
Table 3. Average BER results for Signal Attack on 138 medical images Size of QR code
Gaussian noise
16 16 32 32 64 64 128 128 256 256 512 512 1024 1024
0.997 0.997 0.997 0.997 0.997 0.997 0.997
Salt & pepper noise 0.523 0.528 0.529 0.529 0.529 0.529 0.529
Median filter
Histogram equalization
0.896 0.841 0.825 0.816 0.811 0.809 0.804
0.997 0.997 0.998 0.999 0.999 1 0.999
Sharpening Low pass filter 0.913 0.97 0.872 0.939 0.835 0.909 0.804 0.888 0.78 0.877 0.764 0.87 0.738 0.867
JPEG
0.996 0.992 0.987 0.979 0.973 0.973 0.973
366
A. H. ElSaadawy et al. Table 4. Average BER results for Geometric Attack on 138 medical images Size of QR code 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024
4.2
Resize 0.981 0.972 0.968 0.963 0.96 0.958 0.957
Rotation (30) Crop 0.997 0.66 0.997 0.628 0.995 0.148 0.991 0.037 0.987 0.01 0.986 0.002 0.986 0.001
Tamper Localization
Our scheme has been tested against tamper localization in addition to the previous attacks. It has been tested against copy-paste, text addition and content removal. As shown in Fig. 6, the proposed scheme can detect any tamper even in a small region. In the text addition, we have added the word “Text” in two fixed places of the watermarked medical image and it was detected. Moreover, for copy-paste attack, we have used the part of the watermarked medical image and copied it in the watermarked medical image and again the technique detected the added image. Finally, in content removal, we have removed a small block from the watermarked medical image and refill this block with random colours in the range of the surrounded blocks and the scheme was also able to detect the removed region. The average BER of copy-paste, text addition and content removal on the 138 medical images are recorded in Table 5 After analyzing the results, we have realized that BER in all tamper over all images is reported with a value greater than zero that considers an indication that, in all cases and attacks, the tamper is detected regardless its size and location.
Fig. 6. Results of tamper localization on sample images
Protection of Patients’ Data Privacy
367
Table 5. Average BER results for tamper localization attacks on 138 medical images Size of QR code 16 16 32 32 64 64 128 128 256 256 512 512 1024 1024
Copy and paste Text addition 0.0623 0.659 0.062 0.61 0.0623 0.3511 0.062 0.038 0.061 0.06 0.06 0.015 0.059 0.0073
Content removal 0.125 0.046 0.056 0.056 0.018 0.0045 0.0011
5 Conclusion In this paper, a tamper detection and localization watermarking technique are proposed. The proposed method uses the EPR generated QR code as a watermark image and a greyscale medical image as a host image. The host image is split into blocks of size 4 4 pixels, then the mean of the block is computed after setting the least two significant bits to zero and embed it in the upper half of the block. On the other hand, the watermark pixel is embedded in the lower half of the block after encrypting it. In the extraction of the QR code, the watermarked medical image is divided into blocks of 4 4 pixels and the mean is computed, like in the embedding technique, then the watermark is extracted from the lower half of the block and the mean from the upper half of the block. We can detect if the medical image is attacked when the computed mean and the extracted one are not the same. The proposed method was tested on 138 medical images with various sizes against geometric and signal processing attacks using BER as a measure. Tamper localization was tested on 138 medical images from the dataset using text addition, content removal and copy and paste attacks. BER results indicate that the difference between the extracted watermark and the original one is approximately above 80% on geometric and signal attacks except for crop and salt and pepper attacks. For crop attack, block size to be cropped is small relative to the size of the watermark size. For low pass filtering and the sharpening, BER is valued is decreased when the size of the QR code increase because we apply the same filter on all sizes, so the filter effect decreases when the size increases. Besides, for salt and pepper as the percentage of salt and pepper added to the image is about 0.05 of the size of the medical image. The proposed method can detect all the tested attacks on various sizes of medical image and QR codes. Regardless of the size and location of the tamper is detected and can be localized. In the future, we aim at proposing a technique that protects the transmitted image against any attack not only detects the geometrics attack. This can be done by changing the domain of the transmitted image to the frequency domain.
368
A. H. ElSaadawy et al.
References 1. Mousavi, S.M., Naghsh, A., Abu-Bakar, S.A.R.: Watermarking techniques used in medical images: a survey. J. Digit. Imaging 27(6), 714–729 (2014). https://doi.org/10.1007/s10278014-9700-5 2. Kuang, L.Q., Zhang, Y., Han, X.: A Medical image authentication system based on reversible digital watermarking. In: 2009 1st International Conference on Information Science and Engineering (ICISE), pp. 1047–1050 (2009) 3. Bhatnagar, G., Jonathan, W.U.Q.M.: Biometrics inspired watermarking based on a fractional dual tree complex wavelet transform. Future Gener. Comput. Syst. 29(1), 182–195 (2013) 4. Thanki, R., Borra, S., Dwivedi, V., Borisagar, K.: An efficient medical image watermarking scheme based on FDCuT–DCT. Eng. Sci. Technol. Int. J. 20(4), 1366–1379 (2017) 5. Navas, K.A., Thampy, S.A., Sasikumar, M.: EPR hiding in medical images for telemedicine. Int. J. Electron. Commun. Eng. 2(2), 223–226 (2008) 6. Munch, H., Engelmann, U., Schroter, A., Meinzer, H.P.: The integration of medical images with the electronic patient record and their webbased distribution. Acad. Radiol. 11(6), 661– 668 (2004) 7. Rahman, A.U., Sultan, K., Musleh, D., Aldhafferi, N., Alqahtani, A., Mahmud, M.: Robust and fragile medical image watermarking: a joint venture of coding and chaos theories. J. Healthcare Eng. (2018) 8. Jabade, V.S., Gengaje, S.R.: Literature review of wavelet. Int. J. Comput. Appl. 31(1), 28– 35 (2011) 9. Adnan, W.W., Hitam, S., Abdul-Karim, S., Tamjis, M.R.: A review of image watermarking. In: Proceedings of Student Conference on Research and Development, Swedan (2003) 10. Chandrakar, N., Bagga, J.: Performance comparison of digital image watermarking techniques: a survey. Int. J. Comput. Appl. Technol. Res. 2(2), 126–130 (2013) 11. Saqib, M., Naaz, S.: Spatial and frequency domain digital image watermarking techniques for copyright protection. Int. J. Eng. Sci. Technol. (IJEST) 9(6), 691–699 (2017) 12. Laouamer, L., AlShaikh, M., Nana, L., Pascu, A.C.: Robust watermarking scheme and tamper detection based on threshold versus intensity. J. Innov. Digit. Ecosyst. 2(1–2), 1–12 (2015) 13. Ahmad, A., Sinha, G.R., Kashyap, N.: 3-level DWT image watermarking against frequency and geometrical attacks. Int. J. Comput. Netw. Inf. Secur. 6(12), 58 (2014) 14. Zengzhen, M.: Image quality assessment in multiband DCT domain based on SSIM. Optik Int. J. Light Electron Opt. 125(12), 6470–6473 (2014) 15. Benhocine, A., Laouamer, L., Nana, L., Pascu, A.C.: New images watermarking scheme based on singular value decomposition. J. Inf. Hiding Multimed. Signal Process. 4(1), 9–18 (2013) 16. Kaur, M., Kaur, R.: Reversible watermarking of medical images authentication and recovery-a survey. Inf. Oper. Manage. 3(1), 241–244 (2012) 17. Mousavi, S.M., Naghsh, A., Abu-Bakar, S.A.R.: Watermarking techniques used in medical images: a survey. Digit. Imaging 27(6), 714–729 (2014) 18. Mohanty, S.P., Ramakrishnan, K.R.: A dual watermarking technique for images. In: Proceedings of the 7th ACM International Multimedia, pp. 49–51 (1999) 19. Alomari, R.S., Al-aer, A.: A fragile watermarking algorithm for content authentication. Int. J. Comput. Inf. Sci. 2(1), 27–37 (2004) 20. Zhao, Y.: Dual domain semi-fragile watermarking for image authentication (2003) 21. Yu, X., Wang, C., Zhou, X.: Review on semi-fragile watermarking algorithms for content authentication of digital images. Future Internet 56(9), 1–17 (2017)
Protection of Patients’ Data Privacy
369
22. Lin, E.T., Podilchuk, C.I., Delp III, E.J.: Detection of image alterations using semifragile watermarks. In: Proceedings of the SPIE—Security and Watermarking of Multimedia Contents II, USA (2000) 23. Hosny, K.M., Darwish, M.M., Li, K., Salah, A.: Parallel multi-Core CPU and GPU for fast and robust medical image watermarking. In: IEEE Access (2018) 24. Gull, S., Loan, N.A., Parah, S.A., Sheikh, J.A., Bhat, G.M.: An efficient watermarking technique for tamper detection and localization of medical images. J. Ambient Intell. Humaniz. Comput. 11(5), 1799–1808 (2018) 25. Yahya, A.N., Jalab, H.A., Wahid, A., Noor, R.M.: Robust watermarking algorithm for digital images using discrete wavelet and probabilistic neural network. J. King Saud Univ. – Comput. Inf. Sci. 27(4), 393–401 (2015) 26. Sharma, A., Singh, A.K., Ghrera, S.P.: Robust and secure multiple watermarking for medical images. Wireless Pers. Commun. 92(4), 1611–1624 (2018) 27. Zhang, L., Zhou, P.P.: Localized affine transform resistant watermarking in region-ofinterest. Telecommun. Syst. 44(3), 205–220 (2010) 28. Moghaddam, M.E., Nemati, N.: A robust color image watermarking technique using modified imperialist competitive algorithm. Forensic Sci. Int. 233(1), 193–200 (2013) 29. ElSaadawy, A.H., ELSayed, A.S., Al-Berry, M.N., Roushdy, M.: Reversible watermarking for protecting patient’s data privacy using an EPR-generated QR code. In: AICV (2020) 30. OPENi Medica Image dtabase. https://openi.nlm.nih.gov/. Accessed 10 Sep 2019 31. Shimonski, R.J., Eaton, W., Khan, U., Gordienko, Y.: Exploring the sniffer pro interface. In: Sniffer Pro Network Optimization and Troubleshooting Handbook, pp. 105–158 (2002)
Breast Cancer Classification from Histopathological Images with Separable Convolutional Neural Network and Parametric Rectified Linear Unit Heba Gaber1(&), Hatem Mohamed2(&), and Mina Ibrahim3(&) 1
2
3
Department of Information Systems, Faculty of Computers and Information, Menoufia University, Menoufia, Egypt [email protected] Department of Information Systems, Faculty of Computers and Information, Menoufia University, Alexandria, Egypt [email protected] Department of Information Systems, Faculty of Computers and Information, Menoufia University, Cairo, Egypt [email protected]
Abstract. The convolutional neural network has achieved great success in the classification of medical imaging including breast cancer classification. Breast cancer is one of the most dangerous cancers impacting women all over the world. In this paper, we propose a deep learning framework. This framework includes the proposed pre-processing phase and the proposed separable convolutional neural network (SCNN) model. Our pre-processing uses patch extraction and data augmentation to enrich the training set and improve the performance. The SCNN model uses separable convolution and parametric rectified linear unit (PRELU) as an activation function. The SCNN shows superior performance and faster than the pre-trained neural network models. The SCNN approach is evaluated using the BACH2018 dataset [1]. We test the performance using 40 random images. The framework achieves accuracy between 97.5% and 100%. The best accuracy is 100% for multi-class and binary class. The framework provides superior classification performance compared to existing approaches. Keywords: Deep learning SCNN PRELU Convolutional Neural Network (CNN)
1 Introduction Breast cancer is one of the most difficult and dangerous diseases that a person can face in his life. Women are considered the most sensitive to breast cancer. According to a study by the American Cancer Society (ACS) for 2020, in the USA, the estimated deaths of women due to breast cancer is near to 42,170, also about 276,480 new cases of invasive breast cancer will be diagnosed in women, and about 48,530 new cases of carcinoma in situ will be diagnosed [2]. The early and correct diagnosis helps in the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 370–382, 2021. https://doi.org/10.1007/978-3-030-58669-0_34
Breast Cancer Classification from Histopathological Images
371
process of treatment and reduce the number of deaths. The pathologists play a very important role in the process of diagnosis, which is done manually, the manual process may lead to an error in diagnosis. Also, it is considered a stressful process for pathologists and consumes a lot of time and depends on the accuracy and clarity of the image [3]. Computer-aided detection system (CAD) has used to overcome the misdiagnosis problem, we generate a framework that increases the performance of the classification of breast cancer and reduce the time wasted in the diagnosis process. This framework depends on the convolutional neural network. Our model depends on work in a separable convolution neural network which is used less multiplication process than the traditional convolution and is faster than the traditional [4]. The framework is evaluated using the BACH2018 dataset [1]. The accuracy, specificity, sensitivity, precision, and f1-score were used as evaluation metrics. This paper is organized as follows: Sect. 2: illustrates the related works. Section 3: discuss the proposed methods. Section 4: illustrates the materials that we will use. Section 5: discuss the results and discussion. Section 6: discuss the analysis and comparisons of results. Section 7: illustrates the conclusion and future work.
2 Related Work The traditional machine learning approaches including the support vector machine, principle component analysis, and random forest encountered major shortcomings in image classification. Therefore, researchers have tended to use deep learning. Nowadays, a deep learning approach provides a superior performance of the classification of medical imaging with different modalities. The study by Bayramoglu et al. [5], proposed a CNN model with different sizes of convolution kernels 7 7, 5 5, and 3 3. They evaluated their model with the BreakHis dataset. They performed patientlevel classification and reported 83.25% accuracy for binary classification. In another study by Spanhol et al. [6], proposed a model similar to AlexNet with different fusion techniques for image and patient-level classification of breast cancer. This study that classifies the BreakHis dataset reported 90% and 85.6% accuracy for image and patient-level classification respectively. Besides, the study by Araujo et al. [7], proposed a CNN-based approach to classify the BC Classification Challenge 2015 dataset. Araujo’s model achieved approximately 77.8% accuracy when classifying four classes, and 83.3% accuracy for binary class. Also, the study by Chennamsetty et al. [8], presented a multi-classification of breast cancer from histopathological images using an ensemble of Pre-Trained Neural Networks (Resnet-101, Densenet-161). They used BACH 2018 dataset. They tested the proposed ensemble using 40 random chosen images and achieved an accuracy of 97.5%, and when they classify 100 images provided by the organizers, the proposed scheme achieved an accuracy of 87%. Chennamsetty’s won the first place of ICIAR2018 Grand Challenge on Breast Cancer Histology Images. Also, Kwok [9], proposed a method using a pre-trained model (Inception-Resnet-v2) for the multiclass classification of breast cancer using the BACH 2018 dataset. Different data augmentation methods and patches were employed to improve the accuracy of the method. In Kwok study, the accuracy of the 100 test images provided by the organizers was 87%. The framework won the first place of
372
H. Gaber et al.
ICIAR2018 Grand Challenge. In 2019 Alom et al. [10], proposed Inception Recurrent Residual Convolutional Neural Network (IRRCNN) model for breast cancer classification. The IRRCNN model combines the strength of the Inception Network (Inception-v4), the Residual Network (ResNet), and the Recurrent Convolutional Neural Network (RCNN). The model was tested with the BC Classification Challenge 2015 dataset. They achieved 99.05% and 98.59% testing accuracy for the binary and multi-class cases respectively. In this paper, we will present the proposed framework that we used to classify BACH 2018 dataset [1]. Our framework is divided into two parts as shown in Fig. 1. Part1 includes the pre-processing stage and proposed SCNN model. In part2 the trained model is used to predict the test set. Finally, we evaluated the performance of our framework using accuracy, precision, recall, f1-score and confusion matrix [12]. The pre-processing stage is divided into two parallel processes. The patch extraction process and the data augmentation process then images are resized as shown in Fig. 1.
Fig. 1. Overview of the framework.
Breast Cancer Classification from Histopathological Images
373
3 Methods 3.1
Patch Extraction
This process was performed to enrich our training set with samples and improve the performance of the SCNN model. A sequential patch extraction was used. The input image was cropped into patches. Patch size depends on the original size of the image in the dataset. For example, in BACH 2018. patches were cropped from each image in the dataset, using a patch size of 1495 1495 pixels and stride of 100 pixels [9]. The 400 histological images were cropped into 4000 patches, 10 patches for each sample. Figure 2 shows the patch extraction process.
Fig. 2. Process of patch extraction applied on BACH 2018.
3.2
Data Augmentation
The data augmentation is used to increase the amount of training set using information only in the training set and to reduce the overfitting in the test set. We applied different data augmentation techniques including Vertical flipping, horizontal flipping, (Vertical & horizontal) both flipping, crop images, Gaussian blur with random sigma between 0 and 0.5, Add Gaussian noise, Average Pooling and affine transformations to each image (Scale = 2.5, translate percent = 0.04 and rotate = 15%) [8, 10]. The augmented samples generated from this process were selected randomly from the ten different data augmentation techniques. In BACH 2018 dataset, the number of samples in the training set was 320. The total augmented samples generated were 3200. Finally, 1280 samples were selected randomly. 3.3
Resize Process
All generated samples from the data augmentation process, the patch extraction process, and the training set were resized. In the BACH2018 dataset, the samples were resized to 299 299 pixels. 3.4
Proposed Separable Convolutional Neural Network (SCNN) Model
After applying the pre-processing, the training set now is ready to pass through the model as it is illustrated in Fig. 1. The proposed model is a convolutional neural network model. The model depends on the separable convolution (SeparableConv2D) unit and parametric rectified unit (PRELU) as an activation function. The separable convolution uses the depthwise spatial convolution [4] which is depthwise convolution
374
H. Gaber et al.
followed by a pointwise convolution that mixes the resulting output channels. The separable convolution is faster than the traditional convolution because it uses one dimension filters and less multiplication operations than the traditional convolution and gives better performance. The PRELU unit is a type of non-linear activation function. This PRELU function was tested in [11] study and it was more qualified than the other activation functions. The main advantage of the SCNN model is that it provides better performance with fewer network parameters when compared to the pre-trained neural networks. The structure of the SCNN model as following. We start this model with a fully connected layer that receives the input image with size 48 48 3, then three blocks are applied which will be illustrated in the next paragraph. After the third block, two fully connected layers are performed. Each one is followed by a dropout layer with a 50% dropout rate. Finally, a fully connected output layer with softmax as activation function is used for multi-classification and sigmoid function for binary classification. This model contains 39 layers with 4844132 network parameters. Table 1 illustrates the 39 layers of the model. The input to each layer, output, and parameters of each layer are explained in the table. This model consists of three blocks, each block contains one or more sub-block. Each sub-block consists of separableConv2d layer, activation layer with activation function PRELU and batch normalization layer. The sub-blocks are followed by a maxpooling layer with a pooling size 2 2 and a dropout layer with a 25% dropout rate as shown in Fig. 3. The first block has one sub-block. The parameters of the SeparableConv2D layer in this sub-block were 32 filters, the dimensions of the kernel were 3 3 and padding same. The second block has two sub-block. The parameters of the SeparableConv2D layer in the two sub-block were a total of 64 filters, the dimensions of the kernel were 3 3 and padding same. The third block has three sub-blocks. The parameters of the SeparableConv2D layer in the three sub-block were a total of 128 filters, the dimensions of the kernel were 3 3 and padding same.
Fig. 3. Diagram of SCNN model for two blocks.
Breast Cancer Classification from Histopathological Images
375
Table 1. The 39 layer of the proposed SCNN model. Layer number Layer name
Input
Output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
(48, 48, 32) (48, 48, 32) (48, 48, 32) (48, 48, 32) (48, 48, 32) (48, 48, 32) (48, 48, 32) (48, 48, 32) (24, 24, 32) (24, 24, 32) (24, 24, 64) (24, 24, 64) (24, 24, 64) (24, 24, 64) (24, 24, 64) (24, 24, 64) (12, 12, 64) (12, 12, 64) (12, 12, 128) (12, 12, 128) (12, 12, 128) (12, 12, 128) (12, 12, 128) (12, 12, 128) (12, 12, 128) (12, 12, 128) (12, 12, 128) (6, 6, 128) (6, 6, 128) (6, 6, 512) (6, 6, 512) (6, 6, 512) (6, 6, 512) (18432) (256) (256) (256) (256) (4)
(48, 48, 32) 128 (48, 48, 32) 0 (48, 48, 32) 128 (48, 48, 32) 0 (48, 48, 32) 1344 (48, 48, 32) 0 (48, 48, 32) 128 (24, 24, 32) 0 (24, 24, 32) 0 (24, 24, 64) 2400 (24, 24, 64) 0 (24, 24, 64) 256 (24, 24, 64) 4736 (24, 24, 64) 0 (24, 24, 64) 256 (12, 12, 64) 0 (12, 12, 64) 0 (12, 12, 128) 8896 (12, 12, 128) 0 (12, 12, 128) 512 (12, 12, 128) 17664 (12, 12, 128) 0 (12, 12, 128) 512 (12, 12, 128) 17664 (12, 12, 128) 0 (12, 12, 128) 512 (6, 6, 128) 0 (6, 6, 128) 0 (6, 6, 512) 66048 (6, 6, 512) 0 (6, 6, 512) 2048 (6, 6, 512) 0 (18432) 0 (256) 4718848 (256) 0 (256) 1024 (256) 0 (4)multi-class 1028 (4) 0
3.5
dense_1 (Dense) activation_1 (Activation = PRELU) batch_normalization_1 dropout_1 (Dropout = 0.25) separable_conv2d_1(filters = 32) activation_2 (Activation = PRELU) batch_normalization_2 max_pooling2d_1(Pooling_size = (2,2)) dropout_2 (Dropout = 0.25) separable_conv2d_2(filters = 64) activation_3 (Activation = PRELU) batch_normalization_3 separable_conv2d_3(filters = 64) activation_4 (Activation = PRELU) batch_normalization_4 max_pooling2d_2(Pooling_size = (2,2)) dropout_3 (Dropout = 0.25) separable_conv2d_4(filters = 128) activation_5 (Activation = PRELU) batch_normalization_5 separable_conv2d_5(filters = 128) activation_6 (Activation = PRELU) batch_normalization_6 separable_conv2d_6(filters = 128) activation_7 (Activation = PRELU) batch_normalization_7 max_pooling2d_3(Pooling_size = (2,2)) dropout_4 (Dropout = 0.25) dense_2 (Dense) activation_8 (Activation = PRELU) batch_normalization_8 dropout_5 (Dropout = 0.5) flatten_1 (Flatten) dense_3 (Dense) activation_9 (Activation = PRELU) batch_normalization_9 dropout_6 (Dropout = 0.5) dense_4 (Dense) activation_10 (Activation = softmax)
Parameters
Training Methodology
Initially, we split the dataset randomly to 80% for the training set, 10% for the validation set to monitor progress, and tune the model. And 10% for the testing set [8]. We used this partition ratio depending on the recent studies to simply compare our results with their results. Therefore, the number of training set is 320 samples, the validation set is 40 samples and the test set is 40 samples. We prepared five methodologies of the pre-processing process are shown in Table 2. These methodologies are only data augmentation, only 10 patches, only 14 patches, data augmentation with 10 patches,
376
H. Gaber et al. Table 2. The pre-processing methodologies.
Data augmentation 14 patches 10 patches Data augmentation and 14 patches Data augmentation and 10 patches
Number of samples 1280 (selected randomly) 14 * 400 = 5600 10 * 400 = 4000 1280 + 5600 = 6880 1280 + 4000 = 5280
Training 320 (selected randomly) 320 320 320 320
Total 1600 5920 4320 7200 5600
and data augmentation with 14 patches. Every one of the five methodologies are added to the training set. Finally, we have five different training sets. We have trained the model with these five training sets. We used a machine with two GPUs (NVIDIA GeForce GTX 1060 Ti). The back propagation is performed by the Adagrad optimization function with a constant learning rate equals 0.05. This learning rate number is chosen because it is suitable for the optimization function and the batch size number. We also use batch size 64 depending on the number of samples in the training set and to help the model to be more stable. All of these parameters are dependent on each other. By these parameters, the model is very stable and reaches the optimal performance within 40 epochs. Besides, the objective function is a categorical cross-entropy for multi-class and binary crossentropy for binary classification. 3.6
Evaluation Metrics
To evaluate the proposed framework, we used accuracy, precision (2), recall, or sensitivity (3), Specificity (3), F1-score, and confusion matrix as evaluation metrics [12]. TP: True positive, FP: false positive, FN: false negative, TN: True negative. Accuracy is the correctly identified prediction for each class divided by the total number of dataset. It is calculated using the following formula (1): ACCURACY ¼ ðTP þ TN Þ=ðTP þ TN þ FP þ FN Þ
ð1Þ
PRECISION ¼ TP=ðTP þ FPÞ
ð2Þ
RECALL ¼ TP=ðTP þ FN Þ
ð3Þ
SPECIFICITY ¼ TN=ðTN þ FPÞ
ð4Þ
F1-score is a measure of test accuracy, and it uses both precision and recall to compute the scores. It is a good metric when test data is imbalanced. F1-score is calculated using the following formula (5): F1 SCORE ¼ 2xððprecisionxrecallÞ=ðprecision þ recallÞÞ
ð5Þ
Breast Cancer Classification from Histopathological Images
377
4 Materials Breast Cancer Classification Challenge 2018 Dataset (BACH 2018) was made available as part of the ICIAR-2018 grand challenge [1]. This dataset consists of high resolution (2048 1536) pathology images, which are annotated H&E-stained images for breast cancer classification released in 2018. The dataset consists of 400 images that were equally distributed across four classes (100 samples per class). The four classes are normal tissue which is a non-cancerous sample, benign which is a non-cancerous breast conditions are unusual growths, in situ which is a non-invasive cancer where abnormal cells have been found in the lining of the breast milk duct, and invasive carcinoma in which the abnormal cancer cells that began forming in the milk ducts have spread beyond the ducts into other parts of the breast tissue. For binary classification, the normal and benign classes are combined to generate non-carcinoma class and the in situ and invasive classes are combined to generate carcinoma class. It is the update of the bioimaging 2015 dataset that is classified in the Araujo’s study [7]. Sample images which display the four classes of BACH 2018 are shown in Fig. 4.
Fig. 4. Normal tissue. Benign lesion. In situ carcinoma. Invasive carcinoma.
5 Results and Discussion 5.1
The Model Performance Evaluation
After learning the model on the five training sets that we prepare with the five different methodologies. We evaluate the model optimization using the accuracy and loss function. The loss function is calculated on training and validation and its interpretation is based on how well the model is doing in these two sets. It is the sum of errors made for each sample in training or validation sets. Loss value implies how poorly or well a model behaves after each iteration. For the training and validation sets and through 40 iteration or epochs, the experimental results prove that the highest accuracy with the lowest loss (with the dark blue line) happened when using 10 patches and data augmentation methodology for preparing the training set as shown in Figs. 5, 6. And the lowest accuracy with the highest loss when using data augmentation methodology for preparing the training set (with the green line).
378
H. Gaber et al.
VALIDATION ACCURACY FOR BACH2018 DATASET
ACCURACY
ACCURACY
TRAINING ACCURACY FOR BACH 2018 DATASET 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
AUG 14PATCH 10PATCH 14PATCH+AUG 10PATCH+AUG
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
AUG 14PATCH 10PATCH 14PATCH+AUG 10PATCH+AUG
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
EPOCHS
EPOCHS
Fig. 5. Training and validation accuracy with five methodologies of pre-processing through 40 epochs. Aug means data augmentation and patch means patch extraction.
VALIDATION LOSS FOR BACH2018 DATASET
TRAIN LOSS FOR BACH 2018 DATASET 1.2
0.6
1
0.5
0.8
AUG
LOSS
LOSS
0.4
14PATCH
0.3
10PATCH
0.2
AUG 14PATCH
0.6
10PATCH
0.4
14PATCH+AUG
14PATCH+AUG 0.1
10PATCH+AUG
0.2
10PATCH+AUG
0
0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
EPOCHS
EPOCHS
Fig. 6. Training and validation loss with five methodologies of pre-processing through 40 epochs. Aug means data augmentation and patch means patch extraction.
After learning the model using the five training sets that prepared using the five methodologies that are shown in Table 3. We used the trained models to predict 40 randomly chosen images from the four classes. The experimental results prove that the highest accuracy achieved when we used methodology (5) although it’s training set samples are less than methodology (4) and the lowest test accuracy happened when using methodology (1) in which the training set constructed with data augmentation. Also for binary classification of class carcinoma and class non-carcinoma, the methodology (5) win. Table 3. Prediction accuracy when using 40 samples of four classes. Model+ methodologies SCNN model+ data augmentation (1) SCNN model+ 14 patches (2) SCNN model+ 10 patches (3) SCNN model+ 14 patches+ data augmentation (4) SCNN model+ 10 patches+ data augmentation (5)
Total number of training set 1600 5920 4320 7200
Test Accuracy 68% 90% 93% 95%
5600
100%
Breast Cancer Classification from Histopathological Images
5.2
379
Evaluation of Methodology (5) Using Evaluation Metrics
For multi-class, we generate a confusion matrix for 40 random imbalance distribution samples of the four classes (benign = 11, in situ = 9, invasive = 6, and normal = 14), we use the imbalance distribution to prove our result quality, but we also check the balance distribution that generates the same result. Depending on the confusion matrix that is shown in Table 4, we calculate precision, recall, f1-score, accuracy, and specificity and we achieve 100% for all as illustrated in Table 6. For binary class, we generate a confusion matrix for 40 random imbalance distribution samples of the two classes (carcinoma = 19 and non-carcinoma = 21). Depending on the confusion matrix that is shown in Table 5, we calculate precision, recall, f1-score, accuracy, and specificity and we achieve 100% for all as illustrated in Table 7. Table 4. Confusion matrix of 4 classes.
Table 5. Confusion matrix of 2 classes. PREDICTION
PREDICTION In situ
Invasive
Carcinoma
Normal
Benign
11
0
0
0
In situ
0
9
0
0
Invasive
0
0
6
0
Normal
0
0
0
14
TRUTH
TRUTH
Benign
Noncarcinoma
Carcinoma
19
0
Noncarcinoma
0
21
Table 6. Test performance of four classes. Class name #OF IMAGES BENIGN 11 INSITU 9 INVASIVE 6 NORMAL 14
PRECISION RECALL F1SCORE 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
ACCURACY SPECIFICITY 100% 100% 100% 100%
100% 100% 100% 100%
Table 7. Test performance of 2 classes. Class name #OF IMAGES Carcinoma 19 Non21 carcinoma
PRECISION RECALL F1SCORE 100% 100% 100% 100% 100% 100%
ACCURACY SPECIFICITY 100% 100%
100% 100%
380
H. Gaber et al.
6 Analysis and Comparisons of Results We illustrate the previous studies that classify the BACH2018 dataset. As we illustrate in Tables 9 and 10, these studies (Golatkar, [13]- Rakhlin, [14]- Chennamsetty, [8]Kwok, [9]) proposed a pre-trained models of neural network and achieved accuracy 93%, 93.8%, 97.5%, 98% respectively for binary classification. 85%, 87.2%, 97.5%, 98% for multi-classification. The studies (Araújo, [7]- Alom, [10]) classifies the Bioimaging 2015 dataset. The first one proposed CNN + SVM and achieved 83.3%, 77.8% for binary and multi-class, the second proposed IRRCNN model + data augmentation, and achieved 99.09%, 98.59% for binary and multi-class. Our proposed model with 10 patches and data augmentation has achieved 100% for all of the evaluation metrics for binary and multi-class. Therefore, our method shows significant improvement in the state-of-the-art performance for both binary and multi-class breast cancer recognition. The computation time for this experiment is given in Table 8. As shown in the table methodology (5) reported better results with less time than methodology (4). Table 8. Computational time per sample for the breast cancer classification experiments. DATASET MODEL BACH2018 SCNN + Methodology 5 (10 PATCH + Augmentation) BACH2018 SCNN + Methodology 4 (14 PATCH + Augmentation)
NUMBER OF SAMPLES 5600
Epochs TIME (M) 40
> > R0 ð xÞ ¼ 0 > > > > < Rk ð xÞ ¼ 0 > J 0 ð xÞ 0 > > > > > > jK j ¼ 0 > > : Ip:dst 0; p ¼ 1; 2; . . .; n
ð10Þ
For jK j ¼ 0 in formula (10), write it as f ðxÞ ¼ 0; For Ip.dst 0 in formula (10), write it as J1(x) 0; In order to transform constrained problem into unconstrained problem. This paper gives the objective function shown below: D0 ¼ D þ
N1 N1 X 2 X 2 1 1 1 min 0; J0 ðx) þ min 0; J 1 ðxÞ þ ½f ðxÞ2 dk fk l k k
ð11Þ
Among them, dk, fk and l are penalty factors. dk is set as 0.5, fk is set as 0.5, l is set as 0.8.
562
J. Luo et al.
5 Algorithm Optimization 5.1
Basic PSO
PSO is an algorithm that relies on swarm intelligence to search randomly. In the iterative calculation, each particle updates its own speed and position by constantly updating Pbest and gbest, so as to find the best position of the particle. Particle position and velocity are updated as follows formula (12): (
vki þ 1 ¼ wvki þ c1 r1 P xki þ 1 ¼ xki þ vki
xk best:i i
þ c2 r2 g
xk best i
ð12Þ
In formula (12), w is the inertia weight; c1 and c2 are acceleration factors, and k is the current iteration number. 5.2
Improved PSO
5.2.1 Simplified PSO This paper, based on the basic PSO and combination with reference [7], proposes a mean simplified PSO by removing the speed term in the algorithm, and its position update formula is shown in Eq. (13): Xit þ 1 ¼ wXit þ c1 r1
p þ g
p þ g
best best best Xit þ c2 r2 best Xit 2 2
ð13Þ
This improved method can make the particle move to the current optimal position by using the particle itself and the global optimal position, so that the particle can find the global optimal position faster, thus effectively avoiding the problem of premature convergence of the algorithm. 5.2.2 Acceleration Factor In PSO, the acceleration factor reflects the information exchange between particles.In this paper, referring to the literature [8], setting a larger value of c1 and a smaller value of c2 in the early stage of the search, reducing the value of c1 and increasing the value of c2 in the later stage of the search, so that more particles can learn from the global optimization, while less particles learn from the local optimization. Therefore, this article proposes the acceleration factor change formula as follows formula (14): c1 ðkÞ ¼ c1ini c1ini cifin ðk=Tmax Þ c2 ðk Þ ¼ c2ini c2fin c2ini ðk=Tmax Þ
ð14Þ
In formula (14), c1ini and c1fin represent the Initial and end values of the acceleration factor c1, c2ini and c2fin represent the Initial and end values of the acceleration factor c2, k is the current number of iterations.
Power Grid Critical State Search Based on Improved Particle Swarm Optimization
563
5.2.3 Dynamic Inertia Weight Inertia weight is a key parameter for local and global search of balance algorithm. This paper, proposes a strategy to dynamically change inertia weight,and its change formula can be expressed as follows: w ¼ wmin þ ðwmax wmin Þ cosðp t=2Tmax Þ þ r betarnd ða; bÞ
ð15Þ
Among them, r is the inertia adjustment factor, and betarnd generates random numbers following the beta distribution. The third term uses beta distribution to adjust the overall value distribution of inertia weight, and adds inertia adjustment factor before betarnd to control the deviation degree of w, so as to further improve the search accuracy of the algorithm. 5.3
Improved PSO Calculation Flow
According to the above analysis, corresponding to the closest critical initial running state, the algorithm flow chart is as follows Step 1: Set the initial fault branch and calculate the data after power flow. Step 2: Read in system parameters, set algorithm parameters and objective functions. Step 3: In the algorithm, the nodal injection power vector of power grid is used as particles and initialize the particle population. Step 4: The fitness value is calculated to find the individual optimal value and the global optimal value. Step 5: The acceleration factor and inertia weight are calculated by formula (13)–(15), and the position of each particle is updated in time. Step 6: Determine whether the stop condition is satisfied, if it is satisfied, exit the program. Otherwise, continue the iteration until the stop condition is satisfied.
6 Example Analysis This paper takes IEEE-14 node system as an example to further explain the proposed algorithm. It will be described in detail below. In the algorithm, the parameters of each component of the system and the calculated D value are expressed by the per-unit, where the reference capacity is taken as 100 MVA. In this paper, the improved PSO and the basic PSO are tested by MATLAB simulation. In this test, the specific parameters are set as the number of particle population is 40, the maximum number of iterations is Tmax = 200. In the PSO, acceleration factor c1 = c2 = 2, inertia weight w decreases linearly from 0.9 to 0.1; in the improved PSO, acceleration factors c1ini = 2, c1fni = 0.5, c2ini = 0.5, c2fni = 2, inertia weight wmin = 0.1, wmax = 0.9. Assuming that the initial failure is the branch between node 3 and node 4, that is, branch L42. In order to facilitate analysis and comparison, in this paper, it is assumed that each branch of IEEE-14 system is equipped with a back-up protection of currenttype, and the setting value of protection current is 3 KA.
564
J. Luo et al.
Figure 1 shows the result of D-value obtained by PSO, with 200 iterations. The abscissa represents the number of iterations, and the ordinate represents the distance. It can be seen from the figure that the shortest power distance D corresponding to the optimal value is 3.2266.
Fig. 1. Power shortest distance D of basic PSO.
Figure 2 shows the result of D-value obtained by the improved PSO, with 200 iterations. It can be seen from the figure that the shortest power distance D corresponding to the optimal value is 2.3642.
Fig. 2. Power shortest distance D of improved PSO.
Figure 3 shows the results of the optimal fitness obtained by PSO, with 200 iterations. The abscissa in the figure represents the iteration times, and the ordinate represents the fitness. We can see from the figure that the best fitness is 0.0663.
Power Grid Critical State Search Based on Improved Particle Swarm Optimization
565
Fig. 3. Optimal fitness of basic PSO.
Figure 4 shows the results of the optimal fitness obtained by the improved PSO, with 200 iterations. We can see from the figure that the best fitness is 0.0557.
Fig. 4. Optimal fitness of improved PSO.
It can be seen from the comparison results that in calculating the shortest distance between the current state and the critical state of the system, the improved PSO reduces from 3.2266 to 2.3642. In calculating the optimal fitness, the improved PSO reduces from 0.0663 to 0.0557. The improved PSO is used to repeatedly calculate the D value and the optimal fitness of the system. The D value converges to about 2.364, and the optimal fitness converges to about 0.063. This shows that the improved PSO improves the PSO optimization level, increases the search range, and improves the calculation accuracy. Therefore, the improved algorithm can find the closest critical point of triggering the cascading fault more quickly, which provides a reference for preventing interlocking trip accident.
566
J. Luo et al.
7 Conclusion In recent years, although the power system continues to improve its own stability, the power failure accidents still happen frequently, the main reason is the cascading trip. This article starts with the manifestation of the early stage of the cascading trip, on the basic of the action mechanism of relay protection, the distance protection is used as the overload protection of the branch, and an optimization model is established to find the critical point of the power grid. The variable to be optimized is the injected power of the node corresponding to the running state of the cascading trip boundary. The improved PSO is applied to solve the model, on the basic of the simplified PSO, a simplified mean PSO with dynamic adjustment of acceleration factor and inertia weight are given. By modifying the individual optimal position and the global optimal position in the algorithm position formula, adding the dynamic adjustment of beta distribution to the inertia weight and introducing the dynamic acceleration factor, it not only increase the variety of the population, but also makes the algorithm has good global convergence ability. The simulation results show that the improved PSO can achieve better optimization results in all target directions. Under the same iteration times, the improved PSO in this paper has faster convergence speed, better stability and calculation accuracy compared with PSO, and can quickly and accurately find out the critical point of triggering cascading fault, which provides reference for further research on preventing cascading trip in the future. Acknowledgment. This research was financially supported by Scientific Research Development Foundation of Fujian University of Technology under the grant GY-Z17149, and Scientific and Technological Research Project of Fuzhou under the grant GY-Z18058.
References 1. Gan, D.-Q., Jiang-Yi, H., Han, Z.-X.: Thinking on some international blackouts in 2003. Power Syst. Autom. 28(3), 1–5 (2004) 2. Huang, X.-Q., Huang, Y., Liu, H., et al.: Analysis of the importance of the root causes of power production accidents based on the dynamic weight Delphi method. Electr. Technol. 18 (3), 89–93 (2017) 3. Deng, H.Q., Lin, X.Y., Wu, P.P., Li, Q.B., Li, C.G.: A method of power network security analysis considering cascading trip. In: Pan, J.S., Lin, J.W., Liang, Y., Chu, S.C. (eds.) Genetic and Evolutionary Computing. ICGEC 2019. Advances in Intelligent Systems and Computing, vol. 1107. Springer, Singapore (2020) 4. Zhu, J.-W.: Power System Analysis. China Power Press, Beiping (1995) 5. Deng, H.-Q., Li, C.-G., Yang, B.-L., Alaini, E., Ikramullah, K., Yan, R.: A Method of Calculating the Safety Margin of the Power Network Considering Cascading Trip Events. Springer (2020) 6. Deng, H.Q., Wu, P.P., Lin, X.Y., Lin, Q.B., Li, C.G.: A method to prevent cascading trip in power network based on nodal power. In: Pan, J.S., Lin, J.W., Liang, Y., Chu, S.C. (eds.) Genetic and Evolutionary Computing. ICGEC 2019. Advances in Intelligent Systems and Computing, vol. 1107. Springer, Singapore (2020)
Power Grid Critical State Search Based on Improved Particle Swarm Optimization
567
7. Huang, Y., Lu, H.-Y., Xu, K.-B., Shen, G.-Q.: Simplified mean particle swarm optimization algorithm with dynamic adjustment of inertia weight. Microcomput. Syst. 39(12), 2590–2595 (2018) 8. Teng, Z.-J., Lv, J.-L., Guo, L.-W., Wang, Z.-X., Xu, H., Yuan, L.-H.: Particle swarm optimization algorithm based on dynamic acceleration factor. Microelectr. Comput. 34(12), 125–129 (2017)
Study of PSO Optimized BP Neural Network and Smith Predictor for MOCVD Temperature Control in 7 nm 5G Chip Process Kuo-Chi Chang1,2,7(&), Yu-Wen Zhou1,2, Hsiao-Chuan Wang3, YuhChung Lin1,2, Kai-Chun Chu4, Tsui-Lien Hsu5, and JengShyang Pan6 1
7
School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China [email protected] 2 Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China 3 Institute of Environmental Engineering, National Taiwan University, Taipei, Taiwan 4 Department of Business Management, Fujian University of Technology, Fuzhou, China 5 Institute of Construction Engineering and Management, National Central University, Taoyuan, Taiwan 6 College of Computer Science and Engineering, Shandong University of Science and Technology, Shandong, China College of Mechanical and Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan
Abstract. The key industries of information technology in the semiconductor integrated circuit industry will have an important role after 2020, In the third generation of advanced semiconductor of 7 nm 5G chips GaN is be used, MOCVD is a key technology for preparing high-quality communication semiconductor crystals. This study proposes PID controller of PSO and BP neural network algorithm to improve the control ability of MOCVD temperature. From the research results, it is found that the method proposed in this study adopts the PSO and BP neural network intelligent PID algorithm controller with Smith predictor, which has better dynamic performance, and the value from 150 to 500 s is stable from 0 to 1 no vibration, any overshoot and short adjustment time, ideal control. Keywords: MOCVD 5G chip process BP neural network predictor Intelligent control PSO algorithm
Smith
1 Introduction The key industries of information technology in the semiconductor integrated circuit industry will have an important role after 2020, especially in 5G mobile communications. With the rapid development of the third generation of semiconductor materials, MOCVD process technology has a profound influence on the development of batteries, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 568–576, 2021. https://doi.org/10.1007/978-3-030-58669-0_51
Study of PSO Optimized BP Neural Network and Smith Predictor
569
lighting, communications and other fields. It is also key technical equipment for 5G communication chips. At present, the investment scale of semiconductor factories is huge, especially in the future trend, the key commercial scale is 7 nm and below (Fig. 1) [1–3], the production cost is expensive, and the precision of thin film process control is required. Table 1 summarizes the latest application trends of 7 nm chips.
Fig. 1. Semiconductor key size trend in 2020. Table 1. Summarizes the latest application trends of 7 nm chips.
570
K.-C. Chang et al.
In the third generation of advanced semiconductor 5G chips GaN is be used, MOCVD is a key technology for preparing high-quality communication semiconductor crystals. However, temperature is one of the most common process parameters in industrial control of chip processes. Temperature control is also an important parameter of the MOCVD process system (Fig. 2), which directly affects the quality of film material growth [4–6]. The temperature of the reaction chamber must be accurately controlled throughout the entire material growth process, so as to ensure that the material quality meets the characteristics.
Fig. 2. The architecture of MOCVD system in 7 nm FAB
With the progress of science and technology, many intelligent algorithm control methods have emerged. However, PID control is still widely used in various fields such as Electromechanical, petroleum, chemical industry, thermal engineering, metallurgy and so on, especially in the bottom industrial production process control. This is because the advanced intelligent algorithm can not be 100% stable in industrial control directly as a controller. Because PID control not only has the advantages of simple algorithm principle, stable and reliable, easy to realize, strong adaptability, good robustness to model parameter perturbation, but also clear physical meaning and easy to understand. Therefore, the intelligent PID controller based on BP neural network algorithm is adopted in combination with the characteristics of BP neural network selfadaptive and self-learning ability. PSO has the characteristics of fast convergence speed. At present, MOCVD process control in engineering application still adopts the traditional control method, and some theoretical research including fuzzy control has made some progress, and its correctness and feasibility have been verified by simulation experiment. However, from the result, the control effect has not reached the very precise control, and there is still room for optimization and further research. PSO optimization algorithm is mainly used to solve the problem that BP neural network itself is uncertain and easy to fall into local optimal solution, so as to improve the
Study of PSO Optimized BP Neural Network and Smith Predictor
571
control performance. Aiming at the shortcomings of the neural network with large uncertainty and easy to fall into the local optimal solution, the PSO algorithm is used to optimize the weights, improve the ability of the neural network to adjust PID parameters adaptively, and add the Smith predictor. Compared with typical compensation control schemes, Smith predictive control is one of them, and the control effect of controlled objects with large time delay is improved. This is an important contribution of this research to the intelligent temperature control of current advanced equipment.
2 PID Controller Design of BP Neural Network Combined on PSO Figure 3 is shown the structure of PSO-BP-PID controller. The main purpose of BP neural network is to optimize the BP neural network algorithm through PSO, and then modify the initial weight to adjust the process parameters. We expect to improve the self-learning ability and convergence speed of the BP neural network for the entire PID controller. In this study, the PID control parameters are finally output as P, I, and D three values, by which the conventional PID controller can improve the closed-loop control performance of the controlled object, which is an important function. In the structure diagram of PSO-BP-PID controller,rin represents the set working temperature value, yout represents the actual output temperature value, and e represents the temperature error [7–9].
Fig. 3. PID controller schematic diagram used PSO to optimize BP neural network.
The steps used PSO algorithm to optimize BP neural network in this study are as follows: Step (1): encode the connection weights of all neurons in the BP network structure, so that the individual becomes a real number code string. The BP neural network for this study takes the set value, error and actual value as the input, PID parameter as the output, hidden layer as 5 layers, forming a 3-5-3 structure, so the particle vector dimension is 30 Step (2): initialize the particle swarm. In this study, the population size has been set 30; c1, c2 are taken as 2; w is taken as 0.8; 500 is the iteration maximum numbers.
572
K.-C. Chang et al.
Set the maximum and minimum velocity of particles as vmin ; vmax , and generating initial velocity randomly in the interval. Step (3): Map the formed particles to the BP neural network to form the neural network weights, and then construct the fitness function like below (1): 1 f ¼ ½rðk þ 1Þ yðk þ 1Þ2 2
ð1Þ
r(k + 1) is the expected value at k + 1 moment. y(k + 1) is the actual value at k + 1 moment. By calculating the particle fitness value, each particle fitness value and the overall optimal value are obtained. Step (4): Update the position of the particles and the speed of each iteration, and then check whether the speed of the updated particles is within the set range. Step (5): Use the test algorithm to confirm that the iteration numbers are reached the set value, or whether the system output has met the objective function. If any of the above assumptions are true, the iteration will be terminated and the optimal solution ð2Þ ð3Þ will be generated, the optimal initial weights wij ð0Þ and wli ð0Þ are optimized by PSO for BP neural network, otherwise, it will return to Step3 iteration to continue. We use the weights optimized by the PSO algorithm as the initial weights of the BP neural network, and then send them to the training network. The flow chart of PSO-BP neural network is show in Fig. 4.
Fig. 4. Flow chart of this study of PSO-BP network.
Study of PSO Optimized BP Neural Network and Smith Predictor
573
The PID control structure diagram of the Smith predictor based on the PSO algorithm BP in Fig. 5 can be obtained by combining the three methods described above.
Fig. 5. PID control structure diagram of the Smith predictor based on the PSO algorithm BP.
3 Discussion of Experimental Results The temperature control system model of MOCVD can be obtained by curve flying method. The gain is 3.2 (°C/0.1 V), the controlled object dead time is 150 s as shown in formula (2), the controlled process inertia time constant is 200 s, and the object model is [10–12]: GðsÞ ¼
3:2 e150s 200s þ 1
ð2Þ
In this study, we used MATLAB Simulink toolbox for simulation of the MOCVD temperature control system. The simulation system is established by the combination of step signal module, incremental PID controller, transfer function of temperature control system and oscilloscope. Figure 6 is to show the PSO BP neural network combination to PID control with Smith predictor simulation structure. PSO BP algorithm is programmed in the S-function module of the graph.
574
K.-C. Chang et al.
Fig. 6. Simulation controller structure of PSO BP neural network combination PID with Smith predictive.
The temperature control of MOCVD is simulated by Simulink toolbox of MATLAB. The incremental PID and Smith predictor parameters proposed in this study are adjusted by its own characteristics. The PID parameters are adjusted continuously by trial and error method. Finally, the simulation control results are obtained. PSO optimizes the PID controller of BP neural network and Smith predictor, then the parameters of PID are adjusted online and the optimal control is output through selfadaptive and self-learning ability.
Fig. 7. MOCVD temperature control response simulation curve.
Study of PSO Optimized BP Neural Network and Smith Predictor
575
Through the trial and error of conventional PID control, the relatively good curve is obtained. The values of PID parameters in the 0–1 range are 1, 0.08 and 0.1 respectively. It can be found from the two simulation results of temperature control in Fig. 7 that after the same limit is imposed on the value of PID parameters. Using the incremental PID control method of Smith predictor, the system has overshoot, the value is between 0.5*1.7, the response curve oscillates, the solid-state stability time is long, and the control effect is poor. From the research results, it is found that the method proposed in this study adopts the PSO and BP neural network intelligent PID algorithm controller with Smith predictor, which has better dynamic performance, and the value from 150 to 500 s is stable from 0 to 1 no vibration, any overshoot and short adjustment time, ideal control. However, we can clearly find that the BP neural network adaptive PID controller has the advantages of stable control output and convenient adjustment. Due to the shortcomings of traditional neural network equipment in the traditional sense of slow convergence speed, the local “pure lag” and “large inertia” achieve the control quality and best tracking of the system, instead of local optimization is constant, so the PSO algorithm The optimized BP neural network controller can significantly improve the results of the MOCVD temperature control. There will be overshooting of the system step response; therefore, it has a good guiding role in improving the control accuracy of the MOCVD industrial process and has great guiding significance. The development of the semiconductor industry is of great significance, especially at present, it is more focused on the process market of 5G chips.
4 Conclusion and Suggestion The key industries of information technology in the semiconductor integrated circuit industry will have an important role after 2020, In the third generation of advanced semiconductor 5G chips GaN is be used, MOCVD is a key technology for preparing high-quality communication semiconductor crystals. This study proposes PID controller of PSO and BP neural network algorithm to improve the control ability of MOCVD temperature. From the research results, it is found that the method proposed in this study adopts the PSO and BP neural network intelligent PID algorithm controller with Smith predictor, which has better dynamic performance, and the value from 150 to 500 s is stable from 0 to 1 no vibration, any overshoot and short adjustment time, ideal control. In the future, one is to find a better optimization algorithm to further improve the initialization time and control performance of neural network weights. The second is to build the simulation environment of the actual control, and apply the mathematical control method to the simulation environment to get the actual control simulation experiment results.
576
K.-C. Chang et al.
References 1. Chang, K.-C., Lin, Y.-C., Chu, K.-C.: Application of edge computing technology in the security industry. Front. Soc. Sci. Technol. 1(10), 130–134 (2019). https://doi.org/10.25236/ FSST.2019.011016 2. Zhou, Y.W., et al.: Study on IoT and big data analysis of furnace process exhaust gas leakage. In: Pan, J.S., Li, J., Tsai, P.W., Jain, L. (eds.) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol. 156. Springer, Singapore (2020) 3. Lu, C.-C., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently safer design strategies (IV). The advanced thin film manufacturing process design and adjustment. J. Loss Prev. Process Ind. 43, 280–291 (2016) 4. Chang, K.-C., Chu, K.-C., Wang, H.-C., Lin, Y.-C., Pan, J.-S.: Agent-based middleware framework using distributed CPS for improving resource utilization in smart city. Future Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006 5. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G base station based on internet of things collaborative control. IEEE Access 8, 32935–32946 (2020) 6. Amesimenu, D.K., et al.: Home appliances control using android and arduino via bluetooth and GSM control. In: Hassanien, A.E., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol 1153. Springer, Cham (2020) 7. Stadel, O., Schmidt, J., Liekefett, M., Wahl, G., Gorbenko, O.Y., Kaul, A.R.: MOCVD techniques for the production of coated conductors. IEEE Trans. Appl. Supercond. 13(2), 2528–2531 (2003) 8. Li, C.H., Xu, S.X., Xie, Y., Zhao, J.: The application of PSO-BP neural network PID controller in variable frequency speed regulation system. Appl. Mech. Mater. 599–601, 1090–1093 (2014).https://doi.org/10.4028/www.scientific.net/amm.599-601.1090 9. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7, 105562–105571 (2019) 10. Mickevičius, J., Dobrovolskas, D., Steponavičius, T., Malinauskas, T., Kolenda, M., Kadys, A., Tamulaitis, G.: Engineering of InN epilayers by repeated deposition of ultrathin layers in pulsed MOCVD growth. Appl. Surf. Sci. 427, 1027–1032 (2014). https://doi.org/10.1016/j. apsusc.2017.09.074 11. Chih-Cheng, L., Chang, K.-C., Chen, Chun-Yu.: Study of high-tech process furnace using inherently safer design strategies (III) advanced thin film process and reduction of power consumption control. J. Loss Prev. Process Ind. 43, 280–291 (2015) 12. Chang, K.C., Pan, J.S., Chu, K.C., Horng, D.J., Jing, H.: Study on information and integrated of MES big data and semiconductor process furnace automation. In: Pan, J.S., Lin, J.W., Sui, B., Tseng, S.P. (eds) Genetic and Evolutionary Computing. ICGEC 2018. Advances in Intelligent Systems and Computing, vol 834. Springer, Singapore (2019)
Study of the Intelligent Algorithm of Hilbert-Huang Transform in Advanced Power System Cheng Zhang1, Jia-Jing Liu1, Kuo-Chi Chang1,2,6(&), Hsiao-Chuan Wang3, Yuh-Chung Lin1,2, Kai-Chun Chu4, and Tsui-Lien Hsu5 1
6
School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China [email protected] 2 Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China 3 Institute of Environmental Engineering, National Taiwan University, Taipei, Taiwan 4 Department of Business Management, Fujian University of Technology, Fuzhou, China 5 Institute of Construction Engineering and Management, National Central University, Taoyuan, Taiwan College of Mechanical and Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan
Abstract. With the rapid increase in population and electricity consumption, power grid has long formed a large-scale interconnection of systems. The power system is a complex multi-dimensional dynamic system, the traditional system parameter processing method has gradually shown its limitations, which affects the stability and reliability analysis of the system. This study introduces a popular time-frequency intelligent application analysis methodology for nonlinear nonstationary signals–Hilbert-huang transform (HHT) algorithm, then summaries the applications of HHT method for low frequency oscillation of advanced power system, power quality detection and harmonic analysis with combining the research achievements of domestic and foreign scholars in recent years. However, this study discusses the interpolation function and endpoint effect of HHT method in practice and further research on its application in advanced power system. Keywords: Advanced power system Hilbert-Huang transform algorithm Low frequency oscillation Power quality detection Harmonic analysis
1 Introduction Interconnection of power systems in China has become a prominent feature in recent years. In nonlinear non-stationary power systems, the Fourier transform is previous methods of data processing, mainly deal with linear and non-stationary signals, so there are some constraints on the Fourier transform. It could merely be used to analyze linear and stationary signals because it requires that the analysis system must be linear and © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 577–585, 2021. https://doi.org/10.1007/978-3-030-58669-0_52
578
C. Zhang et al.
under periodic or stable data conditions, if this is not the case, the spectrum results obtained will not have much practical significance. There have also been a number of signal processing methods in history that are either linearly bound or stationary bound so that nonlinear nonstationary signals cannot be analyzed completely. Norden E. hiang of NASA proposed a new signal processing method for Hilbert-Huang transform in 1998, which is considered one of the most important applied mathematical methods ever devised by NASA [1–3]. This method has been widely used in the power system and has been further studied by many scholars. As a new time-frequency analysis method, HHT algorithm breaks the limitation of Fourier transform, solves the problem of instantaneous frequency identification and has its own adaptability [4]. This method is suitable for time-frequency analysis of nonlinear nonstationary signals; this method can calculate the instantaneous frequency and amplitude of the signal, and then contribute to the theoretical basis of power system parameter analysis. However, for advanced power system, power quality detection, and power system harmonic analysis under the research results of HHT theory and its application in advanced power system. This study discusses the further research on its application in advanced power system.
2 Methodology of Hilbert-Huang Transform HHT is a new data analysis method, mainly composed of Hilbert spectrum analysis and EMD (empirical mode decomposition). The most important feature of this method is the introduction of IMF (intrinsic mode functions) thinking to solve local signal characteristics. Above this paper introduce the main process of HHT method to deal with signals which is nonlinear and nonstationary. First, the signals to be analyzed are resolved into IMF components by EMD algorithm and then analyze each the method to obtain the instantaneous frequency and amplitude of the signal is mainly achieved using the IMF component using the Hilbert method. The key part of EMD decomposition is a step-by-step filtering process. Assume that the signal that needs to be decomposed is s(t). Step (1): First find the maximum and minimum values on all dates, then use spline interpolation to fit the upper envelope V1(t) and the lower envelope V2(t), and then calculate the average as Eq. (1). 1 m ¼ ðv1 ðtÞ þ v2 ðtÞÞ 2
ð1Þ
Step (2): First define h ¼ sðtÞ m, when h can meet the two requirements of the IMF, namely (A) when the number of extreme points and zeros of all data segments are the same or at most differ by 1, or (B) at random time points, its local The expectation of the envelope formed by the maximum value and the envelope formed by the local offset should be zero. After that, h is the first IMF. If h is not regarded as the next and new s(t) and the previous action is repeated, finally the h meets IMF only required to complete [5]. Step (3): the decomposition stops when the residual component is monotonous or small enough to be regarded as measurement error, at this time the signal has been
Study of the Intelligent Algorithm of Hilbert-Huang Transform
579
decomposed into several intrinsic modal signals ci and a residual component r, the calculation method of s(t) is shown in Eq. (2). sðtÞ ¼
n X
ci þ r
ð2Þ
i¼1
In principle, the size of the two standard deviations SD is used as the criterion of the intrinsic mode function, the calculation method is shown in Eq. (3). sd ¼
T X ½hk1 ðtÞ hk ðtÞ2 t¼0
ð3Þ
h2k1 ðtÞ
In general situation, the smaller the values of SD are the better the linearity and stability of the resulting intrinsic modal function. A lot of practice shows that the decomposition effect of EMD is the best when SD is between 0.2 and 0.3. The detailed HHT method step flow charts are shown in Fig. 1.
input original signal s(t)
find all maxima and minima in s(t)
find the upper v1(t) and lower envelopeand v2(t) of s(t) respectively.
to find the averageenvelope, m = (v1 (t) + v1 (t)) / 2
h=s(t)-m
to judge whether h meets the decomposition stop criterion, sd should be between 0.2 and 0.3
N
Y S(t)=r+ci
N
determine whether r satisfies residual condition Y output ci(t)
Fig. 1. Flow chart of calculation steps of HHT method.
580
C. Zhang et al.
3 Application of Hilbert-Huang Transforms Method 3.1
Advanced Power System Low Frequency Oscillation Applications
As the scale of interconnected large power grids continues to expand, continuous improvement of interconnection and the continuous operation of transmission lines with large capacity and long distance, the probability of low frequency oscillation is greatly increased, which seriously affects the reliability and safety of advanced power system. Low frequency oscillations of power systems refers to the phenomenon that after the system is disturbed, the relative sway between the rotors of the synchronous generator operating in parallel causes the power of the system to oscillate at other angles from 0.2 to 0.5 Hz. For a long time, the low frequency oscillation problem has been studied by the linearization of small signal stability analysis, and achieved certain results, but as we all known non-linearity is a typical feature of power systems, as the system scale grows and becomes more complex, the linearization method is exposed to its shortcomings. With the development of nonlinear research, many scholars introduce bifurcation and chaos theory into the study of low frequency oscillation, although such methods can solve some problems that previous linearization methods could not solve, it still has limitations on the size of the system and the order of the equation [6–8]. HHT algorithm applied for analyze the dynamic oscillation mode of low frequency oscillation of advanced power system and extract about system fault transient information, the Prony’s algorithm and the wavelet transform algorithm are used to analyze the examples in the 2-zone 4-machine system and the epri-36 node system respectively to prove EMD method has high resolution and can effectively process short data, at the same time, it can also overcome the difficulty of selecting the wavelet basis in the wavelet variation. In reference, before empirical mode decomposition, the approximate range of each mode frequency have been known by using Fourier transform and then it is processed by filtering according to the density of the obtained modal frequency [9, 10]. This method can identify the characteristic parameters of low frequency oscillation accurately. Many scholars combine the HHT algorithm with the traditional low frequency oscillation analysis method and Prony’s method. For example, in reference, EMD is used to decompose, and then signals are distinguished from noises by the fact that there is a correlation between signals and only a weak correlation among noise components. Based on this, de-noise and reconstruct the signal and finally the reconstructed signal is analyzed by Prony’s method. This way can improve the shortcoming of Prony’s sensitivity to noise. In reference [11, 12], after the signal is decomposed by EMD, using the method of obtaining signal energy from each IMF to distinguish noise and signal pattern and the IMF component with large energy weight is the dominant
Study of the Intelligent Algorithm of Hilbert-Huang Transform
581
mode of oscillation. In short, the HHT method application for low frequency oscillation of advanced power system has become a trend. A large number of experiments have proved that this method can achieve more effective and accurate results in the field of dealing with nonlinear signal in the power system, and extract the characteristics of the oscillation mode. 3.2
Power Quality Detection
Power supply quality often encounters transient effects of switching operations, power system failures, and harmonic distortion. For the transient phenomenon of power system, many scholars at home and abroad use wavelet technology to solve it [13, 14]. HHT analysis method is used to effectively analyze abnormal power quality disturbance signals. It breaks down the signal into several IMF by EMD, and these intrinsic modal functions contain the local characteristics of the signal. Then calculate the first amplitude and frequency of the IMF, and the time and magnitude of the disturbance mutation can be obtained. Past research developed an electric energy detection system directly based on HHT method, which combined software and hardware to provide a convenient detection system platform for workers. However, when detect the voltage discontinuity signal at the inflection point with HHT method, only the time when the disturbance occurs can be detected, but the detection fails at the termination time. The solution to this problem is to superimpose the harmonic signal on the original voltage discontinuous signal. Adding harmonic signals, such as the third harmonic, intended to increase the local extremum point, so as to detect the amplitude of the disturbance termination. In the application of HHT method, the problem of mode aliasing will appear. The interference signal occupies the position of the original physical process curve, making each IMF unable to reflect the real physical process, which is called the mode aliasing problem [15, 16]. A known high-frequency signal is superimposed on the original signal to extract the transient oscillation signal. After decomposing the added single with EMD, the first IMF component is subtracted from the known high frequency signal to get IMF’ and then use the Hiblert to analysis it. In addition, EMD method cannot accurately screen out all components of the harmonic signal when it decomposes the harmonic signal with too much fundamental wave energy. Before using EMD, Fourier transform was performed on the signal, this step can determine the approximate frequency range value of each mode, and then use the low-pass filtering method or other filters to specify the filtering frequency part, so as to make each component leave for HHT analysis [17, 18]. For power quality detection, the HHT method plays an extremely important role. It can check in real time whether the power quality in the power system is good or bad, and laid an important foundation for the next step of quality improvement, so as to ensure the safe operation of the power supply system. At present, this method is still being studied in this field. 3.3
Harmonic Analysis of Advanced Power System
Nowadays, the power supply system is often severely polluted by harmonics. The analysis and suppression of power system harmonic is an important and very
582
C. Zhang et al.
meaningful work. Harmonic detection is the foundation and main basis of every work, only accurate real-time detection of harmonics can prepare for subsequent suppression. Since the early 1980s, China has paid much attention to and carried out research on harmonic control, and developed several indexes of power quality to limit the allowable harmonic value of public power grid. Active power filter (APF) is a power electronic device which can dynamically suppress harmonics and improve power quality. But as the power quality continues to improve, scholars at home and abroad continue to explore new methods for advanced power system harmonic detection. HHT is introduced into the application of harmonic detection in power system. It can extract harmonic signals of any frequency. However, the envelope and endpoint problems of HHT method in practical application will affect the accuracy of harmonic detection, the cubic spline interpolation used in the method has the phenomenon of overshoot and undershoot in practice, and because the end points of the boundary are not all extreme points, it will lead to the phenomenon of flying wings. In view of this that a lot of scholars have also made research results. The problem of envelope is improved by using the Hermite interpolation method instead of the original cubic spline interpolation method, and the point symmetric extension method is used to improve the problem of end-point flying wing. Further dealt with the problem of endpoint effect, and proposed the method of combining artificial neural network and point symmetric extension to improve the endpoint problem. Furthermore, the signal is preprocessed by filtering to improve the aliasing problem, which further improves the accuracy of harmonic analysis. For HHT, it can’t effectively distinguish the harmonic of similar frequency, Iterative HHT method is often used to identify the static signal inside the signal to improve the detection accuracy [19, 20]. However, the HHT method has a huge impact on the harmonic detection of the power supply system and is also the basis for harmonic suppression. The problems that affect the accuracy of harmonic detection in its application process are not only the improvement of the above summary, but also the continuous research and improvement by scholars.
4 Research Discussions and Prospects for Future Study In recent years, HHT method has become a hot topic in the research of scholars at home and abroad, and its application in power system has also attracted much attention and achieved good results. It has been proved that this method has a broad application prospect in power system; however, this method still has some problems to be solved like below. 4.1
Interpolation Function Problem
In the process of EMD decomposition, envelope fitting curve is a very important step, so the selection of interpolation function is extremely important. The original method uses cubic spline interpolation, but there are over and under impulse phenomena, which affect the accuracy of the algorithm. At present, many scholars have proposed other interpolation functions, various improvement methods are shown in Table 1 below.
Study of the Intelligent Algorithm of Hilbert-Huang Transform
583
Table 1. Several interpolation function methods. Scholar name R. He C. Zheng Guang Xiaolei Jin Tao Hu Jinsong
Interpolation function method Mobility Model Ultra-low frequency oscillation Piecewise cubic Hermite Subsection power function High-order spline interpolation
Published date 2018 2018 2011 2005 2003
Although these methods improve the accuracy and time of the algorithm, the problem should be further studied. 4.2
Endpoint Effect Problem
When we use the envelope fitting curve to obtain the maximum and minimum values, the flying wing phenomenon at the end of the curve will affect the accuracy of the HHT analysis, mainly because the boundary endpoints are not all endpoints. To solve this problem, it is necessary to extend the boundary points with the extension method. For this a lot of scholars have put forward many extension methods and proved their effectiveness, Table 2 lists several extension methods to improve the end effect problem.
Table 2. Several extension methods. Scholar name
Extension methods
Vergura, S Ucar, F Qi Quanquan Wang Ting
HHT and Wavelet Analysis Fast Extreme Learning Machine Method of removing endpoints The most minimum similarity distance compared with the ends Artificial neural networks+Mirror extension
Su Yuxiang
Published date 2018 2018 2011 2009 2008
These methods have been proven to effectively improve end effects and are widely used. Because of the importance of this problem, it still needs further study. 4.3
Other Applications of Power Systems
The design application of HHT methodology for advanced power system that low frequency oscillation controller can be further studied and in the field of improvement of the precision of low-frequency oscillation modal parameter extraction under strong noise background. In addition, we can further study the application of HHT in other aspects of power system, such as fault diagnosis, voltage safety analysis and so on.
584
C. Zhang et al.
5 Conclusion and Suggestion With the expansion and complexity of the power system, the HHT method is very suitable for the development and stable operation of advanced power system. This study introduces the principle of HHT method and summarizes its application in advanced power system in three directions includes low frequency oscillation of power system, power quality detection and power system harmonic detection analysis combined with the domestic and foreign scholars on the application of this method research results. The intelligent method can extract the dynamic mode and oscillation information of the low-frequency oscillation of the power system and make preparations for the next step of oscillation suppression. Moreover, it can effectively analyze nonstationary power quality disturbance signals and monitor harmonic signals in real time to improve power quality. In general, the application of the intelligent algorithm to the traditional power system will greatly improve the reliability of the power grid and reduce the losses caused by the network in security. Finally, the further research direction of this method is discussed. Since the research of HHT method started late, several shortcomings of this intelligent method in practical application stated in the previous section need further exploration and practice. Acknowledgment. Project supported by Natural Science Foundation of Fujian Province, China (Grant No. 2015J01630) and Fujian University of Technology Research Fund Project (GYZ18060).
References 1. He, R., Ai, B., Stüber, G.L., Zhong, Z.: Mobility model-based non-stationary mobile-tomobile channel modeling. IEEE Trans. Wirel. Commun. 17(7), 4388–4400 (2018) 2. Zhang, J., Tan, X., Zheng, P.: Non-destructive detection of wire rope discontinuities from residual magnetic field images using the Hilbert-Huang transform and compressed sensing. Sensors 17, 608 (2017) 3. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G base station based on internet of things collaborative control. IEEE Access 8, 32935–32946 (2020) 4. Huang, N.E., Shen, Z., Long, S.R., et al.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. 454(1971), 903–995 (1998) 5. Chen, D., Lin, J., Li, Y.: Modified complementary ensemble empirical mode decomposition and intrinsic mode functions evaluation index for high-speed train gearbox fault diagnosis. J. Sound Vib. 424, 192–207 (2018). https://doi.org/10.1016/j.jsv.2018.03.018 6. Zheng, C., et al.: Analysis and control to the ultra-low frequency oscillation in southwest power grid of China: a case study. In: 2018 Chinese Control and Decision Conference (CCDC), Shenyang, pp. 5721–5724 (2018) 7. Jiang, K., Zhang, C., Ge, X.: Low-frequency oscillation analysis of the train-grid system based on an improved forbidden-region criterion. IEEE Trans. Ind. Appl. 54(5), 5064–5073 (2018)
Study of the Intelligent Algorithm of Hilbert-Huang Transform
585
8. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7, 105562–105571 (2019) 9. Wang, Y., Dong, R.: Improved low frequency oscillation analysis based on multi-signal power system. Control Eng. China 26(07), 1335–1340 (2019) 10. Lu, C.-C., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently safer design strategies (IV). The advanced thin film manufacturing process design and adjustment. J. Loss Prev. Process Ind. 43, 280–291 (2016) 11. Lijie, Z.: Application of Prony algorithm based on EMD for identifying model parameters of low-frequency oscillations. Power Syst. Protect. Control 37(23), 9–14+19 (2009) 12. Ucar, F., Alcin, O.F., Dandil, B., Ata, F.: Power quality event detection using a fast extreme learning machine. Energies 11, 145 (2018) 13. Lu, C.-C., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently safer design strategies (III) advanced thin film process and reduction of power consumption control. J. Loss Prev. Process Ind. 43, 280–291 (2015) 14. Sahani, M., Dash, P.K.: Automatic power quality events recognition based on Hilbert Huang transform and weighted bidirectional extreme learning machine. IEEE Trans. Ind. Inf. 14(9), 3849–3858 (2018) 15. Vergura, S., Carpentieri, M.: Phase coherence index, HHT and wavelet analysis to extract features from active and passive distribution networks. Appl. Sci. 8, 71 (2018) 16. Zhao, J., Ma, N., Hou, H., Zhang, J., Ma, Y., Shi, W.: A fault section location method for small current grounding system based on HHT. In: 2018 China International Conference on Electricity Distribution (CICED), Tianjin, pp. 1769–1773 (2018) 17. Li, K., Tian, J., Li, C., Liu, M., Yang, C., Zhang, G.: The detection of low frequency oscillation based on the Hilbert-Huang transform method. In: 2018 China International Conference on Electricity Distribution (CICED), Tianjin, pp. 1376–1379 (2018) 18. Shi, Z.M., Liu, L., Peng, M., Liu, C.C., Tao, F.J., Liu, C.S.: Non-destructive testing of fulllength bonded rock bolts based on HHT signal analysis. J. Appl. Geophys. 151, 47–65 (2018). https://doi.org/10.1016/j.jappgeo.2018.02.001 19. Kabalci, Y., Kockanat, S., Kabalci, E.: A modified ABC algorithm approach for power system harmonic estimation problems. Electr. Power Syst. Res. 154, 160–173 (2018). https://doi.org/10.1016/j.epsr.2017.08.019 20. Bečirović, V., Pavić, I., Filipović-Grčić, B.: Sensitivity analysis of method for harmonic state estimation in the power system. Electr. Power Syst. Res. 154, 515–527 (2018). https:// doi.org/10.1016/j.epsr.2017.07.029
Study of Reduction of Inrush Current on a DC Series Motor with a Low-Cost Soft Start System for Advanced Process Tools Governor David Kwabena Amesimenu1,2, Kuo-Chi Chang1,2,6(&), Tien-Wen Sung1,2, Hsiao-Chuan Wang3, Gilbert Shyirambere1,2, KaiChun Chu4, and Tsui-Lien Hsu5 1
School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China [email protected] 2 Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China 3 Institute of Environmental Engineering, National Taiwan University, Taipei, Taiwan 4 Department of Business Management, Fujian University of Technology, Fuzhou, China 5 Institute of Construction Engineering and Management, National Central University, Taoyuan, Taiwan 6 College of Mechanical and Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan
Abstract. The existence of high armature current is the main limitation while starting a DC series motor. This initial current can have dangerous effects on the DC motor such as damage of windings and reduction in the lifetime of the machine. There are two basic methods of starting of DC motor namely the resistor starting method and the soft start using solid-state devices. In this project, to lessen the inrush current, the solid-state method was adopted to start the motor smoothly. The system uses Arduino Atmega328p-PU Microcontroller to send pulses to the motor driver which regulates the running of the motor. By focusing on DC series motor, power electronic converters and microcontrollers, the system was modeled and simulated in Proteus ISIS Software. The results recorded from the soft starter system were compared with a direct online starting system considering the starting current and voltage of the motor with an open loop control system. The soft starter system was able to decrease the direct online current value from 0.996 A to 0.449 A which represents approximately 28.1% reduction of the starting current to protect the motor. Keywords: DC series motor Arduino Atmega328p-PU Low-cost soft start system Advanced process tools
Inrush current
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 586–597, 2021. https://doi.org/10.1007/978-3-030-58669-0_53
Study of Reduction of Inrush Current on a DC Series Motor
587
1 Introduction Almost every mechanical motion that is seen around is achieved by an electric motor. Changing from electrical energy to mechanical energy is done by devices called electric motors. Per the power supply demand, generally there are two categories of electric motors which are the DC and AC motors. This project serves to provide a concise account of the soft starting of DC motors. DC motors consist of two main parts namely the stator which is the stationary part having the field windings and the rotor being the rotating part containing the armature windings. However, more than one permanent magnet pieces makes up the stator of a permanent magnet DC motor. Based on the construction of DC motors and the mode of excitation, there are five major types of DC motors. They are permanent magnet DC motor, compound DC motor, shunt DC motor, series DC motor, and separately excited DC motor [1]. The permanent magnet motor uses a magnet to supply the field flux and have better regulation of speed and better ability of starting. The driving of certain limited amount of load which cannot be exceeded is the drawback of a permanent magnet DC motor and is usually used in low powered areas. The link between the field windings and the armature windings of the series motors is done in series. The series motor develops a high starting torque at start and the speed varies widely between full load and no load. Considering the connection between the field windings and the armature windings of the shunt DC motor, it is done in parallel and operates under better regulation of speed. The winding of the field is connected to the same source as the winding of the armature and can also be separately excited. An advantage of the shunt field being separately excited is the ability of a variable speed drive to establish an independent control of the field and armature [1]. In compound motors, the field is connected in series with the armature and a separately excited shunt field. The series field can cause hitches in a variable speed drive application [1]. In a separately excited motor, the voltage source for the armature circuit is separated from the field excitation circuit, that is, both the armature circuit and field circuit have different voltage supply sources. Although DC motors are selfstarting, one major drawback encountered is that, DC motor draws high amount of current during the start of the motor which in effect can damage the armature windings. The reason of the high initial armature current is due to absence of back emf at starting [2]. In view of this, there is the need to limit the high starting armature current using a starter. 1.1
Review of Related Work
Soft starting the DC motor will prevent high inrush current and also prevent the armature windings of the motor from burning. Researchers have implemented various starting methods to curb the high inrush current to the armature circuit when the motor is started. Resistor starting method and the solid-state starting method [3] are the two major methods implemented in the start of DC motors. The methods used, the outcomes, merits and demerits of the work of some researchers are going to be discussed. Resistors are used to reduce the high inrush current to the armature windings by connection them in series with the armature windings of the motor. There are different
588
G. D. K. Amesimenu et al.
types of resistor starting method used depending on the type of DC motor [4]. These include the 2-point starter, 3-point starter, and the 4-point starter [4]. A 2-point starter is connected in series with the armature circuit to protect the motor from high inrush current during starting [4]. The two-point starter is used for only series motors and has a no-load release coil [2]. The no-load release coil helps to stop the motor when it is started without a load. Since the two-point starter has a limitation of being used in series motors only, a three-point starter is used in shunt DC motors or compound wound DC motors [4]. The three-point starter is preferred for armature speed control of DC motors. It cannot be used where wide range of speed control, by shunt field control is required [3]. The four-pointer starter is preferable for field speed control of DC motors. A drawback of this starter is that, it fails when there is an open-circuit in the field windings. A common application of rectifier circuits is to convert an AC voltage input to a DC voltage output [5]. There are two main types of rectifiers, half wave rectifier and full wave rectifier. A half wave rectifier only conducts during one halfcycle of the input AC voltage. This introduces harmonics in the output current; such harmonics are undesired by dc load and leads to increased power losses, poor power factor and interference [6]. In [3], three conventional means of controlling and monitoring the armature current level when starting a dc motor are presented, namely: the use of a gradually decreasing tapped resistance between the supply voltage and the motor armature circuit, the use of a chopper circuit between the supply voltage and the motor armature circuit and the use of a variable DC voltage source. Before the bit by bit buildup of the motor speed, the circuit of the armature was in series connection with a resistance and was slowly removed as the motor generates its own emf to drive it. Steady-state analysis was used to calculate the period of movement from one tap to the other [7, 8]. From the results obtained, it was observed that the highest amount of current in the armature did not surpass twice its value of rating. Secondly, a step-up converter was used in controlling the armature current. The hysteresis controller was used to bias the controlled switch of the chopper circuit [9]. Finally, a variable DC voltage source was used in an indirect way to control the armature current. The level of the voltage source was minimal at the start-up and the controlled full-wave rectifier was used to increase the back emf bit by bit. The value of the peak of the current in the armature windings has its ripples to be minimal and did not surpass the value of its rating. In conclusion, the peak value of the current in the armature seemed to have been controlled by the variable voltage DC source method and the ripples of the current in the armature windings was minimal [8]. The significant of adding of starter circuit in different kinds of DC motors is explained in [10]. A technique of soft starting is introduced as a measure of controlling how the motor is to be started. The paper gives a study on the different types of starters used in the industries and also elaborates on why motors burn when starters are not used. Armature of a motor has no back emf generated in a motor is not in motion, however large amount of current would be drawn due to its relatively small resistance when the stationary armature is supplied with full load across it [10]. The large amount of current drawn is capable of causing damage to the windings, brushes and commutators, and to mitigate this occurrence, a series connection between the armature winding and a resistance is employed to alleviate the damage of the armature during the
Study of Reduction of Inrush Current on a DC Series Motor
589
time for start only [11]. When the motor reaches an appreciable speed and generate an emf that can regulate the motor speed, the resistance is slowly cut off. In conclusion, DC motors need the external starter for its starting then after it has gained a good speed, they can be cut off. The provision from zero voltage to a voltage rating is basically done by a starter and also reduces the inrush current and controls current at start to a safer amount until the speed and torque rating of the motor is achieved. Two methods of starting a PV fed DC motor, namely resistor start and a hysteresis control of armature current are presented in [2]. The hysteresis control of armature current method aims at including a chopper circuit with a hysteresis controller to the armature circuit to restrict the high armature current. Controlling and monitoring the current of the armature windings in two quantities of pre-set verge by switching a MOSFET between the set values [2]. A comparative study between the resistor starter and the chopper circuit with hysteresis controller starter was conducted. According to this paper’s simulation results, the latter method of chopper circuit with hysteresis controller appear to reduce the initial armature current and avoid wastage of energy present in the conventional resistor start method. The resistor starting method presents a shorter settling time. The starting of DC motors using SCR module is presented in [12]. This SCR module consists of a bridge rectifier which is designed with two thyristors [12], two diodes and a firing circuit [13] that takes a single-phase supply. The rectifier gives a variable DC voltage output that is fed to the terminal of the armature. Laboratory test result of a 220 V, 5 hp DC shunt motor indicates a starting current of approximately 12 A at no load condition. By using the SCR module, the starting current is reduced to 2 A. 1.2
Study Aims
Designing a soft starting circuit to limit the high amount of current which enters the armature circuit of DC motor during the starting of the motor in order to protect the machine from any damages is the purpose of this project. The objectives of this project are to: (1) Review existing literature on starting methods of DC motors. (2) Model DC motor soft starter for both DC source and AC source and in proteus ISIS software. (3) Study the performance of the dc motor soft starter and give recommendations. (4) Compare the direct-online starting method and the soft start.
2 System Design and Production A microcontroller, the driver circuit, the relay and a rectifier form the soft starter circuit. AC and DC power sources are the two power sources used of which the AC power converted to DC power is done by the rectifier circuit. The AC power source is rectified and is prioritized as the main source of supply to the DC motor. The DC power source serves as a back-up to the AC power source. The relay in the circuit switches between
590
G. D. K. Amesimenu et al.
the rectified AC source and the DC source. The Arduino Atmega328p-PU Microcontroller is the controller used in this proposed system to generate and sends pulses that control the switching action of the MOSFET in the driver circuit. The pulses generated by the microcontroller determines the duty cycle of the chopper circuit in the driver circuit which determines the output of the chopper circuit. The driver circuit receives signals from the microcontroller and feeds DC power into the DC motor in varying amounts thereby indirectly controlling the motor speed and torque. A real time clock (RTC) records the date and starting time of the motor whiles the display shows the recorded time. The soft starter employed in the system helps to control the supply voltage to the DC motor to protect the motor windings from burning and damaging. The system will be executed through modeling and simulation using Proteus ISIS software. The block diagram of the soft starter system is shown in Fig. 1 [14–18].
Fig. 1. Block diagram of the soft starter system.
2.1
AC Power Supply
The circuit of the AC power supply represented in Fig. 2. The AC power supplies consist of a transformer, an uncontrolled single-phase bridge rectifier, voltage controller and filtering capacitor. The transformer is used to step down the supply AC voltage from 240 V to 12 V RMS. AC-DC single-phase full-wave uncontrolled bridge voltage rectifier is used to convert the alternating voltage to a direct voltage to be applied to the dc motor. The AC power supply consist of four diodes arranged in a bridge arrangement passing the positive half of the wave but inverting the negative half of the sine-wave to create a pulsating DC output. Two filtering capacitors C1 and C2 of values 100 lF and 470 lF respectively are used to provide a smooth pulsating output DC. The voltage regulator is employed to provide steady 12 V supply to the motor driver circuit.
Study of Reduction of Inrush Current on a DC Series Motor
591
Fig. 2. Schematic diagram of the AC power supply circuit.
2.2
DC Power Circuit
Figure 3 shows the DC power source circuit which mainly consists of DC power source, switch and a diode. The DC power source provides a steady 12 V power to the motor. The unidirectional flow of current is done by the diode to prevent reverse flow of power to the source.
Fig. 3. Direct current power source.
2.3
Liquid Crystal Display (LCD)
The liquid crystal display shows the real time at which the motor starts, recorded by the real time clock (RTC), the PWM level of the controller and the PWM level expressed in percentage. A 16 2 LCD displays 16 characters per line. A variable resistor is connected to the LCD to adjust the contrast of the screen. The diagram of the liquid crystal display is shown in Fig. 4.
592
G. D. K. Amesimenu et al.
Fig. 4. Liquid Crystal Display (LCD).
2.4
Microcontroller
Figure 5 shows the Arduino Atmega328p-PU Microcontroller unit in the circuit which is a self-contained system with a processor and other peripherals. It is having 8-bit processor core which has codes written on to be executed and operates between 1.8– 5.5 v. It is the intelligent unit of the system and it is programmed to take inputs from the device it is controlling and retain control by sending the device signals to different parts of the device. The controller has six PWM output channels which drive the power electronic switches of the starter.
Fig. 5. Arduino Atmega328p-PU Microcontroller.
2.5
Real Time Clock (RTC)
The DS1302 real time clock was used in the system to communicates with the microcontroller through the 12C communication and its operating power is between 3.3–5 v. This means that it communicates with the microcontroller using just 2 pins. It keeps track of the current time hence you do not have to set the date and time every time you start the motor. The RTC has a crystal oscillator which uses less power and it is very cheap. Figure 6 shows a diagram of the real time clock.
Study of Reduction of Inrush Current on a DC Series Motor
593
Fig. 6. Real Time Clock (RTC).
2.6
Motor Driver
The circuit in the motor driver links the microcontroller to the DC series motor. The motor driver circuit in this project has two main functions. It drives the motor in either forward or backwards direction and the voltage given to the DC motor is controlled by a converter. The circuit is made of an H-bridge converter with four MOSFET whose gates are triggered by the microcontroller given signal. The motor driver receives signals from the microcontroller and transmits the relative signal to the motor to control the working speed and direction of the motor [9]. The relationship of the input and output of the rectifier is given by Eq. (1) and (2): VDC ¼ 0:9 VRMS
ð1Þ
OR VDC ¼
2Vmax p
ð2Þ
Where output voltage is VDC, input voltage (root mean square) is VRMS, maximum peak to peak value of a half cycle is Vmax. The switching of the MOSFET gives a variable output voltage of the converter to the DC series motor. Figure 7 shows the motor driver used in this system.
594
G. D. K. Amesimenu et al.
Fig. 7. Motor driver.
2.7
Relay
Electromechanically allowing the flow of current in a circuit is done by a switching device called Relay and this operation is done by opening and closing of contacts of the relay. The NO on a relay represents normally open and this shows that the relay is has not been energized and the contacts are opened not to allow the flow of current and vice-versa when on NC which means normally close. The functions will change when current is applied to it. Therefore, in this project a relay is used to select between the types of power supply source to the motor. The AC source is set as primary hence it is connected to the normally open (NO). Thus, when the rectified AC current passes through the coil of the relay, it becomes energized to change the state of the contactor. Figure 8 shows a diagram of the relay.
Fig. 8. Relay.
3 Experimental Results and Discussion In this section of the project, discussions will be made on starting and running behavior of soft starter system of the DC series motor compared to direct online system by modeling the system using Proteus ISIS Software. Figure 9 shows wiring diagram of the whole soft starter operation in this study.
Study of Reduction of Inrush Current on a DC Series Motor
595
Fig. 9. Wiring diagram of the soft starter system.
Considering direct online starting during simulation, the startup current increases significantly with a current value of about of 0.996 A at maximum speed of 498 rpm and operates at full voltage and after it attained it constant speed, a steady-state current of 0.4978 A was recorded. This result shows how high inrush current the motor receives before the current drops with time to its steady-state. However, with the soft starter during simulation, the startup current the DC motor receives compared to the direct online starter is smaller and it gradually grows with respect to time. The startup current received by the DC motor was 0.78 A at a zero voltage which increases gradually with respect to the duty cycle of the converter which shows about 21.3% decrease in starting current during simulation. Also, during the use of direct online starting system in simulation with Proteus ISIS Software, the DC motor starts with a very high inrush current and later reduces when the speed of the motor increases to its rated value with respect to time. Soft starter
Fig. 10. Current-speed graph of direct-online and soft starter.
596
G. D. K. Amesimenu et al.
system on the other hand starts at a lower current to the DC motor and gradually increases with as the motor speed increases till the speed gets to its rated value. The current-speed characteristic of the direct-online starter and the soft starter systems is shown in Fig. 10.
4 Conclusion and Recommendation The soft starter system was designed to reduce the starting voltage of the DC motor which intend will also affect the current at the start of the motor. The system comprises of an Arduino Atmega328p-PU Microcontroller which sends the PWM signals to the driver circuit. The driver circuit drives the motor according to the signals received from the microcontroller. The soft starter circuit was designed and simulated using Proteus ISIS Software. The results attained from the soft starter system were compared with the direct-online starter system results. The direct-online starter system recorded a starting current of 0.942 A against 0.716 A for the soft starter system and a current of 0.449 A was recorded at maximum speed. Per the results attained, it can be concluded that the employment of the soft starter system gives protection from zero voltage to a voltage rating which is basically done by a starter and also reduces the inrush current and controls current at start to a safer amount until the speed and torque rating of the motor is achieved as compared to the direct-online starting method. It was found that the smooth stopping of the motor was a challenge encountered which can later be looked at in the future. An improved method of stopping the motor can be employed in the future to make the project more efficient.
References 1. Shrivastava, K., et al.: A review on types of DC motors and the necessity of starter for its speed regulation. Int. J. Adv. Res. (2016) 2. Sudhakar, T., et al.: Soft start methods of DC motor fed by a PV source. Int. J. Appl. Eng. Res. 10(66) (2015). ISSN 0973-4562 3. Taleb, M.: A paper on Matlab/Simulink models for typical soft starting means for a DC motor. Int. J. Electr. Comput. Sci. IJECS-IJENS 11(2) (2011) 4. Koya, S.A.: DC Motor Control: Starting of a DC Motor 5. Theraja, B., Theraja, A.: A Textbook of Electrical Technology Volume I Basic Electrical Engineering in S.I. System of Units. S. Chand & Company, New Delhi (2005) 6. Rashid, M.H.: Power Electronics Handbook. Academic Press, San Diego (2001) 7. Sen, P.C.: Principles of Electric Machines and Power Electronics. Wiley, New York (1989) 8. Matlab Software Version 6.5, The Math Works Inc. (2002) 9. Chen, H.: Value Modeling of Hysteresis Current Control in Power Electronics (2015) 10. Kumar, R.: 3- Coil starter use for starting D.C. motor. Int. J. Sci. Res. Eng. Technol. (IJSRET), March (2015) 11. Theraja, B., Theraja, A.: Electrical Technology, AC & DC Machines Volume II. S. Chand & Company, New Delhi (2005) 12. Holonyak, N.: The silicon P-N-P-N switch and controlled rectifier (thyristor). IEEE Trans. Power Electron. 16(1), 8–16 (2001)
Study of Reduction of Inrush Current on a DC Series Motor 13. 14. 15. 16.
597
In: IEEE 7th Power Engineering and Optimization Conference (PEOCO), 22 July (2013) Atmel 8-bit AVR Microcontrollers ATmega 328IP datasheet summary USB Radio Clock Meinburg. Accessed 20 Oct 2017 Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Agent-based middleware framework using distributed CPS for improving resource utilization in smart city. Future Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006 17. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G base station based on internet of things collaborative control. IEEE Access 8, 32935–32946 (2020) 18. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7, 105562–105571 (2019)
Co-design in Bird Scaring Drone Systems: Potentials and Challenges in Agriculture Moammar Dayoub1(B) , Rhoda J. Birech2 , Mohammad-Hashem Haghbayan1 , Simon Angombe2 , and Erkki Sutinen1 1
University of Turku (UTU), 20500 Turku, Finland [email protected] 2 University of Namibia, Windhoek, Namibia
Abstract. In Namibia, agricultural production depends on small-scale farms. One of the main threats to these farms is the birds’ attack. There are various traditional methods that have been used to control these pest birds such as making use of chemicals, fires, traps, hiring people to scare the birds, as well as using different aspects of agricultural modifications. The main problem of using such methods is that they are expensive, many are harmful to the environment or demand extra-human resources. In this paper, we investigate the potential and challenges of using a swarm of drones as an Intelligent surveillance and reconnaissance (ISR) system in a bird scaring system targeting a specific type of bird called Quelea quelea, i.e., weaver bird, and on Pearl millet crop. The idea is to have a co-design methodology of the swarm control system, involving technology developers and end-users. To scare away the birds from the field, a disruption signal predator-like sound will be produced by the drone. This sound is extremely threatening and terrifying for most bird species. The empirical results show that using the aforementioned technology has a great potential to increases food security and sustainability in Africa.
Keywords: Swarm of drones Quelea quelea · Pearl millet
1
· Bird control · Precision agriculture ·
Introduction
The Red-billed Quelea Quelea bird is the most important pest in agriculture affecting cereal crops in Africa. The damage caused to small grain crops in the semi-arid zones of Africa was estimated at 79.4 million US$ per annum in the year 2011 [1]. At the same time, birds are an important component of agroecosystems. The birds feed on rodents, insects, and other arthropods, hence balancing the ecosystem [2]. They depend on agriculture for food in the form of grains, seeds, and fruits. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 598–607, 2021. https://doi.org/10.1007/978-3-030-58669-0_54
Co-design in Bird Scaring Drone Systems:
599
Various techniques have been used by people to control the Quelea, which consists of scaring away, trapping and catching birds for food, harvesting eggs, disturbing or destroying the nests, burning the birds while roosting and poisoning with organophosphate avicide. All these methods create disruptions to the birds, which finally die or vacate the area and migrate to another place, resulting to an imbalanced ecosystem and a biodiversity threat [3]. Killing the birds is not a proper solution to the bird problem as attempted mass killing of Sturnus vulgaris in Europe and Quelea birds in Africa have been disapproved by the international conventions [2]. Most small-holder farmers in Africa have no access to costly and sophisticated equipment and materials to control birds such as aircraft, boom-sprayers, chemical avicides, firebombs, and dynamite, and have instead relied on traditional methods, which are largely effective and environmentally friendly, but time-consuming, tedious and limited in the scale of application [4,5] Unmanned aerial system (UAS) is a technological innovation with great potential to improve the management efficiency of agriculture and natural resource systems such as crops, livestock, fisheries, and forests [6]. The most important factors limiting pearl millet production in Namibia are unfavorable climate and erratic weather patterns, the widespread use of traditional farming practices, limited farm sizes, and Red-billed quelea birds (Quelea. quelea lathamii) [7]. The quelea is a small weaver bird native and endemic to sub-Saharan Africa whose main food is the seed of annual grasses. It attacks small-grained cereal food crops in the absence of such grasses [8]. The cost of protecting crops from the Quelea is very high for a country like Namibia. We are looking for safe, cheap, and effective ways to protect crop yield for farmers, ensure food security, and increase the level of livelihood in those areas.
2
The Main Challenges of Agriculture in Namibia
Namibia is the driest country in Sub-Saharan Africa. It is classified to be an arid and semi-arid county with 55% of the land being arid, receiving less than 300 mm of rain per year, and 40% being semi-arid, receives between 300–500 mm per year. The main crops grown in Namibia include pearl millet, maize, sorghum, wheat, grapes, and date palm. Pearl millet, locally known as Mahangu, is the most widely grown crop (by the land area) by smaller holder communal farmers under rained condition [9]. In communal farming areas, in particular, the drier North Central Namibia, pearl millet is grown almost exclusively (Ohangwena, Oshikoto, Oshana, Omusati) (300 mm–450 mm of rainfall) [10]. Pearl millet is the most important food security cereal crop in Northern Namibia as it occupies 80% of cropped land; followed by maize, then sorghum takes the third position. Sorghum occupies almost the same land area as maize and is planted almost exclusively by communal farmers. Pearl millet accounts for approximately 40% of cereal grain intake and 24% of total calorie intake by Namibian consumers [11]. It is therefore the principal source of food security to
600
M. Dayoub et al.
the majority of the country’s rural population and forms a crucial part of the national diet [12]. The local cultivars of pearl millet are predominantly planted due to their hard kernel which stores well, strong stalks that do not lodge easily and have good food value. Pearl millet mature in 120–150 days, giving yields that are the lowest among the cultivated cereals (250–300 Kg/ha) [13]. Pearl millet grain is nutritionally better than other cereals such as maize, rice, and sorghum due to its high protein content [14]. They are also a valuable animal feed, with higher protein content and a better-balanced amino acid profile than most cereals, such that less protein concentrate is required in a pearl millet-based animal feed ration [15].
3
Economic Losses Caused by Queleas
The Quelea is considered the greatest biological limitation of cereal production in Africa. It is mentioned as the most numerous and destructive bird on earth [16]. The estimated adult population of Quelea in Africa is at least 1.5 billion, causing agricultural losses in excess of 50 million US$ annually according to FAO. A colony of Quelea in Namibia was estimated to number 4.8 million adults and 4.8 million fledglings, and to consume approximately 13 tonnes of insects and 800–1200 tonnes of grass and cereal seeds during its breeding cycle [17,18]. One Quelea bird consumes grain that is half of its size (10 g) per day, meaning that 10 tonnes of crops can be consumed daily by a swarm of one million birds [18,19]. Crop losses can range from 50 to 100% depending on the extend/duration of invasion [20] estimates that 2 million Quelea birds can destroy 50 tonnes of rice crop in a day valued at 600,000 USD. Despite these losses of phenomenal scale, little research currently is being conducted of the Queleas. The most damage is inflicted on the crops if the birds attack a crop when their seeds are at milk stage. In cereals, the milky stage is the period when a milky white substance begins to accumulate in the developing grains. The bird is also destructive at the dough stage and less destructive at the grain maturity and harvesting stages. In Northern Namibia, the period from the milk stage to the harvesting stage lasts for 2–3 months and this long duration of protracted control is a huge cost to the farmer.
4 4.1
Control of Queleas Modelling and Early Warning
The dependence of Quelea breeding on rainfall allows for the prediction of invasion based on rainfall occurrence. Models for the occurrence of the invasion in a location has been done based on these rainfall patterns [21]. 4.2
Application of Chemical Avicide
Aerial application of Fenthion, an organophosphate pesticide has been used extensively against birds in crop farms. It is highly poisonous to birds and slightly poisonous to mammals. The chemical is applied at night in the nesting area [22].
Co-design in Bird Scaring Drone Systems:
4.3
601
Fire-Bombs
Another way of controlling Quelea birds is by blowing them up using firebombs or dynamite as they concentrate to roost. Explosives made of diesel and petrol mixtures are set and detonated to create fires that kill birds and their breeding colonies [23]. 4.4
Traditional Methods
Bird scaring - Bird scaring is done by using scarecrows, chasing, and shouting at the birds to scare them away. This practice is mostly done by children throwing stones, clapping hands, and beating metals [24]. However, the birds soon become habituated to the scarecrows and noise and they ignore them. The birds get into the fields as early as possible and by the time farmers wake up, the birds are already in the fields feeding. Sorghum farmers in Namibia and Botswana guard sorghum and millet fields daily from early morning until evening from the months of March to May. They ignite leaves and throw stones to drive them away. All other household duties are temporarily put aside and delayed during that period. Farmers need to guard the fields otherwise the yield will be completely lost [25]. 4.5
Killing the Birds and Use as Food
The Quelea bird is eaten by people in many parts of Africa [1,26]. Both the adult birds and the chicks can be harvested and used for human consumption at subsistence level or sold in the market. The harvesting of birds and chicks is done communally every night until the surviving birds flee from the area [25]. Various means of trapping Quelea birds have been used in many parts of Africa [22]. One example is spreading a fishing net near the roost to trap the swarming birds returning to roost. Trapped birds are collected and used as food. 4.6
Changing Agricultural Practices
Planting as early as possible enables the crop to pass through the most vulnerable stage, the grain-milk stage before the birds arrive. Other cultural practices are creating disturbances on the nests and the eggs, which causes the colony to vacate the nests and move to another place [1]. In Zimbabwe, the threat of Quelea birds on small grain crops over the years has led farmers to change from small cereals to maize. The maize may be attacked by wild pigs, monkeys, and baboons, but control of those animals by scaring is easier compared to control of Quelea birds [27]. Unfortunately in Namibia, the crop alternatives are limited and farmers are left with the option of pearl millet, which is adapted to the harsh climate.
5
Using the Drones for Pest Bird Control
Drones as an ISR system, serve farmers to protect the crops from pests as well as support the timely, efficient, and optimized use of inputs such as soil amendments, fertilizers, seeds, and water. This process leads to an increase in yields
602
M. Dayoub et al.
and reduces the overall cost of farm operations. Nowadays, the Unmanned Aerial Vehicle (UAV) is a low-cost option in sensing technology and data analysis system [28]. The farmer can use UAV to scare birds away from orchards or crops to avoid yield losses. The UAV provides loudspeaker broadcasting trouble signals, and the drone can be designed to imitate a huge predator bird. Research showed that UAV can prevent extensive pest birds in a 50 m radius centered on the UAV confirming that one UAV is capable of protecting a farm smaller than 25 ha. This implies that a swarm of UAVs drones can be used to protect larger farms [18]. The nuisance and trouble caused by the Quelea bird are best understood by the farmer and it is important to apply the co-design approach in the development of a robust UAV solution. Co-design is a technology production method that involves the end-users in the design of the technology, in this case, the Drones. This approach engages farmers and other stakeholders in the technology development process. Stakeholder contribution and insight are valuable in guiding the design process and bring about successful outcomes and sustainability [29,30] see Fig. 1.
Drones, Camera and sound
Ground Camera
So ware Computer Bird Detec on system (Ground)
Bird.pdf
Fig. 1. System diagram of the proposed swarm drone bird
5.1
Methodology: Design of Swarm Drones
The drone is composed of a sensor on the ground, which investigates the space and determines the time of flying the drones, it thus protects the crop from in-coming birds. A drone has a multi-dimensional appearance with visual movement; it is equipped with a convincing sound device and imitates the flight and
Co-design in Bird Scaring Drone Systems:
603
sound of a natural predator, which scares the pest bird. It is an effective way to protect a large area from quelea birds without inflicting any harm to them. We use alarm sounds released by the drone to avoid any bird approaching it. In this study, we use different alarms to evaluate the effectiveness of band distress calls on birds and to measure the effectiveness of control. The drone will be deployed to protect agricultural fields from birds and will be used during some agricultural practices. The main benefit of drones is its ability to reduce the damage to the farm as well as to the birds. The study will also evaluate the response of quelea birds to drones flying at different altitudes. It will assess the effectiveness of alerts when drones are flying within 30 m above ground level (AGL) and at lower altitudes. 5.2
Technical Setup and Analysis Results
In this section we show the technical setup used to control and steer the swarm and some results about the amount of energy consumed by each drone. The weight of each drone is around 8 Kg including the sound producer and other peripherals. To design this system two parts should be covered, i.e., 1) control part and mission part. The control part is all the processes that are focused to control the drones and collision avoidance that is directly related to the swarm of drones while the mission part is focused on performing the mission that is scaring the bird. For control system we use non-linear technique from [31] for take-off and [32] for landing the drones. The main strategy here is to perform near-optimal algorithms on each drone to save the battery energy as much as possible and also remove the drone vibration in a way that object detection, i.e., bird, and tracking is done appropriately. To guarantee a suitable collision avoidance we used the technique explained in [33]. To cover all the areas in a land, simultaneous localization, and mapping (SLAM) technique is applied [34]. GPS is used to perform SLAM since the drone operates in open areas. GPS module is mounted on the drones and the location of each drone is reported via GPS. However the SLAM can be done via other techniques than GPS such as visual odometry [34] for some semi-closed areas e.g., under the trees and special weather conditions, that is the future work for this paper. For the mission part the object detection and tracking are done via a normal convectional neural network (CNN) is used. The main technique is explained in [35]. The land area is considered 4 hectares in a square shape that is 4 × 104 m2 . 3 drones as a swarm are used to cover the area. The swarm operates in a way that each drone keeps 10 m distance from the other drones in a line formation and the central drone is the leader [31] and [36]. If a drone is 8 Kg the energy for each drone without considering the air friction is 8 × 10 × 30 = 2400 J. If the friction of the air is negligible, the elapsed time for each total sweep of the land is 4x104 /40xv where v is the speed of the drone and 40 is the sweep line. In our experiment the speed of each drone is 50 Km/h, i.e., 13.8 m/s. The overall energy consumption of the swarm is 3 × 4 × 104 /40 × 13.8 × 2400 J ≈ 1 MJ. This shows that a our drone with two normal 12 V, 2200 mAh battery with one charge break
604
M. Dayoub et al.
can sweep all the land. Please notice that we consider the air friction negligible and we take out the energy of drone acceleration/deceleration. 5.3
Co-designing the Drones
In order to ensure the functional design of the drone-based solution to the bird problem, we will follow the co-design approach. Co-design in technology development involves the engagement of shareholders, particularly during the design development process. Co-design enables early-stage alterations and guarantees delivery of a usable product while ensuring that the technology matches the needs of the end-users. By using co-design, we can link the design with the needs of the community in Namibia, and the end-user is credited for creating a homegrown solution for crop protection against a notorious bird pest [37]. Co-design involves a set of users, representing diverse stakeholders throughout the design process, from ideation to the final product. The co-design team will consist of the following stakeholders: farmers, ICT specialists, agronomists, extension staff, ecologists, local authorities, etc. In the technical design of the drones, we will make use of a team of MSc students in software engineering and ICT. The team will be supervised by our support team at the University of Turku in Finland. The aspects of sustainability will be catered from the onset of the project: During the stakeholder interface, pertinent questions on the economic, environmental, energy, and culture are addressed and factored into the drone design. Economy and business: How to cover the budget of the design and running costs? How should technology attain low-cost status and remain profitable to users? Environment: How best can environmentally friendly materials used and how may they be recycled? Energy - Can the gadget be charged by solar power? Culture: What cultural consideration should be integrated, in the perspective of subsistence farmers, women, and youth? Ethics: What are the general principles of ownership, use, and Intellectual Property Rights (IPRs)? 5.4
Comparison of the Cost of Different Quelea Bird Control Methods
The cost of method chemical control: the rate of application of fenthion on millet R is on average 2 Kg/ha (1.5–2.4 Kg/ha) [6]. At a price in the form of Queletox of 105 $ for 1 kg of fenthion, it costs USD 210 to purchase 2 Kg of the chemical for the control of Quelea in one hectare. The cost of method Bird-scaring: The other option is to hire people to scare the birds (3 persons × 60 days × 5 $ per day) which costs 900 $ per hectare. Our drone costs 400 $ per piece and can cover at least 2 ha. If we used swarm drones (3 drones for instance), it can cover a larger area and serve the farm for several years. The cost of buying the drone is recovered in the first year and the technology could work for at least 5 years. Besides that, the environment is preserved when the birds are scared away instead of being killed; while no risk is posed to the food-chain. The technology uses less energy compared to other traditional methods.
Co-design in Bird Scaring Drone Systems:
6
605
Conclusion, Regulations, and Future-Works
To reduce the damage of crops by pests, appropriate management practices are required and smart solutions should be deployed on the farm. The amount of damage would vary depending on the extent of the application of smart solutions on the field. A drone has obvious design and sound advantages, combined with its ability to fly quickly and randomly across an area, which means it is quickly distinguished by birds as a threat. We will exploit the voice in this smart bird control technology so that when the birds hear the various trouble signals, they feel threatened and escape from the area. The digital audio circuit will be designed with specific bandwidth and sufficient memory to generate alarms repeatedly. The frequency of broadcasts will vary depending on altitude and time of day [38]. There are various challenges affecting the implementation of UAVs in agricultural systems. Regarding the regulation requirements, all drones flying in Namibia require certification and permissions. UAV must be kept below a height of 120 m from the ground and under the top of a man-made object. e.g. a building or tower. The drone will not fly at night because the birds are inactive, whereas, during the day, the drone can recognize the bird clearly. A drone is not allowed to fly near the airport or other restricted airspaces. Using drones required the permission of landowners. The drone should be kept within the line of sight of the operator. This means it cannot be flown behind obstacles or through cloud or fog [39]. Quality software plays a vital role in the applicability of drone technology [6]. In the future, the map of the area can be loaded into the drone controller. This is then set up with boundary areas and present points to create flight paths for the drone. The best flight paths to use and the height to fly the drone does depend on the bird quantity, where the birds are living and the time of the day the drone is functioned. Various flight ways can also be set up and used at different times. The drone has long flight times allowing them to stay in the space for half an hour to one hour (depending on the model). When the drone returns from its flight, all that is required is to replace the battery before sending the drone to fly again. In an agricultural rural setting, the stakeholder knowledge fed to the technology during the co-design processes produces an ultimate ICT-based ‘African drone’ embedded with Intelligent Systems to become a bird control best practice and a means of increasing production, preserve resources, improving income, and achieving food security via the digitization of African Agriculture.
References 1. Jaeger, M., Elliott, C.C.: Quelea as a Resource, pp. 327–338. Oxford University Press, Oxford (1989) 2. Dhindsa, M.S., Saini, H.K.: Agricultural ornithology: an Indian perspective. J. Biosci. 19, 391–402 (1994) 3. Kale, M., Balfors, B., M¨ ortberg, U., Bhattacharya, P., Chakane, S.: Damage to agricultural yield due to farmland birds, present repelling techniques and its impacts: an insight from the Indian perspective. J. Agric. Technol. 1, 49–62 (2012)
606
M. Dayoub et al.
4. Elliot, C.: The New Humanitarian-Quelea-Africa’s most hated bird (2009). https:// www.thenewhumanitarian.org/news/2009/08/19 5. Anderson, A., Lindell, C.A., Moxcey, K.M., Siemer, W.F., Linz, G.M., Curtis, P.D., Carroll, J.E., Burrows, C.L., Boulanger, J.R., Steensma, K.M., Shwiff, S.A.: Bird damage to select fruit crops: the cost of damage and the benefits of control in five states. Crop Prot. 52, 103–109 (2013) 6. Sylvester, G.: E-agriculture in Action: Drones for Agriculture, p. 112, FAO (2018) 7. FAO NAD WFP, “ROP, Livestock and Food Security Assessment Mission to Namibia” (2009). http://www.fao.org/3/ak334e/ak334e00.htm 8. Craig, A.F.J.K.: Quelea quelea, Birds of Africa, vol. 7 (2004) 9. Matanyairel, C.M.: Pearl Millet Production System(s) in the Communal Areas of Northern Namibia: Priority Research Foci Arising from a Diagnostic Study. Technical report (1996) 10. NAMIBIA, “Country Report on the State of Plant Genetic Resources for Food and Agriculture Namibia,” Technical report (2008) 11. Rohrbach, E., Lechner, D.D., Ipinge, W.R., Monyo, S.A.: Impact from Investments in Crop Breeding: the case of Okashana 1 in Namibia, International Crops Research for the Semi Arid tropics, Technical report (1999) 12. Shifiona, T.K., Dongyang, W., Zhiguan, H.: Analysis of Namibian main grain crops annual production, consumption and trade-maize and pearl millet. J. Agric. Sci. 8(3), 70–77 (2016) 13. Singh, G., Krishikosh Viewer Krishikosh (2003). https://krishikosh.egranth.ac.in/ displaybitstream?handle=1/5810014632 14. Roden, P., Abraha, N., Debessai, M., Ghebreselassie, M., Beraki, H., Kohler, T.: Farmers’ appraisal of pearl millet varieties in Eritrea, SLM Eritrea Report 8, Technical report (2007) 15. Macauley, H.: Background paper Cereal Crops: Rice, Maize, Millet, Sorghum, Wheat. ICRISAT, Technical report (2015) 16. Lagrange, M.: Innovative approaches in the control of quelea, ouelea auelea lathimii. In: Proceedings of the Thirteenth Vertebrate Pest Conference, p. 63, Zimbabwe (1988) 17. Pimentel, D.: Encyclopedia of Pest Management. CRC Press (2007) www.crcpress. com 18. Wang, Z., Griffin, A.S., Lucas, A., Wong, K.C.: Psychological warfare in vineyard: using drones and bird psychology to control bird damage to wine grapes. Crop Prot. 120, 163–170 (2019) 19. Elliott, C.C.H.: The Pest Status of the Quelea. Africa’s Bird Pest. pp. 17–34. Oxford University Press, Oxford (1989) 20. Oduntan, O.O., Shotuyo, A.L., Akinyemi, A.F., Soaga, J.A.: Human-wildlife conflict: a view on red-billed Quelea quelea. Int. J. Mol. Evol. Biodivers. V5, 1–4 (2015) 21. Cheke, R.A., Veen, J.F., Jones, P.J.: Forecasting suitable breeding conditions for the red-billed quelea Quelea quelea in Southern Africa. J. Appl. Ecol. 44(3), 523– 533 (2007) 22. Cheke, R.A., Sidatt, M.E.H.: A review of alternatives to fenthion for quelea bird control, pp. 15–23, February 2019 23. Cheke, R.A.: Crop Protection Programme Quelea birds in Southern Africa: protocols for environmental assessment of control and models for breeding forecasts R8314. Natural Resources Institute, University of Greenwich at Medway, Technical report (2003)
Co-design in Bird Scaring Drone Systems:
607
24. L.C. of Africa, Lost Crops of Africa. National Academies Press, February 1996 25. CABI, Quelea quelea (weaver bird). https://www.cabi.org/isc/datasheet/66441 26. Mulli`e, W.C.: Traditional capture of red-billed quelea Quelea quelea in the Lake Chad Basin and its possible role in reducing damage levels in cereals. Ostrich 71(1–2), 15–20 (2000) 27. Mathew, A.: The feasibility of small grains as an adoptive strategy to climate change. Russ. J. Agric. Socio-Econ. Sci. 41(5), 40–55 (2015) 28. Norasma, C.Y.N., Fadzilah, M.A., Roslin, N.A., Zanariah, Z.W.N., Tarmidi, Z., Candra, F.S.: Unmanned aerial vehicle applications in agriculture. In: IOP Conference Series: Materials Science and Engineering, vol. 506, pp. 012063 (2019) 29. Wojciechowska, A., Hamidi, F., Lucero, A., Cauchard, J.R.: Chasing lions: co-designing human-drone interaction in Sub-Saharan Africa, CoRR, vol. abs/2005.02022 (2020). https://arxiv.org/abs/2005.02022 30. Myllynp¨ aa ¨, V., Misaki, E., Apiola, M., Helminen, J., Dayoub, M., Westerlund, T., Sutinen, E.: Towards holistic mobile climate services for farmers in Tambuu, Tanzania. In: Nielsen, P., Kimaro, H.C. (eds.) Information and Communication Technologies for Development. Strengthening Southern-Driven Cooperation as a Catalyst for ICT4D, pp. 508–519. Springer International Publishing, Cham (2019) 31. Tahir, A., B¨ oling, J.M., Haghbayan, M.H., Plosila, J.: Navigation system for landing a swarm of autonomous drones on a movable surface. In: Proceedings of the 34th International ECMS Conference on Modelling and Simulation, ECMS 2020, Wildau, Germany, 9–12 June 2020, pp. 168–174. European Council for Modeling and Simulation (2020) 32. Tahir, A., Boling, J.M., Haghbayan, M.H., Plosila, J.: Comparison of linear and nonlinear methods for distributed control of a hierarchical formation of UAVs. IEEE Access 8, 95667–95680 (2020) 33. Yasin, J.N., Haghbayan, M.H., Heikkonen, J., Tenhunen, H., Plosila, J.: Formation maintenance and collision avoidance in a swarm of drones. In: ISCSIC 2019: 3rd International Symposium on Computer Science and Intelligent Control, Amsterdam, The Netherlands, 25–27 September 2019, pp. 1:1–1:6. ACM (2019) 34. Mohamed, S.A.S., Haghbayan, M.H., Westerlund, T., Heikkonen, J., Tenhunen, H., Plosila, J.: A survey on odometry for autonomous navigation systems. IEEE Access 7, 97466–97486 (2019) 35. Rabah, M., Rohan, A., Haghbayan, M.H., Plosila, J., Kim, S.: Heterogeneous parallelization for object detection and tracking in UAVs. IEEE Access 8, 42784–42793 (2020) 36. Yasin, J.N., Haghbayan, M.H., Heikkonen, J., Tenhunen, H., Plosila, J.: Unmanned aerial vehicles (UAVs): collision avoidance systems and approaches. IEEE Access 8, 105139–105155 (2020) 37. Dayoub, M., Helminen, J., Myllynp¨ aa ¨, V., Pope, N., Apiola, M., Westerlund, T., Sutinen, E.: Prospects for climate services for sustainable agriculture in Tanzania. In: Tsounis, N., Vlachvei, A. (eds.) Advances in Time Series Data Methods in Applied Economic Research, pp. 523–532. Springer International Publishing, Cham (2018) 38. Berge, A., Delwiche, M., Gorenzel, W.P., Salmon, T.: Bird control in vineyards using alarm and distress calls. Am. J. Enol. Vitic. 58(1), 135 LP–143 LP (2007) 39. Maintrac Group: Drones for bird control - how it works - Maintrac Group (2019). https://www.maintracgroup.com/blogs/news/drones-for-birdcontrol-how-they-work
Proposed Localization Scenario for Autonomous Vehicles in GPS Denied Environment Hanan H. Hussein1(&), Mohamed Hanafy Radwan2, and Sherine M. Abd El-Kader1 1
2
Computers and Systems Department, Electronics Research Institute, Giza, Egypt {hananhussein,sherine}@eri.sci.eg Valeo Inter-Branch and Automotive Software Corporate, Cairo, Egypt [email protected]
Abstract. The improvement of Advanced Driver Assistance Systems (ADAS) has been increased significantly in last decades. One of the important topics related to ADAS is the autonomous parking system. This system faces many challenges to have an accurate estimation for the node (vehicle) position in indoor scenario, especially in the absence of GPS. Therefore, an alternative localization system with precise positioning is mandatory. This paper addresses different indoor localization techniques with their advantages and disadvantage. Furthermore, several localization technologies have been studied such as UltraWideBand (UWB), WiFi, Bluetooth, and Radio Frequency Identification Device (RFID). The paper compares between these different technologies highlighting their coverage range, localization accuracy, applied techniques, advantages, and disadvantages. The key contribution of this paper is proposing a scenario for underground garage. Finally, a localization system based UWB-Time Difference of Arrival (TDoA) is suggested deployed on IEEE 802.15.4a UWB-PHY standard hardware platform with several Anchor Nodes (ANs), enabling communication and localization with significant accuracy. Keywords: Indoor localization
Autonomous vehicle UWB
1 Introduction In the last couple of years, localization [1] has become one of the most interesting topics the industry field especially in vehicular technology era. Localization systems become the key factor for many deployed applications, like Advanced Driver Assistance Systems (ADAS), Intelligent Transportation System (ITS), military applications, automation in precision farming, Internet of Things (IoT) [2],….etc. In all these applications, it is mandatory to have accurate position. In autonomous parking system, vehicles are driving autonomously to their parking slots inside the underground garage [3]. Such system needs special wireless network in order to communicate vehicles with some sensors in this garage. In addition, localization system is required high precision of positioning (i.e. few centimeters). Nevertheless in © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 608–617, 2021. https://doi.org/10.1007/978-3-030-58669-0_55
Proposed Localization Scenario for Autonomous Vehicles
609
independent GPS environment such as underground parking garages or indoor places, Global Navigation Satellite System (GNSS) doesn’t support localization. Thus, different local positioning systems have been proposed recently [4] to solve this problem with high accuracy. In autonomous parking scenario, the main problems are: • Navigate a vehicle autonomously inside the underground parking garage. • The navigation process should have localization accuracy in range of centimeter, with high flexibility, robustness and resolution. • All the localization estimation and managing the vehicle movement should be by the vehicle itself. • Distribution model of the communication network inside the garage should be optimized in order to cover all the garage area. In this paper, we will discuss different techniques and technologies that support indoor localization system. Besides, main motivation and proposed solution will be showed to support autonomous parking scenario. Remaining paper is introduced as follows: Sect. 2 and Sect. 3 show localization techniques and technologies, respectively. A network scenario for underground parking garage application is suggested in Sect. 4. Finally, conclusion and future work is showed in Sect. 5.
2 Localization Techniques In this section, several techniques which are widely applied for localization will be discussed. 2.1
Received Signal Strength Indicator (RSSI)
The Received Signal Strength Indicator (RSSI) technique is considered as the simplest and commonly applied technique for indoor localization [5]. The RSSI is the measured received power level at Rx in dBm. Actual distance d between pairs (i.e. Tx and Rx) can be predicted using RSSI; where distance d is inversely proportional with RSSI level as RSSI ¼ 10 n log10 d þ A;
ð1Þ
where n is expressed as the path loss exponent. In case of free space, n equals 2 while it equals 4 in case of indoor places. A is considered as the RSSI level at Rx. It is noted that, RSSI needs at least triangulation points or N-point. For example, RSSI is measured at the vehicle to estimate d from the vehicle and trilateration reference points. Finally, simple geometry can be applied to predict the vehicle’s location related to the reference points.
610
2.2
H. H. Hussein et al.
Channel State Information (CSI)
Channel State Information (CSI) denotes to recognize channel characteristics of a wireless link. This information defines how a signal propagates from Tx to Rx and describes the combined effect (i.e. scattering, fading, and power decay with distance). CSI is usually delivered higher localization accuracy than RSSI; as the CSI has the ability to realize both channel amplitude response and channel phase response in different frequencies [6]. However, the CSI is a complex technique. It can be formulated in a polar form as H ð f Þ ¼ jH ð f Þjejuðf Þ ;
ð2Þ
where, jH ð f Þj is the channel amplitude response of the frequency f while channel phase response is expressed as uðf Þ. Currently, several IEEE 802.11 Network Interface Controllers (NICs) cards can offer subcarrier-level channel measurements for OFDM approach which can be translated into richer multipath information, more stable measurements and higher localization accuracy [6]. 2.3
Fingerprinting Technique
Fingerprinting technique is employed for indoor localization systems. For localization detecting, it is commonly demanding additional history or survey about surrounding environment to acquire this environment fingerprints [7]. Primarily, RSSI measurements are collected during the offline phase. Once the system is deployed, online measurements are matched with the offline measurements to estimate vehicle location. Numerous algorithms are improved to compare the online measurements with the offline measurements like, probabilistic method, Artificial Neural Network, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). 2.4
Angle of Arrival (AoA)
Angle of Arrival (AoA) technique is a widely technique applied at Rx. This technique is based on antennae arrays [8] to calculate the distance among Tx and Rx, where the incident angle at Rx is used to estimate the difference of time arrival at each element of the antennae array as in Fig. 1. 2.5
Time of Flight (ToF)
Time of Arrival (ToA) or Time of Flight (ToF) uses the signal propagation characteristics to determine d [9], in which the speed of the light (c ¼ 3 108 m/s) can be multiplied by ToF value to estimate d. Unfortunately, ToF demands synchronization among Tx and Rx. In several situations, timestamps is transmitted with signal to eliminate synchronization issue. Let t1 be the needed time to transmit data from Tx while t2 is the measured time at Rx after receiving data; where t2 ¼ t1 þ tp. tp is expressed as the signal propagation time [9]. Hence, Dij is the measured distance between the Txi and Rxj can be formulated as
Proposed Localization Scenario for Autonomous Vehicles
611
Fig. 1. AOA technique [8]
Dij ¼ ðt2 t1 Þ v;
2.6
ð3Þ
Time Difference of Arrival (TDoA)
Time Difference of Arrival (TDoA) is similar to ToF. The difference between both techniques that TDoA depends on the difference in signals propagation times from several Tx. Usually, TDoA is measured at Rx [1]. The actual distance Dij is obtained as Dij ¼ c TDij ;
ð4Þ
where TDij is measured TDoA while light speed is c. Dij is expressed as LD ði; jÞ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 2ffi Xj x þ Yj y þ Zj z ; ðXi xÞ2 þ ðYi yÞ2 þ ðZi zÞ2 ð5Þ
where ðXi; Yi; ZiÞ are the coordinates of Tx/reference node i and ðx; y; zÞ are the coordinates of RX/user. Table 1 offers a comparison between the mentioned techniques for indoor localization and addresses main advantages and drawbacks of these techniques.
3 Technologies for Localization This section will introduce various communication technologies like WiFi, Bluetooth, Ultra-WideBand (UWB). These technologies are exploited by localization techniques to improve indoor localization accuracy. 3.1
WiFi
Wifi is commonly operates in the ISM band (Industrial, Scientific, and Medical band). It follows IEEE 802.11ah standard (i.e. categorized for IoT services) with coverage
612
H. H. Hussein et al. Table 1. Main advantages and drawbacks of indoor localization techniques
Techniques RSSI
CSI Fingerprinting
Advantages Simple implementation, low cost, can be exploits with various technologies Immune to noises and fading Simple implementation
AoA
High positioning precision, no fingerprinting is needed
ToF
High positioning precision, no fingerprinting is needed
TDoA
High positioning precision, no fingerprinting is needed
Drawbacks Affected by fading and environmental noise, prone accuracy, fingerprinting is needed Complexity Any minor variation in environment, new fingerprints are needed Complex implementation, directional antennas is needed, complex algorithms and performance degrades as distance increased Time synchronization between Tx and Rx is needed. In some scenarios, additional antennas and time stamps is needed at Tx and Rx. LOS is required for high accuracy Clock synchronization is needed, large BW is needed, might need time stamps
range about 1 km [10]. This technology is existed in laptops, smart phones, and other portable user devices. Thus, Wifi is considered as one of the most popular technologies that exploited for indoor localization [11]. Additionally, WiFi access points can be exploited as reference points for localization systems calculations (can be used without any additional infrastructure). The mentioned techniques as RSSI, CSI, ToF, AoA, and any combination of them can be employed to offer WiFi based localization system. 3.2
Bluetooth
IEEE 802.15.1 standard or Bluetooth technology is applied to connect various nodes either fixed or moving via limited distance space. Various indoor localization techniques like RSSI, ToF, and AOA can be based on Bluetooth technology. Based on several researches [12], RSSI is the most applied positioning technique based on Bluetooth due to its simplicity. Unfortunately, the main drawback of applying RSSI technique on Bluetooth technology is the accuracy limitations in localizing nodes (vehicle). However, Bluetooth in its original form can be exploited for positioning due to its coverage range, low transmitted power and energy consumption). Two common protocols based on Bluetooth applied for indoor localization are iBeacons (by Apple Inc.) and Eddystone (by Google Inc.) [13]. 3.3
Radio Frequency Identification Device (RFID)
Radio Frequency Identification Device (RFID) is employed for transmitting and memorizing data by electromagnetic transmission from Tx to any RF compatible circuit
Proposed Localization Scenario for Autonomous Vehicles
613
[14]. RFID systems are categorized into two main types (i.e. active RFID, and passive RFID). Active RFID: Usually, it operates in the UHF (Ultra High Frequency) band and microwave band. Active RFIDs are attached with power source. Their IDs are transmitted periodically and their coverage range in hundreds of meters away from RFID reader. These active devices can be exploited for tracking objects and indoor localization because of their coverage range, low budget and simplicity in implementation. Nevertheless, active RFID technology is prone against localization accuracy and it is not available on most portable user devices. Passive RFID: Unlike active RFIDs, passive RFIDs have short coverage range (1– 2 m) and don’t require batteries. These devices are smaller in size and lighter in weight with lower cost than other type. Passive RFIDs are operates in low, high, UHF and microwave band. These devices are considered as an alternative solution to bar codes, specifically in non-line of sight environments. However, they are not efficient in case of localization due to their limited coverage range. These passive devices can be applied for proximity based services by exploiting brute force mechanisms, but still additional complex modifications are needed like transmitting an ID that can be used to identify the devices. 3.4
Ultra Wideband (UWB)
In UWB, ultra short-pulses with time period of 500 MHz), in the frequency range from 3.1 to 10.6 GHz, with low duty cycle [15]. This technology is exploited for localization estimation based on propagating radio signal from Tx into reference nodes (i.e. known locations). ToF, TDoA, AoA, and RSSI or hybrid technique are the most applied techniques based on UWB technology [16]. Table 2 compares between different wireless technologies applied to support indoor localization in terms of coverage range, accuracy, applied localization techniques, advantages and disadvantages. As stated in Table 2, UWB has numerous advantages such as low power consumption, large BW, low cost, high data rate, high accuracy in localization, robust against environments variations and immune to noises and fading due to its very short pulses. In addition, UWB signal can penetrate several of materials and has the capability of data transmission and localization simultaneously. All these characteristics make UWB a strong candidate for vehicle localization. Nowadays, integrated UWB radio communication chips implementing the IEEE 802.15.4a standard [17] became available on the market. This chip is offering the ability to implement UWB technology in any application scenario efficiently. Unlike RSSI, the accuracy of ToF and TDoA techniques can be better by increasing either SNR or effective BW. One limitation of ToF as stated before is the necessities of time synchronization among all nodes. However time synchronization is also vital in TDoA, but it is much easier in this case; as it is necessary only among reference nodes. Due to the similarity of the offset time for each of the ToF calculations, thus offset time
614
H. H. Hussein et al. Table 2. Comparison among various indoor localization technologies
Technology
Coverage range
Accuracy
Localization technique
Advantages
Disadvantages
WiFi
150 m
10–15 m
RSSI, CSI, ToF and AoA
Bluetooth
70–100 m