273 29 32MB
English Pages 1313 [802] Year 2023
Engineering Cyber-Physical Systems and Critical Infrastructures 1
D. Jude Hemanth Utku Kose Junzo Watada Bogdan Patrut
Smart Applications with Advanced Machine Learning and Human-Centred Problem Design
Engineering Cyber-Physical Systems and Critical Infrastructures Volume 1
Series Editor Fatos Xhafa , Departament de Ciències de la Computació, Technical University of Catalonia, Barcelona, Spain
The aim of this book series is to present state of the art studies, research and best engineering practices, real-world applications and real-world case studies for the risks, security, and reliability of critical infrastructure systems and Cyber-Physical Systems. Volumes of this book series will cover modelling, analysis, frameworks, digital twin simulations of risks, failures and vulnerabilities of cyber critical infrastructures as well as will provide ICT approaches to ensure protection and avoid disruption of vital fields such as economy, utility supplies networks, telecommunications, transports, etc. in the everyday life of citizens. The intertwine of cyber and real nature of critical infrastructures will be analyzed and challenges of risks, security, and reliability of critical infrastructure systems will be revealed. Computational intelligence provided by sensing and processing through the whole spectrum of Cloud-to-thing continuum technologies will be the basis for real-time detection of risks, threats, anomalies, etc. in cyber critical infrastructures and will prompt for human and automated protection actions. Finally, studies and recommendations to policy makers, managers, local and governmental administrations and global international organizations will be sought.
D. Jude Hemanth · Utku Kose · Junzo Watada · Bogdan Patrut
Smart Applications with Advanced Machine Learning and Human-Centred Problem Design
D. Jude Hemanth Department of ECE Karunya University Coimbatore, Tamil Nadu, India Junzo Watada Graduate School of Information Waseda University Shinjuku, Japan
Utku Kose Department of Computer Engineering Faculty of Engineering Suleyman Demirel University Isparta, Turkey Bogdan Patrut Faculty of Computer Science “Alexandru Ioan Cuza” University of Iasi Iasi, Romania
ISSN 2731-5002 ISSN 2731-5010 (electronic) Engineering Cyber-Physical Systems and Critical Infrastructures ISBN 978-3-031-09752-2 ISBN 978-3-031-09753-9 (eBook) https://doi.org/10.1007/978-3-031-09753-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
General Committees
Honorary Chairs Prof. Dr. ˙Ilker Hüseyin Çarıkçı (Rector of Süleyman Demirel University, Turkey) Prof. Dr. ˙Ibrahim Diler (Rector of Applied Sciences University of Isparta, Turkey)
General Chair Prof. Dr. Tuncay Yi˘git (Süleyman Demirel University, Turkey)
Conference Chairs Prof. Dr. Çetin Elmas (Azerbaijan Technical University, Azerbaijan) Prof. Dr. Cemal Yılmaz (Mingachevir State University, Azerbaijan) Assoc. Prof. Dr. Hasan Hüseyin Sayan (Gazi University, Turkey) Assoc. Prof. Dr. ˙Ismail Serkan Üncü (Isparta Applied Sciences University, Turkey) Assoc. Prof. Dr. Utku Köse (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Mevlüt Ersoy (Süleyman Demirel University, Turkey)
Organizing Committee Prof. Dr. Mehmet Gürdal (Süleyman Demirel University, Turkey) Prof. Dr. Anar Adilo˘glu (Süleyman Demirel University, Turkey) Prof. Dr. Semsettin ¸ Kılınçarslan (Süleyman Demirel University, Turkey) Prof. Dr. Kemal Polat (Bolu Abant ˙Izzet Baysal University, Turkey)
v
vi
General Committees
Prof. Dr. Okan Bingöl (Applied Sciences University of Isparta, Turkey) Prof. Dr. Cemal Yılmaz (Gazi University, Turkey) Prof. Dr. Ercan Nurcan Yılmaz (Mingachevir State University, Azerbaijan) Prof. Dr. Yusuf Sönmez (Azerbaijan Technical University, Azerbaijan) Prof. Dr. Hamdi Tolga Kahraman (Karadeniz Technical University, Turkey) Prof. Dr. M. Ali Akcayol (Gazi University, Turkey) Prof. Dr. Jude Hemanth (Karunya University, India) Prof. Dr. U˘gur Güvenç (Düzce University, Turkey) Assoc. Prof. Dr. Asım Sinan Yüksel (Süleyman Demirel University, Turkey) Assoc. Prof. Dr. Akram M. Zeki (International Islamic University Malaysia, Malaysia) Assoc. Prof. Dr. Bogdan Patrut (Alexandru Ioan Cuza University of Iasi, Romania) Assoc. Prof. Dr. Halil ˙Ibrahim Koruca (Süleyman Demirel University, Turkey) Assoc. Prof. Dr. Erdal Aydemir (Süleyman Demirel University, Turkey) Assoc. Prof. Dr. Ali Hakan I¸sık (Mehmet Akif Ersoy University, Turkey) Assoc. Prof. Dr. Muhammed Maruf Öztürk (Süleyman Demirel University, Turkey) Assoc. Prof. Dr. Osman Özkaraca (Mu˘gla Sıtkı Koçman University, Turkey) Assist. Prof. Dr. Bekir Aksoy (Isparta Applied Sciences University, Turkey) Assist. Prof. Dr. Mehmet Kayaku¸s (Akdeniz University, Turkey) Assist. Prof. Dr. Gürcan Çetin (Mu˘gla Sıtkı Koçman University, Turkey) Assist. Prof. Dr. Murat ˙Ince (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Gül Fatma Türker (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Ferdi Saraç (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Cevriye Altınta¸s (Isparta Applied Sciences University, Turkey) Lect. Dr. Hamit Arma˘gan (Süleyman Demirel University, Turkey)
Secretary and Social Media Lect. Çilem Koçak (Isparta Applied Sciences University, Turkey)
Accommodation and Registration/Venue Desk Lect. Dr. Recep Çolak (Isparta Applied Sciences University, Turkey) Lect. Cem Deniz Kumral (Isparta Applied Sciences University, Turkey)
Travel/Transportation Lect. Dr. Hamit Arma˘gan (Süleyman Demirel University, Turkey)
General Committees
Web/Design/Conference Session Lect. Ali Topal (Isparta Applied Sciences University, Turkey)
vii
Scientific Committee
General Chair Prof. Dr. Tuncay Yi˘git (Süleyman Demirel University, Turkey) Committee Chair Assoc. Prof. Dr. Utku Köse (Süleyman Demirel University, Turkey) Prof. Dr. Ahmet Bedri Özer (Fırat University, Turkey) Prof. Dr. Akram M. Zeki (Malaysia International Islamic University, Malaysia) Prof. Dr. Ali Keçeba¸s (Mu˘gla Sıtkı Koçman University, Turkey) Prof. Dr. Ali Öztürk (Düzce University, Turkey) Prof. Dr. Anar Adilo˘glu (Süleyman Demirel University, Turkey) Prof. Dr. Aslanbek Nazıev (Ryazan State University, Russia) Prof. Dr. Arif Özkan (Kocaeli University, Turkey) Prof. Dr. Aydın Çetin (Gazi University, Turkey) Prof. Dr. Ayhan Erdem (Gazi University, Turkey) Prof. Dr. Cemal Yılmaz (Mingachevir State University, Azerbaija) Prof. Dr. Çetin Elmas (Azerbaijan Technical University—Vice Rector, Azerbaijan) Prof. Dr. Daniela Elena Popescu (University of Oradea, Romania) Prof. Dr. Eduardo Vasconcelos (Goias State University, Brazil) Prof. Dr. Ekrem Sava¸s (Rector of U¸sak University, Turkey) Prof. Dr. Ender Ozcan (Nottingham University, England) Prof. Dr. Ercan Nurcan Yılmaz (Mingachevir State University, Azerbaija) Prof. Dr. Erdal Kılıç (Samsun 19 May University, Turkey) Prof. Dr. E¸sref Adalı, (˙Istanbul Technical University, Turkey) Prof. Dr. Gültekin Özdemir (Süleyman Demirel University, Turkey) Prof. Dr. Hüseyin Demir (Samsun 19 May University, Turkey) Prof. Dr. Hüseyin Merdan (TOBB Economy and Technology University, Turkey) Prof. Dr. Hüseyin Seker ¸ (Birmingham City University, England) Prof. Dr. Igbal Babayev (Azerbaijan Technical University, Azerbaijan) Prof. Dr. Igor Lıtvınchev (Nuevo Leon State University, Mexico) Prof. Dr. ˙Ibrahim Üçgül (Süleyman Demirel University, Turkey) Prof. Dr. ˙Ibrahim Yüceda˘g (Düzce University, Turkey)
ix
x
Scientific Committee
Prof. Dr. ˙Ilhan Ko¸salay (Ankara University, Turkey) Prof. Dr. Jose Antonio Marmolejo (Panamerican University, Mexico) Prof. Dr. Jude Hemanth (Karunya University, India) Prof. Dr. Junzo Watada (Universiti Teknologi PETRONAS, Malaysia) Prof. Dr. Kemal Polat (Bolu Abant ˙Izzet Baysal University, Turkey) Prof. Dr. Marat Akhmet (Middle East Technical University, Turkey) Prof. Dr. Marwan Bıkdash (North Carolina Agricultural and Technical State University, USA) Prof. Dr. Mehmet Ali Akçayol (Gazi University, Turkey) Prof. Dr. Mehmet Gürdal (Süleyman Demirel University, Turkey) Prof. Dr. Mehmet Karaköse (Fırat University, Turkey) Prof. Dr. Mehmet Sıraç Özerdem (Dicle University, Turkey) Prof. Dr. Melih Günay (Akdeniz University, Turkey) Prof. Dr. Muharrem Tolga Sakallı (Trakya University, Turkey) Prof. Dr. Murat Kale (Düzce University, Turkey) Prof. Dr. Mostafa Maslouhi (IbnTofail University, Morocco) Prof. Dr. Mustafa Alkan (Gazi University, Turkey) Prof. Dr. Nihat Öztürk (Gazi University, Turkey) Prof. Dr. Norita Md Norwawi (Universiti Sains Islam Malaysia, Malaysia) Prof. Dr. Nuri Özalp (Ankara University, Turkey) Prof. Dr. Nurali Yusufbeyli (Azerbaijan Technical University, Azerbaijan) Prof. Dr. Okan Bingöl (Applied Sciences University of Isparta, Turkey) Prof. Dr. Oktay Duman (TOBB Economy and Technology University, Turkey) Prof. Dr. Ömer Akın (TOBB Economy and Technology University, Turkey) Prof. Dr. Ömer Faruk Bay (Gazi University, Turkey) Prof. Dr. Recep Demirci (Gazi University, Turkey) Prof. Dr. Resul Kara (Düzce University, Turkey) Prof. Dr. Re¸sat Selba¸s, (Applied Sciences University of Isparta, Turkey) Prof. Dr. Sabri Koçer (Necmettin Erbakan University, Turkey) Prof. Dr. Sadık Ülker (European University of Lefke, Cyprus) Prof. Dr. Sergey Bushuyev (Kyiv National University, Ukraine) Prof. Dr. Sezai Tokat (Pamukkale University, Turkey) Prof. Dr. Turan Erman Erkan (Atılım University, Turkey) Prof. Dr. Yusuf Öner (Pamukkale University, Turkey) Assoc. Prof. Dr. Abdulkadir Karacı (Kastamonu University, Turkey) Assoc. Prof. Dr. Aida Mustafayeva (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Ahmet Cüneyd Tantu˘g (˙Istanbul Technical University, Turkey) Assoc. Prof. Dr. Alexandrina Mirela Pater (University of Oradea, Romania) Assoc. Prof. Dr. Ali Hakan I¸sık (Mehmet Akif Ersoy University, Turkey) Assoc. Prof. Dr. Almaz Aliyeva (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Devrim Akgün (Sakarya University, Turkey) Assoc. Prof. Dr. Ercan Bulu¸s (Namık Kemal University, Turkey) Assoc. Prof. Dr. Erdal Aydemir (Süleyman Demirel University, Turkey) Assoc. Prof. Dr. Ezgi Ülker (European University of Lefke, Cyprus) Assoc. Prof. Dr. Gamze Yüksel (Mu˘gla Sıtkı Koçman University, Turkey)
Scientific Committee
xi
Assoc. Prof. Dr. Hasan Hüseyin Sayan (Gazi University, Turkey) Assoc. Prof. Dr. ˙Ismail Serkan Üncü (Applied Sciences University of Isparta, Turkey) Assoc. Prof. Dr. J. Anitha (Karunya University, India) Assoc. Prof. Dr. M. Kenan Dö¸so˘glu (Düzce University, Turkey) Assoc. Prof. Dr. Mahir ˙Ismayılov (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Muhammed Hanefi Calp (Karadeniz Technical University, Turkey) Assoc. Prof. Dr. Nevin Güler Dincer (Mu˘gla Sıtkı Koçman University, Turkey) Assoc. Prof. Dr. Osman Özkaraca (Mugla Sitki Kocman University, Turkey) Assoc. Prof. Dr. Özgür Aktunç (St. Mary’s University, USA) Assoc. Prof. Dr. Parvana Safarova (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Ramazan Senol ¸ (Applied Sciences University of Isparta, Turkey) Assoc. Prof. Dr. Rayiha Agayeva (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Ridha Derrouıche (EM Strasbourg Business School, France) Assoc. Prof. Dr. Sabuhi Gahramanov (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Samia Chehbı Gamoura (Strasbourg University, France) Assoc. Prof. Dr. Sedat Akleylek (Samsun 19 May University, Turkey) Assoc. Prof. Dr. Selami Kesler (Pamukkale University, Turkey) Assoc. Prof. Dr. Selim Köro˘glu (Pamukkale University, Turkey) Assoc. Prof. Dr. Serdar Biro˘gul (Düzce University, Turkey) Assoc. Prof. Dr. Serdar Demir (Mu˘gla Sıtkı Koçman University, Turkey) Assoc. Prof. Dr. Serhat Duman (Düzce University, Turkey) Assoc. Prof. Dr. Serkan Ballı (Mu˘gla Sıtkı Koçman University, Turkey) Assoc. Prof. Dr. Tarana Yusibova (Mingachevir State University, Azerbaija) Assoc. Prof. Dr. Tiberiu Socacıu (Stefan cel Mare University of Suceava, Romania) Assoc. Prof. Dr. Tolga Ovatman (˙Istanbul Technical University, Turkey) Assoc. Prof. Dr. Ümit Deniz Ulu¸sar (Akdeniz University, Turkey) Assoc. Prof. Dr. Ülker A¸surova (Mingachevir State University, Azerbaija) Assist. Prof. Dr. Ali Sentürk ¸ (Applied Sciences University of Isparta, Turkey) Assist. Prof. Dr. Arif Koyun (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Barı¸s Akgün (Koç University, Turkey) Assist. Prof. Dr. Bekir Aksoy (Isparta Applied Sciences University, Turkey) Assist. Prof. Dr. Deepak Gupta (Maharaja Agrasen Insitute of Technology, India) Assist. Prof. Dr. Dmytro Zubov (The University of Information Science and Technology St. Paul the Apostle, Macedonia) Assist. Prof. Dr. Enis Karaarslan (Mu˘gla Sıtkı Koçman University, Turkey) Assist. Prof. Dr. Esin Yavuz (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Fatih Gökçe (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Ferdi Saraç (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Gül Fatma Türker (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Gür Emre Güraksın (Afyon Kocatepe University, Turkey) Assist. Prof. Dr. Iulian Furdu (Vasile Alecsandri University of Bacau, Romania) Assist. Prof. Dr. Mehmet Kayaku¸s (Akdeniz University, Turkey) Assist. Prof. Dr. Mehmet Onur Olgun (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Mustafa Nuri Ural (Gümü¸shane University, Turkey) Assist. Prof. Dr. Okan Oral (Akdeniz University, Turkey)
xii
Scientific Committee
Assist. Prof. Dr. Osman Palancı (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Paniel Reyes Cardenas (Popular Autonomous University of the State of Puebla, Mexico) Assist. Prof. Dr. Remzi Inan (Applied Sciences University of Isparta, Turkey) Assist. Prof. Dr. S. T. Veena (Kamaraj Engineering and Technology University, India) Assist. Prof. Dr. Serdar Biro˘gul (Düzce University, Turkey) Assist. Prof. Dr. Serdar Çiftçi (Harran University, Turkey) Assist. Prof. Dr. Ufuk Özkaya (Süleyman Demirel University, Turkey) Assist. Prof. Dr. Veli Çapalı (U¸sak University, Turkey) Assist. Prof. Dr. Vishal Kumar (Bipin Tripathi Kumaon Institute of Technology, India) Lect. Dr. Anand Nayyar (Duy Tan University, Vietnam) Lect. Dr. Bogdan Patrut (Alexandru Ioan Cuza University of Iasi, Romania) Lect. Dr. Simona Elena Varlan (Vasile Alecsandri University of Bacau, Romania) Dr. Ashok Prajapatı (FANUC America Corp., USA) Dr. Katarzyna Rutczy´nska-Wdowiak (Kielce University of Technology, Poland) Dr. Mustafa Küçükali (Information and Communication Technologies Authority) Dr. Nabi Ibadov (Warsaw University of Technology, Poland) Dr. Özkan Ünsal (Süleyman Demirel University, Turkey) Tim Jaques (International Project Management Association, USA)
Keynote Speaks
ICAIAME 2021 Keynote Speakers 1. Prof. Dr. Çetin Elmas (Azerbaijan Technical University, Azerbaijan) “Artificial Intelligence in Project Management” 2. Prof. Dr. Marat Akhmet (Middle East Technical University, Turkey) “Domain structured dynamics and chaos” 3. Prof. Dr. Hüseyin Seker (University of Birmingham, England) “The Power of Data and The Things It Empowers: What have we learned in the age of Covid19” 4. Prof. Dr. Jude Hemanth (Karunya Institute of Technology and Sciences, India) “Trends in medical decision support systems using AI techniques” 5. Assoc. Prof. Dr. Ender Özcan (University of Nottingham, England) “Topology Optimisation of Acoustic Porous Materials Using Heuristics and Metaheuristics” 6. Dr. Mladen Vukomanovi (Vice President, International Project Management Agency—IPMA, Croatia) “PM initiatives in Smart City technologies in the context of IPMA” 7. Zümrüt Müftüo˘glu (Türkiye Cumhuryeti Cumhurba¸skanlı˘gı Dijital Dönü¸süm Ofisi, Turkey) “Ulusal Yapay Zeka Stratejisi 2021–2025” 8. Timoty Jaques (International Project Management Agency—IPMA, USA) “Innovation and Project Management: Meeting the Practical Demands of Digitization”
xiii
Foreword
Engineering is known as an important technological view, which is used by humans to understand the nature and build effective solutions for problems. Along all the historical time period, the humankind has been trying to figure out better solutions for the faced problems. Eventually, that caused the societies and the world to transform into a global information processing, sharing, and communication platform. Because of the all needs for immediate communication, reaching to the most recent information and employing practical tools to use resources efficiently, engineeringbased solution designs became essential for well-being of the humanity. In detail, the Mathematics and the Modern Logic caused engineering efforts to be divided into specific fields, and the human mind found its flexible way in engineering. Nowadays, the humanity needs to find the balance between advantages and disadvantages of intense technology. Thanks to the revolutionary appearance of some technologies such as electronics, computer, and Internet, a new era of rapid advancements is already alive in 2000s. In all different technological tools, Artificial Intelligence employs main role in especially engineering-related solution designs and running effective solutions directly. But since Artificial Intelligence is associated with adaptive use of mathematics to process the data, the total number of issues to be considered for human-compatible smart system design increase day-by-day. In even the most advanced smart system, it may be possible to encounter with the bias factors caused by hidden problematic flows in the big amount of data. That may be worse if we think about potential human errors in the data, as the target data is often affected by human actions and knowledge transferring. So, engineering efforts nowadays need to be focusing on the human compatibility and keep the human in the centre to build responsible smart systems. It is a pleasure for me to write a Foreword for this book titled as Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. As it can be understood from the title, the book is a result of efforts to build up a knowledge on the latest advancements in terms of smart tools developed to deal with human-centred problems. The book consists of different research works of engineering problems, as covered under the fields of medical, electronics, and industrial areas. The coverage generally hits remarkable solutions such as Machine/Deep xv
xvi
Foreword
Learning, image processing, optimization, and software–hardware-oriented specific developments. The book is actually a collection of the chosen papers from the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2021) so there is a nice variety of engineering-related works in smart developments as well as Applied Mathematics. As keeping the human as the target object, all papers provide a nice view of the very recent literature. I believe this book will be a very useful source for everybody including scientists (researchers, academicians), experts, professionals, and degree students. I would like to thank to dear editors Dr. Hemanth, Dr. Kose, Dr. Watada, and Dr. Patrut, for their valuable efforts to realize such a timely reference work. I also thank to all authors for their contributions. With my kind regards. Dr. Jafar Alzubi Al-Balqa Applied University Salt, Jordan [email protected]
Preface
As the technological advancements gain more momentum, it becomes a more remarkable requirement to have human-side control in tools. With the wind of the twenty-first century, the humankind is facing with lots of advanced tools making daily life tasks fast and practical. However, some of these advanced technologies still require human control and careful analysis of any data-oriented issues. In this context, the scientific developments have critical role to shape the future with a peace between technology and the humanity. That seems a more critical need as the world is facing with some massive problems such as pandemics and climate change. Among these issues, Artificial Intelligence became a powerful tool to develop smart systems, which are able to help users in important real-world problems. On the other hand, that is still a critical thing to think about human-compatible design issues in these smart systems, too. So, optimum design and developments are needed to be applied widely. Titled as Smart Applications with Advanced Machine Learning and HumanCentred Problem Design, this book employs a collection of papers presented at the International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2021) held in Antalya, Turkey. Thanks to the intense international contributions, the event included reports for many recent research outcomes in the context of Artificial Intelligence and the Applied Mathematics in different engineering fields. As the Artificial Intelligence runs the mathematical background to process the real-world data, different aspects of Mathematics (like Applied Mathematics) are needed to be thought as a supportive component within smart tools. Additionally, smart systems running over Machine Learning and Deep Learning approaches are needed to be developed carefully against critical engineering problems in vital fields. So, the book has been organized carefully to include different research aspects of smart applications within engineering fields and considering human aspects for sure. In the organization of the book, independent reviewers as well as the scientific committee of the ICAIAME 2021 had critical views to improve quality of all papers and covering different aspects of research, as focusing more on the human-centred fields. All findings from the included papers have the potential of understanding more
xvii
xviii
Preface
about the latest literature, triggering future works, and motivating especially early researchers to find their exact ways in terms of the future smart world. As the editors of the book, we would like to send our thanks to all participants took part in the ICAIAME 2021. Also, special thanks go to all keynote speakers and the session chairs, who shared their valuable scientific and academic views in terms of the latest outcomes of research. We are also grateful to all the event staff provided great efforts to make the event a successful platform for international knowledge sharing and any potential future collaborations. With our best wishes. Coimbatore, India Isparta, Turkey
Shinjuku, Japan Iasi, Romania
D. Jude Hemanth [email protected] Utku Kose [email protected] http://www.utkukose.com Junzo Watada [email protected] Bogdan Patrut [email protected]
Acknowledgements As the editors, we would like to thank Lect. Çilem Koçak (Isparta University of Applied Sciences, Turkey) for her valuable efforts on pre-organization of the book content, and the Springer team for their great support to publish the book.
Contents
1
2
3
Implementation of Basic Math Processing Skills with Neural Arithmetic Expressions in One and Two Stage Numbers . . . . . . . . . . Remzi Gürfidan, Mevlüt Ersoy, D. Jude Hemanth, and Elmira Israfilova 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Neural Arithmetic Expressions and Logic Units . . . . . . . 1.1.2 Long Short-Term Memory Algorithm . . . . . . . . . . . . . . . . 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Experimental Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An Example Application for Early Diagnosis of Retinal Diseases Using Deep Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . Bekir Aksoy, Fatmanur Ate¸s, Osamah Khaled Musleh Salman, Hamit Arma˘gan, Emre Soyaltin, and Ender Özcan 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Research Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Autonomous Parking with Continuous Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehmet Ertekin and Mehmet Önder Efe 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Deep Q Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1 2 3 4 5 6 10 10 11
11 12 12 16 18 20 21 22 25 25 26 26 xix
xx
Contents
3.2.2 Deep Deterministic Policy Gradient Algorithm . . . . . . . . 3.2.3 Twin Delayed Temporal Difference Algorithm . . . . . . . . 3.2.4 Soft Actor Critic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Hindsight Experience Replay Algorithm . . . . . . . . . . . . . 3.2.6 Parking Environment Simulation Model . . . . . . . . . . . . . . 3.3 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5
6
Design and Manufacturing of a 3 DOF Robot with Additive Manufacturing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Burak Ke¸skekçi, Hilmi Cenk Bayrakçi, and Ercan Nurcan Yilmaz 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Findings and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real-Time Mask Detection Based on Artificial Intelligence Using Renewable Energy System Unmanned Aerial Vehicle . . . . . . . Bekir Aksoy, Mehmet Yücel, Re¸sat Selba¸s, Merdan Özkahraman, Çetin Elmas, and Almaz Aliyeva 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Related Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Research Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Investigation of Effect of Wrapping Length on the Flexural Properties of Wooden Material in Reinforcement with Aramid FRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semsettin ¸ Kilinçarslan, Yasemin Sim¸ ¸ sek Türker, and Nabi Ibadov 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 29 30 31 33 35 36 37 39
39 40 40 41 43 44 45 47
47 49 50 50 54 57 59 59
61 61 62 65 66 67
Contents
7
8
9
Deep Learning-Based Air Defense System for Unmanned Aerial Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bekir Aksoy, Mustafa Melik¸sah Özmen, Muzaffer Eylence, Seyit Ahmet ˙Inan, and Kamala Eyyubova 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Research Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 MobileNetV2 Training Results . . . . . . . . . . . . . . . . . . . . . 7.3.2 Xception Training Results . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 InceptionV3 Training Results . . . . . . . . . . . . . . . . . . . . . . . 7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strategic Framework for ANFIS and BIM Use on Risk Management at Natural Gas Pipeline Project . . . . . . . . . . . . . . . . . . . . ˙Ismail Altunhan, Mehmet Sakin, Ümran Kaya, and M. Fatih AK 8.1 Introductıon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Artificial Neural Networks (ANN) . . . . . . . . . . . . . . . . . . 8.3.2 Structure of Artificial Neural Network . . . . . . . . . . . . . . . 8.3.3 Fuzzy Inference System . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 Adaptive Neuro-Fuzzy Inference System—ANFIS . . . . 8.3.5 What is the Building Information Modelling (BIM) . . . . 8.3.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Predicting Ethereum Price with Machine Learning Algorithms . . . . Mehmet Birhan and Ömür Tosun 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Used Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Data Collecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxi
69
69 70 70 76 76 77 78 80 80 81 85 85 87 87 88 91 91 92 93 94 97 99 99 101 101 102 104 104 106 107 109 111 111
xxii
Contents
10 Data Mining Approachs for Machine Failures: Real Case Study . . . Ümran Kaya 10.1 Introductıon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Re-processing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Classification of People Both Wearing Medical Mask and Safety Helmet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emel Soylu and Tuncay Soylu 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Single Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . 11.2.4 Double Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . 11.3 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Anonymization Methods for Privacy-Preserving Data Publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Burak Cem Kara and Can Eyupoglu 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Big Data Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Data Anonymization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Protection Methods with Anonymization . . . . . . . . . . . . . 12.3.2 Anonymization and Protection Models . . . . . . . . . . . . . . . 12.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Comparison of Existing Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Improving Accuracy of Document Image Classification Through Soft Voting Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semih Sevim, Sevinç ˙Ilhan Omurca, and Ekin Ekinci 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Document Image Classification . . . . . . . . . . . . . . . . . . . . . 13.3.2 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . 13.3.4 Soft Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
113 113 114 115 115 119 124 125 126 129 129 132 132 134 137 137 138 140 145 145 146 147 148 149 152 155 155 157 161 161 162 164 164 165 165 166
Contents
13.4 Experiments and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Improved Performance of Adaptive UKF SLAM with Scaling Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kübra Yalçin, Serhat Karaçam, and Tu˘gba Selcen Navruz 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Adaptive UKF SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Simulation Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Conclusion and Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based on MLE, EM and ICE . . . . . . . . . . . . . . . . . . . . . . . . . . . Serhat Karaçam, Kübra Yalçin, and Tu˘gba Selcen Navruz 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 Extended Kalman Filter (EKF) . . . . . . . . . . . . . . . . . . . . . 15.2.2 Unscented Kalman Filter (UKF) . . . . . . . . . . . . . . . . . . . . 15.2.3 Adaptive Extended Kalman Filter (AEKF) . . . . . . . . . . . 15.2.4 Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2.5 AEKF-SLAM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Simulation Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Artificial Intelligence Based Detection of Estrus in Animals Using Pedometer Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Hakan I¸sık, Seyit Haso˘glu, Ömer Can Eskicio˘glu, and Edin Dolicanin 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1 Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.2 Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.3 Electronic Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.4 Proposed Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Discussion and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxiii
167 167 167 169 170 171 175 175 176 180 182 183 185 185 186 186 188 189 191 191 192 196 197 199
199 200 201 201 202 204 205 207 210 211
xxiv
Contents
17 Enhancing Lexicon Based Sentiment Analysis Using n-gram Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hassan Abdirahman Farah and Arzu Gorgulu Kakisim 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Sentiment Lexicons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.1 Vader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.2 TextBlob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.3 Afinn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.4 SentiWordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.1 Pre-processing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.2 N-gram Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.3 Feature Space Construction . . . . . . . . . . . . . . . . . . . . . . . . 17.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 A Comparison of Word Embedding Models for Turkish . . . . . . . . . . Ahmet Tu˘grul Bayrak, Musa Berat Bahadir, Güven Yücetürk, ˙Ismail Utku Sayan, Melike Demirda˘g, and Sare Melek Yalçinkaya 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.2 Data and Data Preprocessing Steps . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.1 Embedding Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.3.2 Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 The Unfairness of Collaborative Filtering Algorithms’ Bias Towards Blockbuster Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emre Yalcin 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Description of Blockbuster Items . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4 Blockbuster Bias in User Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.4.1 The Propensities of Users for Blockbuster Items . . . . . . . 19.4.2 Profile Size and Blockbuster Bias . . . . . . . . . . . . . . . . . . . 19.5 Different User Groups in Terms of Inclination for Blockbuster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.6 Algorithmic Propagation of Blockbuster Bias . . . . . . . . . . . . . . . . . 19.6.1 Blockbuster Bias in Recommendations for Different User Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 19.7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
213 213 215 215 215 215 216 216 216 217 217 218 220 220 223
223 224 227 227 227 228 229 230 233 233 235 236 236 237 238 240 241 242 244 245
Contents
20 Improved Gradient-Based Optimizer with Dynamic Fitness Distance Balance for Global Optimization Problems . . . . . . . . . . . . . . Durdane Ay¸se Ta¸sci, Hamdi Tolga Kahraman, Mehmet Kati, and Cemal Yilmaz 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.1 GBO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.2 Dynamic Fitness-Distance Balance (dFDB) . . . . . . . . . . 20.2.3 Improved GBO with Dynamic Fitness Distance Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.2 Benchmark Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3.3 Constrained Engineering Design Problems . . . . . . . . . . . 20.4 Analyze Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.4.1 Statistical Analysis Results . . . . . . . . . . . . . . . . . . . . . . . . . 20.4.2 Convergence Analysis Results . . . . . . . . . . . . . . . . . . . . . . 20.4.3 Results for Engineering Design Problems . . . . . . . . . . . . 20.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 TR-SUM: An Automatic Text Summarization Tool for Turkish . . . . Yi˘git Yüksel and Yalçın Çebi 21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Related Studies in Turkish . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Datasets in Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 TR-SUM: A Text Summarization Tool for Turkish . . . . . . . . . . . . 21.3.1 General Overview of “TR-SUM: A Text Summarization Tool for Turkish” . . . . . . . . . . . . . . . . . . . 21.3.2 TR-NEWS-SUM Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3.4 The Proposed Neural Network Models for Turkish Text Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Automatic and Semi-automatic Bladder Volume Detection in Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U˘gur Can Kavuncuba¸si, Görkem Tek, Kayra Acar, Burak Ertosun, and Mehmet Feyzi Ak¸sahin 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxv
247
247 248 248 249 250 252 252 253 253 262 262 263 265 265 268 271 271 272 272 275 276 276 277 278 279 280 282 283 285
285 286 286 286
xxvi
Contents
22.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Effects of Variable UAV Speed on Optimization of Travelling Salesman Problem with Drone (TSP-D) . . . . . . . . . . . . . . . . . . . . . . . . . Enes Cengiz, Cemal Yilmaz, Hamdi Tolga Kahraman, and Ça˘gri Suiçmez 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.3.1 Truck-Drone Algorithm Approach . . . . . . . . . . . . . . . . . . 23.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4.2 Experimental Studies and Results . . . . . . . . . . . . . . . . . . . 23.5 Discussions and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Improved Phasor Particle Swarm Optimization with Fitness Distance Balance for Optimal Power Flow Problem of Hybrid AC/DC Power Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serhat Duman, Hamdi Tolga Kahraman, Busra Korkmaz, Huseyin Bakir, Ugur Guvenc, and Cemal Yilmaz 24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.2 Mathematical Formulation of Optimal Power Flow Problem of Hybrid AC/DC Power Grids . . . . . . . . . . . . . . . . . . . . . 24.2.1 State and Control Variables . . . . . . . . . . . . . . . . . . . . . . . . . 24.2.2 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.2.3 Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3.1 Fitness-Distance Balance Method . . . . . . . . . . . . . . . . . . . 24.3.2 Overview of Phasor Particle Swarm Optimization (PPSO) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.3.3 Proposed FDBPPSO Algorithm . . . . . . . . . . . . . . . . . . . . . 24.4 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5.1 Determining the Best FDBPPSO Variant on CEC 2020 Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.5.2 Application of the Proposed FDBPPSO Method for Optimal Power Flow Problem of Hybrid AC/DC Power Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
287 292 293 293 295
295 297 299 299 300 300 300 304 304
307
307 310 310 311 313 314 314 316 317 318 319 319
321 332 334
Contents
25 Development of an FDB-Based Chimp Optimization Algorithm for Global Optimization and Determination of the Power System Stabilizer Parameters . . . . . . . . . . . . . . . . . . . . . . Huseyin Bakir, Hamdi Tolga Kahraman, Seyithan Temel, Serhat Duman, Ugur Guvenc, and Yusuf Sonmez 25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2 Mathematical Formulation of Power System Stabilizer Parameters Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2.1 Power System Model with PSS Structure . . . . . . . . . . . . . 25.2.2 Objective Functions and Constraints . . . . . . . . . . . . . . . . . 25.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3.1 Fitness-Distance Balance Selection Method . . . . . . . . . . 25.4 Overview of Chimp Optimization Algorithm . . . . . . . . . . . . . . . . . 25.4.1 Proposed FDBChOA Algorithm . . . . . . . . . . . . . . . . . . . . 25.5 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.6 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.6.1 Determining the Best FDBPPSO Variant on CEC 2020 Benchmark Test Suite . . . . . . . . . . . . . . . . . . . . . . . . 25.6.2 Application of the Proposed FDB- Based Chimp Optimization Algorithm for Power System Stabilizer Parameters Optimization . . . . . . . . . . . . . . . . . . 25.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Deep Learning-Based Prediction Model of Fruit Growth Dynamics in Apple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hamit Arma˘gan, Ersin Atay, Xavier Crété, Pierre-Eric Lauri, Mevlüt Ersoy, and Okan Oral 26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . Ça˘grı Suiçmez, Cemal Yılmaz, Hamdi Tolga Kahraman, Enes Cengiz, and Alihan Suiçmez 27.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2 Materials and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2.1 Dataset Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2.2 Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.2.3 Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . 27.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.3.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.3.2 Results and Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27.4 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxvii
337
337 339 340 341 342 342 343 345 347 348 348
350 358 363 367
367 369 371 372 375
375 377 377 377 379 385 385 386 394 397
xxviii
Contents
28 Prediction of Development Types from Release Notes for Automatic Versioning of OSS Projects . . . . . . . . . . . . . . . . . . . . . . . Abdulkadir Seker, ¸ Saliha Ye¸silyurt, ˙Ismail Can Ardahan, and Berfin Çınar 28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.3.2 Pre-processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.3.4 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Design Optimization of Induction Motor with FDB-Based Archimedes Optimization Algorithm for High Power Fan and Pump Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Burak Yenipinar, Ay¸segül Sahin, ¸ Yusuf Sönmez, Cemal Yilmaz, and Hamdi Tolga Kahraman 29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29.2 Mathematical Formulation of Optimization Problem . . . . . . . . . . . 29.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29.3.1 Archimedes Optimization Algorithm . . . . . . . . . . . . . . . . 29.3.2 Archimedes Optimization Algorithm (AOA) with Fitness Distance Balance . . . . . . . . . . . . . . . . . . . . . . 29.4 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29.5.1 Determining the Best FDB-AOA Method on Benchmark Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 29.5.2 Application of the Proposed FDB-AOA Method for Design Optimization of Induction Motor . . . . . . . . . . 29.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Collecting Health Information with LoRa Technology . . . . . . . . . . . . Zinnet Duygu Ak¸sehir, Sedat Akleylek, Erdal Kiliç, Burçe Sirin, ¸ and Korhan Cengiz 30.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.3 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.3.1 LoRa Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.3.2 Node Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.3.3 Server Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.3.4 Client Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.3.5 Mobile Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
399
399 400 400 400 401 402 403 405 406 406
409
409 411 415 416 416 418 418 418 421 426 427 429
429 430 432 432 433 434 435 436
Contents
xxix
30.3.6 Wearable Module Hardware . . . . . . . . . . . . . . . . . . . . . . . . 30.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
436 437 438 438
31 A New Hybrid Method for Indoor Positioning . . . . . . . . . . . . . . . . . . . Zinnet Duygu Ak¸sehir, Sedat Akleylek, Erdal Kılıç, Ceyda Aksaç, and Ali Ghaffari 31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Material and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3.2 Indoor Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
441
32 On the Android Malware Detection System Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Durmu¸s Özkan Sahin, ¸ Bilge Ka˘gan Yazar, Sedat Akleylek, Erdal Kiliç, and Debasis Giri 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.1 Previous Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Motivation and Contribution . . . . . . . . . . . . . . . . . . . . . . . . 32.1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2.1 Used Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2.2 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3.1 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3.2 Converting Static Properties to Images . . . . . . . . . . . . . . . 32.3.3 Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 32.4 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.4.1 Results with Malgenome-215 Dataset . . . . . . . . . . . . . . . . 32.4.2 Results with the Drebin-215 Dataset . . . . . . . . . . . . . . . . . 32.5 Conclusion and Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Poisson Stability in Inertial Neural Networks . . . . . . . . . . . . . . . . . . . . Marat Akhmet, Madina Tleubergenova, Roza Seilova, and Akylbek Zhamanshin 33.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.3 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
441 442 444 444 445 448 449 450 453
453 454 456 456 457 457 457 458 458 459 460 462 462 463 464 465 467
467 468 473 475
xxx
Contents
34 Poisson Stable Dynamics of Hopfield-Type Neural Networks with Generalized Piecewise Constant Argument . . . . . . . . . . . . . . . . . . Marat Akhmet, Duygu Aru˘gaslan Çinçin, Madina Tleubergenova, and Zakhira Nugayeva 34.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 A Business Workflow for Clustering and Decision Making Systems in Tax Audit Industry: A Case Study . . . . . . . . . . . . . . . . . . . . Ipek Akta¸s, Tolgay Kaya, and Mehmet S. Akta¸s 35.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.2.1 Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.3.1 Clustering Algorithm Module Utilizing Container Based Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.3.2 Rule Based Decision Making Module Utilizing Container Based Virtualization . . . . . . . . . . . . . . . . . . . . . 35.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Mask R-CNN Approach for Egg Segmentation and Egg Fertility Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kerim Kür¸sat Çevik, Hasan Erdinç Koçer, Mustafa Bo˘ga, and Salih Mervan Ta¸s 36.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Optimizing the Hedging Rules for the Dam Reservoir Operations by Meta-Heuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . . Umut Okkan, Zeynep Beril Ersoy, and Okan Fistikoglu 37.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37.2 Study Region and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37.3.1 Hedging Models Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37.3.2 Model Calibrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
477
477 478 479 481 483 483 485 485 486 486 488 488 490 493 493 495
495 497 498 500 502 506 507 511 511 513 514 514 516 517
Contents
xxxi
37.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 38 Next Word Prediction with Deep Learning Models . . . . . . . . . . . . . . . Abdullah Atçılı, Osman Özkaraca, Güncel Sarıman, and Bogdan Patrut 38.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.3.2 NLP with Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 38.3.3 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.4.1 RNN-GRU Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.4.2 LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.4.3 Human Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Cooperative Multi-agent Reinforcement Learning for Autonomous Cars Passing on Narrow Road . . . . . . . . . . . . . . . . . . Mustafa Sehriyaro˘ ¸ glu and Yakup Genç 39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3.2 Reward Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3.3 Agent Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.3.4 Curriculum Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Oscillations in Recurrent Neural Networks with Structured and Variable Impulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marat U. Akhmet, Gülbahar Erim, and Madina Tleubergenova 40.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40.1.1 The Structure of the Model . . . . . . . . . . . . . . . . . . . . . . . . . 40.1.2 Basic Conditions of the Research . . . . . . . . . . . . . . . . . . . 40.2 Almost Periodic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40.3 Periodic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
523
523 524 526 526 526 528 528 528 529 529 530 530 533 533 534 535 535 536 536 537 537 539 540 541 541 542 544 545 547 548 548
xxxii
Contents
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with LDA and Creating a New Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . Çilem Koçak, Tuncay Yi˘git, J. Anitha, and Aida Mustafayeva 41.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.1 The Problems Posed by Tweeting . . . . . . . . . . . . . . . . . . . 41.2.2 Studies Conducted with Artificial ˙Intelligence Conducted on Twitter in the Literature . . . . . . . . . . . . . . . 41.2.3 Studies Conducted with Artificial ˙Intelligence Conducted on Twitter in the Literature . . . . . . . . . . . . . . . 41.2.4 The Process of Natural Language Processing . . . . . . . . . 41.3 Material-Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.2 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.3 Creating a DATASET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.4 Analysis and Classification . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.5 Topical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Research Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4.1 Bi-gram, Tri-gram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Conclusion and Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Hopfield-Type Neural Networks with Poincaré Chaos . . . . . . . . . . . . . Marat Akhmet, Duygu Aru˘gaslan Çinçin, Madina Tleubergenova, Roza Seilova, and Zakhira Nugayeva 42.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Face Expression Recognition Using Deep Learning and Cloud Computing Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hilal Hazel Cumhuriyet, Volkan Uslan, Ersin Yava¸s, and Huseyin Seker 43.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3.1 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.3.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . 43.3.3 Cloud Computing Services . . . . . . . . . . . . . . . . . . . . . . . . . 43.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
551 551 552 552 553 553 554 555 555 556 557 559 560 561 561 564 564 567
567 568 569 570 573 575
575 576 577 577 577 578 579 583 583
Contents
44 Common AI-Based Methods Used in Blood Glucose Estimation with PPG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ömer Pekta¸s and Murat Köseo˘glu 44.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.2 AI-Based Non-invasive BGL Methods . . . . . . . . . . . . . . . . . . . . . . . 44.2.1 Pulse Based Cepstral Coefficients . . . . . . . . . . . . . . . . . . . 44.2.2 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . 44.2.3 Decision Tree (DT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.2.4 Random Forest Regression (RFR) . . . . . . . . . . . . . . . . . . . 44.2.5 K-Nearest Neighbor (KNN) . . . . . . . . . . . . . . . . . . . . . . . . 44.2.6 Artificial Neural Network (ANN) . . . . . . . . . . . . . . . . . . . 44.2.7 Naïve Bayes (NB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44.3 Conclusion and Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Capturing Reward Functions for Autonomous Driving: Smooth Feedbacks, Random Explorations and Explanation-Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Cemil Güney and Yakup Genç 45.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.4.1 The Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.4.2 The Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Unpredictable Solutions of a Scalar Differential Equation with Generalized Piecewise Constant Argument of Retarded and Advanced Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marat Akhmet, Duygu Aru˘gaslan Çinçin, Zakhira Nugayeva, and Madina Tleubergenova 46.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46.3 Results on Unpredictable Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 46.4 Example with a Numerical Simulation . . . . . . . . . . . . . . . . . . . . . . 46.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Classification of Naval Ships with Deep Learning . . . . . . . . . . . . . . . . . Onurhan Çelik and Aydın Çetin 47.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxxiii
585 585 587 587 587 588 588 589 589 590 590 591
593 593 594 596 597 598 599 600 601 602
603
603 604 605 616 617 618 621 621 622 623
xxxiv
Contents
47.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 47.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 48 Investigation of Mass-Spring Systems Subject to Generalized Piecewise Constant Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marat Akhmet, Duygu Aru˘gaslan Çinçin, Zekeriya Özkan, and Madina Tleubergenova 48.1 Introduction and Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48.2 Dynamics of Mass-Spring Systems Subject to Generalized Piecewise Constant Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48.2.1 Undamped Spring-Mass System . . . . . . . . . . . . . . . . . . . . 48.2.2 Damped Spring-Mass System . . . . . . . . . . . . . . . . . . . . . . 48.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Classification of High Resolution Melting Curves Using Recurrence Quantification Analysis and Data Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fatma Ozge Ozkok and Mete Celik 49.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.2.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Machine Learning Based Cigarette Butt Detection Using YOLO Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hasan Ender Yazici and Taner Dani¸sman 50.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.3.1 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.3.2 Convolutional Neural Network (CNN) . . . . . . . . . . . . . . . 50.3.3 You Only Look Once (YOLO) . . . . . . . . . . . . . . . . . . . . . . 50.3.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
629
629 632 632 635 639 639
641 641 642 642 643 645 646 648 648 651 651 652 653 653 653 654 655 658 660 660
Contents
51 Securing and Processing Biometric Data with Homomorphic Encryption for Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdulrahim Mohamed Ibrahim and Alper Ozpinar 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.1 Biometric Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.2 Homomorphic Encryption . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.3 Overview of SEAL and TENSEAL . . . . . . . . . . . . . . . . . . 51.4 Proposed Methodology and Algorithm . . . . . . . . . . . . . . . . . . . . . . 51.4.1 Experimental Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Automatic Transferring Data from the Signed Attendance Papers to the Digital Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sefa Çetinkol, Ali Sentürk, ¸ and Yusuf Sönmez 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Computer Vision Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Canny Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Morphological Tranformations . . . . . . . . . . . . . . . . . . . . . 52.2.3 Shape Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.1 Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Rectified Linear Unit (ReLU) Layer . . . . . . . . . . . . . . . . . 52.3.3 Max-Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.4 Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Proposed Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Boarding Pattern Classification with Time Series Clustering . . . . . . Kamer Özgün, Baris Doruk Ba¸saran, Melih Günay, and Joseph Ledet 53.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xxxv
663 663 663 664 664 665 666 666 667 668 670 670 673 673 674 675 675 676 677 677 677 677 678 679 683 688 688 691
691 693 694 698 699
xxxvi
Contents
54 Shipment Consolidation Practice Using Matlog and Large-Scale Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael G. Kay, Kenan Karagul, Yusuf Sahin, and Erdal Aydemir 54.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3 Shipment Consolidation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3.1 TL Transport Charge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.3.2 Total Logistics Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.5 Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.5.1 The Second Phase: Determination of Consolidated Shipments and the Shipment Routes . . . . . . . . . . . . . . . . . 54.5.2 Computational Experiments with Large-Scale Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.7 Appendix 1 Some of the Solution Graphics for Aegean Town Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.8 Appendix 2 Some of the Solution Graphics for Turkey Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 The Imminent but Slow Revolution of Artificial Intelligence in Soft Sciences: Focus on Management Science . . . . . . . . . . . . . . . . . . Samia Chehbi Gamoura, Halil ˙Ibrahim Koruca, and Ceren Arslan Kazan 55.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.2 Background and Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.2.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.2.2 Soft Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.3 Problem Position and Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . 55.4 Proposed Approach and Methodology . . . . . . . . . . . . . . . . . . . . . . . 55.4.1 Investigating the Contrasted Investments on AI Research in Management Science: Visual Analysis . . . . 55.4.2 Investigating AI Research in the Sub-fields of Management Science: Quantitative Analysis . . . . . . . 55.4.3 Investigating AI Impacts on Management Research: Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . 55.5 Discussion and Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55.6 Conclusion, Limitations, and Perspectives . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
701 701 703 704 704 705 705 706 708 709 714 714 715 716 719
719 720 720 721 723 724 724 725 729 730 730 731
56 Multi-criteria Decision-Making for Supplier Selection Using Performance Metrics and AHP Software. A Literature Review . . . . 735 Elisa Marlen Torres-Sanchez, Jania Astrid Saucedo-Martinez, Jose Antonio Marmolejo-Saucedo, and Roman Rodriguez-Aguilar 56.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Contents
56.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56.2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56.3.1 Criteria for Selecting a Supplier . . . . . . . . . . . . . . . . . . . . . 56.3.2 Tools for the Selecting Process . . . . . . . . . . . . . . . . . . . . . . 56.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 PID Controller and Intelligent Control for Renewable Energy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pedro Domínguez Alva, Jose Antonio Marmolejo Saucedo, and Roman Rodriguez-Aguilar 57.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.2 Brief History of PID Controllers and Their Operation . . . . . . . . . . 57.3 Application of the PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.4 Importance of Renewable Energy Systems . . . . . . . . . . . . . . . . . . . 57.4.1 Methods of Obtaining Renewable Energies . . . . . . . . . . . 57.5 How Can We Link PID Controllers with Renewable Energy Systems? and How Would This Benefit Us? . . . . . . . . . . . 57.5.1 Solar Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.5.2 Hydroelectric Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.5.3 Wind Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.6 Conclusion and Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Machine Learning Applications in the Supply Chain, a Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Walter Rosenberg-Vitorica, Tomas Eloy Salais-Fierro, Jose Antonio Marmolejo-Saucedo, and Roman Rodriguez-Aguilar 58.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.4 Review Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Machine Learning Applications for Demand Driven in Supply Chain: Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eric Octavio Mayoral Garinian, Tomas Eloy Salais Fierro, José Antonio Marmolejo Saucedo, and Roman Rodriguez Aguilar 59.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.2 Literature Review (LR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.2.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.2.2 Demand Driven and Its Role in the Supply Chain and Operations Management . . . . . . . . . . . . . . . . . . . . . . .
xxxvii
736 737 740 740 740 741 742 745
745 746 748 749 750 750 750 751 751 751 752 753
753 754 757 758 758 760 763
763 764 764 765
xxxviii
Contents
59.2.3 Machine Learning (ML), Tools, Techniques, and Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.2.4 Implementation and Results of Machine Learning Cases and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.3 Methodology for the Literature Review . . . . . . . . . . . . . . . . . . . . . . 59.3.1 LR Searching Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.3.2 LR Selecting Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.3.3 LR Analyzing Phase 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.4 Conclusions and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 59.4.1 Declaration of Competing Interests . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) and Fault Prediction with Real-Time Condition Monitoring in Manufacturing 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ceren Arslan Kazan, Halil ˙Ibrahim Koruca, and Samia Chehbi Gamoura 60.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.3 Method and Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.3.1 Failure Mode Effects Analysis (FMEA) Method . . . . . . . 60.3.2 Programmable Logic Controller (PLC) . . . . . . . . . . . . . . . 60.3.3 Kitchen Equipments Manufacturing Company . . . . . . . . 60.3.4 System Integration: PLC, ERP, C# with WinProLadder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.4 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
766 766 767 767 767 769 769 770 770
773
773 775 778 779 781 781 781 783 787 789
About the Conference
Web: http://www.icaiame.com e-mail: [email protected] ICAIAME 2021 is meeting with the latest requirements of the Academic Promoting Programme applied in Turkey. The 3rd International Conference on Artificial Intelligence and Applied Mathematics in Engineering (ICAIAME 2021) will be held within 1–2–3 October 2021 at the Belek, Antalya (Turkey), which is the pearl of the Mediterranean, heaven corner of Turkey and the fourth most visited city in the world. The main theme of the conference, which will be held at Innvista Hotel with international participations along a three-day period, is solutions of Artificial Intelligence and Applied Mathematics in engineering applications. The languages of the ICAIAME 2021 are English and Turkish. As it was done in 2019 and 2020, it is planned to publish English full-text studies accepted-presented within the scope of the conference, in the Springer Lecture Notes on Data Engineering and Communications Technologies Series.
Scope/Topics Conference Scope/Topics (as not limited to) In Engineering Problems: . Machine Learning Applications
xxxix
xl
. . . . . . . . . . . . . . . . . . . . . . .
About the Conference
Deep Learning Applications Intelligent Optimization Solutions Robotics/Softrobotics and Control Applications Hybrid System-Based Solutions Algorithm Design for Intelligent Solutions Image/Signal Processing Supported Intelligent Solutions Data Processing-Oriented Intelligent Solutions Prediction and Diagnosis Applications Linear Algebra and Applications Numerical Analysis Differential Equations and Applications Probability and Statistics Operations Research and Optimization Discrete Mathematics and Control Nonlinear Dynamical Systems and Chaos General Engineering Applications General Topology Number Theory Algebra Analysis Applied Mathematics and Approximation Theory Mathematical Modelling and Optimization Graph Theory Kinematics
About the Conference
Conference Posters
xli
Chapter 1
Implementation of Basic Math Processing Skills with Neural Arithmetic Expressions in One and Two Stage Numbers Remzi Gürfidan, Mevlüt Ersoy, D. Jude Hemanth, and Elmira Israfilova
1.1 Introduction Studies on artificial intelligence (AI) technologies have started in the 1950s and continued until today, and nowadays it is a very popular field of study. The basis of artificial intelligence lies in imitating the human brain and giving machines the ability to learn and make decisions. Alan Turing has carried out studies on whether machines and computers can have the ability to think with the Turing Test he developed. The aim of the studies is whether it is logically possible to say that a machine can think [1]. With artificial intelligence technology, the way for machines to imitate human intelligence, even partially, has been opened. As the concept of artificial intelligence deepens, it can be divided into narrow artificial intelligence and general artificial intelligence. The narrow concept of artificial intelligence can be said as a model trained on a specific subject to perform better than or close to human [2]. Narrow AI does not have the qualities of reasoning or understanding in the background. It establishes R. Gürfidan (B) Department of Computer Technologies, Isparta University of Applied Sciences, Isparta, Turkey e-mail: [email protected] M. Ersoy Department of Computer Engineering, Süleyman Demirel University, Isparta, Turkey e-mail: [email protected] D. Jude Hemanth Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] E. Israfilova Department of Information Technologies, Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_1
1
2
1 Implementation of Basic Math Processing Skills with Neural …
a cause-effect relationship by focusing only on certain features. For this reason, narrow artificial intelligence has many disadvantages. The two most important of these disadvantages are; A large amount of data is needed for a correct education and the model developed for a problem solution cannot be used for a different problem area. The general concept of artificial intelligence, on the other hand, defines models that can establish a cause-effect relationship much more similar to human intelligence and have logic and comprehension abilities in different problems. The transition from narrow artificial intelligence applications to general artificial intelligence applications is accelerating thanks to study areas such as natural language processing, computer vision, neural arithmetic expressions. In this study, the calculation of neural arithmetic expressions with deep learning algorithm has been studied. As neural arithmetic expressions, target operations are restricted to addition, subtraction, multiplication, division, and exponentiation. The mathematical values to be studied were determined as one-digit and two-digit numbers. Addition, subtraction, multiplication, division and exponentiation operations were performed on one-digit numbers. Addition, subtraction, multiplication and division operations were performed on two-digit numbers. With this study, it is aimed to get results by solving mathematical operations that will form the basis of advanced mathematical operations by artificial intelligence.
1.1.1 Neural Arithmetic Expressions and Logic Units The nonlinear activation functions of the hidden layers in the artificial neural network (ANN) architecture are the main factor in the failure of the estimation of the numerical operation results. This is because of the abstract and non-linear signification link of activation functions in exploring the relationship between model inputs and labels. In studies carried out with this method, learning and estimating the numerical values in the training set cannot be beyond memorization. To overcome this difficulty, instead of soft activation functions such as PReLu and ELU, the use of hard and precisevalued activation functions such as Tanh or Sigmoid will yield more successful results [3]. The development of the neural accumulator (NAC) formed the basis of the neural arithmetic logic unit (NALU) model. NAC is a simple but effective neural network model that supports the ability to learn addition and subtraction to learn linear functions. The block diagram of the network structure of the aggregation process is shown in Fig. 1.1. When W1 weight is set to 1, W2 weight −1, W3 weight 0, the difference between X1 input value and X2 input value can be taken. The block diagram of the network structure of the subtraction is shown in Fig. 1.2. When the W1,2,3 coefficients are determined as 1, the sum of the input values is obtained. As shown in the ANN layers above, the network can learn to estimate simple arithmetic functions such as addition and subtraction by limiting the weight parameters to −1, 0, and 1 [3].
1.1 Introduction
3
Fig. 1.1 Block diagram of the network structure of the subtraction process
Fig. 1.2 Block diagram of the network structure of the aggregation operation
1.1.2 Long Short-Term Memory Algorithm In the architecture of the LSTM algorithm, there are three gates as input, forget and output, block input, a single cell, an output activation function and surveillance connections. The output of the block is repeatedly connected to the input of the block and all its gates. Although LSTM architectures give successful results in speech and text processing, they are generally preferred for classification processes [4, 5]. The block diagram of the LSTM algorithm is shown in Fig. 1.3. The mathematical formulas for calculating the entrance gate, forget gate, exit gate, new memory cell and last memory cell of the LSTM algorithm are shown in [6, 7] Eqs. 1.1–1.6; ˙It = σ Wi xt + Vi + ht−1 + bi , (Input Gate)
(1.1)
ft = σ Wf xt + Vf + ht−1 + bf , (Forget Gate)
(1.2)
ot = σ Wf xt + Vo + ht−1 + bo , (Output Gate)
(1.3)
c˜(t) = tanh Wc xt + Vo + U(c) h(t−1) , (New Memory Cell)
(1.4)
ct = ft ct−1 + it tanh Wc xt + Vc + ht−1 + bc , (Last Memory Cell) (1.5) ht = ot tanh ct
4
1 Implementation of Basic Math Processing Skills with Neural …
Fig. 1.3 LSTM network structure [6]
Loss =
n 1 2 (Y i − Y i ) (Loss value) n i=1
(1.6)
The purpose of using the LTSM algorithm in this study is that the algorithm can remember the previous data in the data set and has forget gates. The main reason behind our preference is that the next input between nodes is dependent on the previous output. Considering addition, subtraction, division, multiplication, exponentiation, we can talk about a mathematical interaction between two numbers. The gates of forgetting eliminate the negative effects of learning by destroying the wrong ties. We hope that this feature will increase the success of the developed model.
1.2 Related Work When similar studies are examined to date, it has been determined that solving the complex expression calculation problem involving addition, subtraction, multiplication, division and bracketing is still a difficult problem. In the studies carried out, arithmetic expression calculation was tried to be solved as a hierarchical reinforcement learning problem. Specifically, success has been enhanced by using the Multi-Level Hierarchical Reinforcement Learning (MHRL) framework to factor a complex arithmetic operation in a few simple operations [8].
1.3 Proposed Method
5
Another study is that the problem situation lacks inductive bias for arithmetic operations, leaving neural networks without the basic logic needed to make predictions on tasks such as addition, subtraction, and multiplication. Two new neural network components in the study: the Neural Addition Unit (NAU), which can learn complete addition and subtraction; and the Neural Multiplication Unit (NMU), which can multiply subsets of a vector. It has been found that careful initialization, limiting the parameter space, and regulating for sparsity are important when optimizing NAU and NMU. The proposed units NAU and NMU combine more consistently, have fewer parameters, learn faster, combine for larger hidden dimensions, and obtain sparse and meaningful weights compared to previous neural units [9]. Zaremba et al. they preferred reinforcement learning to teach the model to multiply single-digit numbers and add multi-digit numbers to the model [10]. In their study, Saxton et al. compare a mathematical problem solver with sequential questions and answers in a free-form textual input–output format. Associative memory core, also known as associative recurrent neural network and LSTM algorithm, presented a comprehensive analysis of the models. As a result of the study, it turns out that they need to modify the recurrent associative neural network in order to achieve the success of the LSTM algorithm [11]. The neural GPU shows very successful results in arithmetic learning [12]. Price et al. [13] and Freivalds and Liepins [14] developed the Neural GPU to perform multi-digit multiplication with curriculum learning. However, a successful attempt to learn division or expression calculation has not yet been made [8].
1.3 Proposed Method The aim of the study is to successfully predict the results of basic mathematical operations by an artificial intelligence model to be trained. This process has been tried to be realized with neural arithmetic expressions. As a neural arithmetic expression, target mathematical operations are restricted to addition, subtraction, multiplication, division and exponentiation. The data samples to be studied were determined as one-digit and two-digit numbers. In order to carry out LSTM training, the data set is produced instantaneously. The general template of the code prepared for this process is shown in Table 1.1. The flowchart of the developed model is shown in Fig. 1.4. Before starting the training, it is necessary to choose how many digits will be processed. If the selection is single digit, 1000 training data are produced as shown in Table 1.1. If the selection is two-digit, 50,000 training data are produced according to Table 1.1. According to the number of digits chosen, 80 repetitive training is carried out if it is a single digit, and 100 repetitive training is carried out if it is two digits. The data set required for learning is generated instantaneously and randomly. The data set required to learn one-step operations was created from 1000 data and subjected to 100 repetitive learning. The data set required to learn two-step operations was created from 50,000 data and subjected to 80 iterative learning.
6 Table 1.1 Generating data for LSTM model training
1 Implementation of Basic Math Processing Skills with Neural … function CreateDataset (num_examples):
Inputs = {0,1,2,3,4,5,6,7,8,9}
1: x_train = np.zeros((num_examples, max_time_steps, num_features)) 2: y_train = np.zeros((num_examples, max_time_steps, num_features)) 3: for i range (num_examples): 4: e,l = Generate_data(); 5: x,y = vectorize_example(e, l); 6: x_train[i] = x 7: y_train[i] = y 8: return x_train, y_train 9: function Generate data () 10: first_num = np.randint(min_val, min_val) 11: second num = np.randint(min_val, min_val) 12: operation_type = np.randint(min_val, min_val) 13: if (operation_type) 14:
return result = first_num {+, -, *, /, ^} second_num
Fig. 1.4 Proposed model algorithm
1.4 Experimental Findings In this section, learning samples were restricted between 0 and 9 and were performed in five different types of operations. Due to the small number of data samples, the result space to be reached is limited. This situation caused the model to learn fast and the accuracy rate to reach very high results. The data set required to learn onestep operations was created from 1000 rows and subjected to 100 repetitive learning. The success rate of the operation performed in one-digit numbers was calculated as
1.4 Experimental Findings
7
Fig. 1.5 One-digit numbers loss values graphs
99.9%. The loss values of mathematical operations performed on one-digit numbers according to the training number are shown in Fig. 1.5. The accuracy values of mathematical operations performed on one-digit numbers according to the training number are shown in Fig. 1.6. In this section, learning samples were limited to 9–99 and were performed in four different types of operations. The data set required to learn two-step operations was created from 50,000 data and subjected to 80 iterative learning. The success rate in operations performed with two-digit numbers was calculated as 95.6%. The increase in the number of data samples compared to the one-digit number sample has expanded the result space to be reached. This situation decreased the learning speed and accuracy rate of the model. The loss values of the mathematical operations performed on two-digit numbers according to the training number are shown in Fig. 1.7. The accuracy values of the mathematical operations performed on two-digit numbers according to the training number are shown in and Fig. 1.8. We see that the learning parameter range constraints are very important. We have determined on the graphs that as the learning parameter range values increase, the learning will be confused and its success will decrease. A similar problem is stated in the literature section [9] in the studies of Madsen and Johansen. For this reason, model training was carried out by limiting the parameter ranges to the number of steps in the realization of mathematical operations in the study. The results of randomly selected mathematical operations with randomly chosen numbers required for us to see the test results are shown in Table 1.2.
8
1 Implementation of Basic Math Processing Skills with Neural …
Fig. 1.6 One-digit numbers accuracy values graphs
Fig. 1.7 Two-digit numbers loss values graphs
1.4 Experimental Findings
9
Fig. 1.8 Two-digit numbers accuracy values graphs Table 1.2 Some sample operations and results from the test results
Test row
Input
Output
Prediction
1
24/87
.28
.28
2
49 * 78
3822
3822
3
44 − 51
−7
−7
4
95/86
1.1
1.1
5
42 − 82
−40
−40
6
25/60
.42
.42
7
70 + 65
135
135
8
52 + 81
133
133
9
18 − 92
−74
−74
10
66 + 32
98
98
11
51/63
.81
.81
12
72/62
1.16
1.16
13
65/9
7.22
7.22
14
36/18
2
2
15
49 * 84
4116
4116
16
89 + 34
123
123
17
40/15
2.67
22
18
57 − 14
43
43
19
36/7
5.14
5.14
20
55/4
13.75
13.75
21
45 − 8
37
37
10
1 Implementation of Basic Math Processing Skills with Neural …
1.5 Conclusions In the study, experimental results were obtained by teaching addition, subtraction, multiplication, division and exponentiation operations in one-digit numbers, and addition, subtraction, multiplication, and division operations in two-digit numbers to the developed model by using neural arithmetic expressions. The data set required for learning is instantaneous and randomly generated. The data set required to learn one-step operations was created from 1000 data and subjected to 100 repetitive learning. The data set required to learn two-step operations was created from 50,000 data and subjected to 80 iterative learning. A success rate of 99.9% was achieved in operations performed with one-digit numbers, and 95.6% in operations performed with two-digit numbers. During the study, the desired success could not be achieved in terms of exponentiation in two-digit numbers and teaching basic mathematical operations in three-digit numbers. In the next study, it is aimed to train the model with different algorithms to gain the ability to perform exponentiation operations on two-digit numbers and basic operations on three-digit numbers.
References 1. Moor J (ed) (2003) The Turing test: the elusive standard of artificial intelligence, vol 30. Springer Science & Business Media 2. Pennachin C, Goertzel B (2007) Contemporary approaches to artificial general intelligence. In: Artificial general intelligence. Springer, Berlin, Heidelberg, pp 1–30 3. Trask A, Hill F, Reed S, Rae J, Dyer C, Blunsom P.(2018) Neural arithmetic logic units. arXiv preprint arXiv:1808.00508 4. Zhou C, Sun C, Liu Z, Lau F (2015) A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630 5. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5–6):602–610 6. Yuan X, Li L, Wang Y (2019) Nonlinear dynamic soft sensor modeling with supervised long short-term memory network. IEEE Trans Industr Inf 16(5):3168–3176 7. Tong W, Li L, Zhou X, Hamilton A, Zhang K (2019) Deep learning PM 2.5 concentrations with bidirectional LSTM RNN. Air Qual Atmos Health 12(4):411–423 8. Chen K, Dong Y, Qiu X, Chen Z (2018) Neural arithmetic expression calculator. arXiv preprint arXiv:1809.08590 9. Madsen A, Johansen AR (2020) Neural arithmetic units. arXiv preprint arXiv:2001.05016 10. Zaremba W, Mikolov T, Joulin A, Fergus R (2016) Learning simple algorithms from examples. In: ICML 11. Saxton D, Grefenstette E, Hill F, Kohli P (2019) Analysing mathematical reasoning abilities of neural models. arXiv preprint arXiv:1904.01557 12. Kaiser Ł, Sutskever I (2015) Neural GPUs learn algorithms. arXiv preprint arXiv:1511.08228 13. Price E, Zaremba W, Sutskever I (2016) Extensions and limitations of the neural gpu. CoRR, abs/1611.00736 14. Freivalds K, Liepins R (2017) Improving the neural GPU architecture for algorithm learning. arXiv preprint arXiv:1702.08727
Chapter 2
An Example Application for Early Diagnosis of Retinal Diseases Using Deep Learning Methods Bekir Aksoy, Fatmanur Ate¸s, Osamah Khaled Musleh Salman, Hamit Arma˘gan, Emre Soyaltin, and Ender Özcan
2.1 Introduction The retina is a layer of the eye that contains light and color-sensitive visual cells and nerve fibers. It allows the light to be transmitted to the brain as an image and thus to see the objects. One of the essential areas in the retina is the macula area. The macula is an area on the retina that helps the eye see in detail. The macula can be damaged by diseases such as age-related macular degeneration (AMD) or diabetic macular edema (DME) [1, 2]. AMD is basically divided into two subgroups as dry type and wet type. Dry-type AMD is caused by the accumulation of extracellular fluid under the retinal pigment epithelium and the increase of these deposits, called drusen, over B. Aksoy (B) · O. K. M. Salman · E. Soyaltin Faculty of Technology, Mechatronics Engineering, Isparta University of Applied Sciences, 32200 Isparta, Turkey e-mail: [email protected] O. K. M. Salman e-mail: [email protected] F. Ate¸s Faculty of Technology, Electrical and Electronics Engineering, Isparta University of Applied Sciences, 32200 Isparta, Turkey e-mail: [email protected] H. Arma˘gan Süleyman Demirel University, Süleyman Demirel University Rectorate, 32200 Isparta, Turkey e-mail: [email protected] E. Özcan Computer Science and Operational Research with the COL Lab, University of Nottingham, Nottingham, UK e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_2
11
12
2 An Example Application for Early Diagnosis …
time; Wet AMD is caused by the growth of abnormal and blood-leaking vessels under the retina and new vessel formations, namely choroidal neovascularization (CNV) [2, 3]. Diabetic macular edema (DME) occurs as a result of the accumulation of extracellular fluids that disrupt the structure of the retina layer with increased permeability of blood vessels and is the main cause of decreased vision in people with diabetes [2, 4]. Since vision loss will occur in advanced stages of diseases, early diagnosis of diseases and initiation of treatment are very important in order to prevent vision loss [1]. Optical coherence tomography (OCT) imaging is one of the essential imaging methods used in the early diagnosis of diseases in the macula. OCT is a widely used imaging technique in the world. Firstly, Huang et al. It was developed in 1991 [5, 6]. OCT is an imaging method that allows to measure the thickness of the retinal nerve fiber layer and allows to obtain high-resolution tomographic cross-section images [7]. It is widely used in the diagnosis of retinal diseases such as AMD and DME [1, 8]. Technological innovations are used in the imaging of the retina, and different methods are used in diagnosing the diseases that occur in the retina through images. Among these methods, diagnosis of disease by using artificial intelligence methods on images obtained from OCT has gained popularity today. In addition to methods that require preprocessing and feature extraction, such as support vector machines [9, 10], random forest algorithm [9, 11], deep learning methods are also used [12]. The study, it is aimed to diagnose these diseases in the macula of the eye by using standard retinal images taken from the open-access website (Kaggle) and retinal images of drusen, DME, CNV diseases by using deep learning architectures of ResNet-152, HitNet, Efficient-B0, Efficient-B7 [13, 14]. In addition, the data set was applied to the community model created under the name of RDV-Net as an alternative to these architectures. RDV-Net was first tested on Mnist and Cifar-10 datasets and then applied on the dataset with retinal images. The remainder of the article is organized as follows. In the second part, the materials and methods used in the diagnosis of the disease from retinal images are given. In the third part, research findings are presented. The fourth section contains the discussion section. In the fifth section, the results are given.
2.2 Material and Method 2.2.1 Material The materials used in the study are given titles.
2.2 Material and Method
13
Fig. 2.1 OCT retinal images a DME b CNV c drusen d normal
2.2.1.1
Dataset
In the study, retinal OCT images taken from the open-source website were used as a dataset [13, 14]. Images were obtained from retrospective cohorts of adult patients from July 1, 2013, to March 1, 2017, from different centers such as the University of California San Diego’s Shiley Eye Institute, California Retinal Research Foundation. A data set consisting of four classes: CNV, DME, drusen, and normal retinal images; is divided into three as training, prediction, and testing. The training data consists of 37,205 CNVs, 11,348 DMEs, 8616 drusen, and 26,315 normal retinal images. The prediction data consists of 8 CNV, 8 DME, 8 drusen, and 8 normal retinal images. Test data were created using 242 CNV, 242 DME, 242 drusen, and 242 standard retinal images. An example retinal image of each disease from the data set is given in Fig. 2.1.
2.2.1.2
Deep Learning Architectures
Deep learning architectures ResNet-152, HitNet, Efficient-B0, Efficient-B7, and RDV-Net were used in the study.
14
2 An Example Application for Early Diagnosis …
ResNet (Residual Nets) He et al. It is a deep learning architecture proposed in 2015. The model was used to classify the ImageNet dataset, and the test error rate was obtained as low as 3.57%. The 3.57% test error result obtained brought first place in ILSVRC 2015 and COCO 2015 competitions [15]. In Fig. 2.2, the connection example used in the ResNet architecture is given.
HitNet Deliege et al. It is a model created from capsule networks in 2018 by [16]. Capsule networks (CapsNet); It is a deep learning model presented as an alternative to solving problems in object detection in CNN architectures [17]. HitNet architecture, on the other hand, is based on CapsNet and includes an HoM (Hit-or-Miss) layer built on CapsNet using feature maps [16]. Each vector in this layer corresponds to a single class, and the closest HoM capsule vector to the target is the predicted class [16]. Deliege et al. The basic structure of the hitnet architecture developed by all is shown in Fig 2.3.
Fig. 2.2 Example part of ResNet connection structure
Fig. 2.3 HitNet architecture
2.2 Material and Method
15
Fig. 2.4 The scaling structure proposed by Tan and Le a basic b width scaling c depth scaling d resolution scaling e composite scaling
EfficientNet EfficientNet architecture by Tan and Le; is a deep learning architecture that has been put forward with the thought that scaling is essential in increasing the number of layers. In this architecture, instead of manual scaling used to increase the performance of the model in deep learning architectures such as ResNet, a new method has been proposed that equally scales the depth/width and resolution dimensions. The mathematical expressions for the depth/width and resolution of the EfficientNet deep learning model are given in Eq. 2.1 [18]. Depth: d = α φ Width: ω = β φ Resolution: r = γ φ α.β 2 .γ 2 ≈ 2 α ≥ 1, β ≥ 1, γ ≥ 1
(2.1)
Here; α, β, γ are constants, φ is a user-defined coefficient. The EfficientNet scaling structure is given in Fig. 2.4 [18].
DenseNet DenseNet architecture Huang et al. by; It has been developed in order to use the feature information in the image more effectively in the Figure. While there are L connections for L layers in traditional network structures, there are L(L + 1)/2 connections for L layers in the DenseNet network structure. Here, each layer is
16
2 An Example Application for Early Diagnosis …
Fig. 2.5 DenseNet architecture structure [19]
feed-forward connected to the other layers. Thanks to this structure, each layer has information about the previous layer. DenseNet model can have 121, 169, 201, and 264 layers. The connection structure of the DenseNet model is as shown in Fig. 2.5 [19].
RDV-Net In this study, a community model named RDV-Net was created. In the model, the dataset is divided into sub-datasets. The first sub-dataset was trained with the ResNet152 model, the second sub-dataset was trained with DenseNet-121, and the third sub-dataset was trained with the Vgg-19 deep learning architecture. In this collective model, when the test data is wanted to be classified, image classification is made according to the majority classification of the three models trained. The RDV-Net structure is as given in Fig. 2.6.
2.2.2 Method In Fig. 2.7, the workflow diagram of the study is given. In the first stage, the retina’s training, prediction, and test data were obtained from the open-access website. In the
2.2 Material and Method
17
Fig. 2.6 RDV-Net architectural structure
second stage, data augmentation was made using methods such as image enlargement, reduction, and image rotation to eliminate the over-learning problem. In the third stage of the study, the data set was classified by training with ResNet-152, HitNet, Efficient-B0, Efficient-B7, and RDV-Net deep learning architectures. The model obtained as a result of the classification was evaluated using the test and prediction data set.
Fig. 2.7 Classification process flow chart
18
2 An Example Application for Early Diagnosis …
2.3 Research Findings The results obtained from Resnet-152, HitNet, Efficient-B0, Efficient-B7, and RDVNet deep learning architectures performed on OCT retinal images in the study are given in Table 2.1. When the table is examined, the prediction accuracy of the four architectures showed success over 90%. The RDV-Net architecture has the highest test accuracy of 99.59% on the dataset among these four architectures. In the other phase of the study, complexity matrix results of Resnet-152, HitNet, Efficient-B0, Efficient-B7, and RDV-Net deep learning architectures used in the study are given. Results for Resnet-152 architecture are in Table 2.2, HitNet architecture results are in Table 2.3, Efficient-B0 architecture results are in Table 2.4, EfficientB7 architecture results are in Table 2.5. Rdv-net architecture results in It is given in Table 2.6. When the tables are examined, the ResNet-152 architecture is included in a total of 242 OCT images; He correctly classified all of the CNV diseases. In DME disease, it was determined that 240 images were classified correctly, and 2 images were incorrectly classified. In Drusen’s disease, he classified 236 images correctly and Table 2.1 Estimation and test results of the architectures used Dataset
ResNet-152
HitNet
Efficient-B0
Efficient-B7
RDV_Net
Validation
%100
%93.75
%93.75
%93.75
%99.98
Test
%99.17
%94.21
%95.55
%85.53
%99.59
Table 2.2 Test results of ResNet-152 deep learning architecture Estimated results Original results
CNV
CNV
DME
Drusen
Normal
242
0
0
0
DME
2
240
0
0
Drusen
6
0
236
0
Normal
0
0
0
242
Drusen
Normal
Table 2.3 Test results of HitNet deep learning architecture Estimated results CNV Original results
DME
CNV
229
4
4
6
DME
5
226
5
6
Drusen
12
3
222
4
Normal
2
2
3
235
2.3 Research Findings
19
Table 2.4 Test results of Efficient-B0 deep learning architecture Estimated results CNV Original results
DME
Drusen
Normal
CNV
238
4
0
0
DME
1
238
0
3
Drusen
16
2
210
14
Normal
0
2
1
239
Table 2.5 Test results of Efficient-B7 deep learning architecture Estimated results Original results
CNV
CNV
DME
Drusen
Normal
241
1
0
0
DME
6
214
2
20
Drusen
47
0
131
64
Normal
0
0
0
242
Table 2.6 Test results of RDV-Net deep learning architecture Estimated results Original results
CNV
CNV
DME
Drusen
Normal
242
0
0
0
DME
0
241
0
1
Drusen
3
0
239
0
Normal
0
0
0
242
6 images incorrectly. Finally, it was determined that 239 of the standard retinal images without disease were classified correctly, and 3 images were incorrectly classified. If HitNet architecture is examined, out of a total of 242 OCT images, 229 CNV diseases were classified correctly and 14 of them incorrectly. In DME disease, it was determined that 226 images were classified correctly, and 16 images were incorrectly classified. In Drusen’s disease; It classified 222 images correctly and 19 images incorrectly. It was determined that 235 of the normal retinal images without disease were classified correctly, and 7 images were incorrectly classified. The deep learning model was created using the Efficient-B0 architecture; Of the 242 CNV and DME images, 238 were classified correctly, and 4 were incorrectly classified. Of the 242 retinal images with drusen disease, 210 were correctly classified, and 32 were misclassified. Of the images with normal retina, 239 were classified correctly, and 3 were incorrectly classified. In the classification using the Efficient-B7 architecture, it was observed that the classification accuracies of images with drusen and DME disease were low. Of the CNV images, 241 were classified correctly and 1 incorrectly. Of the images of DME disease, 214 were classified correctly, and 28 were incorrectly
20
2 An Example Application for Early Diagnosis …
classified. It classified only 131 of the 242 images with drusen’s disease as drusen and included the remaining 111 images in other classes. He classified 242 of the normal images correctly. Finally, RDV-Net, which was created using the community model; In 242 OCT images; He correctly classified all of the CNV diseases. In DME disease, it was determined that 241 images were classified correctly, and 1 image was incorrectly classified. He classified 239 images correctly and 3 images incorrectly in Drusen’s disease. Finally, it was determined that all standard retinal images without disease were classified correctly.
2.4 Discussion In the literature, many applications of deep learning yield successful results in the diagnosis of retinal diseases from OCT images. Kamran et al. classified retinal OCT images with 99.8% accuracy with their OpticNet-71 CNN-based model [20]. Li et al. The VGG-16 architecture and weight values pre-trained by Imagenet were used for retinal disease diagnosis. In the study in which they applied transfer learning, they achieved 98.2% classification accuracy [21]. Kharisudin et al. changed the hyperparameters with the CNN network they created and obtained the highest accuracy of 99.2% [22]. Li et al. classified retinal images with 97.9% accuracy using Multi-ResNet50 [23]. Islam et al., 11 (AlexNet, GoogLeNet, ResNet-18, ResNet-50, VGGNet-16, VGGNet-19, ResNet-101, Inceptionv3, Inception-ResNetv2, DenseNet-201 and SqueezeNet.) transfer to deep learning architecture practiced learning. They classified retinal images with the best DenseNet architecture with 97% accuracy [24]. Alqudah achieved 97.1% classification accuracy in classifying retinal images with the CNN model he designed [25]. Sertkaya et al. used deep learning architectures LeNet, AlexNet, and VGG16 to classify four classes of retinal images. They found the most accurate classification as the VGG16 architecture with a rate of 93.01% [26]. Tasnim et al. classified retinal diseases using Xception, ResNet50, MobileNetV2, and CNN. The accuracy values for the models they use are training accuracy for CNN, Xception, ResNet50, MobileNetV2, respectively; 0.9000, 0.9300, 0.8900, 0.9300, test accuracy; 0.9800, 0.9907, 0.9700, 0.9917 [27]. Tsuji et al. Mentioning that the pooling layer used in CNN-based models causes loss of information; they diagnosed retinal disease by using capsule networks instead of CNN-based models. They found the classification accuracy with capsule nets to be 99.6% [28]. The academic studies given in Table 2.7 above and the results obtained in the study are shown in Table 2.7 in comparison. As seen in the table, classification results were obtained with high accuracy.
2.5 Results Table 2.7 Deep learning studies with retinal images
21 Writers
Method
Classification success
Kamran vd. [20]
OpticNet-71
%99.8
Li vd. [21]
Transfer learning + VGG-16
%98.6
Kharisuadin vd. [22]
CNN
%99.2
Li vd. [23]
Multi-ResNet50
%97.9
Islam vd. [24]
AlexNet
%86.0
VGG-16
%94.6
VGG-19
%95.4
SqueezeNet
%93.5
GoogleNet
%93.7
Inception-v3
%95.5
DenseNet-201
%97.0
ResNet-18
%95.0
ResNet-50
%94.3
ResNet-101
%93.1
Inception + ResNet-v2 %90.9 Alqudah [25]
CNN
%97.1
Sertkaya vd. [26]
AlexNet
%95.28
VGG-16
%93.01
LeNet
%93.76
AlexNet (Dropout)
%94.72
CNN
%98.0
Xception
%99.07
ResNet-50
%97.00
MobileNet-v2
%97.17
Tasnim vd. [27]
Tsuji vd. [28]
CapsNet
%99.6
In the study
ResNet-152
%99.17
HitNet
%94.21
Efficient-B0
%95.55
Efficient-B7
%85.53
RDV-Net
%99.59
2.5 Results Today, advances in technology and medicine are very important in the early diagnosis of diseases. Artificial intelligence methods are one of the most frequently used methods in the early diagnosis of diseases. The artificial intelligence techniques
22
2 An Example Application for Early Diagnosis …
used it is aimed to reduce human-induced errors in the diagnosis of diseases. Eye diseases are also one of the areas where early diagnosis is important. The study classified disease-free standard retinal images taken from the open-access website and different retinal images including CNV, DME, and drusen diseases using ResNet152, HitNet, Efficient-B0 Efficient-B7, and RDV-Net deep learning architectures, and the following results were obtained. • Firstly, the Efficient-B7 architecture was used as a model in the study, and it was trained using normal and diseased retinal images. When the model was tested, 85.53% accuracy was observed. • Secondly, HitNet architecture is used as a model. When this model, which was trained with retinal images, was also tested, the classification accuracy was 94.21%. • The third deep learning architecture, the Efficient-B0 model, is trained with normal and diseased images. The classification accuracy of the model was determined as 95.55%. • In the fourth deep learning architecture, the use of ResNet-152 is preferred. The model trained with standard retinal and diseased retinal images yielded 99.17% test classification accuracy. • Finally, the ensemble model named RDV-Net was created, and it was observed that the test classification accuracy increased. According to the test results, 99.57% classification accuracy was obtained in the model trained with normal and diseased retinal images. The test results of all five architectures used in the study are over 80%, indicating that these architectures are successful in detecting eye diseases. Acknowledgements We want to thank everyone who made the Retinal OCT Images (optical coherence tomography) open-source data used in the study available on the website (kaggle).
References 1. He X, Fang L, Rabbani H, Chen X, Liu Z (2020) Retinal optical coherence tomography image classification with label smoothing generative adversarial network. Neurocomputing 2. Das V, Dandapat S, Bora PK (2019) Multi-scale deep feature fusion for automated classification of macular pathologies from OCT images. Biomed Sig Process Control 54:101605 3. Toptan M, Satici A, Sa˘glik A (2018) Ya¸sa Ba˘gli Maküla Dejenerasyonunun Ya¸s Tipinde Intravitreal Ranibizumab Enjeksiyonun Etkinli˘ginin Ara¸stirilmasi. J Harran Univ Med Fac 15(3) 4. Erçalik Y, Türkseven Kumral E, ˙Imamo˘glu S (2018) Ranibizumab Tedavisine Yetersiz Yanit Veren Diyabetik Makula Ödeminde Aflibercept Tedavisi Erken Dönem Sonuçlar. Retina-Vitreus/J Retina-Vitreous 27(1) 5. Huang D, Swanson EA, Lin CP, Schuman JS, Stinson WG, Chang W, Hee MR, Flotte T, Gregory K, Puliafito CA et al (1991) Optical coherence tomography. Science 254:1178–1181 6. Mumcuoglu T, Erdurman C, Durukan AH (2008) Optik Koherens Tomografi Prensipleri Ve Uygulamadaki Yenilikler. Turk J Ophthalmol 38:168–175
References
23
7. Dikkaya F, Özkök A, Delil S¸ (2018) Parkinson Hastali˘ginda Retina Sinir Lifi Tabakasi Ve Makula Kalinli˘ginin De˘gerlendirilmesi. Dicle Tip Dergisi 45(3):335–340 8. Fang L, Jin Y, Huang L, Guo S, Zhao G, Chen X (2019) Iterative fusion convolutional neural networks for classification of optical coherence tomography images. J Vis Commun Image Represent 59:327–333 9. Alsaih K, Lemaitre G, Rastgoo M, Massich J, Sidibé D, Meriaudeau F (2017) Machine learning techniques for diabetic macular edema (DME) classification on SD-OCT images. Biomed Eng Online 16(1):68 10. Cavaliere C, Vilades E, Alonso-Rodríguez M, Rodrigo MJ, Pablo LE, Miguel JM, GarciaMartin E et al (2019)Computer-aided diagnosis of multiple sclerosis using a support vector machine and optical coherence tomography features.Sensors19(23):5323 11. Hussain MA, Bhuiyan AD, Luu C, Theodore Smith RH, Guymer R, Ishikawa H, Ramamohanarao K et al (2018) Classification of healthy and diseased retina using SD-OCT imaging and random forest algorithm.PLoS ONE13(6):E0198281 12. Rasti R, Rabbani H, Mehridehnavi A, Hajizadeh F (2017) Macular OCT classification using a multi-scale convolutional neural network ensemble. IEEE Trans Med Imaging 37(4):1024– 1034 13. Retinal OCT images (optical coherence tomography). Eri¸sim: https://www.Kaggle.Com/Pau ltimothymooney/Kermany2018. Son Eri¸sim Tarihi: 31 July 2020 14. Detect retina damage from OCT images, Eri¸sim: https://www.Kaggle.Com/Paultimothym ooney/Detect-Retina-Damage-From-Oct-Images. Son Eri¸sim Tarihi: 21 Oct 2020 15. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 16. Deliege A, Cioppa A, Van Droogenbroeck M (2018) Hitnet: a neural network with capsules embedded in a hit-or-miss layer, extended with hybrid data augmentation and ghost capsules. Arxiv Preprint Arxiv:1806.06519 17. Be¸ser F, Kizrak MA, Bolat B, Yildirim T (May 2018) Recognition of sign language using capsule networks. In: 2018 26th signal processing and communications applications conference (SIU). IEEE, pp 1–4 18. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. Arxiv Preprint Arxiv:1905.11946 19. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708 20. Amit Kamran S, Saha S, Shihab Sabbir A, Tavakkoli A (2019) Optic-Net: a novel convolutional neural network for diagnosis of retinal diseases from optical tomography images. Arxiv, Arxiv1910 21. Li F, Chen H, Liu Z, Zhang X, Wu Z (2019) Fully automated detection of retinal disorders by image-based deep learning. Graefe’s Arch Clin Exp Ophthalmol 257(3):495–505 22. Kharisudin I, Az-Zahra MF, Winarti ER, Waluya SB (June 2020) Deep convolutional neural networks for the detection of macular diseases from optical coherence tomography images. J Phys Conf Ser 1567(2):022076 (IOP Publishing) 23. Li F, Chen H, Liu Z, Zhang XD, Jiang MS, Wu ZZ, Zhou KQ (2019) Deep learning-based automated detection of retinal diseases using optical coherence tomography images. Biomed Opt Express 10(12):6204–6226 24. Islam KT, Wijewickrema S, O’Leary S (June 2019) Identifying diabetic retinopathy from OCT images using deep transfer learning with artificial neural networks. In: 2019 IEEE 32nd international symposium on computer-based medical systems (CBMS). IEEE, pp 281–286 25. Alqudah AM (2020) AOCT-NET: a convolutional network automated classification of multiclass retinal diseases using spectral-domain optical coherence tomography images. Med Biol Eng Comput 58(1):41–53 26. Sertkaya ME, Ergen B, Togacar M (June 2019) Diagnosis of eye retinal diseases based on convolutional neural networks using optical coherence images. In: 2019 23rd international conference electronics. IEEE, pp 1–5
24
2 An Example Application for Early Diagnosis …
27. Tasnim N, Hasan M, Islam I (2019) Comparisonal study of deep learning approaches on retinal OCT image. Arxiv Preprint Arxiv:1912.07783 28. Tsuji T, Hirose Y, Fujimori K, Hirose T, Oyama A, Saikawa Y, Kotoku JI et al (2020) Classification of optical coherence tomography images using a capsule network.BMC Ophthalmol20(1):1–9
Chapter 3
Autonomous Parking with Continuous Reinforcement Learning Mehmet Ertekin and Mehmet Önder Efe
3.1 Introduction In the age we live in, both passenger transportation and freight transportation are of great importance. Parking the vehicles when they reach the target point is a challenging task for both humans and automatic parking systems. Artificial intelligence-based methods are used for this task, where traditional control methods are insufficient. The neurons involved in the working mechanism of the biological brain and their communication with each other inspired researchers to design structures that can learn on their own. Mathematical models that imitate the human brain are called Artificial Neural Networks (ANNs). The strategy development capabilities of an ANN allow it to be used in the design of the control system for problems such as vehicle parking. If the output in a learning system is also part of the input, it is called reinforcement learning. Some of the current reinforcement learning algorithms are: Twin Delayed Policy Gradient (TD3), Deep Deterministic Policy Gradient (DDPG), Soft Actor Critic (SAC). HER (Hindsight Experience Replay) method is a wrapper algorithm that increases the performance in unsuccessful attempts when used with reinforcement learning algorithms. An artificial intelligence control system was designed with HER supported feedback learning methods on a vehicle model, in which throttle and steering commands are controlled continuously in space in the designed working environment. The designed control system controls the vehicle and enables it to park at the target point.
M. Ertekin (B) · M. Önder Efe Department of Computer Engineering, Hacettepe University, Ankara, Turkey e-mail: [email protected] M. Önder Efe e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_3
25
26
3 Autonomous Parking with Continuous Reinforcement Learning
3.2 Related Works 3.2.1 Deep Q Networks Chris Watkins introduced Q-learning in 1989 [1], which later proved by Watkins and Dayan in 1992 [2]. The idea behind Q learning was that if we store what to do in every state-action pair in a table called Q table and update that table according to Bellman equation and temporal difference learning [3]. We would have model free reinforcement learning algorithm. Q new (st , at ) ← Q(st , at ) old value
+
α
temporal difference
· rt +
learning rate reward
γ discount factor
max Q(st+1 , a) a
·
− Q(st , at )
estimate of optimal future value
(3.1)
old value
new value (temporal difference target)
Q learning stores expected reward at given state and action pairs which means Q values. Q values are arbitrarily initialized by the programmer. At every time step Q table is updated by the rule given in (3.1) [4]. The α parameter in this expression is the step size or the learning rate, which determines the importance of recently acquired experience. When α gets closer to zero algorithm learns slowly, sticks with old experiences or when α goes up to 1, the algorithm does not care of old information and it tries to maximize compatibility of new information. Similarly, the discount factor γ determines the importance of expected feature rewards. If the discount factor γ get closer to 0, the agent starts to consider only current rewards and this will make the agent “myopic” else if the discount factor γ goes up to 1, the agent starts to optimize long term expected reward.
Algorithm 1: Q-Learning 1
Initialize Q table for all action-state pairs
2
For each episode do following steps:
3
do
4
Choose action A using policy derived from Q
5
Apply action A, observe reward S and next state S’
6
Update Q table according to equation (1) Assign S’ to S
7 8 9
until end of episode end for
3.2 Related Works
27
One of the prime problems with Q learning was the curse of dimensionality problem. When there exists high dimension, state space of Q learning becomes unable to scale, computing power required increases exponentially. Mnih et al. find a method to solve curse of dimensionality problem in Q-learning [4]. The idea behind their algorithm was instead of Q tables to store expected values, they use ANN to guess feature rewards. They take raw pixel outputs of Atari games and successfully apply their algorithm to couple of games in 2013, this method is called DQN [5]. They solved curse of dimensionality problem by applying convolutional neural networks at the first layers of ANN structure. They also demonstrated experience replay buffer mechanism. DQN imitates human behavior by allowing agent to explore environment and gather information. Agent collects data and puts into replay buffer in training process. DQN architecture concept is illustrated in Fig. 3.1. DQN uses two separate Q networks: local Q network and target Q network. These two separate Q networks are used to calculate prediction value (θ ) and target value (θ , ). During training process, periodically target network weights are updated by copying weights of the actual Q network. Pausing target network for a while and then updating target networks weights with actual Q network weights makes learning process more stable. Applying replay buffer mechanism on top of this makes even more stable learning process [6].
Fig. 3.1 DQN algorithm architecture
28
3 Autonomous Parking with Continuous Reinforcement Learning
3.2.2 Deep Deterministic Policy Gradient Algorithm DQN can solve complex reinforcement learning tasks with high dimensional observation spaces but only the ones with limited discrete action spaces. When it comes to solve continues actions it fails due to the curse of dimensionality problem. Lilicrap et al. introduced DDPG to use reinforcement learning in continuous control tasks [7]. DDPG utilize two separate actor and critic networks. Actor is responsible for determining best actions from probabilities by configuring the weight parameters θ. Critic is responsible for evaluation of actions generated by actor network. Actor deterministically approximates the optimal policy, which means actor generates best possible actions for any given state. Actor network of DDPG uses policy-based learning model and tries to directly estimate optimal policy by maximizing rewards through gradient ascent. Critic network on the other hand, uses value-based learning model to estimate quality of state-action pairs.
3.2 Related Works
29
3.2.3 Twin Delayed Temporal Difference Algorithm Fujimoto et al. claim that Q function of DDPG is commonly makes over estimations, which causes poor policy updates and significant biases. TD3 algorithm is proposed to solve this problem and increase performance of DDPG. Firstly, single Q function of DDPG is changed to double Q function, which represents twin keyword in naming. Minimum of Q value is used to form targets. This clipped double Q-learning mechanism helps to avoid any additional over estimation over standard mechanism. Also, minimization of Q function provides low-variance value estimations with stable learning targets that helps policy updates to be safer. Algorithm 3:
30
3 Autonomous Parking with Continuous Reinforcement Learning
Secondly, TD3 updates Q network more frequent than policy network and target policy network. That is where delayed keyword in algorithm name came from. Authors of TD3 recommends one policy update per every two Q-function updates. Finally, TD3 adds noise to target action. This noise helps to avoid deterministic policy overfitting to narrow peaks in the value estimates and smooths out Q. Algorithm 4:
3.2.4 Soft Actor Critic Algorithm Haarnoja et al. announced SAC algorithm with contributions of UC Berkeley and Google in 2018 [8]. They developed SAC to be a sample efficient, no sensitive to hyperparameters, off policy model free reinforcement learning algorithm. J (π ) = Eπ
t
r (st , at ) − α log(π (st |at ))
(3.2)
3.2 Related Works
31
Maximum entropy reinforcement learning framework is used in SAC algorithm, which uses objective function given in Eq. (3.2). Here expectation is constructed from policy on the left side and real dynamics of the system on the right. Optimal policy of SAC algorithm maximizes expected return together with expected entropy. Temperature parameter α controls the tradeoff between policy and dynamics of the system. Authors showed in a technical report [9] that α temperature parameter could be learned automatically instead of treating it as a hyperparameter. Algorithm 5 shows SAC algorithm in detail [10].
3.2.5 Hindsight Experience Replay Algorithm Sparse rewards in learning environments are one of the biggest challenges for reinforcement learning systems. Which is agent gets remarkable reward when only it reaches the goal state. Andrychowicz et al. [11] presented HER algorithm to improve performance on sparse reward and multi-goal reinforcement learning tasks. HER algorithm can be combined with off policy reinforcement learning algorithms to improve sample efficiency. The idea behind HER is to store set of experienced episodes in replay buffer not only with the original goal but also with a subset of other goals. HER is independent from initial distribution of environment states. Algorithm 6 shows HER algorithm in detail.
32
Algorithm 5:
3 Autonomous Parking with Continuous Reinforcement Learning
3.2 Related Works
33
3.2.6 Parking Environment Simulation Model Creating and using environment simulation models is one of the most important tasks for reinforcement learning researchers. Leurent et al. [12] open sourced their environments on Github. In this work, their parking environment is used (Fig. 3.2). Parking environment uses bicycle kinematic model [13], which is given in Eq. (3.3). In this model, (x, y) is the vehicle position, v is the speed, ψ is the heading
34
3 Autonomous Parking with Continuous Reinforcement Learning
Fig. 3.2 Parking environment
angle, a is the acceleration command, β is the slip angle at the center of gravity and δ is the steering angle. x˙ = v cos(ψ + β) y˙ = v sin(ψ + β) v˙ = a v ψ˙ = sin β l β = tan−1 (1/2 tan δ)
(3.3)
Parking environment implements low-level controller on top of vehicle to track given target speed and trajectory. Longitudinal controller is a proportional controller to control speed given in Eq. (3.4). Lateral controller on the other hand is a proportional-derivative controller to control heading, combined with some nonlinearities inverting the kinematic model. Equation (3.5) lateral position controller and Eq. (3.6) show lateral heading controller, where a is the acceleration, v is the velocity, vr is the reference velocity, K p is the controller gain, Δlat is the lateral position of vehicle, vlat,r is the lateral velocity command, Δψr is the heading variation, v L is the lane heading, ψr is the target heading and δ is the front wheel angle control. a = K p (vr − v) vlat,r = −K p,lat Δlat v
lat,r Δψr = sin−1 v
(3.4)
(3.5)
3.3 Experiments and Results
35
ψr = ψ L + Δψr ψ˙ r = K p,ψ (ψr − ψ) l −1 ˙ δ = sin ψr 2v
p R(s, a) = − s − sg W, p − b collusion
(3.6) (3.7)
Reward function of parking is given in Eq. (3.7), where environment s = x, y, vx , v y , cos ψ, sin ψ , sg = x g , yg , 0, 0, cos ψg , sin ψg , ||z||W, p = p 1/ p . In order to have narrower spike of rewards at goal, p-norm is i |Wi x i | preferred instead of Euclidean norm.
3.3 Experiments and Results In this work TD3, DDPG and SAC algorithms are compared using HER wrapper and parking environment. Algorithms are used from open-source stable baselines library [14]. Table 3.1 lists the settings that are used in the experiments. Visualization of performance metrics are carried out by open source tensor board module of TensorFlow python library [15]. Figure 3.3 shows experimental results. Shorter episodes are better than longer ones. Higher rewards indicate better performance. Success rate of 1 means every episode started to end successfully.
Table 3.1 Experiment settings
Parameter
Value
Maximum episode length
100
Buffer size
1,000,000
Learning rate (η)
0.001
Gamma (γ )
0.95
Batch size
256
Network architecture
[256, 256, 256]
Tau (τ )
0.005
Policy delay for TD3
2
Target policy noise for TD3
0.2
Target noise clip for TD3
0.5
Target update interval for SAC
1
36
3 Autonomous Parking with Continuous Reinforcement Learning
Fig. 3.3 Experiment results
3.4 Conclusions and Future Work In this work reinforcement learning algorithms are compared in a simulation environment. Experiment results show that SAC algorithm is better than TD3 and DDPG in overall performance. Proposers of the TD3 algorithm claim that TD3 has improved DDPG a lot but when it comes to the continuous time car parking problem, several experiments of which is studied in this work, we see that TDa3 is performing poorer than DDPG.
References
37
Future work can be done with more realistic simulation environments that uses realistic sensor inputs to localize vehicle by making dynamic mapping. Static obstacles like trees and dynamic obstacles like pedestrians can be added to have obstacle avoidance. Lastly, it can be applied to real word vehicles.
References 1. Watkins CJCH (1989) Learning from delayed rewards 2. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292. https://doi.org/10.1007/ BF00992698 3. Sutton RS (n.d.) Learning to predict by the methods of temporal differences 4. Q-learning—Wikipedia (n.d.). https://en.wikipedia.org/wiki/Q-learning. Accessed 8 August 2021 5. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. http://arxiv.org/abs/1312.5602. Accessed 30 March 2021 6. Buchholz M (2019) Deep reinforcement learning. Introduction. Deep Q network (DQN) algorithm. https://medium.com/@markus.x.buchholz/deep-reinforcement-learning-introduct ion-deep-q-network-dqn-algorithm-fb74bf4d6862. Accessed 16 May 2021 7. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: 4th ınternational conference on learning representations, ICLR 2016—Conference track proceedings, ınternational conference on learning representations, ICLR, 2016. https://goo.gl/J4PIAz. Accessed 30 March 2021 8. Haarnoja T, Zhou A, Abbeel P, Levine S (2019) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor 9. Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P, Levine S (2019) Soft actor-critic algorithms and applications 10. Soft actor-critic—spinning up documentation (n.d.). https://spinningup.openai.com/en/latest/ algorithms/sac.html. Accessed 8 August 2021 11. Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. In: Advances in neural ınformation processing systems, December 2017, pp 5049–5059. http://arxiv.org/abs/1707.01495. Accessed 28 March 2021 12. Leurent E (2018) An environment for autonomous driving decision-making. GitHub. https:// github.com/eleurent/highway-env. Accessed 28 March 2021 13. Polack P, Altché F, d’Andréa-Novel B, de La Fortelle A (2017) The kinematic bicycle model: a consistent model for planning feasible trajectories for autonomous vehicles. In: 2017 IEEE ıntelligent vehicles symposium (IV), IEEE, pp 812–818 14. Raffin A, Hill A, Ernestus M, Gleave A, Kanervisto A, Dormann N (2019) Stable Baselines3, GitHub Repository 15. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X, Research G (2015) TensorFlow: large-scale machine learning on heterogeneous distributed systems. www.tensorflow.org
Chapter 4
Design and Manufacturing of a 3 DOF Robot with Additive Manufacturing Methods A. Burak Ke¸skekçi, Hilmi Cenk Bayrakçi, and Ercan Nurcan Yilmaz
4.1 Introduction Additive manufacturing has emerged as a method that was been used since the 1980s [1]. The RepRap project, which started as an initiative of the University of Bath in 2005 and expanded with the participation of various collaborators over time, paved the way for inexpensive and easily developable additive manufacturing devices [2]. Additive manufacturing focuses on many different sub-methods and materials [3]. FDM (Fused Deposit Modeling), is the most common method and was used in the study [4]. Robot arms appear in many different areas [5]. Manufacturing, healthcare, military defense, space and aerospace industries are the most common examples for that. In this study, additive manufacturing methods and robotic topics were brought together, a robot arm with three degree of freedom was designed, and the necessary parts were produced with the FDM method. The necessary programs were written and shared on open source platforms for the robot arm to operate the internal controls and commands that received from outside. With this study, it is aimed to create a training set that can be easily accessed by individuals who are training in robotic control systems.
A. Burak Ke¸skekçi (B) · H. Cenk Bayrakçi Department of Mechatronics Engineering, Faculty of Technology, Isparta Unıversity of Applied Sciences, Isparta, Turkey e-mail: [email protected] E. Nurcan Yilmaz Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_4
39
40
4 Design and Manufacturing of a 3 DOF Robot with Additive …
4.2 Material and Method The study contains mechanical, electronic and software components. All of the applications used are selected as open source.
4.2.1 Material 4.2.1.1
Mechanic Design
In the study, a (Scara) type design with two rotational and one prismatic joint was preferred. The most important feature of Scara is middle level difficulty of kinematic controls and mechanical design. Thus, both mechanical design and software control are balanced at medium difficulty level. The designed parts were exported in.stl format and manufactured with the FDM device, also called as a 3D printer. In the literature review, it was seen that the same method was used in similar applications [6]. The main reason for using the FDM method is that it has the required strength for required parts and is relatively the cheapest method. The robot arm, whose mechanical design, production and assembly was completed can be seen in Fig. 4.1.
4.2.1.2
Electronic Components
In the study, Arduino Mega 2560 microcontroller control card, Ramps expansion card, A4988 motor drivers, Nema 17 stepper motors and mechanical limit switches were used. Figure 4.2 shows the electronic control card components assembled to each other. Fig. 4.1 Assembled robot arm
4.3 Method
41
Fig. 4.2 Microcontroller control unit and other electronic components
4.2.1.3
Software Components
Internal Control Software It is the internal control software that allows the robot arm to interpret and execute the recieved commands. The software divides the commands into sub commands as it receives over USB, defines the task to be performed, makes the inverse kinematic calculations and triggers the actuators acordingly. Figure 4.3 shows the flow diagram of the internal control program algorithm.
External Control Software It is the part that provides the communication between the robot arm and the computer. C-based open source programming language was used while coding. There are various sections to preview the commands before sent to robot and to take the robot to the starting point. Figure 4.4 shows the interface window of the external control program.
4.3 Method When forward kinematic analyzes of the robot arm are made, the position and orientation of the end effector with respect to the starting point of the robot can be seen in Eq. 4.1 [7].
42
4 Design and Manufacturing of a 3 DOF Robot with Additive …
Fig. 4.3 Internal Control Program Algorithm
4.4 Findings and Discussion
43
Fig. 4.4 External control program interface window
⎡
C12 −S12 ⎢ S12 C12 0 ⎢ 4T = ⎣ 0 0 0 0
⎤ 0 C12 l2 + C1l1 0 S12 l2 + S1l1 ⎥ ⎥ ⎦ 1 d 0 1
(4.1)
With this point of view, as a result of inverse kinematic analysis, the equations of joint variables are seen in Eqs. 4.2 and 4.3 [8]. ⎞ ⎛ ⎡ ⎡ | 2 2 2 2 2 + p2 − l 2 − l 2 | p p + p − l − l x y x y 1 2 1 2⎠ θ2 = A tan 2⎝±√ 1 − , 2l1l2 2l1l2 θ1 = A tan 2( p y , px ) ± A tan 2
/
px2 + p 2y − (l2 Cθ2 + l1 )2 , l2 Cθ2 + l1
(4.2) (4.3)
The solution set obtained with these equations includes four different results. Two of them are real and two are imaginary. One of the real solutions is to make the robot arm work in a “right-handed” and the other “left-handed” arrangement. In the study, a right-handed solution prefered and the mechanical design was arranged accordingly [9].
4.4 Findings and Discussion In serial robot arms, occuring small errors for various reasons are increased by cascading along the arm and may cause the end effector to not be in the desired position [10]. Therefore, the mechanical structure should not allow such errors. It
44
4 Design and Manufacturing of a 3 DOF Robot with Additive …
Fig. 4.5 Trajectories created for the robot arm to follow
Fig. 4.6 Trajectories followed by the robot arm
was observed that the belt and pulley systems used in the study should be tensioned enough [11]. Therefore, the tension of the belts was ensured by reviewing the design. For similar reasons, the robot must accurately determine its zero position [12]. So switches that detect the starting position need precision and correct positioning. At this point, an improvement could be done by using optical limit switches instead of mechanical limit switches. The G codes that required for the robot arm to follow various trajectories were created with point to point trajectories with 0.1 mm intervals. The created trajectories are shown in Fig. 4.5, and the drawings created by the robot arm in response to these codes can be seen in Fig. 4.6.
4.5 Conclusion With this study, a low cost robot arm design was made for the use of people and institutions that provide applications and training in the field of robotic control and microcontroller. As a result of the tests, the repeatability value was seen as 0.4 mm. It is possible to reduce this value with mechanical and software improvements. The development of the work continues on open source platforms. Mechanical designs, internal and external control softwares are open to those who want to participate in the project. In the study, the robot arm works as an open loop. Naturally, error correction is not possible. It is possible to provide feedback to the system with image processing or encoders.
References
45
Parallel robots are an option that should not be ignored because they are more complicated to control and aslo they have a higher load capacity [13]. In addition to this study, designing a parallel robot arm is an option to be considered in future studies.
References 1. Wong KV, Hernandez A (2012) A review of additive manufacturing. ISRN Mech Eng 2012:1– 10. https://doi.org/10.5402/2012/208760 2. Jones R, Haufe P, Sells E, Iravani P, Olliver V, Palmer C, Bowyer A (2011) Reprap—the replicating rapid prototyper. Robotica 29(1 SPEC. ISSUE):177–191. https://doi.org/10.1017/ S026357471000069X 3. Ngo TD, Kashani A, Imbalzano G, Nguyen KTQ, Hui D (2018) Additive manufacturing (3D printing ): a review of materials, methods, applications and challenges. Compos B 143:172–196. https://doi.org/10.1016/j.compositesb.2018.02.012 4. Aksoy B, Ghazal Z, Senol ¸ R, Ersoy M (2020) Ses ve Metin Olarak Girilen ˙I¸saret Dili Hareketlerinin Robot Kol Tarafından Gerçekle¸stirilmesi 8:220–232. https://doi.org/10.29130/dubited. 593405 5. Almurib HAF, Al-qrimli HF, Kumar N (2011) A review of application industrial robotic design 6. Özsoy K, Aksoy B, Yücel M (2020) Design and manufacture of continuous automatic 3D printing device with conveyor system by image processing technology. Erzincan Üniversitesi Fen Bilimleri Enstitüsü Dergisi 13(2):392–403. https://doi.org/10.18185/erzifbed.666424 7. Bingül Z (2017). Robot Kinemati˘gi. Umuttepe yayınları, Umuttepe Basımevi 8. Ait L, Murray R (n.d.) Kinematics of SCARA robots 9. Mousavi A, Akbarzadeh A, Shariatee M, Alimardani S (2015) Repeatability analysis of a SCARA robot with planetary gearbox. In: 2015 3rd RSI international conference on robotics and mechatronics (ICROM), pp 640–644. https://doi.org/10.1109/ICRoM.2015.7367858 10. Xuan J, Xu S (2014) Review on kinematics calibration technology of serial robots. 15(8):1759– 1774.https://doi.org/10.1007/s12541-014-0528-1 11. Whitesell J (1985) Design of belt-tensioner systems for dynamic stability, May 2015. https:// doi.org/10.1115/1.3269258 12. Zhang X, Xie X (2019) A new robot body calibration method of SCARA based on machine vision. 1:3077–3081 13. Metodij KI, Engineering-skopje FM (2014) Comparison of the characteristics
Chapter 5
Real-Time Mask Detection Based on Artificial Intelligence Using Renewable Energy System Unmanned Aerial Vehicle Bekir Aksoy, Mehmet Yücel, Re¸sat Selba¸s, Merdan Özkahraman, Çetin Elmas, and Almaz Aliyeva
5.1 Introduction The COVID-19 virus emerged in Wuhan city of Hubei province of China in 2019 and became a global pandemic in a short time by affecting the whole world. The World Health Organization (WHO) declared the COVID-19 pandemic on March 11, 2020 [1]. COVID-19 is a highly contagious virus that causes upper and lower respiratory tract infections with high viral load in respiratory and salivary secretions. The most common way of transmission of the disease is the tiny droplets that a virus-carrying person coughs or scatters from the nose and mouth while breathing [2]. In droplet-borne diseases, the particles are too heavy to stay suspended in the air and can reach 2–3 m away in case of coughing or sneezing. The best way to prevent direct transmission of this type of virus is to keep a distance. Most of the diseases transmitted by droplets occur as a result of contact. After contact with the particles B. Aksoy (B) · M. Yücel · M. Özkahraman Faculty of Technology, Department of Mechatronics Engineering, Isparta University of Applied Science, Isparta, Turkey e-mail: [email protected] R. Selba¸s Faculty of Technology, Department of Mechanical Engineering, Isparta University of Applied Science, Isparta, Turkey Ç. Elmas Faculty of Information and Telecommunication Technologies, Azerbaijan Technical University, Baku, Azerbaijan A. Aliyeva Department of Information Technologies, Mingachevir State University, Mingachevir, Azerbaijan
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_5
47
48
5 Real-Time Mask Detection Based on Artificial Intelligence…
on the surfaces, the particles are taken into the body by touching the eyes, mouth, nose and face with the hands [3]. The best way to reduce the contagiousness of COVID-19 is hand washing, respiratory hygiene and social distance rules, which are called basic precautions. Taking public health decisions across the country to limit interaction between people can slow the spread of the virus [4]. For this reason, it is known that countries take decisions such as “use of masks, curfews, closure of workplaces and protection of social distance” within the scope of basic measures [5]. In order to prevent the spread of the COVID-19 virus in our country, the use of medical face masks has been made compulsory by the Ministry of Health’s Coronavirus Scientific Committee as of April 4, 2020 in closed and social areas and in workplaces where collective work is performed. However, in our country, it is seen that human mobility increases with the increase in air temperatures, people’s unconsciousness about contagiousness, not knowing the technique of putting on and taking off masks, or refraining from wearing masks [6]. This situation increases the risk of infected people to infect healthy people. Due to non-compliance with the rules, the virus is spreading rapidly and the number of cases is increasing day by day. With this uncontrolled increase in the number of cases, loss of life and serious damage to the country’s economy occur [7]. The obligation to wear a mask, which is one of the methods used to minimize these damages, is supervised by certain personnel. Inspections are carried out in closed and open social areas [8]. Personnel assigned for inspection are not always efficient in detecting people without masks. In addition, there are applications where these inspections are carried out by an officer through surveillance cameras in closed areas. However, in these applications, human intervention is required to analyze the surveillance cameras. The large amount of images to be analyzed makes it difficult for a person to follow all the images at the same time. Therefore, deep learning algorithms used in computer vision applications can provide a solution to these problems [9]. Deep learning is a kind of artificial intelligence method that uses multi-layer artificial neural networks in areas such as natural language processing, object recognition, and voice recognition. Unlike traditional machine learning methods, deep learning provides learning from data obtained from videos, images, texts and sounds, instead of learning with coded rules [10]. Deep learning, which is one of the most important branches of artificial intelligence, is used in image processing applications, military and security, production, machine vision, surveillance and many other sectors [11]. Using the deep learning, scientists try to minimize the amount of error and human intervention. Deep learning is a sub-branch of machine learning. However, the usage areas are different. In case of a possible error in machine learning, it is necessary to intervene and make arrangements by a software developer. However, algorithms determine the accuracy of a prediction in deep learning. In deep learning architectures, the data set is collected on the computer and analysis is performed using these data [12, 13]. The images obtained from the video can be analyzed using deep learning algorithms. Unmanned aerial vehicles (drones) with cameras placed on them are one of the methods frequently used in image collection. Unmanned aerial vehicle is a type
5.2 Related Studies
49
of aircraft technology that does not require a pilot, can be remotely controlled or can fly autonomously. Unmanned aerial vehicle are frequently preferred in military and civilian areas today, thanks to their high maneuverability, vertical take-off and landing, and being able to be used regardless of open or closed space. In military areas, it is used in studies such as target detection and destruction, ensuring regional security, and region exploration, and in civilian areas, unmanned aerial vehicle are used in applications such as mapping, spraying, fire detection, aerial photography [14]. One of the important problems in unmanned aerial vehicle is that energy consumption increases and flight times are shortened under intense working conditions. In order to increase flight times and make them more efficient, solar panels were used within the scope of the study. Renewable energy is defined as the energy that does not run out in the next day or days in the nature’s own cycle. Natural resources such as hydraulic (water), solar, wind and geothermal are called renewable energy types [15]. The demand for alternative or clean energies is increasing day by day, since it is thought that the life of oil will be exhausted in a period of 50 years according to the statistics. The fact that Turkey is located in a very rich geography in terms of solar energy expands the usage areas of solar panels today [15, 16]. Solar panels are insulated conductor devices that convert solar energy directly into electrical energy when sunlight falls on them. When the semiconductor material absorbs solar energy, the electrons break off from the atoms to which they are attached and begin to circulate freely in the material and electric current is produced. Solar panels are photovoltaic diodes that do not have any moving parts on them and generate electrical voltage at their ends in proportion to the sun’s rays on them [17, 18]. In this study, a new system was developed using the wifi camera and solar panel placed on the unmanned aerial vehicle. The unmanned aerial vehicle used was deployed in the determined area and the data set was created by transferring the images to the computer instantly. The uninterrupted data transfer of the wifi camera and the continuous flight of the unmanned aerial vehicle increase the depletion rate of the batteries. The rapid decrease of the batteries shortens the flight time. For this reason, the solar panels used on the unmanned aerial vehicle and the energy required for the motherboard, camera and GPS are provided via the solar panel instead of the battery. Thus, the flight time is kept at the maximum level. The data set obtained is trained with Inception V3, ResNet152 V2, MobileNet V2 deep learning architectures, and the user interface program, which prepares the deep learning architecture that gives the most accurate result, is classified and shown to the user by classifying the people who wear masks and those who do not.
5.2 Related Studies The literature related to the study was reviewed. Gürler et al. have designed a solar powered unmanned aerial vehicle with a one-meter open wingspan that can fly for a long time in Trabzon conditions. The unmanned aerial vehicle designed in the
50
5 Real-Time Mask Detection Based on Artificial Intelligence…
related study can use energy efficiently by utilizing solar panel cells. In the design, solar panels were placed on the wing parts and therefore the wing surface was tried to be kept as wide as possible. In this related study, the XFLR5 program used for small-sized unmanned aerial vehicles was used [19]. Candan et al. developed unmanned aerial vehicles based on solar energy, low altitude, medium range and automatic flight stability and control, made of composite material, and capable of transmitting instant images from the region. The developed aircraft is intended to meet the needs in areas such as reconnaissance, observation and communication. The weight of the designed and produced unmanned aerial vehicle is 15 kg, its wing area is 2 m2 and its wingspan is 5 m. Solar panels with 17% efficiency are placed on the wings. Due to the low efficiency of the panels used in the study and the fact that only half of the wings can be covered, one third of the energy required for flight was provided from solar energy instead of all [16]. Uyanık, has developed a model that can control masks by using image processing techniques. In the developed model, it was tried to obtain the highest possible accuracy. The model accuracy was found to be 0.9 with training repetitions [20]. Akgül et al. examined the situation of people wearing face masks in indoor and outdoor areas with deep learning methods. The study was developed based on deep learning architectures using Convolutional Neural Network (CNN). The developed model is trained and tested with the Simulated Masked Face Dataset (SMFD). As a result of model training, an accuracy of 97.09% was obtained [9].
5.3 Material and Method In the material part of this study, the unmanned aerial vehicle (drone) and deep learning architectures used in the study are explained. In the method section, detailed information about the way the study was carried out is given.
5.3.1 Material This section is divided into three subheadings. These are unmanned aerial vehicle (drone), solar panels used as energy source in unmanned aerial vehicle and deep learning models. These deep learning models are also examined under three headings among themselves. These are the MobileNet V2, ResNet152 V2 and Inception V3 architectures.
5.3.1.1
Unmanned Aerial Vehicle
In this study, an unmanned aerial vehicle (drone) capable of patrolling and collecting images autonomously was used. The unmanned aerial vehicle consists of GPS,
5.3 Material and Method Table 5.1 Technical specifications of unmanned aerial vehicle
51 Procedures for special authorizations
Not required
Battery
500 mAH Lipo
Flight time
9–10 Dakika
Camera
720p HD
Dimensions
27 × 19.5 × 5 cm
Weight
150−200gr
wifi camera, main board and four DC motors. The technical specifications of the unmanned aerial vehicle are given in Table 5.1.
5.3.1.2
Solar Panel
The structures that can convert the rays from the sun into electrical energy are called solar panels. One of the solar panels used in the experimental application produces a voltage of 4.2 V and a current of 100 milliamperes. A power board has been designed so that the voltage taken from the solar panel can be used in the unmanned aerial vehicle without any problems. The designed power board circuit consists of diode, schematic of regulator and resistor elements as shown in Fig. 5.1. Solar panels are structures that produce direct current, and the current and voltage levels they can give can be changed by connecting the panels in series or parallel. The value obtained at the output by connecting the solar panels in series gives the sum of the voltages at the same current value. If the panels are connected in parallel, the sum of the currents at the same voltage value is obtained at the output [21]. In order to obtain more current under constant voltage in the study, three solar panels were connected in parallel as seen in Fig. 5.2.
5.3.1.3
MobileNet V2
The MobileNet architecture is an ESA (convolutional neural networks) architecture developed by Google developers. MobileNetV1 is an architecture developed for mobile or low-cost applications, which is easy to use in deep networks, reducing network cost and size [22, 23]. The MobileNetV2 model was developed on V1 and the linearity problem in narrow layers was eliminated in this model. With the MobileNetV2 architecture, operations such as classification, object recognition and segmentation can be done easily. The MobileNetV2 model is an architecture that includes deeply separable filters and combination stages. The model uses a deep convolution filter that examines the inputs by separating them into two separate layers, with a resolution of 1 × 1 pixel for each input. Then, the features separated by filters in the merge step are combined and a new layer is created. The MobileNetV2 architecture uses batch norm and ReLU activation functions in its structure. The input size of the architecture has a maximum resolution of 224 × 224 pixels and the
52
5 Real-Time Mask Detection Based on Artificial Intelligence…
Fig. 5.1 Diagram of power board circuit designed for solar panel
Fig. 5.2 Schematic of wiring solar panels
last layer has a Softmax function for classification [24, 25]. The schematic of the MobileNetV2 architecture is given in Fig. 5.3 [26].
5.3.1.4
ResNet152 V2
Resnet is a neural network model developed to facilitate the training of deeper network structures by in 2015 He et al. [27]. The most important difference that distinguishes the Resnet architecture from other models is that residual values are created by adding the blocks that feed the next layers to the model. The Resnet50 architecture is created by replacing both layer blocks in a 34-layer network with a 3-layer
5.3 Material and Method
53
Fig. 5.3 Diagram of the MobileNetV2 architecture
Fig. 5.4 Schematic of the ResNet152 V2 architecture
bottleneck layer. This 50-layer network contains 3.8 billion FLOP. Unlike Resnet50, Resnet101 and 152 architectures contain more layers [28, 29]. The ResNet 152 architecture contains 11.3 billion FLOP. Schematic of the ResNet152 V2 architecture is shown in Fig. 5.4 [30].
5.3.1.5
Inception V3
Inception V3 is a convolutional neural network architecture used for image classification. The Inception V3 architecture works by emulating the multi-layered process of human recognition of images. In this architecture, the depth and width could be increased without causing computational complexity. Some links between layers
54
5 Real-Time Mask Detection Based on Artificial Intelligence…
Fig. 5.5 Schematic of the Inception V3 architecture
in other networks were considered to have ineffective and redundant information. Inception V3 was designed to overcome this situation. It uses an inception module with 22 layers in its parallel processing workflow and leverages classifiers in the middle layers to increase the separation capacity in the lower layers. 1 × 1 convolution layers are used to solve the computational complexity and reduce the number of parameters. The 1 × 1 convolution layers used make the network deeper. After the convolution layers, linearity is increased by using the ReLU function. Since fully connected layers contain too many parameters, they are replaced with a pooling layer, reducing the number of parameters and lightening the processing load. Schematic of the Inception V3 architecture is shown in Fig. 5.5 [31].
5.3.2 Material and Method This section of the study is divided into 3 headings. These headings are mask detection by unmanned aerial vehicle, classification of face images and user interface design that shows the results. These topics are detailed later in this section.
5.3.2.1
Mask Detection with Unmanned Aerial Vehicle
Unmanned aerial vehicle used in the study consists of a motherboard, GPS, wifi camera and control controller. Unmanned aerial vehicle can be controlled via the remote. In addition, it has the ability to fly autonomously within the determined coordinates. While performing its mission, the unmanned aerial vehicle transmits instantaneous image data to the ground unit. In the study, energy flow was provided by integrating three solar panels and power card on the unmanned aerial vehicle in order to ensure continuous data flow and to perform uninterrupted flight. According to the measurements, three solar panels gave 4.2 V voltage output and 300 mAH current output. This energy was used for the mainboard, GPS, and camera feeds of
5.3 Material and Method
55
Fig. 5.6 Developed unmanned aerial vehicle (drone) system
the unmanned aerial vehicle, and the energy required for the engines was provided via internal lipo batteries. In order to collect human face images in social areas, the altitude level at which the unmanned aerial vehicle will operate is determined first. In the study, the altitude level was determined as 7 m and face images from this height were transferred to the ground unit with a wifi camera. The transferred face images were classified as masked and unmasked with deep learning architectures. As a result, masked faces on the interface are identified with a green frame, and those without masks are identified with a red frame. The general structure of the developed system is given in Fig. 5.6.
5.3.2.2
Classification of Face Images
In the second stage of the study, a data set consisting of human face images was prepared. In order to prepare the data set, masked and unmasked video recordings were collected from 16 volunteers for the study. The collected video recordings were divided into picture frames and a total of 3891 images were obtained. The images obtained were filtered using the HaarCascade method on a computer located in the ground unit and the images were cut so that only the face of the people remained. The images in the data set were labeled as masked and unmasked and the data set was made ready, and training was carried out using the deep learning architectures ResNet152 V2, MobileNet V2 and Inception V3. The workflow diagram of the architectures is given in Fig. 5.7 80% of the dataset is divided into training and 20% is divided into test dataset. The deep learning architectures used were evaluated according to
56
5 Real-Time Mask Detection Based on Artificial Intelligence…
Fig. 5.7 The workflow diagram of the architectures
confusion matrix and accuracy performance evaluation criteria, and the images were classified as masked and unmasked.
5.3.2.3
User Interface (UI) Design
An user interface (UI) design was needed so that the program output could be viewed by users after the images were instantly collected and classified with a deep learning architecture. UI displays the images taken from the camera instantly and divides the people into two classes as masked or unmasked. In addition, telemetry data such as position, altitude and battery percentage of the unmanned aerial vehicle are displayed on the designed UI. UI is given in Fig. 5.8.
Fig. 5.8 Visual of user interface (UI)
5.4 Research Findings
57
5.4 Research Findings In the study, the images obtained from the unmanned aerial vehicle with a wifi camera were classified using the MobileNet V2, ResNet152 V2 and Inception V3 deep learning architectures. In the evaluation of the MobileNet V2 architecture, 779 images were separated as test data sets. There are two different classes in the 779 images in the test dataset. 777 of the test data were predicted correctly with MobileNet V2 architecture, two of them were predicted incorrectly and an accuracy rate of 99.74% was obtained. The confusion matrix of the MobileNet V2 architecture is given in Fig. 5.9. In the evaluation of the ResNet152 V2 architecture, 779 images were allocated as test datasets. There are two classes in 779 images in the test dataset. With the ResNet152 V2 architecture, 758 of the test data were predicted correctly and 21 of them were predicted incorrectly and an accuracy rate of 96.02% was obtained. The confusion matrix of the ResNet152 V2 architecture is given in Fig. 5.10. In the evaluation of Inception V3 architecture, 779 images were separated as test dataset. There are two classes in the 779 images in the test dataset. With the Inception V3 architecture, 772 of the test data were predicted correctly, and 7 of them were predicted incorrectly and an accuracy rate of 99.10% was obtained. The complexity matrix of the Inception V3 architecture is given in Fig. 5.11. The accuracy rates obtained with the deep learning architectures used are given in Table 5.2. It is seen that the data set used in the study gives more than 95% successful results in all three deep learning architectures. Among the three deep learning architectures, MobileNet V2 has been identified as the most successful architecture with an accuracy rate of 99.74% of the deep learning architecture.
Fig. 5.9 The confusion matrix obtained with the MobileNet V2 architecture
58
5 Real-Time Mask Detection Based on Artificial Intelligence…
Fig. 5.10 The confusion matrix obtained with the ResNet152 V2 architecture
Fig. 5.11 The confusion matrix obtained with the Inception V3 architecture Table 5.2 The accuracy rates of the deep learning architectures
Deep learning architecture
Accuracy (%)
MobileNet V2
99.74
ResNet152 V2
96.02
Inception V3
99.10
References
59
5.5 Conclusion The Covid-19 pandemic, which emerged in Hubai city of Wuhan province of China in 2019, affected the whole world. One of the important measures to reduce the Covid-19 pandemic is the use of masks. In the study, real-time mask detection based on artificial intelligence was carried out using a renewable energy system unmanned aerial vehicle. Facial images taken from a patrolling unmanned aerial vehicle were classified with deep learning architectures and successful results were obtained. Firstly, unmanned aerial vehicle was designed and the solar panel was integrated into the system in order to benefit from renewable energy sources. With the three solar panels used, the flight time of the unmanned aerial vehicle was increased by 35– 42%. This increase shows that renewable energy sources can be used in unmanned aerial vehicle and efficient results will be obtained. Secondly, 3891 images obtained from the system were classified in MobileNet V2, ResNet152 V2 and Inception V3 architectures, respectively, and the results were discussed. An accuracy rate of 99.74% was achieved with MobileNet V2 architecture, 96.02% with Resnet152 V2 architecture and 99.10% with Inception V3. According to these accuracy results, the best result of 779 images used for test data was obtained from MobileNet V2 architecture with 99.74% accuracy rate. By transferring the MobileNet V2 architecture to an user interface design, the operating status of the system was tested and the mask wearing status of people was detected instantly. It is aimed to develop applications such as the unmanned aerial vehicle system developed in future studies, to detect fires, to find those wanted by the security forces, and to identify faulty parking lots in traffic. Ethical Approval Ethics committee permission was obtained from the Isparta University of Applied Sciences Scientific Research and Publication Ethics Institution, with the decision number 65/02, to take the images from 16 people.
References 1. https://covid19.saglik.gov.tr/TR-66494/pandemi.html 2. Alicilar HE, Meltem ÇÖL (2020) Yeni Koronavirüs Salgını: Korunmada Etkili Yakla¸sımlar 3. Filiz T (2020) Covid-19 Pandemi Sürecinde Yeti¸skinler Arasında Yüz Maskesi Kullanma Prati˘gi ve Tekni˘gi Üzerine De˘gerlendirme ve Öneriler. Halk Sa˘glı˘gı Hem¸sireli˘gi Dergisi 2(2):52–56 4. Sermet ¸ Kaya S¸ (2020) Yeni Koronavirüs Enfeksiyonu’nun (COVID-19) Yönetiminde Toplum Temelli Önleyici Giri¸simler Ve Önemi. J Int Soc Res 13(75) 5. https://www.cdc.gov/Coronavirus/2019-Ncov/Community/Community-Mitigation.Html 6. Eikenberry SE, Mancuso M, Iboi E, Phan T, Eikenberry K, Kuang Y, Gumel AB (2020) To mask or not to mask: modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic. Infectious Disease Model 5:293–308 7. McKibbin W, ve Fernando R (2020) COVID-19’un ekonomik etkisi. COVID-19 Zamanında Ekonomi 45:10.1162 8. https://www.icisleri.gov.tr/koronavirus-denetimlerinde-yeni-donem
60
5 Real-Time Mask Detection Based on Artificial Intelligence…
9. Akgül ˙I, Kaya V, Baran A (2021) Koronavirüse Kar¸sı Yüz Maskesi Tespitinin Derin Ö˘grenme Yöntemleri Kullanılarak ˙Incelenmesi 10. Yilmaz ÖÜA, Yayin K (2021) Derin Ö˘grenme. Kodlab Yayin Da˘gitim Yazilim Ltd. Sti ¸ 11. Özsoy K, Aksoy B (2021) Real-time data analysis with artificial intelligence in parts manufactured by FDM printer using image processing method. J Testing Evaluat 50(1) 12. Aalami N (2020) Derin ö˘grenme yöntemlerini kullanarak görüntülerin analizi. Eski¸sehir Türk Dünyası Uygulama ve Ara¸stırma Merkezi Bili¸sim Dergisi 1(1):17–20 13. Co¸skun H, ve Yi˘git T (2018) Kalp Seslerinin Sınıflandırılmasında Yapay Zeka Uygulamaları. Biyomedikal Mühendislik Sorunlarini Çözmek Için Do˘gadan Ilhamlanan Akilli Teknikler 146– 183 14. Ertin OB (2016) Sabit kanatlı bir insansız hava aracı için otopilot sistemi geli¸stirmede döngüde donanım tabanlı yakla¸sım (Master’s thesis) 15. Bozkurt AU (2008) Yenilenebilir enerji kaynaklarının enerji verimlili˘gi açısından de˘gerlendirilmesi (Doctoral dissertation, DEÜ Sosyal Bilimleri Enstitüsü) 16. Candan K, Çiçek MB, Saygı YE, Kaynak Ü (2021) Güne¸s Enerjili Insansiz Hava Araci Geli¸stirilmesi 17. Boz OH (2011) Günümüzün alternatif enerji kayna˘gı: Fotovoltaik güne¸s pilleri (Master’s thesis, Balıkesir Üniversitesi Fen Bilimleri Enstitüsü) 18. Bingöl O, Özkaya B, Paçaci S (2017) Comparison of fuzzy logic and perturb&observe control in maximum power point tracking for photovoltaic system using buck converter. Mugla J Sci Technol 1:51−57 19. Gürler S, Kabak A, Uç MA, Yildiz F (2017) Makina Mühendisli˘gi Bölümü 20. Uyanık O (2021) Pandemi ile mücadele görüntü i¸sleme teknikleri yardımıyla maske kontrolü 21. Nakir ˙I (2007) Fotovoltaik güne¸s panellerinde GTS ve MGTS kullanarak verimlili˘gin arttırılması 22. Baydilli YY (2021) Polen Ta¸sıyan Bal Arılarının MobileNetV2 Mimarisi ile Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi 21:527−533 23. Singh B, Toshniwal D, Allur SK (2019) Shunt connection: an intelligent skipping of contiguous blocks for optimizing MobileNet-V2. Neural Netw 118:192–203 24. To˘gaçar M, Cömert Z, Ergen B (2021) Intelligent skin cancer detection applying autoencoder, MobileNetV2 and spiking neural networks. Chaos, Solitons Fractals 144:110714 25. Gavai NR, Jakhade YA, Tribhuvan SA, Bhattad R (2017) MobileNets for flower classification using TensorFlow. In: 2017 international conference on big data, IoT and data science (BID), December, IEEE, pp 154–158 26. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021) SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain Cities Soc 66:102692 27. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 28. McNeely-White D, Beveridge JR, Draper BA (2020) Inception and ResNet features are (almost) equivalent. Cogn Syst Res 59:312–318 29. Tan Z (2019) Derin ö˘grenme yardımıyla araç sınıflandırma (Master’s thesis, Fırat Üniversitesi, Fen Bilimleri Enstitüsü) 30. Suresh AJ, Visumathi J (2020) Inception ResNet deep transfer learning model for human action recognition using LSTM. Materials Today: Proc 31. Dong N, Zhao L, Wu CH, Chang JF (2020) Inception v3 based cervical cell classification combined with artificially extracted features. Appl Soft Comput 93:106311
Chapter 6
Investigation of Effect of Wrapping Length on the Flexural Properties of Wooden Material in Reinforcement with Aramid FRP Semsettin ¸ Kilinçarslan, Yasemin Sim¸ ¸ sek Türker, and Nabi Ibadov
6.1 Introduction Sustainability is a concept that has been used for a long time in the management of forest resources [1–3]. This concept, which is considered as managing the resources used without losing their qualities, was inspired by the forestry sector and adapted to other sectors. Wood is an organic building material that buildings need a lot and whose raw material source is forests. It is not possible to use no wooden materials (roof, concrete formwork, coating, stairs, furniture, doors, windows, etc.) even in a structure where the entire carrier system is concrete or steel [4]. Waste material generated during the production of wooden structures is very low and can be used in different applications [5–7]. In addition to being an environmentally friendly material that causes low energy consumption and low air and water pollution in its production, wood continues to maintain the same feature after its useful life [8]. At the end of its life, even if it is not included in recycling, its disposal to the nature does not harm the environment or can be used as a biofuel. S. ¸ Kilinçarslan (B) · Y. Sim¸ ¸ sek Türker Civil Engineering Department, Engineering Faculty, Suleyman Demirel University, Isparta, Turkey e-mail: [email protected] Y. Sim¸ ¸ sek Türker e-mail: [email protected] N. Ibadov Faculty of Civil Engineering, Warsaw University of Technology, Armii Ludowej16, 00-637 Warsaw, Poland e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_6
61
62
6 Investigation of Effect of Wrapping Length …
Because of the low risk of corrosion and superior mechanical properties of FRP materials, the historical texture can be transferred to the future with the least intervention and in the most durable way [9]. The use of FRP is a good solution for repairing damages in wooden historical buildings, especially in wooden joints. In some structures, damaged beam elements can be removed and replaced with wood or laminated wooden beams reinforced with FRP [10]. Pupsys et al. [11] investigated oak wood beams with dimensions of 145 × 145 × 2450 mm strengthened with glass fiber reinforced polymer sheets, and determined that the reinforcement increased the bending properties of oak wood beams. Yashida et al. [12] examined laminated beams rubber (H. Brasiliensis) wood reinforced with carbon and glass fiber reinforced polymers between different layers. Reported that reinforcement with fiberreinforced polymers increased the bending properties of laminated beams. Murato˘glu [13] investigated beams of Eastern beech (Fagusorientalis L.) and Scots pine (Pinus sylvestris) wood strengthened with carbon fiber reinforced polymers, and determined that the reinforced samples is 108.66% better bending strength. Kılınçarslan and Sim¸ ¸ sek Türker [8], examined glulam column-beam joint area strengthen with carbon FRP and reported that the values of load carrying capacity, energy consumption and stiffness of strengthened specimens is increased. In addition, found that reinforcement with carbon based FRP fabric increased the strength and durability of the column-beam joints. Wang et al. [14] investigated that bending properties of solid Fir (Pseudotsuga menziesii Mirb.) beams reinforced with various fiber-reinforced polymer fabric composites (flax, basalt, E-glass FRP, hybrid FRP) and determined that fiber-reinforced polymers improved the bending properties of wood materials. In this study, it was aimed to investigate the effect of fiber reinforced polymer material and reinforcement length on the bending properties of Iroko wood materials.
6.2 Material and Method In this study, Iroko (C. Excelsa) wood species, which is widely used in the production of wood composites and especially for structural purposes, is studied. The Iroko beam samples used in the study have been supplied from Nasreddin Forest Products (Naswood) Ltd. in the Antalya region. The wooden beams are manufactured from smooth, knot-free, flawless timber with dimensions of 20 × 20x360 mm. In this study, it was aimed to investigate the effect of fiber reinforced polymer material and reinforcement length on the bending properties of Iroko wood materials. Samples were kept in the air conditioning cabinet with a temperature of 20 ± 2 °C and it is ensured that their moisture is approximately 12 ± 0.5%. Then the wrapping process was started. In the reinforcement process, aramid based fiber reinforced polymer was used and the wrapping was applied in two layers. The properties of aramid based fiber reinforced polymer material are given in Table 6.1. In order to determine the effect of length difference in reinforcement with fiber reinforced polymer material, 50 mm (I-FRP-50), 150 mm (I-FRP-150) ve 300 mm (I-FRP-300) length reinforcement processes were performed. A total of 4 different series samples were used, including the wrapped beams and their reference (I-R) beams (Table 6.2).
6.2 Material and Method Table 6.1 Technical properties of aramid based FRP fabric
63 Material properties Weight
Aramid
(g/m2 )
300 100
Modulus of elasticity (GPa) Tensile strength
(N/mm2 )
3300 0.170
Width (mm)
Table 6.2 Characteristics of the tested samples
Sample no
Sample code
Winding length (mm)
Strengthening status
1
I-R
–
–
2
I-FRP-50
50
+
3
I-FRP-150
150
+
4
I-FRP-300
300
+
Roll priming is performed to form a thin film layer (0.1–0.2 mm) with an epoxy-based primer developed for the MasterBrace® FRP (MasterBrace® P 3500) System. After the priming process, Developed Epoxy adhesive for MasterBrace® FRP (MasterBrace® SAT 4500) Fibrous Polymer System is used. Epoxy adhesive is applied to the primed surfaces with a roller to achieve a thickness of 1 mm. The wrapping process of wooden beams with FRP composites has been performed in a U-shaped reinforcement in three regions of the beam. In Fig. 6.1, a schematic view of the reinforcement process with 50 mm long FRP fabric is given. In Fig. 6.2, a schematic view of the reinforcement process with 150 mm long FRP fabric is given. In Fig. 6.3, a schematic view of the reinforcement process with 300 mm long FRP fabric is given. Fig. 6.1 50 mm long reinforcement application
64
6 Investigation of Effect of Wrapping Length …
Fig. 6.2 150 mm long reinforcement application
Fig. 6.3 300 mm long reinforcement application
After the epoxy adhesive is applied, fibrous polymer fabrics cut in appropriate sizes are stretched in the direction of their fibers and adhered to the surface, immediately. Then, it is ensured that the epoxy is absorbed into the fabric and there is not any gap between it and the surface by pressing in the direction of the fibers of the fibrous polymer fabrics with a roller. After the first layer of adhesive is completed, the same operations have been repeated once again, the second layer is wrapped and the wrapping process is completed. The wrapped beams are stored for 1 week before being subjected to the bending test. The flexural strength tests were performed on 20 × 20 × 360 mm specimens prepared according to TS 2474 (2005). Loading is carried out with a constant speed of 6 mm/min where the load cell capacity of the bending tester is 50 kN, and its rupture is assured. In the experiments, the span of the support points is taken as 300 mm and the force at the moment of rupture (Pmax ) was read by the device, and the values of flexural strength and modulus of elasticity are determined.
6.3 Results
65
6.3 Results In this study, Iroko samples were reinforced with aramid based FRP fabrics in 50, 150 and 300 mm lengths. Reference specimens and reinforced specimens were subjected to the flexural test. The load-displacement graphs of the tested beams after the reinforcement process are given in Fig. 6.4. When Fig. 6.2 is examined, it is seen that the maximum load carrying capacity and displacement amounts of Iroko beams reinforced with aramid based fiber reinforced polymer fabric are higher than the reference samples. It was determined that the maximum displacement amount increased by 35.34% and the maximum load carrying capacity increased by 37.33% in reinforced beams with a length of 300 mm compared to the reference beams. It was determined that the maximum displacement amount increased by 22.50% and the maximum load carrying capacity increased by 29.10% in reinforced beams with a length of 150 mm compared to the reference beams. It is seen that the lowest increase in load carrying capacity (8.92%) and displacement amount (4.24%) is in reinforced samples made at 50 mm length. Flexural strength and modulus of elasticity of the samples are given in Figs. 6.3 and 6.4. When Figs. 6.5 and 6.6 are examined, it is seen that flexural strength and modulus of elasticity values increase with the application of reinforcement with aramid based polymer fabrics. It was determined that the flexural strength of the reinforced beams with a length of 300 mm compared to the reference beams increased by 21.44% and the modulus of elasticity increased by 17.03%. It was determined that the flexural strength of the reinforced beams with a length of 150 mm compared to the reference beams increased by 14.12% and the modulus of elasticity increased by 15.36%. It is 10000
Load (N)
8000
6000
4000
I-R I-FRP-50
2000
I-FRP-150 I-FRP-300
0 0
2
4
6
8
10
12
14
16
Displacement (mm)
Fig. 6.4 Load-displacement graph of samples
18
20
22
24
66
6 Investigation of Effect of Wrapping Length … 150 Flexural strength (N/mm2)
Fig. 6.5 Flexural strength of the samples
120 90 60 30 0 I-R
I-FRP-50
I-FRP-150
I-FRP-300
15000 Modulus of elasticity (N/mm2)
Fig. 6.6 Modulus of elasticity of the samples
12000 9000 6000 3000 0 I-R
I-FRP-50
I-FRP-150
I-FRP-300
seen that the lowest increase in the flexural strength (1.77%) and modulus of elasticity (10.96%) values is observed in the reinforced samples made at 50 mm length.
6.4 Conclusions Wooden materials could be destroyed over time and lose their function due to environmental conditions and the effect of various pests (fungi, insects, etc.). For this reason, wood material might lose its durability feature over time. In recent years, one of the methods used to increase the strength properties of these materials was reinforcement with fiber reinforced polymers. In this study, different lengths of reinforcement were applied with aramid based fiber reinforced polymers and the effects on the flexural strength, modulus of elasticity and load carrying capacity of wood
References
67
material were investigated. It was determined that the flexural strength, modulus of elasticity and load carrying capacity of the wood material increased with the increase of FRP length. It has been determined that the flexural strength and modulus of elasticity values of the beams reinforced with 150 and 300 mm FRP lengths are close to each other. However, it was determined that 150 and 300 mm reinforcements were effective on the flexural strength and modulus of elasticity values compared to the reinforcement with 50 mm FRP length. If the estimated load as a result of the calculations is known, the reinforcement length can be determined and the reinforcement of the beams accordingly can be determined by this study. In this way, material, labor and time savings can be achieved by covering a certain length instead of covering all of the beams. Fiber reinforced polymer fabric application was a solution method that could be used to strengthen wood systems in cases where high load carrying capacity was desired. FRP applications extend the useful life of constructions. The construction of a new structure requires large amounts of energy and a significant initial investment. Therefore, it was more sustainable to strengthen and reuse an old structure rather than demolishing an old structure and building a new one. In order to extend the service life of an old structure, the use of FRP materials was a great advantage in the sustainable development of the construction industry. Before the invention of FRP materials, a lot of energy was consumed in the production of metal materials such as aluminum and steel, which are used instead. The low density of FRP materials minimizes the need for heavy equipment. Thus, fuel consumption and harmful emissions were less during production, transportation and application. When all these situations were considered in the context of the overall life cycle associated with the entire structural system, sustainability could be promoted in many ways by using FRP composites in construction applications.
References 1. Sahin HT, Arslan MB, Korkut S, Sahin C (2011) Colour changes of heat-treated woods of red-bud maple, European hophornbeam and oak. Color Res Appl 36(6):462–466 2. Sahin CK, Onay B (2020) Alternatıve wood species for playgrounds wood from fruit trees. Wood Res 65(1):149–160 3. Sahin C, Topay M, Var AA (2020) A study on some wood species for landscape applications: surface color, hardness and roughness changes at outdoor conditions. Wood Res 65(3):395–404 4. Uz A (2020) Housing production and housing building materials in the axis of sustainable development. Doctoral Thesis, Ankara University, Ankara 5. Kilincarslan S, Simsek Turker Y (2020) Evaluation in terms of sustainability of wood materials reinforced with FRP. J Tech Sci 10(1):22–30 6. Winandy JE (1994) Wood properties. Encycl Agric Sci 4:549–561 7. Kilincarslan S, Turker YS (2021) Experimental investigation of the rotational behaviour of glulam column-beam joints reinforced with fiber reinforced polymer composites. Compos Struct 262 8. Kettunen PO (2006) Wood: structure and properties. Trans Tech Publication, Uetikon-Zuerich
68
6 Investigation of Effect of Wrapping Length …
9. Bastianini F, Corradi M, Borri A, di Tommaso A (2005) Retrofit and monitoring of an historical building using “Smart” CFRP with embedded fibre optic Brillouin sensors. Constr Build Mater 19(7):525–535 10. Alsheghri A, Akgül T (2019) Investigation of used of carbon fiber reinforced polymer sheets in joints of wooden structures. Acad Platform J Eng Sci 7(3):406–413 11. Pupsys T, Corradi M, Borri A, Amess L (2017) Bending reinforcement of full-scale timber beams with mechanically attached GFRP composite plates. Department of Mechanical & Construction Engineering, Northumbria University, July 2017, United Kingdom. https://doi. org/10.4028/KEM.747.212 12. Yashida N, Praveen N, Mohammed A, Muhammed A (2016) Flexural stiffness and strength enhancement of horizontally glued laminated wood beams with GFRP and CFRP composite sheets. Department of Civil Engineering, College of Engineering Trivandrum, Thiruvananthapuram, Feb 2016, Kerala, India 13. Murato˘glu A (2011) Reinforcement of wood building components with carbon fiber reforced polymers (CFRP) in restoration: Karabük University, pp 1219–1240 14. Wang B, Bachtiar EV, Yan L, Kasal B, Fiore V (2019) Flax, basalt, E-glass FRP and their hybrid FRP strengthened wood beams: an experimental study. Polymers 11(8):1255. https:// doi.org/10.3390/polym11081255
Chapter 7
Deep Learning-Based Air Defense System for Unmanned Aerial Vehicles Bekir Aksoy, Mustafa Melik¸sah Özmen, Muzaffer Eylence, ˙ Seyit Ahmet Inan, and Kamala Eyyubova
7.1 Introduction After developing Unmanned aerial vehicle (UAV) systems as a product of military technology, there has been a significant diversification in the purposes and forms of use, as their benefits have become widespread in the civilian field. UAV systems, which reach large masses as a product marketed in the global economy, can undertake many tasks that make human life easier, from engineering applications [1, 2] to cargo transportation [3]. The fact that large masses can use UAV systems for hobby purposes [4], aviation [5], and photography activities [6] with meager budgets shows the level of prevalence and ease of access they have achieved, as well as the market success of the systems. As in many technological innovations in the history of humanity, there has been a significant increase in the use of UAV systems for the benefit of humanity depending on the ease of access, as well as the abuse of them in a way that threatens humanity in line with the goals of those who provide access. UAV systems have become indispensable weapons of terrorist organizations as they enable cheap, risk-free, and loud actions to be organized in an asymmetrical effort to create an asymmetrical balance of power by creating a difference and surprise effect in the struggle of terrorist organizations against legal authorities superior to them. B. Aksoy (B) · M. Melik¸sah Özmen · M. Eylence Faculty of Technology Mechatronics Engineering, Isparta University of Applied Sciences, Isparta 32200, Turkey e-mail: [email protected] S. Ahmet ˙Inan Suleyman Demirel University Rectorate, Isparta 32200, Turkey K. Eyyubova Department of Information Technologies, Mingachevir State University, Mingachevir, Azerbaijan
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_7
69
70
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
UAV systems, which allow terrorist organizations to access the airspace, which is under the control of states, the valid political government of humanity, have become a security problem with attacks against military targets or civilian populations. It is understood from the fact that terrorist organizations can continue to achieve the results they want with the same method and technique that the solution proposals produced against this security problem, whose existence cannot be denied, but mainly based on passive methods, cannot provide the desired level of security. For this reason, it is considered that it would be beneficial to change the point of view and develop alternative solution proposals against the UAV-based security risks problem, whose weight changes from time to time. In this way, it will be possible to reach more effective systems or increase the security level by integrating the developed solution proposals. There are studies on subjects such as examining, detecting, and preventing the malicious use of UAVs in the literature. Jiang et al. have created an Anti-UAV system that follows the position and trajectory of UAVs [8]. Seidalieva et al. made real-time drone detection by drawing attention to their illegal use due to the increase in the number of drones. Their study first had the object detected, then classified the detected object as bird, background, and drone. They found the model they built with CNN to have a precision of 0.701, a sensitivity of 0.788, and an F1 criterion of 0.742 from performance evaluation metrics [9]. Unlu et al. proposed a drone detection and tracking system. First, they separated the moving object from the background, then calculated the general Fourier descriptor and classified them as a bird or a drone with a neural network. They achieved a classification accuracy of up to 85.3% [10]. Unlu et al. proposed a drone detection and tracking system mounted on a rotating turret. They observed that even tiny objects were detected with meager error rates with the lightweight version of the YoloV3 architecture they used [11]. Lee et al., based on the idea that the use of drones will increase in accidents with the widespread use of drones, stated that the observers and the drone should be aware of an approaching drone, and they designed a drone detection system. They achieved 89% accuracy with Deep CNN [12]. The study aims to design a system based on detecting foreign/threat UAV systems with image processing and deep learning applications and their autonomous destruction with a ground-based air defense system. It will be investigated whether such a system can eliminate the air defense vulnerability that arises because Radar systems cannot easily detect UAVs due to their small radar footprints.
7.2 Material and Method 7.2.1 Material In the study, a deep learning-based UAV destruction weapon was developed to defend against UAV attacks. For the developed weapon system, a unique dataset of rotary wing and fixed-wing UAVs was created. The generated dataset is trained
7.2 Material and Method
71
using three different deep learning architectures. The deep learning architectures used were selected as MobileNetV2, InceptionV3, and Xception. The results of the completed architectures were used in the model UAV destruction weapon, which offers the highest accuracy rate, using performance evaluation criteria. 7.2.1.1
Technical Specifications of UAVs Used for Data Set
Two different types of UAVs, fixed-wing, and rotary-wing, were used in the study. Fixed Wing UAVs, in terms of their characteristics, are UAVs that can fly with the pressure difference created by the forward movement of the air stream [13, 14]. They have high speed and low energy usage [15]. On the other hand, rotary-wing UAVs are UAVs with vertical flight capability and high energy consumption that can hover in the air [16]. In Tables 7.1 and 7.2, the technical specifications of the fixed-wing and rotary-wing UAV used in the study and the visual of the rotary-wing UAV used in the study are given in Fig. 7.1. 7.2.1.2
Technical Specifications of the Designed UAV Destruction Weapon
The UAV destruction weapon used in the study is semi-automatic electric with the ability to launch rubber bullets. The UAV destruction weapon has a rate of fire of 12 rounds per minute. It is placed on two servo motors and has 180° of movement capability in the X and Y axes. A CMOS USB camera with HD resolution is placed Table 7.1 Fixed wing UAV technical specifications
Table 7.2 Rotary wing UAV technical specifications
Wingspan
1718 mm
UAV length
1100 mm
Flight weight
2500–3000 g
Wing surface
60 dm2
Engine
3720 kV brushless engine
ESC
70 A
Battery
14.8 V 4 S 80,000 mAh
Thrust force
3000 gr
UAV dimensions
30 cm × 30 cm × 5.5 cm
Flight distance
150–200 m
Flight weight
150–200 g
Engine
45,000 rpm DC Coreless Engine
Battery
3.7 V 1 S 1800 mAh
Flight speed
20–30 kmp
72
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
Fig. 7.1 Rotary wing UAV
Fig. 7.2 UAV destruction weapon
on the gun. The gun has a 6 V battery, and a 500 rpm dc motor is used to accelerate the bullets. Arduino Uno R3 microcontroller card is used as the motion and control card of the gun. The image of the UAV destruction weapon used in the study is given in Fig. 7.2.
7.2.1.3
Creating the Dataset
Two classes, fixed-wing, and rotary-wing UAVs were used in the dataset created in the study. The image class belonging to the rotary-wing UAV was obtained by displaying it under different time zones, different weather conditions, and different lights. Fixed-wing images were created by taking the work done by Özmen [17]. The dataset used 1352 images belonging to the rotary-wing UAV class and 1462 images belonging to the fixed-wing UAV class. 20% of the images are reserved as test data and 80% as training data. The images of the dataset are given in Fig. 7.3.
7.2 Material and Method
73
Fig. 7.3 Dataset of rotary wing (drone), fixed wing (UAV) classes
7.2.1.4
Deep Learning Architectures
Three different architectures were used in the study, namely MobileNetV2, InceptionV3, and Xception architectures. Brief information about the architectures is given below. MobilenetV2 MobilnetV2 is built and developed on the MobilnetV1 architecture, which is designed for use in devices with low system requirements for classification and image processing [18, 19]. A faster calculation result was obtained in the MobilNetV2 model, where the link skipping technique in the Resnet architecture was given [20]. Figure 7.4 shows the architecture of the MobileNetV2 model. InceptionV3 The InceptionV3 model, developed by Szegedy et al. in 2015, has a 299 × 299 × 3 image input [22]. The InceptionV3 Model has an accuracy rate of 93.7% on the ImageNet dataset. There are close to 24 million parameters on the model [22, 23]. Figure 7.5 shows the architecture of the InceptionV3 model. Xception The Xception architecture is an architecture created by developing and adding to the InceptionV3 architecture [25]. It has emerged with the addition of n × n convolutions and a 1 × 1 convolution layer after the 1 × 1 convolution layer developed on the
74
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
Fig. 7.4 Architecture of the MobileNetV2 model [21]
Fig. 7.5 Architecture of the InceptionV3 model [24]
InceptionV3 architecture [26]. The architecture of the Xception model is shown in Fig. 7.6.
7.2.1.5
Performance Evaluation Criteria
In the study, accuracy, precision, sensitivity, and F1 score metrics were used using the confusion matrix to compare the completed models’ performance accuracy and determine the model that provides the most accurate result.
7.2 Material and Method
75
Fig. 7.6 Architecture of the Xception model [27]
Fig. 7.7 Example confusion matrix
Confusion Matrix The confusion matrix is the most valuable evaluation measure used in deep learning networks [28]. Other evaluation criteria are derived from the confusion matrix. Figure 7.7 image of the confusion matrix is given. As shown in Fig. 7.7; TP value, correctly estimating positive labeled data, FN value, incorrectly estimated data with positive labels, The FP value is the wrongly estimated data with negative labels, The TN value represents correctly classified negatively labeled data [29]. The accuracy value is the percentage of correctly classified data [30]. The formula for the accuracy value is shown in Eq. (7.1).
76
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
Accuracy =
TP + TN TP + FN + FP + TN
(7.1)
The precision value is the value that shows how many of the positively predicted data are positive [31]. The formula for the precision value is shown in Eq. (7.2). Precision =
TP TP + FP
(7.2)
The recall value is the value that shows how many of the data expected to be positively predicted are actually positively predicted [31]. The formula for the recall value is shown in Eq. (7.3). Recall =
TP TP + FN
(7.3)
The F1 score value is the harmonic mean of the precision and recall values of the trained model [30]. The formula for the F1 score value is shown in Eq. (7.4). F1_Score =
2 × Precision × Recall Precision + Recall
(7.4)
7.2.2 Method The workflow diagram of the study is given in Fig. 7.8. In the study, images of fixedwing and rotary-wing UAVs were collected and data preprocessed. In the images obtained in the data preprocessing process, the images that are not UAV, blurry, or the image in the picture is not understood were deleted. The dataset was made suitable for deep learning architectures. A total of 651 fixed-wing UAVs and 536 rotating-wing UAVs were obtained.
7.3 Research Findings Two thousand eight hundred fourteen images used in the study were trained using MobileNetv2, Xception, and InceptionV3 deep learning models. The training results and performance evaluation results of the models used are given below.
7.3 Research Findings
77
Fig. 7.8 Workflow diagram
7.3.1 MobileNetV2 Training Results As a result of the training on the MobileNetV2 deep learning model in the study, an accuracy rate of 96% was obtained. The training phase lasted 10 min and 24 s. Figure 7.9 shows the loss/accuracy graph of the MobileNetV2 model during the training phase. In Table 7.3, the training stage accuracy, precision, recall, and f1 score results of the MobileNetV2 deep learning model are given. The confusion matrix of the MobileNetV2 deep learning model is shown in Fig. 7.10.
Fig. 7.9 MobileNetV2 model training phase loss/accuracy graph
78
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
Table 7.3 MobileNetV2 training results
Precision
Recall
F1 score
UAV
0.98
0.93
0.96
Drone
0.95
0.98
0.97
Accuracy
0.96
Fig. 7.10 MobileNetV2 deep learning model confusion matrix
In the confusion matrix in Fig. 7.10, the field number 0 represents the rotary wing, that is, the drone class, and 1, the fixed-wing, that is, the UAV class.
7.3.2 Xception Training Results As a result of the study’s Xception deep learning model training, a 95% accuracy rate was obtained. The training phase lasted 20 min 51 s. Figure 7.11 shows the loss/accuracy graph of the Xception model during the training phase. In Table 7.4, the training stage accuracy, precision, precision, and f1 score results of the Xception deep learning model are given. The confusion matrix of the Xception deep learning model is shown in Fig. 7.12. In the confusion matrix in Fig. 7.12, the field number 0 represents the rotary wing, that is, the drone class, and 1, the fixed-wing, that is, the UAV class.
7.3 Research Findings
79
Fig. 7.11 Xception model training phase loss/accuracy graph Table 7.4 Xception training results
Precision
Precision
F1 score
UAV
0.93
0.96
0.95
Drone
0.96
0.94
0.95
Accuracy
Fig. 7.12 Xception deep learning model confusion matrix
0.95
80
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
Fig. 7.13 Inception model training phase loss/accuracy graph
Table 7.5 Inception training results
Precision
Recall
F1 score
UAV
0.91
0.95
0.93
Drone
0.95
0.92
0.94
Accuracy
0.93
7.3.3 InceptionV3 Training Results As a result of the training with the Inception deep learning model in the study, an accuracy rate of 93% was obtained. The training phase lasted 16 min and 35 s. Figure 7.13 shows the loss/accuracy graph of the Inception model during the training phase. In Table 7.5, the training stage accuracy, precision, recall, and f1 score results of the Inception deep learning model are given. The confusion matrix of the Inception deep learning model is shown in Fig. 7.14. In the confusion matrix in Fig. 7.14, the field number 0 represents the rotary wing, that is, the drone class, and 1, the fixed-wing, that is, the UAV class.
7.4 Results In the study, it was aimed to develop an air defense system capable of detecting and destroying intrusive elements, thanks to the realization of deep learning-based image processing of the determined regions, in the face of the problem of protecting the
References
81
Fig. 7.14 Inception deep learning model confusion matrix
airspace, whose importance is increasing day by day with the diversification of the risk and threat environment. In order to achieve this aim, MobileNetV2, Xception, and InceptionV3 deep learning models were used, and the obtained outputs were compared. It was understood that the MobileNetV2 learning model, which was seen to have the highest accuracy rate and the shortest response time, would be suitable for real-time image processing.
References 1. Alptekin A, Yakar M (2020) Heyelan bölgesinin ˙IHA kullanarak modellenmesi. Türkiye ˙Insansız Hava Araçları Dergisi 2(1):17–21 2. Ziba HE, Yilmaz HM (2019) Karayolu projeleri için ˙IHA ile s¸eritvari harita üretimi. Türkiye ˙Insansız Hava Araçları Dergisi 1(1):23–32 3. Belge E, Altan A, Hacio˘glu R UYARLAMALI BULANIK MANTIK DENETLEY˙IC˙I TABANLI ˙INSANSIZ HAVA ARACI (˙IHA)’NIN ROTA TAK˙IB˙I VE FAYDALI YÜK TASIMA ¸ PERFORMANSI. Mühendislik Bilimleri ve Tasarım Dergisi 9(1):116–125 4. Yeti¸s H, Güngör Z, Karaköse M Araç-˙IHA ˙I¸sbirli˘gi ile Kargo Teslimatları ˙Için Ortak Rota Optimizasyonu. Fırat Üniversitesi Fen Bilimleri Dergisi 33(2):135–144 5. Akan S, Bayram ˙I, Çam Y, Kacar H, Aydin ÖGDM, Ödevi B ˙INSANSIZ HAVA ARAÇLARININ S˙IV˙IL HAVACILIKTA KULLANIMI 6. Yakar M, Toprak AS, Ulvi A, Uysal M (2015) KONYA BEYSEH ¸ ˙IR BEZAR˙IYE HANININ (BEDESTEN) ˙IHA ˙ILE FOTOGRAMETR˙IK TEKN˙IK KULLANILARAK ÜÇ BOYUTLU MODELLENMES˙I. Türkiye Harita Bilimsel ve Teknik Kurultayı, 25 28 Mart 2015, Ankara 7. Sezer Çoban TO (2017) ˙INSANSIZ HAVA ARAÇLARININ HUKUK˙I VE ET˙IK BOYUTU. B˙ILD˙IR˙I K˙ITABI 71 8. Jiang N, Wang K, Peng X, Yu X, Wang Q, Xing J, Han Z et al (2021) Anti-UAV: a significant multi-modal benchmark for UAV tracking. arXiv preprint arXiv:2101.08466
82
7 Deep Learning-Based Air Defense System for Unmanned Aerial …
9. Seidaliyeva U, Akhmetov D, Ilipbayeva L, Matson ET (2020) Real-time and accurate drone detection in a video with a static background. Sensors 20(14):3856. 10. Unlu E, Zenou E, Rivière N Using shape descriptors for UAV detection. In Electronic imaging, Burlingam, CA, USA 10. Unlu E, Zenou E, Rivière N Using shape descriptors for UAV detection. In Electronic imaging, Burlingam, CA, USA 11. Unlu E, Zenou E, Riviere N, Dupouy PE (2019) Deep learning-based strategies for the detection and tracking of drones using several cameras. IPSJ Trans Comput Vis Appl 11(1):1–13 12. Lee D, La WG, Kim H (2018, October) Drone detection and identification system using artificial intelligence. In: 2018 international conference on information and communication technology convergence (ICTC). IEEE, pp 1131–1133 13. Güçlü A, Kurtulu¸s DF, Arikan KB SAB˙IT VE DÖNER KANATLI HAVA ARACININ YÖNEL˙IM D˙INAM˙IKLERN˙IN H˙IBR˙IT DENET˙IM˙I 14. Euston M, Coote P, Mahony R, Kim J, Hamel T (2008, September) A complimentary filter for attitude estimation of a fixed-wing UAV. In: 2008 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 340–345 15. Kohno S, Uchiyama K (2014, May) Design of robust controller of fixed-wing UAV for transition flight. In: 2014 international conference on unmanned aircraft systems (ICUAS). IEEE, pp 1111–1116 16. Çakıcı F, Yavrucuk ˙I, Shabanı R, Leblebicio˘glu MK (2016) Görev amaçlı döner kanat ˙IHA tasarımı 17. Özmen MM (2021) Derin ö˘grenmeye dayalı insansız hava araçlarına özgü dost-dü¸sman tanımlama sistemi için örnek bir uygulama 18. To˘gaçar M, Cömert Z, Ergen B (2021) Intelligent skin cancer detection applying autoencoder, MobileNetV2, and spiking neural networks. Chaos, Solitons Fractals 144:110714 19. Liu J, Wang X (2020) Early recognition of tomato gray leaf spot disease based on the MobileNetv2-YOLOv3 model. Plant Methods 16:1–16 20. Baydilli YY (2021) Polen Ta¸sıyan Bal Arılarının MobileNetV2 Mimarisi ile Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi (21):527–533 21. Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021) SSDMNV2: a realtime DNN-based face mask detection system using a single shot multibox detector and MobileNetV2. Sustain Cities Soc 66:102692 22. Güldemir NH, Alkan A Derin Ö˘grenme ile Optik Koherens Tomografi Görüntülerinin Sınıflandırılması. Fırat Üniversitesi Mühendislik Bilimleri Dergisi 33(2):607–615 23. Bozkurt F Derin Ö˘grenme Tekniklerini Kullanarak Akci˘ger X-Ray Görüntülerinden COVID-19 Tespiti. Avrupa Bilim ve Teknoloji Dergisi (24):149–156 24. https://cloud.google.com/tpu/docs/inception-v3-advanced 25. Dandil E, Serin Z (2020) Derin Sinir A˘gları Kullanarak Histopatolojik Görüntülerde Meme Kanseri Tespiti. Avrupa Bilim ve Teknoloji Dergisi 451–463. 26. Serin Z (2020) Meme kanserinin histopatolojik görüntüler üzerinde derin sinir a˘gları kullanılarak bilgisayar destekli otomatik tespiti. Master’s thesis, Bilecik Seyh ¸ Edebali Üniversitesi, Fen Bilimleri Enstitüsü 26. Serin Z (2020) Meme kanserinin histopatolojik görüntüler üzerinde derin sinir a˘gları kullanılarak bilgisayar destekli otomatik tespiti. Master’s thesis, Bilecik Seyh ¸ Edebali Üniversitesi, Fen Bilimleri Enstitüsü 27. Chen B, Ju X, Xiao B, Ding W, Zheng Y, de Albuquerque VHC (2021) Locally GAN-generated face detection based on an improved Xception. Inf Sci 572:16–28 28. Çinar A, Yildirim M (2020) Detection of tumors on brain MRI images using the hybrid convolutional neural network architecture. Medical hypotheses, 139:109684. 29. Patro VM, Patra MR (2014) Augmenting weighted average with confusion matrix to enhance classification accuracy. Trans Mach Learn Artif Intell 2(4), 77–91 29. Patro VM, Patra MR (2014) Augmenting weighted average with confusion matrix to enhance classification accuracy. Trans Mach Learn Artif Intell 2(4), 77–91 30. Chicco D, Jurman G (2020) The Matthews correlation coefficient (MCC) advantages over the F1 score and accuracy in binary classification evaluation. BMC Gen 21(1):1–13
References
83
31. Alan MA (2012) Veri madencili˘gi ve lisansüstü ö˘grenci verileri üzerine bir uygulama. Dumlupınar Üniversitesi Sosyal Bilimler Dergisi (33)
Chapter 8
Strategic Framework for ANFIS and BIM Use on Risk Management at Natural Gas Pipeline Project ˙ Ismail Altunhan, Mehmet Sakin, Ümran Kaya, and M. Fatih AK
8.1 Introductıon In today’s rapidly changing, developing and digitalizing world, increasing uncertainties and risks play an active role in the decision mechanisms of companies. In this environment of increasing uncertainty, companies need to understand and apply the risk management system well in order to determine their future strategies and achieve their goals. In order for the project to reach its goals, it will be able to see the risks with their possible consequences early and accurately, evaluate their effects, and take and follow up the necessary measures with effective project planning and control systems and methods in order to reduce and keep all these risks under control; There is a need for a risk management power that will reduce the potential harmful consequences that will hinder project success. Issues such as keeping the risk level in construction projects under control and minimizing, taking the right precautions against different risks, preventing the reflection of risks on project performance scales by reducing or eliminating their effects are issues that can solve some problems of the oil and natural gas industry and contribute to its development. In addition to the risks created by uncertainties, the systematic evaluation of risks in the oil and natural gas industry, which has many risk factors due to its dynamic ˙I. Altunhan · M. Sakin Civil Engineering Department, Hasan Kalyoncu University, Gaziantep, Turkey e-mail: [email protected] Ü. Kaya (B) · M. Fatih AK Industrial Engineering Department, Antalya Bilim University, Antalya, Turkey e-mail: [email protected] M. Fatih AK e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_8
85
86
8 Strategic Framework for ANFIS and BIM Use on Risk …
structure, and the determination of strategies and targets by taking part in decision processes are of vital importance for companies operating in this sector. Project risk management yet to be studied extensively, and not much study has been conducted in the Oil and Gas Sector, as such this review provides risk factors for Oil and Gas project. Although risk management in the oil and natural gas industry is a very important issue for companies operating in the oil and natural gas industry, it has not yet gained clarity and prevalence in our country. Considering this situation, the subject of this thesis study has been determined as “Strategic Framework for ANFIS and BIM Use on Risk Management at Natural Gas Pipeline Project”. It can be converted to Oil&Gas Industry too. Within the scope of this subject, the aims of the thesis are as follows: • Defining the concept of risk in general and risks in oil and gas projects, • To address new methods and methodologies used in risk management and analysis, • Determining the risk management approaches in the Oil and Natural Gas sector, examining the current situation and making forward-looking predictions and visualitions. For these purposes, literature research method and case study method were used in the thesis study. Literature research or survey; It has served to define the risks seen in the projects and to explain the stages and methods of risk management. The case study method was used to study the risks seen in the Tanap project with an artificial intelligence-based application and to make up-to-date determinations on risk management and assessment. In this study; The subject of risk management and its projection in the projects are examined, then a risk management model created with an Adaptive network-based fuzzy inference system approach for the oil and natural gas sector is presented. This struggle is due to the nature of man and the businesses or projects he has established. People and organizations, who have a sense of uncertainty as a result of the perception of risk, react to risks with the motive of getting rid of the fear and doubt they fall into. The drive to get rid of fears and doubts by eliminating uncertainty and the desire to know what the final outcome will be in each bad scenario has led to the development of methods of avoiding risks and the implementation of new solutions. With this approach logic, Adaptive network-based fuzzy inference system was used in risk grading for risk management and analysis and then transferred to BIM 3D-4D with visual and linguistic expressions. Gadd et al. [1] states that the scaling of risk can generally be evaluated by two main risk parameters, Risk Probability (RL) and Risk Severity (RS). In this study, in addition to these two parameters, 2 more parameters from the reference articles will be defined. Based on the assessment of the severity of the risk over the 4 newly defined parameters, the risk parameters defined according to PMI will be changed by taking expert opinion and their accuracy will be confirmed by applying ANFIS. Measuring these factors with traditional methods is a very difficult process and increases the margin of error to very high degrees for large-scale projects.
8.3 Materials and Methods
87
8.2 Literature Due to the requirements of large-scale projects, risk management depends on the opinions of individuals or the risk expert team created. The fuzzy inference technique or its derivatives is very useful in solving complex problems and systems that are not well defined. Tah and Carr [2] applied fuzzy logic in the risk assessment processes of construction projects. They evaluated the project management, risk management and cost/time performance parameters of large-scale projects using fuzzy set logic. According to [3] studied a fuzzy methodology for risk assessment for international construction projects. Their proposed methodologies use impact diagrams to build the risk model. It uses a fuzzy risk assessment approach to estimate risk priorities based on over-budget costs. Lee and Lin [4] introduced a new method for fuzzy risk assessment in their research. The method is to focus directly on fuzzy numbers and use them instead of using linguistic variables for the assessment of risks as content. Artificial intelligence techniques can have broad applications in risk management. Neural networks and fuzzy modeling are integrated together to form two advanced systems with good results. The most basic feature of fuzzy inference methodology compared to other risk assessment methods; It is the use of Comparative Learning ability in response to risk. The first study on neural networks in the risk literature was the study by McKim [5]. He used the neural network for identity risk. Wang and Elhag [6] presented an application in which back propagation neural networks were developed to model the bridge risk score and risk categories. Wang and Elhag [7] developed an Adaptive Neurofuzzy System (ANFIS) for bridge risk assessment. Wenxi and Danyang [8] used neural fuzzy network for risk assessment of road companies in China. In their model, they aimed to simplify decision making and the optimal solution process for managers. The remainder of the article is structured as follows: Part 2 is a literature review. Chap 3 describes the methodology. Chap 4 presents studies using the proposed methodology. Finally, Chap. 5 concludes our study. The list of researches performed and more presented in Tables 8.1, 8.2 and 8.3.
8.3 Materials and Methods In this section, it will be given how the raw data received from the company are reprocessed and in which formats they are used to convert fuzzy numbers. Editing and processing the data ensures that no problems occur in practice. Later, applied machine learning methods will be explained by matlab or pyhton. The research methodology followed the CIFE “Horseshoe” Method (Kunz and Fischer [9]) for transitional research (Fig. 8.1). The research method is briefly; Fuzzy Set Theory and Artificial Intelligence methods are frequently used in order to eliminate this uncertainty and turn human evaluations into meaningful results.
88
8 Strategic Framework for ANFIS and BIM Use on Risk …
Table 8.1 Literature research—AI and RM Year
Author(s)
Application
Method (s)
Project
2010
Jin
Risk management
NFDSS
PPP infrastructure projects
2011
Lin and Jianping
Risk assessment
FANP
New campus construction
2011
Eybpoosh et al.
Risk management
SEM
Construction projects
2012
Fang and Marle
Complexity, risk network
DSS and MCS
Construction projects
2012
Abdelgawad and Fayek
Risk management
FMEA and FST
Construction projects
2012
Fouladgar et al.
Risk evaluation
FTOPSIS
Tunneling project
2012
Fang et al.
Risk analysis
Network theory
Engineering project
2013
Khakzad et al.
Quantitative risk analysis
OBN
Offshore drilling operations
2013
Kardes et al.
Complexity and risk management
IRMA
Mega-projects
2013
Cárdenas et al.
Risk modeling
BBNs
Construction project s
2013
Kuo and Lu
Risk assessment
FMCDM
Metropolitan Construction projects
In the thesis, an approach based on the Adaptive Network Based Fuzzy Inference System (ANFIS) method has been developed for risk management and assessment. Later, this system has been transferred to 3D-dimensional images with color codes and linguistic expressions and risk maps have been created. The purpose of applying a neural fuzzy technique in Risk Management and assessment is not only to capture the learning ability of neural networks, but also to offer a quantitative approach of human knowledge and causality mechanism. It is aimed to combine fuzzy logic and artificial neural networks by making use of the features such as nonlinear mapping from numerical data, learning ability and parallel work, so that the risks of the study are more understandable by combining machine learning and visualization parameters.
8.3.1 Artificial Neural Networks (ANN) Artificial neural networks (ANNs) are algorithms developed with the aim of directly realizing abilities such as deriving new information, creating and discovering new information through learning, which are the characteristics of the human brain, without any assistance. The human brain can be thought of as a perfect computer that can run very fast. Artificial neural networks have been developed inspired by the biological nervous
8.3 Materials and Methods
89
Table 8.2 Literature research—BIM and RM Year Author(s)
Functionality
Benefits for risk management
2008 Hartmann et al.
3D visualization
Facilitating early risk identification
2008 Hartmann et al.
4D construction scheduling/planning
Facilitating early risk identification and risk
2008 Hartmann et al.
5D cost estimation or cash flow modelling
Planning, controlling and managing budget and cost reasonably
2011 Eastman et al.
Construction progress tracking
Improving management level for time, and budget
2014 Chen and Luo
Quality control
Improving construction quality
2008 Sacks and Barak
Structural analysis
Improving structural safety
2011 Hardin
Risk scenario planning
Reducing personnel safety hazards
2014 Volk et al.
Operation and maintenance (Q&M), facility management (FM)
Improving management level and reducing risks
2012 Laakso and Interoperability Kiviniemi,
Reducing information loss of data exchange
2011 Dossick and Neff
Facilitating early risk identification and risk communication
Collaboration and communication facilitation
system. The performances of biological neural networks are too high to be underestimated, and they are capable of processing complex events. With artificial neural networks, it is aimed to bring this ability to computers. Therefore, it is useful to briefly touch on the biological nervous system (Fig. 8.2).
90 Table 8.3 Literature research—studies that carry risk modeling to the third dimension apart from probability and effect dimensions in history and their parameters
8 Strategic Framework for ANFIS and BIM Use on Risk … Years
Authors
Risk Modelling-3rd Parameters
1989
Charette
Predictability
1996
Williams
2003
Jannadi and Almishari
Exposure
2006
Cervone
Risk Discrimination
2007
Aven et al.
Manageability
2007
Dikmen et al.
2007
Cagno et al.
Controllability
2007
Zeng et al.
Factor Index
2007
Zhang
Project Precision
2012
Vidal and Marle
2008
Han et a.l
Fig. 8.1 The CIFE Horseshoe method of transitional
Fig. 8.2 Biological nerve cells (a), Artificial neural (b) [10]
Meaningfulness
8.3 Materials and Methods
91
Fig. 8.3 Structure of multilayer artificial neural network
8.3.2 Structure of Artificial Neural Network Artificial nerve cells come together to form an artificial neural network. In general, cells come together in 3 layers. These layers are: Input layer, Interlayer(s) and Output layer (Fig. 8.3).
8.3.3 Fuzzy Inference System The first step of fuzzy logic modeling is to define the problem and to create membership functions by selecting the appropriate parameters accordingly. Then, a set of rules or rule base containing the solution of the problem is created according to the relevant parameters and the fuzzy subsets created. In the third stage, some inference methods (such as greatest-smallest, greatest multiplication etc.) developed by induction or deduction are selected from these rules [11]. In the last step, the method of clarification of the fuzzy value or conversion to classical numbers (center of gravity, weighted average, etc.) is determined (Fig. 8.4).
92
8 Strategic Framework for ANFIS and BIM Use on Risk …
Fig. 8.4 The fuzzy inference system
8.3.4 Adaptive Neuro-Fuzzy Inference System—ANFIS Various methods have been tried to be applied to increase the efficiency of fuzzy systems. One of them is the ANFIS technique, in which the identification process is carried out with a fuzzy model whose working method occurs in an adaptive network structure. The main purpose of neural adaptive learning techniques; It aims to develop a “learning” model for the systems to be constructed using the data set for fuzzy modeling methods. The fuzzy model, which will be used in the definition phase of the system to be built, has the ability to constantly renew and learn by using both the environmental information about the system and the input and output data of the system, thanks to its adaptive network structure. Basically, the ANFIS structure can be called a network structure of Sugeno type fuzzy systems with neural learning capability. This network consists of a combination of nodes placed in layers, each to perform a specific function and takes advantage of the learning capability of this combination [12]. In the Fuzzy Inference System, the selection of membership functions can be made arbitrarily and completely depends on the user’s choice. But the form of the membership functions taken from the reference articles will attract even more attention. However, in some models, the shape and structure of the membership function cannot be easily noticed by looking at the data sets. Advantages of Neural-Fuzzy Systems: • • • •
Learning ability Linguistic expressiveness of imprecise inputs and system outputs Adaptability Ability to process information simultaneously
ANFIS can assign all possible rules according to the structure created for the problem under consideration, or it allows the rules to be assigned by the expert with the help of data. The fact that ANFIS can create rules or enable rule creation
8.3 Materials and Methods
93
Fig. 8.5 a Sugeno fuzzy model with two rules, b Typical ANFIS architecture [13]
means that it benefits from expert opinions. For this reason, it makes it possible to obtain better results than the mean error squares criterion, as it allows artificial neural networks to benefit from expert opinions in many estimation problems (Fig. 8.5).
8.3.5 What is the Building Information Modelling (BIM) BIM (Building Information Modeling), which is one of the leading developments in the architecture, engineering and construction sectors in recent years, combines traditional computer-aided design with a digital project database that keeps all the data defining the building; All data generated during the design, construction and postproduction operation process is managed. BIM is also a three-dimensional display that includes all this data.
94
8 Strategic Framework for ANFIS and BIM Use on Risk …
BIM is a process for creating and managing all information about a construction throughout the life cycle of a project. The Building Information Model, which is a key part of this process, is the digital description of every aspect of the built asset. This model is based on information collected and updated in collaboration with the key stages of a project. It is widely accepted that it will provide significant improvements in design processes and facilitate collaboration.
8.3.5.1
Benefits of BIM
Eastman et al. [14] identified the benefits of BIM in four sections: pre-construction benefits, design benefits, construction and manufacturing/operation benefits, and post-construction benefits. The pre-construction benefits are as follows: Construction and manufacturing benefits are described below: • The construction process can be simulated day by day and possible problems can be solved, • Automatic update helps to react quickly to parametric model, design or field problems, • BIM facilitates better implementation of rough construction practices with the right design model and material requirements, • Accurate quantities and specifications facilitate the procurement process.
8.3.6 Methods In order to apply the ANFIS method, a data set based on input and output is needed. The model, which is established depending on the number and type of membership functions selected, is trained using a hybrid learning algorithm. Using the ANFIS Editor in Fig. 8.6 in the Fuzzy Logic Toolbox module, which is one of the modules of MATLAB, the available input and output set can be loaded into the system, the established model can be trained and its effectiveness can be tested as a result. First of all, decision makers should determine the criteria that affect the risks, prioritize them and produce an output that can measure the risk score against these criteria. In the literature, it is observed that the number of inputs increases up to a maximum of 5 in studies where ANFIS is applied. After the effective inputs are revealed, the available data set should be divided into two as training and test data. At this stage, 105 pieces of data were taken and 90 of them were classified as training data and 15 of them as test data. ANFIS offers various options for the selection of the membership function according to the available input set. The desired membership functions are selected and used in the training data set separately, and the function type with the smallest error value is selected for training the established model. Another important point here is that the number of parameters to be trained depending on the number of member functions to be selected for each input should
8.3 Materials and Methods
95
Fig. 8.6 ANFIS edit structure
not exceed the size of the training data set. Otherwise, an error is received in the system and the training of the established model cannot be carried out effectively. In addition, it would be useful to consider the relevant issue in dividing the available data set into two as training and test data. Following the selection of the membership function type and number, what needs to be done is to train the available data set using a hybrid learning algorithm. After the model based on the determined number of cycles is trained, the effectiveness of the model can be tested based on the test data available. In Fig. 8.7, there is an example ANFIS model with 5 inputs and 3 membership functions depending on these inputs. As a result, the following algorithm emerges regarding the application of ANFIS to the risk assessment process in Risk Management: Step 1: The data set is created by determining the Risk Score criteria and the output accordingly, within the framework of the opinions of the decision makers and experts. Step 2: If the number of inputs is large, the inputs that most affect the output are determined in order to apply the ANFIS model effectively. Step 3: The available data set is divided into training and test data.
96
8 Strategic Framework for ANFIS and BIM Use on Risk …
Fig. 8.7 ANFIS model structure
Step 4: Among the membership function types in the MATLAB Fuzzy Logic Module, the function type that provides the training data with the lowest error is selected for the available input set. In the selection of the number of membership functions, attention is paid to the rule that the data set should be more than the number of parameters to be trained. Step 5: The model created after the selection of membership function type and number is trained. Step 6: The effectiveness of the trained model is measured using the available test data. Step 7: Color levels are defined by using the Risk score evaluation matrix used in the project. Step 8: Using the Omniclass class, the color codes are defined for the related risk items with the help of dynamo and transferred to the navigation program in 3D. Finally, the Risk maps image is created. 8.3.6.1
BIM Integretion with Dynamo—3D Risk Map and Linguistics Terms
Risk analysis results will be calculated directly and different colors will be automatically attributed to the BIM model. Two options are explored for linking tasks in ANFIS with BIM elements that should eventually get the risk color, the OmniClass approach and an approach that creates an extra parameter in the BIM model to which the Task ID number can be added (Fig. 8.8). A Dynamo programming code is created to link the color codes generated as a result of the analysis with ANFIS and the BIM model. Dynamo is a visual programming plugin for Revit that can automatically manipulate different aspects and read
8.4 Results
97
Fig. 8.8 Update BIM model with colors
Fig. 8.9 Some example of risk map (without linguistics terms-Source; Melegos Lilia L)
data from a BIM model. In this thesis, risk colors will be used to convey information about additional information from ANFIS and add it as colors and linguistic expressions to its corresponding elements in Revit (Fig. 8.9).
8.4 Results After the risk records created from real data in the risk assessment of natural gas and pipeline projects, the solution of the model was started by choosing this function, since the gauss2mf membership function gave results at the lowest RMSE level. The model, which was determined and developed at the beginning, consists of 4 inputs, 1 output and 221 rules. The results were obtained and normalized. The training data
98
8 Strategic Framework for ANFIS and BIM Use on Risk …
set consists of 90 data and the test data set consists of 15 data. Then, the thesis was completed by trying to visualize this process with the help of BIM-based programs. Risk assessment and management is one of the most important parts of a project. Considering the related risk assessment studies in the literature, risk assessment is a Multiple Decision Making problem that deals with risk value optimization and includes many criteria. In the thesis, a risk assessment model was obtained by considering the studies in the literature and expert opinions. The data set used in our Oil and Natural Gasbased risk assessment study resulted in lower error than classical methods with the learning feature and adaptive network-based structure of ANFIS. The proposed risk assessment model can also be used in risk assessment stages worldwide. Then, risk elements can be visualized with 4D visuals. In the stations and pipelines, which are one of the most important parts of the Petroleum and Natural Gas industry, the necessity of reducing the risk coefficient to the minimum level as much as possible, the model consists of a total of 4 inputs with 5 5 3 3 membership functions and the number of output membership functions is 4 parameters applied in the project. It is a structure that can take a long time when a large data set is used. In order to compare the superiority of the neural fuzzy logic approach to classical and fuzzy risk assessments in Risk Management, whose model is presented, RMSE and MAPE error values of Multiple Regression output results were compared on the same data set. These values are given in the previous application section and above in Table 8.4. Risk management is a process that covers all stages, starting from the thought stage of the product to the presentation of it to the customer as a product. Continuity of risk management with quick decisions and actions. It is a systematic structure in which risks are determined, which risks need to be resolved first, strategies and plans are developed and implemented to deal with risks, or it is a discipline that aims to reduce uncertainties and the negative effects of uncertainty to a more acceptable level. In order to pass this process with the least losses, Firms should not ignore the technology support, which is the necessity of the age. It is essential for companies to fully integrate the organization in AI and BIM in the governance of these two issues. Table 8.4 Error indicators Error Indıcator
Training data set
Test data set
RMSE
MAPE
R2
RMSE
MAPE
R2
ANFIS
0.99
0.09
0.9097
1.43
0.11
0.8647
Multiple regression
1.28
0.17
0.8628
2.34
0.13
0.7145
References
99
8.5 Conclusion Risk Management in companies operating in the international oil and natural gas sector generally follows the decision of people who have worked and experienced in this sector for many years. In fact, this situation, which should be passed through a correct and detailed preliminary study and made a decision within the framework of the prepared report, takes place depending on the predictions of the experienced people in a very short time, and the decision mechanism works accordingly. As a result, in the thesis, it has been tried to present a consistent, effective and fast solution for the Risk Management problem, which is difficult to manage in projects. Considering the suggestions expressed above, ANFIS etc. in the Risk Management problem. It is believed that artificial intelligence and information modeling based applications can be increased by diversifying. Research Contributions: The purpose of creating this model; To provide a risk management tool that helps companies with a model prepared by re-evaluating the verbally expressed parameters in the Risk management process, which is determined by benefiting from the opinions of people who have worked in this sector for many years, for companies operating in the international arena, with the help of artificial intelligence-based ANFIS and building information modeling-based software. As a result, while companies working in the international arena make decisions in risk management, country factors, risk factors that are effective in the construction of the project, contract factors and factors for the firm’s earnings, etc. they will have the opportunity to evaluate all of them together. Contribution to academic works: The applied model includes a risk score with four inputs and one output. In academic studies, different model combinations can be established by changing the number of inputs and outputs. Risk models with more than one output can be established to guide future studies. One of the ANFIS-named CANFIS- methods developed in this direction and suitable for multiple output structure can be applied to the problem. In addition, the mamdani ANFIS module can be applied instead of the sugeno type applied in the model. Instead of the Revit and Dynamo model, which are used as application visualization models, other software such as bentley, Rhyno etc. can be used. Visualization of risks can also be done using GIS coordinates.
References 1. Gadd S, Keeley D, Balmforth H (2004) Pitfalls in risk assessment: examples from the UK. Saf Sci 42:841–857 2. Carr V, Tah JHM (2000) A proposal for construction project risk assessment using fuzzy logic. Constr Manag Econ 18:491–500
100
8 Strategic Framework for ANFIS and BIM Use on Risk …
3. Dikmen I, Birgonul MT, Han S (2007) Using fuzzy risk assessment to rate cost overrun risk in international construction projects. Int J Project Manage 25(5):494–505 4. Lee HM, Lin L (2010) A new fuzzy risk assessment approach. Knowl-Based Intell Inf Eng Syst 98–105 5. McKim RA (1993) Neural networks and identification and estimation of risk. Trans AACE Int p 5.1 6. Wang YM, Elhag TM (2007) A fuzzy group decision making approach for bridge risk assessment. Comput Ind Eng 53(1):137–148 7. Wang YM, Elhag T (2008) An adaptive neuro-fuzzy inference system for bridge risk assessment. Expert Syst Appl 34(4):3099–3106 8. Wenxi Z, Danyang C (2009) Expressway management risk evaluation based on fuzzy neural networks. In: Proceedings of the 2009 second ınternational conference on ıntelligent computation technology and automation 02:700–703 9. Kunz J, Fischer M (2008) CIFE research questions and methods. Stanford University, Centre for Integrated Facility Engineering 10. Bais P (2020) medium.com. Retrieved from https://medium.com/@sakshisingh_43965/biolog ical-artificial-neural-network-471722148217 11. Ross T, Donald S (1995) A fuzzy multi-objective approach to risk management. In: Mohsen JP (ed) Proceedings 2nd congress on computing in civil engineering. Held in conjunction with A/E/C Sys. ’95, vol 2, ASCE, New York, pp 1400–1403 12. Xınqıng L, Tsoukalas LH, Uhrıg RE (1996) A neurofuzzy approach for the anticipatory control of complex systems. In: Proceedings of IEEE 5th ınternational fuzzy systems. IEEE, pp 587– 593 13. Ghorbanzadeh O, Rostamzadeh H, Blaschke T, Gholaminia K, Aryal J (2018) A new GISbased data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-fold cross-validation approach for land subsidence susceptibility mappi. Natural Hazards, 497–517 14. Eastman CM et al (2011) BIM handbook: a guide to building information modeling for owners, managers, designers, engineers and contractors. Wiley 15. Chen TC (2000) Extensions of the TOPSIS for group decision—making on fuzzy environment. Fuzzy Sets Syst 114:1–9 16. Carr V, Tah JHM (2001) A fuzzy approach to construction project risk assessment and analysis: construction project risk management system. Adv Eng Softw 32(10–11):847–857 17. Zeng J, An M, Smith NJ (2007) Application of a fuzzy based decision making methodology to construction project risk assessment. Int J Project Manage 25(6):589–600 18. Howard R, Bjo¨rk B-C (2007) Use of Standards for CAD layers in building. Autom Const 16(3):290–297
Chapter 9
Predicting Ethereum Price with Machine Learning Algorithms Mehmet Birhan and Ömür Tosun
9.1 Introduction Blockchain technology is seen as a technology that has increased its popularity recently, is of interest to public or private institutions, and has the potential to be superior to the internet, according to some researchers [1]. With this technology, people do not need a third party in terms of verification and security required for money, service or product transactions wherever they are in the world. The first time the word blockchain is mentioned in the article “Bitcoin: A peer-to-peer electronic cash system” published in 2008 by an author or group of authors, whose identity is still not yet known, although the author appears to be Satoshi Nakamoto. Bitcoin is a cryptocurrency designed as an alternative means of payment, using blockchain technology, independent of any central bank and central government units [2]. With the redesign of this blockchain technology, after the first crypto currency known as Bitcoin, the Ethereum blockchain emerged and has a structure that can be paid and sent with cryptocurrencies called “ether” just like Bitcoin [3]. The Ethereum blockchain is aimed to be faster than the Bitcoin blockchain and aims to make the verification algorithm faster and with a different technique. The Ethereum blockchain was announced by Vitalik Buterin and his team at the Bitcoin conference in 2015. Unlike Bitcoin, it allows a structure called smart contract [4]. For this reason, it is a preferred technology as it allows the creation of alternative cryptocurrencies and is advantageous because the block creation time is shorter than Bitcoin [5]. M. Birhan (B) · Ö. Tosun Akdeniz University, Antalya, Turkey e-mail: [email protected] Ö. Tosun e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_9
101
102
9 Predicting Ethereum Price with Machine Learning Algorithms
Many projects have emerged with the redesign of smart contracts and blockchain structure and at the same time, the number of cryptocurrencies and the number of exchanges on which they are traded has increased a lot as an effect of popularization. Atlan et al., there were 4930 cryptocurrencies on 13.09.2019 and the total volume of the market was approximately $196.9 Billion, and 66% of this market belonged to Bitcoin [6]. On 01.08.2021, the number of crypto currency types was 11,122, the total market volume was approximately 1.646 Trillion $ and approximately 47.3% of the market belonged to Bitcoin [7]. Since cryptocurrencies have a structure that is very suitable for manipulation, it is expected that their returns and costs will be high. Many cryptocurrencies traded in the markets are subject to manipulation called “Pump and Dump”, giving investors great returns as well as inflicting great losses. Sudden drops that occur as a result of manipulation in a cryptocurrency cause the effects to expand by snowball effect in new and shallow markets where speculators are abundant [8]. For this reason, sudden and big jumps in one of the big cryptocurrency have the potential influence on other cryptocurrencies. For this reason, when investing in any cryptocurrency in the cryptocurrency market, considering how it is affected by other cryptocurrencies can reduce the risks. Having big returns and losses also means big risk. Therefore, it would be a rational behavior for the investor to minimize these risks and to invest in the right investment instrument with the lowest risk. The more information about the cryptocurrency to be invested, the less risk it will take. For this reason, knowledge and risk are inversely proportional. When the literature is examined, it has been seen that many researches have been carried out on crypto money in recent years using various techniques such as econometric, artificial intelligence and statistical techniques and at the same time, it has been seen that artificial intelligence techniques give more accurate results than other classical methods. Although most of these researches have been done on Bitcoin, there are not many studies on other major cryptocurrencies. The prediction of Ethereum price with current data with a large type of variables using artificial intelligence algorithms makes this study unique.
9.2 Related Works Hitam and ˙Ismail used machine learning algorithms to predict the price of some cryptocurrencies, including Ethereum, and stated in their research that the best model for Ethereum’s price prediction is the SVM method with 95.5% accuracy [9]. Derbentsev et al. examined the closing prices for Bitcoin, Ethereum and Ripple according to the price trends and proposed a short-term (5–30 days) forecast model with the BART (Binary Autoregressive Tree) method. They also stated that this model gave better results than the ARIMA-ARFIMA model [10]. Hitam et al. proposed a model for price prediction using the stock market data of the cryptocurrencies named Bitcoin, Ethereum, Nem, Ripple, Stellar and Litecoin. According to the results of the research, they found the performance ac curacy as
9.2 Related Works
103
95.5% with the SVM algorithm for the prediction of Ethereum, and as a result of the optimization of the parameters of the SVM algorithm, they achieved accuracy of 97% with the optimized SVM algorithm. In addition, it was stated that the model created with the optimized SVM algorithm showed the best performance on Ethereum [11]. In Atlan et al. machine learning algorithms were used to predict the price of Bitcoin, Ethereum and Ripple cryptocurrencies. In the related study, he stated that the model established with LSTM for Ethereum gave the best results, but the ANN and ANFIS methods were also very close to the best [6]. Chowdhury et al. used the community learning method, which uses various machine learning techniques together with cryptocurrencies, including Ethereum, and stated that this method is the best in the study and said that they reached an accuracy of 92.4% [12]. Livieris et al. worked on the prediction model of Bitcoin, Ethereum and Rip ple cryptocurrencies using a CNN-LSTM model. According to the Ethereum price estimation results, they stated that the accuracy rate of the relevant models was approximately 51.5, 48.9 and 50.9% [13]. In his study which conducted in 2020, Sel examined the relationship between the movements in the crypto money market and the fluctuation in gold prices during the pandemic process and determined that cryptocurrencies with a market volume of at least 0.5% have an effect on gold prices by machine learning method [14]. Yavuz et al. stated that the price of Bitcoin and Ethereum are in a high statistical relationship for price prediction, and they achieved a success close to 99% in the ANN model they established [15]. Akyıldırım et al. established a prediction model for 12 cryptocurrencies, which were obtained in 15–30–60 min and daily periods and had the largest market value during the research period, including Ethereum. For the prediction model, SVM, Random Forest, ANN, Logistic Regression and ARIMA techniques were used. According to the results of the research, he stated that the best estimation model for daily data is the model established with the Random Forest algorithm. They also emphasized that short-term forecasts are better according to the findings [16]. In Kim et al., while modeling the Ethereum price prediction, they made a price prediction with ANN and SVM techniques, using macroeconomic variables and some blockchain variables of Ethereum, as well as the prices of some crypto currencies. According to the results of the research, he stated that besides the macroeconomic variables to improve the forecast, the blockchain information has an important role in price forecasting [17]. Zhang et al. determined that the WAMC prediction model used for Ethereum gave more accurate results than other machine learning algorithms [18].
104
9 Predicting Ethereum Price with Machine Learning Algorithms
9.3 Method and Material 9.3.1 Used Methods 9.3.1.1
Overview of Lasso Regression (LR)
Sometimes the performance of predictions made using classical regression analysis with large data is not as desired. As a solution to this situation, the Lasso Regression method, which is one of the penalized regression methods, provides benefits [19]. The penalty parameter of this method is the sum of the feature coefficients in absolute value, and it uses the “Manhattan Distance” as the penalty criterion [20]. It imposes a restriction that forces the absolute value sum of the regression coefficients to be less than a certain constant λ [21]. As the number of λ increases, it shows similarity to the results obtained by the least squares method, while as this number increases, the shrinkage model comes into play and increases the prediction performance [22].
9.3.1.2
Overview of XGBoost Algorithm
The XGBoost algorithm, in other words, Extreme Gradient Boosting is one of the successful methods developed for machine learning in recent years, and it is an algorithm that has been accepted as an important tool in supervised learning, which controls excessive learning and thus provides better performance [23–25]. Supervised learning is giving output data in response to the input data defined in the system for the model to be trained [26]. The system creates the predictive model from the training data. In the prediction model obtained, the amount of error in a researched problem can be measured. Therefore, it can be expressed as a supervised learning algorithm that performs augmentation to produce accurate models [25]. The basis of the XGBoost algorithm is the optimization of the objective function, and the most important feature is that it is scalable [27].
9.3.1.3
Overview of Artificial Neural Network (ANN)
Artificial neural networks (ANN) is a method that has a working principle similar to the nervous systems found in living biology and can be applied in many areas such as classification, optimization, prediction, and pattern [28]. Artificial neural networks actually constitute a mathematical model of interconnected artificial neurons and are based on the following basic assumptions [28]: • Information processing consists of neurons. • Signals are transmitted by connections between neurons. • Each connection between neurons has a weight value.
9.3 Method and Material
105
Fig. 9.1 Artificial nerve cell [28]
Fig. 9.2 Artificial neural network [30]
• In order for the net output of each neuron to occur, the net input must be processed in an activation function. However, apart from these basic assumptions, unlike statistical methods, they do not require an assumption such as normality or functional structure [29]. The schematic representation of the artificial nerve cell is given in Fig. 9.1. An artificial neural network is formed by grouping artificial nerve cells. This grouping consists of interconnected layers. The hidden layer between the input layer and the output layer consists of one or more layers. Each hidden layer receives its signals from neurons in the previous layer. In other words, each neuron sends the outputs of its own function to the neurons of the next layer. The schematic representation of the artificial neural network is given in Fig. 9.2.
9.3.1.4
Overview of Random Forest (RF)
The Random Forest algorithm structure is an algorithm based on decision trees. Instead of dividing each node using the best separation between them, it divides by the best separation in its subsets [31, 32]. Certain weights are given to newly formed trees. This is done by dividing the internal error and the weight to be given in inverse
106
9 Predicting Ethereum Price with Machine Learning Algorithms
Fig. 9.3 Random forest diagram [34]
proportion while giving these weights. That is, a large weight is given despite low internal error, and these weights are used in the voting phase for class estimation [33]. The schematic representation of the random forests algorithm is given in Fig. 9.3.
9.3.2 Data Collecting The data used in the research were obtained from the website www.investing.com, which is open source for everyone and also contains many economic data. In obtaining the data to be used in the research and the variables to be studied, 9 crypto currencies with the highest market value and at least 3 years of daily price data were determined on the site coinmarketcap.com as of May 5, 2021. The basic information of the specified cryptocurrencies is shown in Table 9.1. Using the data of the specified cryptocurrencies between 01.01.2018 and 05.05.2021, a data file was pre pared to predict the Ethereum’s price for following day. In order to improve the predictive power of the research, not only the daily price, but also the closing price, the opening price, the highest and lowest price of the selected cryptocurrencies, and the daily trading volume were included. Based on the Sel [14] study, the effect of gold prices, which is an economic factor, on Bitcoin price and Yavuz et al. [15] was included in the analysis considering that it will contribute to the research due to the statistical relationship in the prices of Bitcoin and Ethereum. The missing weekend data, which is not included in the data on gold prices, was completed with the linear interpolation method in the SPSS program and analyzed.
9.3 Method and Material Table 9.1 Cryptocurrencies used in research
107
Cryptocurrency
Price ($)
Market value (million $)
Bitcoin (BTC)
57441.30
1073873.01
3522.76
407777.08
Ethereum (ETH)
650.99
99986.49
Dogecoin (DOGE)
0.66
85147.17
Ripple (XRP)
1.61
73305.33
Binance Coin (BNB)
Tether US (USDT)
1.00
52831.05
Cardano (ADA)
1.48
47140.66
Bitcoin Cash (BCH)
1.45
27249.05
355.96
23766.34
Litecoin (LTC)
9.3.3 Method With the data of the selected cryptocurrencies in the research, machine learning algorithms were used to predict the price of Ethereum, which has increased its popularity in recent years and is seen as the strongest crypto money against Bitcoin. In this research, various dimensions were researched to predict the Ethereum price. The first one is Model 1, in which all the crypto money data obtained is used. The second one is the correlation values calculated and filtered, and 3 different models are created over the selected variables for the estimation of the Ethereum price. These models are named as Model 2a, Model 2b and Model 2c. The variables used in Model 1 were formed with the daily, opening, highest, lowest prices and daily trading volume of the cryptocurrencies given in Table 9.1, as well as the daily, opening, highest and lowest prices of gold prices in US dollars. The variables used for Model 2a, Model 2b and Model 2c are given in Table 9.2. The training and test data regarding the methods used are divided as 70–30% for each method. In addition, the optimal parameters used for Model 1, Model 2a, Mo del 2b and Model 2c regarding the techniques used in the research are given below as Tables 9.3, 9.4, 9.5 and 9.6, respectively. Using the Python software language, machine learning algorithms were used thro ugh the Anaconda Navigator—Jupyter Notebook interface, and performance Table 9.2 Variables used for Model 2a, Model 2b and Model 2c
Dependent variable ($)
Independent variables ($)
Model 2a
Ethereum price
Bitcoin price Dogecoin lowest price
Model 2b
Ethereum price
Cardano price Dogecoin lowest price
Model 2c
Ethereum price
Tether daily volume Litecoin price Dogecoin lowest price
108 Table 9.3 Parameters used for Model 1
Table 9.4 Parameters used for Model 2a
Table 9.5 Parameters used for Model 2b
9 Predicting Ethereum Price with Machine Learning Algorithms Model 1 LR
Alpha
10
ANN
Activation Hidden layer size Solver
relu (3,4) adam
XGBoost
Estimators Learning rate Max depth
250 0.01 20
RF
max_depth n_estimators min_samples_leaf min_samples_split
35 55 25 50
LR
Alpha
7
ANN
Activation Hidden layer size Solver
relu (12) lbfgs
XGBoost
Estimators Learning rate Max depth
250 0.01 50
RF
max_depth n_estimators min_samples_leaf min_samples_split
10 50 15 20
LR
Alpha
40
ANN
Activation Hidden layer size Solver
relu (12) lbfgs
XGBoost
Estimators Learning rate Max depth
200 0.01 50
RF
max_depth n_estimators min_samples_leaf min_samples_split
100 6 5 20
Model 2a
Model 2b
measurement was made according to the methods used. The coefficient of determination R2 was used as a performance indicator.
9.4 Discussion and Results Table 9.6 Parameters used for Model 2c
109 Model 2c LR
Alpha
145
ANN
Activation Hidden layer size Solver
Relu (8) lbfgs
XGBoost
Estimators Learning rate Max depth
280 0.01 50
RF
max_depth n_estimators min_samples_leaf min_samples_split
100 100 5 20
9.4 Discussion and Results Information on the results of the analysis made according to Model 1 is given in Table 9.7. When the results are examined, almost every method has given similar results and the prediction performances are satisfactorily good. By analyzing all the variables together, the prediction performance of Ethereum’s closing price on the training data was calculated as approximately 99% for LR, ANN and XGBoost, while it was approximately 96% for RF. When we look at the analysis results, when the R2 values of the test data that the system has never seen are examined, it is seen that it is about 99% for LR and XGBoost. This gives the generalization success of the prediction model on real data. Considering the results of other methods, although the generalization success of ANN and RF is lower than the others, it cannot be said that they are definitely unsuccessful. Information on the results of the analysis made according to Model 2a is given in Table 9.8. When the results are examined, a worse performance is observed than the results obtained in Model 1. The LR method with the worst performance stands out from the results obtained. According to Model 2a, two techniques with the best performance stand out. First one is XGBoost second one is RF. Modeling performance of training data is approximately 97% for XGBoost and 95% for RF. But, when looking at the success of generalizing real-life data, it is about 96% for XGBoost and 97% for RF. In this context, the obvious superiority of XGBoost or RF over each other in prediction success cannot be mentioned. Table 9.7 Analysis results regarding Model 1 Model 1 R2
Train
R2 Test
LR
ANN
XGBoost
RF
0.9930
0.9937
0.9983
0.9610
0.9930
0.9872
0.9917
0.9702
110
9 Predicting Ethereum Price with Machine Learning Algorithms
Table 9.8 Analysis results regarding Model 2a Model 2a R2
Train
R2 Test
LR
ANN
XGBoost
RF
0.8610
0.9248
0.9742
0.9569
0.8700
0.9367
0.9615
0.9728
Table 9.9 Analysis results regarding Model 2b Model 2b R2
Train
R2 Test
LR
ANN
XGBoost
RF
0.8960
0.9546
0.9474
0.9732
0.8960
0.9514
0.9439
0.9793
Information on the results of the analysis made according to Model 2b is given in Table 9.9. When the results are examined, a worse performance is observed than the results obtained in Model 1. The LR method with the worst performance stands out from the results obtained. According to Model 2b, it was calculated that the technique with the best performance was RF. According to the relevant model, the generalization success of the prediction model created by the RF method was calculated as approximately 97%. Information on the results of the analysis made according to Model 2c is given in Table 9.10. When the results are examined, a worse performance is observed than the results obtained in Model 1. The LR method with the worst performance stands out from the results obtained. Similar to the results obtained in Model 2a, two techniques with the best performance compared to Model 2c come to the fore. One of them is XGBoost and the other is RF. Modeling performance of training data is approximately 98% for XGBoost and 97% for RF. However, when we look at the success of generalizing real-life data, it is about 97% for XGBoost and 98% for RF. Therefore, the obvious superiority of XGBoost or RF over each other in prediction success cannot be mentioned for this model either. Table 9.10 Analysis results regarding Model 2c Model 2c LR
ANN
XGBoost
RF
R2 Train
0.8100
0.9519
0.9851
0.9759
R2 Test
0.7980
0.9580
0.9740
0.9808
References
111
9.5 Conclusions and Future Work In this study, it is aimed to find the most accurate method to predict the Ethereum price for the following day. In this context, Lasso Regression, Artificial Neural Networks, XGBoost and Random Forests algorithms from machine learning techniques were used and Python software language was used for calculation. Each technique was calculated for different models created, and the coefficient of determination (R2 ) criterion was used to measure the prediction performance of the model. The parameters relating to the method used for each model was determined as a result of the experiments. Looking at all the models created in this study, it can be said that ANN, XGBoost and RF techniques give almost close results and effective results at the same time. In addition, correlation-based filtering in the creation of the prediction model and reducing the number of variables included in the analysis causes a decrease in the performance of the LR method. How about a change in the performance of the method along with the expansion of the data set of related cryptographic money it could be investigated in future studies. In addition to these, a comparative analysis can be made by creating new prediction models with different machine learning techniques or hybrid methods.
References 1. Sultan K, Ruhi U, Lakhani R (2018) Conceptualizing blockchains: characteristics & applications. arXiv Prepr. arXiv:1806.03693 2. Tanrıverdi M, Uysal M, Üstünda˘g MT (2019) Blokzinciri Teknolojisi Nedir? Ne De˘gildir?: Alanyazın ˙Incelemesi. Bili¸sim Teknolojileri Dergisi 12(3):203–217 3. Çarkacıo˘glu A (2016) Kripto-para bitcoin. Sermaye piyasası kurulu ara¸stırma dairesi ara¸stırma raporu 4. Wood G (2014) Ethereum: a secure decentralised generalised transaction ledger. Ethereum project yellow paper 151(2014):1–32 5. Mendi AF (2021) Blokzincir Uygulamaları ve Gelecek Öngörüleri. GSI J Serie C: Adv Inf Sci Technol 4(1):76–88 6. Atlan F, Pençe I, Çe¸smeli MS¸ (2020) Online Fiyat Tahmin Modeli online price forecasting model using artificial intelligence for cryptocurrencies as bitcoin, Ethereum and Ripple. In: 2020 28th signal processing and communications applications conference (SIU) . IEEE, pp 1–4 7. https://coinmarketcap.com/ 8. Güleç TC, Akta¸s H (2019) Kriptopara Birimleri Piyasasinda Pump & Dump Ma nipulasyonlarinin ˙Iki A¸samali Analizi. Atatürk Üniversitesi ˙Iktisadi ve ˙Idari Bilimler Dergisi 33(3):919–932 9. Hitam NA, Ismail AR (2018) Comparative performance of machine learning algorithms for cryptocurrency forecasting. Ind J Electr Eng Comput Sci 11(3):1121–1128 10. Derbentsev V, Datsenko N, Stepanenko O, Bezkorovainyi V (2019) Forecasting cryptocurrency prices time series using machine learning approach. In: SHS web of conferences, vol 65. EDP Sciences, p 02001 11. Hitam NA, Ismail AR, Saeed F (2019) An optimized support vector machine (SVM) based on particle swarm optimization (PSO) for cryptocurrency forecasting. Procedia Comput Sci 163:427–433
112
9 Predicting Ethereum Price with Machine Learning Algorithms
12. Chowdhury R, Rahman MA, Rahman MS, Mahdy MRC (2020) An approach to predict and forecast the price of constituents and index of cryptocurrency using machine learning. Phys A: Stat Mech Appl 551:124569 13. Livieris IE, Kiriakidou N, Stavroyiannis S, Pintelas P (2021) An advanced CNN-LSTM model for cryptocurrency forecasting. Electronics 10(3):287 14. Sel A (2020) Pandemi Sürecinde Altın Fiyatları ile Kripto Para ˙Ili¸skisinin Makine Ö˘grenme Metotları ile ˙Incelenmesi. ˙Istatistik ve Uygulamalı Bilimler Dergisi 1(2):85–98 15. Yavuz U, Özen Ü, Ta¸s K, Ça˘glar B (2020) Yapay Sinir A˘gları ile Blockchain Verilerine Dayalı Bitcoin Fiyat Tahmini. J Inf Syst Manage Res 2(1):1–9. Retrieved from https://dergipark.org. tr/en/pub/jismar/issue/55710/656814 16. Akyildirim E, Goncu A, Sensoy A (2021) Prediction of cryptocurrency returns using machine learning. Ann Oper Res 297(1):3–36 17. Kim HM, Bock GW, Lee G (2021) Predicting Ethereum prices with machine learning based on Blockchain information. Expert Syst Appl 184:115480 18. Zhang Z, Dai HN, Zhou J, Mondal SK, García MM, Wang H (2021) Forecasting cryptocurrency price using convolutional neural networks with weighted and attentive memory channels. Expert Syst Appl 115378 ˙ Gkili 19. Elasan S, Keskin S, Arı E (2016) Gli ¸ bileGen ¸ regresyonu: DNA hasarını belir leme modeli üzerinde uygulanması. Türkiye Klinikleri Biyoistatistik Dergisi 8(1):45–52. https://doi.org/10. 5336/biostatic.2015-48311 20. Çınaro˘glu S (2017) Sa˘glık Harcamasinin Tahmininde Makine Ö˘grenmesi Regresyon Yönt emlerinin Kar¸sila¸stirilmasi. Uluda˘g Univ J Fac Eng 22(2):179–200 21. Ranstam J, Cook JA (2018) LASSO regression. J Brit Surg 105(10):1348–1348 22. Jaggi M (2013) An equivalence between the lasso and support vector machines. Regularization, optimization, kernels, and support vector machines, pp 1–26 23. Carmona P, Climent F, Momparler A (2019) Predicting failure in the U.S. banking sector: an extreme gradient boosting approach. Int Rev Econ & Finan 61:304–323 24. Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X (2018) Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electron Commer Res Appl 31:24–39 25. Mitchell R, Frank E (2017) Accelerating the XGBoost algorithm using GPU computing. Peer J Comput Sci 3:127–164 26. Öztemel, E. (2012). Yapay sinir a˘gları. Papatya. 27. Zheng H, Yuan J, Chen L (2017) Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies 10(8):1168– 1188 28. Hamzaçebi Ç (2011) Yapay Sinir A˘gları: Tahmin Amaçlı Kullanımı MATLAB ve Neurosolutions Uygulamalı. Ekin Basım Yayın Da˘gıtım 29. Kaynar O, Ta¸stan S, Demirkoparan F (2011) Yapay Sinir A˘gları ile Do˘galgaz Tüketim Tahmini. Atatürk Üniversitesi ˙Iktisadi ve ˙Idari Bilimler Dergisi 25 30. Newman G, Lee J, Berke P (2016) Using the land transformation model to forecast vacant land. J Land Use Sci 11(4):450–475 31. Altınsoy F, Yüksel AS (2019) Uzaktan E˘gitimde Ö˘grencilerin Ba¸sarı Notlarının Makine Ö˘grenmesi Algoritmalarıyla Tahmini. In: 2019 International conference on artificial intelligence and applied mathematics in engineering, Manavgat, Antalya, Turkey 32. Akar Ö, Güngör O (2012) Classification of multispectral images using random forest algorithm. J Geodesy Geoinf 1(2):105–112 33. Ustalı NK, Tosun N, Tosun Ö (2021) Makine Ö˘grenmesi Teknikleri ile Hisse Senedi Fiyat Tahmini. Eski¸sehir Osmangazi Üniversitesi ˙Iktisadi ve ˙Idari Bilimler Dergisi 16(1):1–16 34. https://corporatefinanceinstitute.com/resources/knowledge/other/random-forest/
Chapter 10
Data Mining Approachs for Machine Failures: Real Case Study Ümran Kaya
10.1 Introductıon Corrugated cardboard is a product that is formed by gluing a different type of wave paper between two plain papers of different properties and structures, and its components can have many different features and has a wide range of products. Since corrugated cardboard consists of the formation of different types of paper, it is a product where many paper combinations can be made during production. This combination increases according to where and how to use corrugated cardboard and according to the customer’s request. During production, it is observed that the machines stop for a long time due to an error. It is possible that some of these errors may be from the paper combination used. It is very important for the company to know the paper type that will make the least mistake in the production of the desired product. Because, when machine stops are examined, it is seen that serious costs arise for the company. These costs are such as incomplete production due to downtime, image loss due to delay of orders, machine repair costs and overtime wages for training orders. The manufacturer wants to analyze which paper types have more fails in the combination to minimize these costs. Within the scope of this study, many machine learning methods will be used to determine the relation between paper types used in cardboard production and downtime. Which of machine learning methods works better for this problem will be compared using the correlation coefficients in the results. The data were re-prepared in 3 different ways so that the methods can make better predictions and produce more meaningful results. Production data collected in real time contains valuable Ü. Kaya (B) Industrial Engineering Department, Antalya Bilim University, Antalya, Turkey e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_10
113
114
10 Data Mining Approachs for Machine Failures: Real Case Study
information and information that can be integrated into forecast systems to improve decision making and increase efficiency [1]. The data are separated and used as training set and test set. Within the scope of all these studies, if the machine fails are revealed to be caused by the raw material used, the company will be able to determine which raw material types it will produce and the stopping times according to the paper combination used during production. If this downtime forecast is taken into account in production planning, the variability on the deadline/delivery date that the company will give to its customers will decrease. In line with this forecast, planning and information, the company will be able to minimize its costs. In the second part, machine learning studies on machine fails will be presented. In the next section, Chap. 3, general information about how the data is processed, which methods will be applied and the methods will be given. In Chap. 4, how the machine learning methods give results on the data will be compared and presented. In the last part, the results will be interpreted with all done.
10.2 Literature There are many researches done to prevent the rising costs due to fails in production. Of these, researches to detect machine fails are discussed. Shen [2] used the rough set theory method to diagnose diesel engines based on the type of fault. Maki and Teranishi [3] have developed an intelligent system integrated in the system, consisting of 3 separate steps for fault detection in Hitachi. With this system, they created a decision support mechanism for engineers. Kusiak and Kurasek [3] detected soldering errors in printed circuit board productions with decision trees, which are machine learning algorithms. Fountain, Dietterich and Sudykya [4] detected errors in integrated circuit (ic) cards produced by Hewlett Packard in their study. Romanowski and Nagi [5] used clustering and simulation using data from old designs to determine the bill of materials in the new product design. In this study, they used the decision tree method. Lee and Ng [6] have set up an online technical support system for computer troubleshooting. They have developed a hybrid case based smart system. Dengiz et al. [7] predicted the fails in ceramics with two-stage data analysis. In the first stage, they detected faulty ceramics with image processing, and in the second stage, they processed the faulty product information and made a large amount of error detection. Chien et al. [8] set up rules with the KNN algorithm to detect and predict semiconductor manufacturing errors. Kerdprasop and Kerdprasop [9] predicted errors based on more than 500 features detected with sensors in automatic semiconductor production. In this study, decision tree, naive bayes, logistic regression, instance based machine learning methods were used. Verdier and Ferreira [10] used two different methods for error detection in semiconductor production. These are adaptive mahalanobis distance and k-nearest, neighbor rule. Maiorana and Mongioj [11] used some data mining techniques to estimate the materials needed for engine revision after rally competitions in Italy. Ison et al. [12] determined machine failures,
10.3 Methods
115
which category they entered, and what measures should be taken for this failure with the decision tree model. Liao et al. [13] tried to determine the source of the error with the decision trees and the association rule by using the previous datasets in the productions targeted to be made with zero error. Chen, et al. [14], estimated errors and customer feedback in testing system levels. In this study, outlier value-based clustering was used. Bastos et al. [15] also planned maintenance by predicting machine failures with decision trees. Djelloul et al. [16] have reported failures with data mining techniques to minimize errors in production systems. In recent years, Noshi et al. [17] have made error estimations that will occur during drilling and cutting operations during production. For this study, they used logistic regression, decision trees and hierarchical clustering method. The list of researches performed to detect and prevent production errors with data mining is presented in Table 10.1.
10.3 Methods In this section, it will be given how the raw data received from the company are reprocessed and in which format they are used. Editing and processing the data ensures that no problems occur in practice. Later, applied machine learning methods will be explained.
10.3.1 Re-processing the Data In the original data, the number of machine fails is given according to the raw material combination (Fig. 10.1). This data was organized in three different types before being processed. The reason we do this is to draw more meaningful results when applying the classification and association rules methods. For example, when applying linear regression, machine fails data will be tested whether it is in its original form or grouped. There are 29 raw material paper types in the original data. In any final product, it is expressed as binary (0/1) to indicate the type of paper used. In other words, cells of a matrix of 29 are written 1, if that semi-product is used and 0, if it is not used. Machine fails amounts were handled in 2 different ways; fails in data type-1 were divided into 7 different groups by clustering method. In data type 2, fails were used in their original form. As a result of clustering, machine stops are grouped as follows (Table 10.2). If the data is organized according to this grouping, data type-1 becomes (Fig. 10.2). If the data type-2 is arranged according to the original fails, it is as follows (Fig. 10.3).
116
10 Data Mining Approachs for Machine Failures: Real Case Study
Data type 1 and data type-2 can only be used in classification algorithms. A different data type must be used for the association rule extraction. In data type3, raw materials were written in their original form, not in the form of a vector (Fig. 10.4). Table 10.1 Literature research Author/year
Topic
Subject detail
Shen et al. [2]
Fault detection in diesel engines
Rules for error detection Rough set theory according to type of failure in diesel engines have been issued
Method
Maki and Teranishi [3]
Intelligent system Developed a smart used in error detection system for online data analysis with data mining techniques in hitachi. Their approach is three steps. These consist of feature extraction, combined search, and presentation. It shows engineers focus points for error detection
Kusiak and Kurasek [4]
Soldering error detection on printed circuit board (PCB)
Fountai et al. [5]
Integrated circuit fault Based on the historical detection test data obtained from integrated circuit (IC) products produced by Hewlett Packard, the time of testing has been determined
Romanowski and Nagi [6]
Determination of BOM
Clustering and Classification method, simulation was made by land tree method using the data of old designs to determine the BOM in the new product design
Lee and Ng [7]
Computer error detection system
Online technical support system has been established for computer fault detection
Data mining techniques
Rules for error detection Decision tree based on soldering error algorithms, data in PCB Greedy value-of-information computation
Hybrid case based reasoning system (Hy-case) has been developed (continued)
10.3 Methods
117
Table 10.1 (continued) Author/year
Topic
Subject detail
Get Dengiz et al. [8]
Fault Detection in Ceramic Production
Predicts errors in Image processing ceramics with two-stage data analysis. In the first stage, it detects faulty ceramics with image processing, and in the second stage, it detects errors in large amounts of data by processing the faulty product information
Method
Chien et al. [9]
Error Detection in Semiconductor Production
A rule has been created with KNN in order to detect and predict semiconductor manufacturing errors
K-nearest neighbor rule
Kerdprasop and Kerdprasop [10]
Error detection in semiconductor manufacturing
Product error estimation according to more than 500 features detected by sensors in automatic semiconductor production
Decision tree, naive bayes, logistic regression, instance based
Verdier and Ferreira [11]
Error detection in semiconductor production
two different methods were used for error detection in semiconductor manufacturing
adaptive mahalanobis distance and k-nearest, neighbor rule
Maiorana and Mongioj [12]
Bill of materials estimated
Some data mining techniques were used to estimate materials required for engine revision after rally competitions in Italy
Clustering algorithms
Ison et al. [13]
Fault detection and classification in production
They have done the data Classification tree mining techniques to model used determine the machine malfunctions, which category they are in, and what measures should be taken for this malfunction (continued)
118
10 Data Mining Approachs for Machine Failures: Real Case Study
Table 10.1 (continued) Author/year
Topic
Subject detail
Liao et al. [14]
Data mining in zero-error generation
Processing and Decision tree, prediction of errors from association rule previous data sets in the targets targeted to be made with zero errors
Chen et al. [15]
System level test error Error testing and detection made customer feedback estimation was made in testing system levels
Bastos et al. [16]
Machine failure prediction
Maintenance planning is Supervised learning, made by estimating decision tree machine faults
Djelloul [17]
Fault detection in production
Malfunctions were communicated with data mining techniques to minimize errors in production systems
Noshi [18]
Outer body error prediction
Error prediction that will Logistic regression, occur during drilling and decision trees, cutting operations made hierarchical clustering during production has been made
Clustering based outlier analysis
A new classification approach based on hybrid neural network technique
Quality Type (Raw Material Types) BK140 NC140 KR140 NC140 KR140 KR100 FL100 FL100 FL100 TL100 BTL125 FL100 FL85 FL100 KM115 BTL125 FL100 FL100 FL100 TL100 KR140 NC140 KR140 NC140 KR140 KR140 ANSY175 FL85 ANSY175 KR140 BKR200 FL140 KR140 BTL125 FL90 TL100 BKR180 NC160 KR140 NC160 KR175
Fig. 10.1 Original data
Table 10.2 Grouped machine stops
Method
Unplanned Stop (quantity) 4 1 5 0 2 3 1 4 9
Unplanned stop (quantity)
Grup
Group
0
1
A
1
2
B
2
3
C
3–6
4
D
7–13
5
E
14–24
6
F
25–100
7
G
10.3 Methods
119
Fig. 10.2 Grouped data (data type-1)
Fig. 10.3 Ungrouped data (data type-2)
Fig. 10.4 Data prepared for the association rule (data type-3)
10.3.2 Methods The data can be used in classification methods as data type-1 and data type-2. While applying any method, both data type-1 and data type-2 were used. In practice, 208 different product types of the company for 2018 were used for the training of algorithms. The model, trained with these data, has been tested for 30 different product types from 2017. WEKA 3.8 program was used for the application. Only statistical analysis results will be given without specifying the detailed outputs of the program.
120
10 Data Mining Approachs for Machine Failures: Real Case Study
Table 10.3 Applied methods
Method
Data Type
Method type Regression
M5P
Data type-1
M5P
Data type-2
Linear regression
Data type-1
Linear regression
Data type-2
SMO regression
Data type-1
SMO regression
Data type-2
M5Rule
Data type-1
M5Rule
Data type-2
Random tree
Data type-1
Random tree
Data type-2
Multi layer perception
Data type-1
Multi layer perception
Data type-2
Apriori
Data type-3
Decision trees
Association rule
In practice, 7 different classification methods (Table 10.3) were used. These methods have been tried with both data type-1 and data type-2. The aim is to find out which method with which data type works best.
10.3.2.1
M5P Algorithm
A very simplistic explanation is that the M5P is a binary regression tree model where the last nodes are the linear regression functions that can produce continuous numerical attributes. To construct the tree mode, one M5P uses a divergence metric to produce a decision tree. The divergence metric is called Standard Deviation Reduction (SDR) and it is represented as in Eq. 10.1. n S D R = sd(T ) −
i=1 |T i|
|T i|
∗ sd(T )
(10.1)
The next step to develop a tree model involves tree pruning, tree evacuation and substitution of trees with linear regression functions. This method produces the final tree model that creates a tree-like structure with linear regression model.
10.3.2.2
Linear Regression
Linear Regression is one of the simplest supervised learning algorithms. It assumes a linear relationship between the Y variable we are trying to guess and our predictive variables X1, X2,…, Xn. However, as we mentioned in the previous article, the real regression function can never be reduced and modeled with a linear method due to
10.3 Methods
121
Fig. 10.5 Linear regression
errors. In Fig. 10.5, the blue line shows the function we predicted as a result of the linear regression method. The red curve shows the real function. While we assume that there is a linear relationship between the variable X and Y, we actually see that there is no uniform linear relationship between X and Y. However, we can still get quite close to the real function with a linear method. Let’s assume that we have a model as in Eq. 10.2. Y = a0 + a1 ∗ X + ε
(10.2)
The models that we try to calculate an answer variable (Y) with a single predictive variable (X), as here, are called simple linear regression [18].
10.3.2.3
SMO Regression
Support vector machines are used in both classification and regression problems. For regression, the goal is reversed: instead of finding the widest possible street, it tries to keep as many data samples as possible on the street (in the classification it tries to keep it outside the street). The width of the street is determined by the hyper-parameter (Fig. 10.6) [19].
Fig. 10.6 Support vector machines
122
10.3.2.4
10 Data Mining Approachs for Machine Failures: Real Case Study
M5Rule Algorithm
M5Rules generates a series of M5 trees (where mentioned in Sect. 3.2.1), where only the “best” (highest coverage) leaf/rule is retained from each tree. At each stage, the instances covered by the best rule are removed from the training data before generating the next tree. The algorithm is similar to the PART method for classification trees, except that always builds a full tree at each stage and does not employ the partial tree building speed-up of PART. M5P builds a single decision tree. It is certainly possible that an M5 rules classifier could outperform M5P on a given dataset.
10.3.2.5
Random Tree
Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. A decision tree is a simple representation for classifying examples. For this section, assume that all of the input features have finite discrete domains, and there is a single target feature called the “classification”. Each element of the domain of the classification is called a class. A decision tree or a classification tree is a tree in which each internal (non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with an input feature are labeled with each of the possible values of the target feature or the arc leads to a subordinate decision node on a different input feature. Each leaf of the tree is labeled with a class or a probability distribution over the classes, signifying that the data set has been classified by the tree into either a specific class, or into a particular probability distribution (which, if the decision tree is well-constructed, is skewed towards certain subsets of classes). A tree is built by splitting the source set, constituting the root node of the tree, into subsets—which constitute the successor children. The splitting is based on a set of splitting rules based on classification features. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node has all the same values of the target variable, or when splitting no longer adds value to the predictions. This process of top-down induction of decision trees (TDIDT) is an example of a greedy algorithm, and it is by far the most common strategy for learning decision trees from data. In data mining, decision trees can be described also as the combination of mathematical and computational techniques to aid the description, categorization and generalization of a given set of data [20].
10.3.2.6
Multi-layer Perceptron
Perceptrons proposed by F. Rozenblat in the 1950s are one of the first models of artificial neural networks. The Perceptron model is based on the biological structure of a neuron. An output value is generated when the net value, which is the
10.3 Methods
123
Fig. 10.7 Multilayer artificial neural network
sum of the inputs and the product of the corresponding weight values, exceeds the threshold value. Artificial neural network learning occurs after each feed forward and backward. A single neuron can be used to describe a linear structure using AND, OR and XOR logical structures. However, a single neuron will not be sufficient to model a nonlinear data set. In such elliptical or non-linear situations, multilayered artificial neural networks are used. In neural networks used in XOR logical approaches, the representation of multilayered neural networks is as follows (Fig. 10.7); This network structure is a two-layer artificial neural network. N hidden node points are used in the hidden layer. These are capable of producing results in a nonlinear solution space. In multi-layered neural networks, learning is carried out with forward and backward oscillation, as in a single neuron cell. It is estimated by forward oscillation and the amount of error is calculated. By taking the derivative of the error value and calculating the minimum value of the weights-w and the threshold value (bias-b), back propagation is made with the learning rate. In this back propagation, weights and threshold values are updated and the model is learned.
10.3.2.7
Association Rules (Apriori)
Association rules are if–then statements that help to show the probability of relationships between data items within large data sets in various types of databases. Association rule mining has a number of applications and is widely used to help discover sales correlations in transactional data or in medical data sets. Association rule mining, at a basic level, involves the use of machine learning models to analyze data for patterns, or co-occurrence, in a database. It identifies frequent if–then associations, which are called association rules. An association rule has two parts: an antecedent (if) and a consequent (then). An antecedent is an item found within the data. A consequent is an item found in combination with the antecedent. Association rules are created by searching data for frequent if–then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the data. Confidence indicates the number of times the if–then statements are found true. A third metric, called lift, can be used to compare confidence with expected confidence.
124
10 Data Mining Approachs for Machine Failures: Real Case Study
Association rules are calculated from itemsets, which are made up of two or more items. If rules are built from analyzing all the possible itemsets, there could be so many rules that the rules hold little meaning. With that, association rules are typically created from rules well-represented in data. With the Apriori algorithm, candidate itemsets are generated using only the large itemsets of the previous pass. The large itemset of the previous pass is joined with itself to generate all itemsets with a size that’s larger by one. Each generated itemset with a subset that is not large is then deleted. The remaining itemsets are the candidates. The Apriori algorithm considers any subset of a frequent itemset to also be a frequent itemset. With this approach, the algorithm reduces the number of candidates being considered by only exploring the itemsets whose support count is greater than the minimum support count [21].
10.4 Results Summarizing the results of 6 methods according to two different data types is given in Table 10.4. If the correlation coefficient (gives the relations between real and predicted data) results of the methods are compared according to the two data types, it is seen that the 2nd data type (grouping of machine stop errors) gives better results in the training of the model. The correlation coefficient shows the relationship between the estimated value (Y) and the estimated value (Y ^). The best education model is the random tree model (Table 10.5). If the methods are compared according to the results of the trained model with the test data, the best result can be seen in the SMOreg model. The SMOreg model, trained according to the training data set, gives the least error in the test data. However, although the best trained model gives random tree and multi layer perception, its adaptation to the test data is very poor. This problem is overfitting of random tree and multi layer perceptron methods. In other words, they memorized train data and abandoned learning. Regression models are not very suitable models for this data set. Because the correlation results were very low. However, since decision trees models and artificial neural network models are too adapted to education data, error rates in test data are high. When the 3rd dataset prepared for the association rule is run in the apriori algorithm, the obtained rules are as follows; 1. N = FL, T = KR 15 = = > L = FL 14 < conf:(0.93) > lift:(2.90) lev:(0.07) [9] conv:(5.09) 2. N = FL 39 = = > L = FL 36 < conf:(0.92) > lift:(2.87) lev:(0.17) [23] conv:(6.62) As a result of the implementation of this algorithm, two different rules were obtained. According to rule 1, FL raw material was used 14 times in 15 different product combinations contains FL and KR semi-products. According to the second rule, in the use of FL raw material 39 times, the same raw material was used 36 times
10.5 Conclusion
125
Table 10.4 Results found according to 2 data types Training data set Data type Data type-1
Data type-2
Metod
Corr. Coeff. (%)
MAE
Test set RMSE
Corr. Coeff. (%)
MAE
RMSE
M5P
40.2
1.25
1.48
4.3
1.99
2.44
Linear regression
41.7
1.23
1.47
4.0
1.99
2.44
SMO regression
39.8
1.15
1.51
11.0
2.02
2.53
M5Rule
40.2
1.25
1.48
4.3
1.99
2.44
Random tree 90.7
0.30
0.68
−3.5
2.10
2.59
Multi layer perception
1.19
1.51
−11.1
2.26
2.95
74.2
M5P
18.1
4.48
8.34
6.8
7.46
12.22
Linear regression
35.2
4.13
7.80
11.4
7.46
12.30
SMO regression
28.1
3.19
8.32
15.4
7.02
12.55
M5Rule
32.0
4.13
7.90
6.8
7.46
12.22
Random tree 85.8
1.04
4.29
5.8
7.54
13.14
Multi Layer perception
4.05
6.27
0.5
10.24
15.03
Table 10.5 Comparison of the correlation results of the methods
69.6
Data type-1 (%)
Data type-2 (%)
M5P
40.21
18.05
Linear regression
41.66
35.20
SMO regression
39.82
28.07
M5Rule
40.21
32.01
Random tree
90.73
85.78
Multi layer perception
74.21
69.58
again. According to the result of the association rule algorithm, the raw material used are not a parameter related to machine fails. if it were like this, there should be an error group type rule with the use of the raw material.
10.5 Conclusion Whether the causes of machine fails are raw materials or not was tried to be determined by machine learning algorithms. The raw data was reorganized in three
126
10 Data Mining Approachs for Machine Failures: Real Case Study
different ways and made available to the algorithms. After the data was converted to the format requested by Weka software, were applied. Seven different algorithms have been applied to three different data types. In the results, random tree and multi layer perceptron methods memorize the data and do not give good results in the test data; however, the SMOreg method also got the results it received in the training data almost in the test data. This small difference in results showed that this method is suitable for the problem. When looking at the results of both decision trees, linear regression methods and association rule methods, it has become clear that the raw materials used in the production of cardboard boxes do not affect machine stops and are not the source of unplanned stops. Many parameters in production can affect machine downtimes. As a result of this research, it is concluded that raw materials are not parameters that affect machine stops.
References 1. Elovici Y, Braha D (2003) A decision-theoretic approach to data mining. IEEE Trans Syst Man Cyber Part A, Syst 42–51 2. Shen L, Tay FEH, Qu LS, Shen Y (2000) Fault diagnosis using rough set theory. Comput Ind 61–72 3. Maki H, Teranishi Y (2001) Development of automated data mining system for quality control in manufacturing. Lect Notes Comput Sci 93–100 4. Kusiak A, Kurasek C (2001) Data mining of printed circuit board defects. IEEE Trans Robot Autom 191–196 5. Fountain T, Dietterich T, Sudykya B (2003) Data mining for manufacturing control: an application in optimizing IC test. In: Lakemeyer G, Nebel B. Exploring artificial intelligence in the new millennium, pp 381–400 6. Romanowski C, Nagi R (2004) A data mining approach to forming generic bills of material in support of variant design activities. ASME J Comput Inf Sci 316–328 7. Lee S, Ng YC (2006) Hybrid case-based reasoning for on-line product fault diagnosis. Int J Adv Manuf Technol 823–840 8. Dengiz O, Smith A, Nettleship I (2006) Two stage data-mining for flaw identification in ceramics manufacturing. Int J Prod Res 2839–2851 9. Chien C, Wang W, Chang J (2007) Data mining for yield enhancement in semiconductor manufacturing and an empirical study. Exp Syst With Appl 33:192–198 10. Kerdprasop K, Kerdprasop N (2011) A data mining approach to automate fault detection model development in the semiconductor manufacturing process. Int J Mech 4:336–344 11. Verdier G, Ferreira A (2011) Adaptive Mahalanob is distance and k-nearest neighbour rule for fault detection in semiconductor manufacturing. IEEE Trans Semiconduct Manuf 59–68 12. Maiorana F, Mongioj A (2012) A data mining approach for bill of materials for motor revision. In Proceedings of the federated conference on computer science and information systems. Italy 13. Ison AM, Li W, Spanos CJ (1997) Fault diagnosis of plasma etch equipment. In Semiconductor manufacturing conference proceedings, 1997 IEEE international symposium 14. Liao W, Wang Y, Pan E (2012) Single-machine-based predictive maintenance model considering intelligent machinery prognostics. Int J Adv Manuf Technol 63:51–63 15. Chen HH, Hsu R, Yang P, Shyr JJ (2013) Predicting system-level test and in-field customer failures using data mining. In: International test conference. Hsinchu, Taiwan 16. Bastos P, Lopes I, Pires L (2014) Application of data mining in a maintenance system for failure prediction. Safety, Reliab Risk Anal 1:933–940
References
127
17. Djelloul I, Sarı Z, Sıdıbe I (2018) Fault diagnosis of manufacturing systems using data mining techniques. In: 5th international conference on control, decision and information technologies (CoDIT’18). Thessaloniki, Greece 18. Noshi C, Noynaert S, Shubert J (2018) Casing failure data analaytics: a novel data mining technique predicting casing failure for improved drilling performance and production optimization. In: 2018 SPe annual technical conference and exhibition, dallas 19. Lemsalu M (2017) Quora. 14 December 2017. (Online). Available: https://www.quora.com/ How-does-an-M5P-M5-model-trees-algorithm-in-data-mining-work. Accessed 30 May 2020 20. Gareth J, Hastie T, Witten D, Tibshirani R (2014) An introduction to statistical learning with applications in R. Springer 21. Rouse M (2018) Search business analytics. 29 November 2018. (Online). Available: https:// searchbusinessanalytics.techtarget.com/definition/association-rules-in-data-mining. Accessed 30 May 2020
Chapter 11
Classification of People Both Wearing Medical Mask and Safety Helmet Emel Soylu and Tuncay Soylu
11.1 Introduction COVID-19 has been declared a pandemic by the World Health Organization (WHO). The number of cases approached 92 million in the coronavirus epidemic that emerged in China and turned into a global health crisis by affecting the world in a short time. According to the data published in the WHO Coronavirus Disease (COVID-19) Dashboard, 122 million 524 thousand 424 coronavirus cases were seen worldwide as of March 2021, while the loss of life reached 2 million 703 thousand 620. The outbreak of the most affected countries is the United States, India, Brazil, Russia, Britain, France, Turkey, Italy, Spain, and Germany [1]. Scientists have concentrated their studies in many areas such as the detection, treatment, reduction, and prevention of Covid-19 epidemic disease that affects the whole world [2–6]. To be protected from Covid-19, following hygiene rules, maintaining social distance, wearing a mask are among the factors that reduce the risk of contamination. It is stated that people who use masks in the fight against coronavirus are exposed to fewer viruses, they can be protected from severe and severe infections, as well as seasonal respiratory infections such as flu. As long as it is applied correctly, the use of masks is the most effective solution to minimize coronavirus transmission. In many countries, the use of masks has been made mandatory. Eikenberry et al., in their study on the effect of the use of masks showed that the transmission rate decreased by 91% and the mortality rate decreased [7]. E. Soylu (B) Faculty of Engineering, Department of Software Engineering , Samsun University, 55420, Samsun, Turkey e-mail: [email protected] T. Soylu Faculty of Engineering, Department of Electric-Electronic Engineering, Samsun University, 55420, Samsun, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_11
129
130
11 Classification of People Both Wearing Medical Mask and Safety …
In addition to the occupational health and safety rules that exist in many working environments, it has become mandatory to wear a mask. This will continue until a definitive cure for the Covid-19 disease is found. It is very important to use personal protective equipment to ensure safety during work. It is necessary to use personal safety equipment to prevent any work accident and to minimize the damage that may occur during the accident. One of the most important pieces of equipment in terms of work safety is helmets used for head protection. The helmet is a safety material that protects the head of the employees against impacts, object drops, and electric shocks at the time of contact (low voltage). The helmet protects against the risk of hitting the head, against falling objects [8, 9]. Although helmets are mostly considered to be protective against heavy objects that may fall, they are also protective against problems such as bad weather, electric shocks, and UV rays. Safety helmets are used in mines, constructions, construction sites, shipyards, the iron and steel industry, the industrial sector [10, 11]. Unfortunately, sometimes people do not follow the rules of wearing masks and helmets [12]. Those who break the rules risk both their health and the health of other people. From the images obtained from the camera, it can be detected instantly with image processing methods whether people wear masks and helmets. The success of deep learning techniques in big data analysis has attracted attention in recent years. With deep learning techniques, results with a high success rate are obtained for such problems. This study, it is aimed to determine whether people are wearing masks and helmets with the AlexNet-based deep transfer learning technique. The human brain represents information. For decades, artificial intelligence has been trying to mimic the robustness and efficiency of the human brain. People are exposed to sensory data every second of the day and somehow, they can capture the critical aspects of this data. With this aspect, people can use information in the future. The dimensional size of the data to be processed in computer science is one of the main problems. Especially in classification applications such as pattern recognition, learning complexity increases linearly due to the increase in data size. Recent neuroscience findings have provided information about the principles governing the presentation of information in the human brain and sparked new ideas for designing systems that represent information. One of the most important findings is that the neocortex, which is associated with many cognitive abilities, does not preprocess sensory signals, but allows them to propagate through a complex hierarchy of modules that represent observations based on observations over time. This discovery revealed the subfield of deep machine learning that focuses on computational models for information representation that exhibit properties similar to those of the neocortex [13]. The history of deep learning dates to the 1960s. In 1965, the first general learning algorithm for deep-fed multilayer perceptrons was published by Ivakhnenko and Lapa [14]. In this study, the best properties in each layer are selected by statistical methods and transferred to the next layer. They used backpropagation to train their networks end-to-end. 14 years later, Fukushima proposed the “Neokognitron” deep learning architecture [15]. This structure was inspired by the visual nerve cells of vertebrates. A self-organizing network has been developed with unsupervised
11.1 Introduction
131
learning. The first successful deep neural network application is presented by Yann LeCun et al. developed on mailbox articles [16]. The network has yielded successful results, but its training lasting three days was not considered appropriate in practice. After this work, again, Yann LeCunn applied backpropagation with convoluted webs to classify handwritten numbers (MNIST) using the “LeNet” network [17]. In 1995, Brendan Frey, Peter Dayan, and Geoffrey Hinton were able to train a network that contains hundreds of hidden layers, 6 of which are fully connected, using the wakesleep algorithm they developed. Network training lasted 2 days [18]. In 1997, some important developments were seen, such as long short-term memory for recurrent neural networks introduced by Hochreiter and Schmidhuber. In these periods, ANN algorithms could not be used despite their advantages, due to the computational cost, simpler models such as support vector machines that work problem-specific and use manually prepared features were preferred more from the 1990s to the 2000s. In the period called “Artificial Intelligence Hibernation”, studies in this area have come to a halt due to hardware limitations and other problems. In the early 2000s, artificial neural networks started to be a popular area again. With the increase in the operating speed of computers and the use of graphics processing units (GPU) in calculations, great improvements have been made; the computing speed has increased by about 1000 times over 10 years. In this period, ANN started to rival support vector machines again. With these developments, it has shifted from shallow networks to deep networks. This approach has begun to be used successfully in a wide range from image processing to natural language processing, from medical applications to activity recognition [13]. In the context of ANN, the expression “Deep Learning” was introduced in 2000 by Igor Aizenberg et al. [19]. In an article published by Geoffrey Hinton, she demonstrated how a multilayer feed-forward neural network can effectively train a layer at each iteration, then fine-tune it with a controlled backpropagation method [18]. With the increase in GPU speeds, it has become possible to train deep networks without pre-training. They used this approach in their deep networks, which won Ciresan et al. In traffic signs, medical imaging, and character recognition competitions. Krizhevsky, Sutskever, and Hinton designed similar architectures in 2012. In their GPU-supported works, the normalization method called “dropout” 1 was used to reduce overfitting. The effectiveness of the dropout method has been proven when it brought them outstanding results in the ILSVRC-2012 ImageNet competition. After these developments, technology companies such as Google, Facebook and Microsoft have noticed this trend and started to invest in deep learning [20]. Deep learning draws attention in the academic and industrial fields. Deep learning algorithms use multilayer architectures or deep architectures to extract features from data and can discover huge amounts of structures in data [21]. A standard neural network consists of simple interconnected processors called neurons. Each one produces real value activation sequences. Entrance neurons are activated by sensors that perceive the environment. Other neurons are also activated by the predominant connections of previously active neurons. Some neurons can influence the environment by triggering actions. Learning occurs when the neural network finds the correct weights. Depending on the problem and how the neurons are connected, such learning may require longer causal chains than computational stages [22].
132
11 Classification of People Both Wearing Medical Mask and Safety …
Deep learning provides computational model possibilities for problems with multiple processing layers. The data is passed through many layers and the result is reached. In this way, studies in many areas such as speech recognition and visual object recognition have developed. The backpropagation algorithm is used in deep learning and parameter updates are made according to the result from the previous layer. The complex structure in big data is discovered. Deep learning sheds light on many areas such as image and video processing, natural language processing, speech and speech recognition [23]. Bu et al. performed masked face detection with the convolutional neural network (CNN) technique. This study was carried out to identify criminals before the Covid19 outbreak [24]. Loey et al. achieved a 99.49% success rate in the classification of masked and unmasked people with the hybrid deep transfer learning technique [25]. They obtained the data set from the Real-World Masked Face Dataset (RMFD). Lippert et al. Conducted a mask recognition study with 99% accuracy using Single Shot Detection architecture using the RMFD dataset [26]. Wang et al. created a largescale data set for medical mask recognition [27]. Yadav has developed a system that creates an alarm for those who do not maintain social distance and do not wear masks using CNN [28]. Rohith et al. have developed a system that can detect whether motorcycle riders are wearing helmets with a deep learning technique with 86% accuracy [29]. Using image processing techniques, Li et al. determined whether the perambulatory workers were wearing masks or not [30]. Using the data set obtained from the Google Crawler, Zhang et al. used deep learning techniques to determine whether the workers were wearing safety helmets [31]. Li et al. developed a system that detects whether workers are wearing safety helmets from camera images using image processing and machine learning techniques [31]. Deep learning techniques give successful results in recognizing masks and helmets [31–43]. In the literature, it has been seen that there are many studies conducted separately on whether people wear masks or helmets. However, no study was found to determine whether they were wearing both helmets and masks. In this study, with the AlexNet-based deep transfer learning technique, it is determined whether people wear masks and helmets. For this process, two methods have been studied and compared. The rest of the study includes the data set, the method, and experimental studies.
11.2 Materials and Methods 11.2.1 Dataset This study, it is aimed to determine whether people wear masks and safety helmets, so a new data set is obtained from the combination of more than one data set. Image data were collected for four categories: not wearing masks and safety helmets, wearing masks, wearing safety helmets, wearing masks, and safety helmets.
11.2 Materials and Methods
133
• Images of “not wearing masks and safety helmets” category: MPII Human Pose dataset [44]. This dataset contains 25,000 images containing 410 activities. Sample photos from this dataset are given in Fig. 11.1. • Images of “wearing masks” category: Real-World-Masked-Face-Dataset (RMFD) [45]. It contains 5000 masked faces of 525 people and 90,000 normal faces. Sample photos from this dataset are given in Fig. 11.2. • Images of “wearing safety helmets” category: Safety Helmet Detection from Kaggle [46]. It contains 5000 images. Sample photos from this dataset are given in Fig. 11.3. • Images of both “wearing masks and safety helmets” category: Obtained from Google Crawler. Sample photos from this dataset are given in Fig. 11.4. The preparation steps of the dataset are given in Fig. 11.5. First, the pictures are read from the location they are saved in the device and the raw data set is created. Pictures with .jpg and .png extensions are used as file types. In the second step, the pictures are resized according to the entrance of the network. In the third stage, the variety and amount of data are increased. In the fourth stage, the data set is divided into two parts, including training set and test data, within the specified ratio. 80% of the data set used in this study was used for training and 20% for testing.
Fig. 11.1 Sample photos people don’t wear masks and safety helmets
134
11 Classification of People Both Wearing Medical Mask and Safety …
Fig. 11.2 Sample photos from masked people
11.2.2 Method Two methods were used in the classification of the pictures in this study. AlexNet was preferred for both methods due to its high-performance rate in the literature. The study was carried out in Matlab environment. The technical features of the computer used in this study are as follows: • • • •
GPU: NVIDIA GeForce GTX 1060 6 GB CPU: Intel i7 3.4 GHz Ram: 8 GB Operating System: 64 bits
AlexNet is the name of CNN, designed by Alex Krizhevsky [47]. The architecture of AlexNet is given in Fig. 11.6. AlexNet has about 60 million parameters which are a big number of parameters to be learned. Running these layers across on GPUs helps to speed up the process of training. Layer order of this architecture; image input, convolution, ReLU, channel normalization, pooling, convolution, ReLU, channel normalization, pooling, convolution, ReLU, convolution, ReLU, convolution, ReLU, pooling, fully connected, ReLU,
11.2 Materials and Methods
135
Fig. 11.3 Sample photos from people wearing a safety helmet
dropout, fully connected, ReLU, dropout, fully. It is in the form of connected, softmax, classification. The duties of these layers are briefly as follows [49]: • Convolution layer: multiple feature maps are created. • ReLU: Rectified Linear Unit (ReLU) is used is to increase the non-linearity. The output of ReLU is: ReLU(xi ) = max(0, xi )
(11.1)
Equation 11.1 gives 0 for negative inputs and linearly conveys for positive inputs. x i is the input value. • Pooling: to obtain pooled feature map • Fully connected: hidden layer is fully connected • Softmax: determining the possibility of class. The ith probabilistic output of the softmax function is given in Eq. 11.2. This equation contains values between 0 and 1 that sum to 1 for the number of classes. x i is input value [50]. exp(xi ) softmax(xi ) = ∑n k=1 exp(x k )
(11.2)
136
11 Classification of People Both Wearing Medical Mask and Safety …
Fig. 11.4 Sample photos from people wearing both masks and safety helmets
Fig. 11.5 Data preparation process
Input Image Dataset
Train data
Resize dataset
Data Augmentation
Split data for training and test
Test Data
In the first of the proposed methods, the input is applied to a single deep neural network. With this network, classification is made into 4 categories. In the second method, the input picture is applied to two different deep neural networks. One of these networks classifies the image as with or without a mask, the other as with or without a helmet. As a result, there are still 4 categories. To find the best result, the training parameters of the network were changed, and the training was repeated over and over. The effects of Adam (adaptive momentum), stochastic gradient descent with momentum (SGDM), root mean square propagation (RMSProp) optimizers, and batch size parameter on network performance was examined.
11.2 Materials and Methods
137
3x3 s=2
3x3 same
13x13x384
5x5 same
13x13x256
3x3 s=2
27x27x96
11x11 s=4
27x27x96
55x55x96
Fig. 11.6 The architecture of AlexNet [48]
SoftMax 4
Fig. 11.7 Block diagram of the single DNN system Input image
FC
4096
FC
4096
=
9216
3x3 s=2
6x6x256
3x3
13x13x256
3x3
13x13x384
227x227 image
Deep Neural Network
√ √ x x
√ x √ x
11.2.3 Single Deep Neural Network Figure 11.7 shows the block diagram of the system using a single deep neural network. The picture input applied as entry to the DNN is classified into one of 4 categories. Four different categories of pictures were used in the creation of this network. 366 of the data categories consist of pictures of people with masks, 573 of them with helmets, 952 of them without masks and helmets, and 125 of them with masks and helmets. 2016 pictures were used in total. Table 11.1 shows the training time and accuracy rate according to the optimizer type, learning rate, epoch number, batch size parameters. According to this table, the best result was obtained in the training where the rmsprop optimizer, 0.0001 learning rate, 30 epoch, 20 batch size learning parameters were used.
11.2.4 Double Deep Neural Network Figure 11.8 shows the block diagram of the system using two DNNs. It is determined whether a mask is worn in the first of these nets and whether a helmet is worn in the
138
11 Classification of People Both Wearing Medical Mask and Safety …
Table 11.1 Mask and safety helmet detection DNN learning parameters-results Training
Learning algorithm
Learning rate
Epoch
Iterations per epoch
Batch size
Training time (min s)
Accuracy (%)
1
sgdm
0.00001
30
161
10
237 min 12 s
93.3
2
rmsprop
0.00001
30
161
10
235 min 37 s
93.8
3
adam
0.00001
30
161
10
238 min 57 s
93.55
4
sgdm
0.00001
30
80
20
133 min 56 s
92.06
5
adam
0.00001
30
80
20
88 min 11 s
93.55
6
rmsprop
0.00001
30
80
20
128 min 25 s
94.79
Fig. 11.8 Block diagram of the two DNN system Input image
Deep Neural Network-1 (mask) Deep Neural Network-2 (safety helmet)
√ x √ x
second. In this proposed method, a data set consisting of pictures of people wearing both helmets and masks was not used. The dataset, consisting of people without a mask and a helmet, is the common data set for both networks. As a result, successful results were obtained for pictures of people with masks and helmets that were not included in the dataset. 1206 pictures were used for mask recognition and 1525 pictures were used for hard hat recognition. In Tables 11.2 and 11.3, results related to changes in learning parameters are given. According to Tables 11.2 and 11.3, the most successful results were obtained when the Adam optimizer was used while the batch size value was 10. Sample results of system are given in Figs. 11.9 and 11.10 respectively.
11.3 Conclusions and Future Work A practical safety helmet wearing detection and the medical mask-wearing system is developed in this study using AlexNet based deep transfer learning. Two methods are suggested, such as applying the input picture to a single network or two networks. In
11.3 Conclusions and Future Work
139
Table 11.2 Mask detection DNN learning parameters-results Training
Learning algorithm
Learning rate
Epoch
Iterations per epoch
Batch size
Training time (min s)
Accuracy (%)
1
rmsprop
0.00001
10
105
10
44 min 35 s
95.82
2
sgdm
0.00001
10
105
10
45 min 24 s
96.2
3
adam
0.00001
10
105
10
46 min 48 s
96.96
4
rmsprop
0.00001
10
52
20
24 min 52 s
96.95
5
sgdm
0.00001
10
52
20
24 min 44 s
93.92
6
adam
0.00001
10
52
20
26 min 13 s
96.58
Table 11.3 Safety helmet detection DNN learning parameters-results Training
Learning algorithm
Learning rate
Epoch
Iterations per epoch
Batch size
Training time (min s)
Accuracy (%)
1
rmsprop
0.00001
10
122
10
56 min 29 s
96.07
2
sgdm
0.00001
10
122
10
53 min 38 s
97.05
3
adam
0.00001
10
122
10
55 min 52 s
97.38
4
rmsprop
0.00001
10
61
20
29 min 49 s
97.05
5
sgdm
0.00001
10
61
20
29 min 43 s
94.43
6
adam
0.00001
10
61
20
28 min 53 s
95.74
the first method, a 94.79% success rate was achieved. In the second method, a success of 96.96% in mask recognition and 97.38% in helmet recognition was achieved. This study can be used to determine whether people wear helmets and medical masks in work areas where it is mandatory to wear helmets. Webcam and computer with standard features are required for the application to work. It does not require high costs. The system does not require a monitoring device for each safety-helmet or mask. Experimental results show that the proposed study is effective and efficient. From the comparative study maximum 97.38% success rate is obtained in helmet detection and 96.96% in mask recognition. The data set can be expanded to increase accuracy
140
11 Classification of People Both Wearing Medical Mask and Safety …
Fig. 11.9 Sample results (a)
Fig. 11.10 Sample results (b)
in the next stages of the work. It can be checked whether equipment such as gloves, glasses, which are necessary for work safety, are worn. Using different architectures in deep learning, their performances can be compared. This system can be used in real-time in working environments. Worker safety can be raised to high levels by classifying the images obtained from the cameras. The work can be expanded by detecting the use of other equipment such as the use of glasses and gloves.
References 1. WHO Coronavirus Disease (COVID-19) Dashboard [Internet]. World Health Organization. [cited 2021 Mar 22]. Available from https://covid19.who.int/
References
141
2. Livingston E, Bucher K (2020) Coronavirus disease 2019 (COVID-19) in Italy. JAMA 323(14):1335 3. Wang L, Lin ZQ, Wong A (2020) COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci Rep [Internet]. 10(1):1– 12. Available from https://doi.org/10.1038/s41598-020-76550-z 4. Oh Y, Park S, Ye JC (2020) Deep learning COVID-19 features on CXR using limited training data sets. 39(8):2688–700. arXiv 5. Zheng C, Deng X, Fu Q, Zhou Q, Feng J, Ma H et al (2020) Deep learning-based detection for COVID-19 from chest CT using weak label, pp 1–13. medRxiv 6. Minaee S, Kafieh R, Sonka M, Yazdani S, Jamalipour Soufi G (2020) Deep-COVID: predicting COVID-19 from chest X-ray images using deep transfer learning. Med Image Anal 65 7. Eikenberry SE, Mancuso M, Iboi E, Phan T, Eikenberry K, Kuang Y et al (2020) To mask or not to mask: modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic. Infect Dis Model [Internet] 5:293–308. Available from https://doi.org/ 10.1016/j.idm.2020.04.001 8. Bíl M, Dobiáš M, Andrášik R, Bílová M, Hejna P (2018) Cycling fatalities: when a helmet is useless and when it might save your life. Saf Sci 105:71–76 9. Fung IWH, Lee YY, Tam VWY, Fung HW (2014) A feasibility study of introducing chin straps of safety helmets as a statutory requirement in Hong Kong construction industry. Saf Sci [Internet] 65:70–8. Available from https://doi.org/10.1016/j.ssci.2013.12.014 10. Mills NJ, Gilchrist A (1993) Industrial helmet performance in impacts. Saf Sci 16(3–4):221– 238 11. Brolin K, Lanner D, Halldin P (2020) Work-related traumatic brain injury in the construction industry in Sweden and Germany. Saf Sci [Internet] 136:105147. Available from https://doi. org/10.1016/j.ssci.2020.105147 12. Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P et al (2019) Artificial intelligence and COVID-19: deep learning approaches for diagnosis and treatment. IEEE Access 2020(8):109581–109595 13. Arel I, Rose D, Karnowski T (2010) Deep machine learning—a new frontier in artificial intelligence research. IEEE Comput Intell Mag 5(4):13–18 14. Ivakhnenko AG, Lapa VG (1965) Cybernetic predicting devices. CCM Inf Corp 15. Fukushima K (1979) Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron. IEICE Tech Rep A 62(10):658–665 16. Simard P, LeCun Y, Denker JS (1993) Efficient pattern recognition using a new transformation distance. In: Advances in neural information processing systems, pp 50–8 17. LeCun Y, Jackel LD, Bottou L, Brunot A, Cortes C, Denker JS et al (1995) Comparison of learning algorithms for handwritten digit recognition. In: International conference on artificial neural networks, pp 53–60 18. Hinton GE, Dayan P, Frey BJ, Neal RM (1995) The “wake-sleep” algorithm for unsupervised neural networks. Science (80-) 268(5214):1158–61 19. Aizenberg IN, Aizenberg NN, Vandewalle J (2000) Multiple-valued threshold logic and multivalued neurons. In: Multi-valued and universal binary neurons. Springer, pp 25–80 20. Seker ¸ BA, Diri B, Hüseyin H (2017) Derin Ö˘grenme Yöntemleri ve Uygulamaları Hakkında Bir ˙Inceleme. Gazi J Eng Sci 3(3):47–64 21. Lv Y, Duan Y, Kang W, Li Z, Wang F (2014) Traffic flow prediction with big data : a deep learning approach. IEEE Trans Intell Transp Syst (99):1–9 22. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117 23. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature [Internet] 521:436. Available from https://doi.org/10.1038/nature14539 24. Bu W, Xiao J, Zhou C, Yang M, Peng C (2017) A cascade framework for masked face detection. In: 2017 IEEE international conference on cybernetics and intelligent systems (CIS) and IEEE conference on robotics, automation and mechatronics (RAM), January 2018, pp 458–62 25. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2020) A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19
142
26. 27. 28.
29.
30.
31.
32. 33. 34. 35.
36.
37. 38. 39. 40. 41. 42. 43.
44.
45. 46. 47.
11 Classification of People Both Wearing Medical Mask and Safety … pandemic. Meas J Int Meas Confed [Internet] 167:108288. Available from https://doi.org/10. 1016/j.measurement.2020.108288 Ahmed A, Adeel S, Shahriar H (2020) Face mask detector face mask recognition view project, p 13. Available from https://www.researchgate.net/publication/344173985 Wang Z, Wang G, Huang B, Xiong Z, Hong Q, Wu H et al (2020) Masked face recognition dataset and application, pp 1–3. arXiv Yadav S (2020) Deep learning based safe social distancing and face mask detection in public areas for COVID-19 safety guidelines adherence. Int J Res Appl Sci Eng Technol 8(7):1368– 1375 Rohith CA, Nair SA, Nair PS, Alphonsa S, John NP (2019) An efficient helmet detection for MVD using deep learning. In: Proceedings of international conference on trends in electronics and informatics (ICOEI), April 2019, pp 282–6 Li K, Zhao X, Bian J, Tan M (2017) Automatic safety helmet wearing detection. In: 2017 IEEE 7th annual international conference on CYBER technology in automation, control, and intelligent systems (CYBER), pp 617–22 Zhang W, Yang CF, Jiang F, Gao XZ, Zhang X (2020) Safety helmet wearing detection based on image processing and deep learning. In: Proceedings of 2020 international conference on communications, information system and computer engineering (CISCE) (2), pp 343–7 Hariri W (2020) Efficient Masked Face Recognition Method during the COVID-19 pandemic Sharma V (2018) Face mask detection using YOLOv5 for COVID-19, pp 10–4. Available from https://scholarworks.calstate.edu/downloads/wp988p69r?locale=en Sandesara AG, Joshi DD, Joshi SD (2020) Facial mask detection using stacked CNN model. Int J Sci Res Comput Sci Eng Inf Technol 3307:264–270 Loey M, Manogaran G, Taha MHN, Khalifa NEM (2020) Fighting against COVID-19: a novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain Cities Soc [Internet] 65:102600. Available from https://doi.org/10.1016/j.scs.2020. 102600 Wu F, Jin G, Gao M, He Z, Yang Y (2019) Helmet detection based on improved YOLO V3 deep model. In: Proceedings of 2019 IEEE 16th international conference on networking, sensing and control (ICNSC), pp 363–8 Cao R, Li H, Yang B, Feng A, Yang J, Mu J (2020) Helmet wear detection based on neural network algorithm. J Phys Conf Ser 1650(3) Ansor A, Ritzkal, Afrianto Y (2020) Mask detection using framework tensorflow and pretrained CNN model based on raspberry pi. J Mantik 4(3):1539–45 Golwalkar R, Mehendale N (2020) Masked face recognition using deep metric learning and FaceMaskNet-21. SSRN Electron J Technology I, Stack F, Development S (2020) Real-time masked face recognition using machine learning Said Y (2020) Pynq-YOLO-Net: an embedded quantized convolutional neural network for face mask detection in COVID-19 pandemic era. Int J Adv Comput Sci Appl 11(9):100–106 Kamboj A, Powar N (2020) Safety helmet detection in industrial environment using deep learning, pp 197–208 Long X, Cui W, Zheng Z (2019) Safety helmet wearing detection based on deep learning. In: Proceedings of 2019 IEEE 3rd information technology, networking, electronic and automation control conference (ITNEC), pp 2495–9 Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3686–93 Real world masked face dataset [Internet]. [cited 2021 Jan 20]. Available from https://github. com/X-zhangyang/Real-World-Masked-Face-Dataset Safety helmet detection [Internet]. [cited 2021 Jan 20]. Available from https://www.kaggle. com/andrewmvd/hard-hat-detection Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
References
143
48. AlexNet [Internet]. [cited 2021 Jan 20]. Available from http://datahacker.rs/deep-learning-ale xnet-architecture/ 49. CNN and Softmax [Internet]. [cited 2021 Jan 20]. Available from https://www.andreaperlato. com/aipost/cnn-and-softmax/ 50. Teow MYW (2017) Understanding convolutional neural networks using a minimal model for handwritten digit recognition. In: Proceedings of 2017 IEEE 2nd international conference on automatic control and intelligent systems (I2CACIS), 2017-Decem(October), pp. 167–72
Chapter 12
Anonymization Methods for Privacy-Preserving Data Publishing Burak Cem Kara
and Can Eyupoglu
12.1 Introduction The term “Big Data” has become widespread lately. With the prevalence of social networks, internet of things, and cloud-based outsourcing, we have witnessed explosive growth of data in terms of larger volume, higher velocity, and greater variety [1]. Big data offers great value for every organization and has been recognized as the driving force behind economic growth and technological innovation. Application service providers and data companies that provide computer-based services to customers over the network record user behavior in order to improve service quality or achieve business goals. Emerging technologies such as social networks, smart grids, and e-health systems provide excellent tools by analyzing data to better understand and serve users [2]. Approaches such as statistical science, modern computing methods, computer science, artificial intelligence techniques, machine learning algorithms and mathematics are used to obtain value from high volume data. With big data analytics sustaining by different disciplines, institutions are supported to make more effective decisions. Gaining value from these approaches is a challenging but necessary process. With big data analytics methods, many different services can be offered to individuals and institutions, existing services can be improved and the systems served are used more effectively [3]. B. C. Kara (B) · C. Eyupoglu Department of Computer Engineering, Air Force Academy, National Defence University, 34149 Istanbul, Turkey e-mail: [email protected] C. Eyupoglu e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_12
145
146
12 Anonymization Methods for Privacy-Preserving Data Publishing
Ensuring the privacy and preserving of big data is one of the most important issues. As the value produced from big data increases, security risks also increase. In order to ensure the privacy of big data, it is necessary to prevent the capture of valuable data (health, social network data, identity) by unauthorized persons. As the size of big data increases, the critical information it contains also increases. Therefore, this information, especially personal, institutional and national information, should be hidden in order not to create undesirable situations. In this context, one of the challenges of big data is to provide data privacy. In big data processing, privacypreserving can be divided into two sections: publishing privacy-preserving data and extracting information from data. In the first section, as the collected data may contain critical information about the data subject, it is aimed to protect the information from undesirable disclosure. In the second section, the aim is to extract meaningful information from the data without violating privacy. Our study focuses on privacypreserving data publishing. Big data privacy contains several challenges itself. The first of these difficulties can be shown as the data collected during privacy-preserving data publishing (PPDP) contains critical information about the data subject [4]. There are many data sets in big data. It is necessary to determine whether there is critical information in the different contents of these data sets, how critical data these contents contain, how much data privacy will be provided for each content and data set. All these concerns force us to data modification. The aim of these changes on the data is that personal information about the data subject is never disclosed. On the other hand, this change made on the data should never distort the real purpose of the work. Because the main purpose of publishing data is to obtain meaningful information from the data, and the data should always be useful. But data privacy and utility have always been inversely proportional to each other [5]. There are several methods for PPDP. In this study, we conducted our study by focusing on anonymization methods and models, which is a frequently preferred method for PPDP. The organization of the other sections in this study is as follows: The concept of big data is explained in Sect. 12.2. Anonymization methods and models developed based on them are given in Sect. 12.3. Studies in the literature related to the subject of the study are summarized in Sect. 12.4. In Sect. 12.5, studies in the literature are compared. Finally, the study is concluded with the conclusion part in Sect. 12.6.
12.2 Big Data Definition Big data is a large amount of data sets that traditional hardware and software systems cannot handle. Although the term big data has emerged in recent years, the process of collecting and storing large amounts of information for detailed data analysis has been used for many years. A certain scaling is required in order to use big data. According to this information, unscalable data makes management difficult [6]. Big data is not just data sets with a large amount, it is separated from classical data sets in many categories such as type, form, size, source, frequency, storage techniques,
12.3 Data Anonymization
147
consumers, usage, analysis type, processing purpose and processing method [3]. There are various components that determine whether a data is big data or not. Doug Laney, analyst of the Meta Group (now called “Gartner Group”), described big data in 3D with 3 V in a 2001 research report. This 3 V is volume, velocity and variety [7, 8]. Volume: It is used to define the size of data sets in big data. Today, as of the last point of technology, the amount of data is growing up to terabytes or even petabytes. Velocity: It indicates the speed at which data can be obtained. For example, velocity is an important factor for real-time processing and storage of twits sent every minute on Twitter. For this reason, data must be collected very quickly. Sometimes even a minute delay can cause inconsistency in the analyzed output. Variety: It shows the diversity of data structures in big data. The data can be of any type, structured or unstructured. There are many types of data structures among data types such as text, audio and video data, sensor data, log files, e-mail messages. Later, these “V” groups are 4 V (veracity in addition to 3 V), 5 V (value in addition to 4 V) [6], 7 V (variability, visualization in addition to 5 V) [9], 10 V (validity, venue vocabulary, vagueness in addition to 7 V) with different numbers of V characteristics [10]. Veracity: Due to the increasing data size and variety of data, it is difficult to expect meaningful results by relying on these data. Leading companies in the business world, acting in the logic of unreliable inputs generate unreliable output, do not completely trust the information obtained from big data. However, they do not ignore this information in order to make better decisions. Value: It is the transformation of big data into value. The data converted into value should be used in real time in the decisions to be made, it should be able to affect the decision instantly and we should be able to make the right decisions immediately. According to the studies existed in the literature on the concept of big data, many features that a data should have to become big data are certain. Recently, one more V needs to be defined for these properties, which are referred to as V groups. The name of this new V feature is vulnerability (Fig. 12.1). Vulnerability addresses the issue of data privacy. Because the significant increase in data has led not only to problems with the volume, velocity, variety and accuracy of data, but also to data security and privacy issues [11].
12.3 Data Anonymization Anonymization is a data processing technique that manipulates data, removing or changing identifier such as identity information, quasi-identifier such as age, gender, and attributes such as sensitive information, according to the purpose. The aim here is to remove or change all direct and indirect identifiers in a data set to prevent the
148
12 Anonymization Methods for Privacy-Preserving Data Publishing
Fig. 12.1 New V characteristics: vulnerability [11]
disclosure of individuals’ identities or their distinctiveness within a group in a way that can be associated with a natural person. With the anonymization of the data, it becomes possible to draw meaningful conclusions from the sensitive or non-sensitive information belonging to the data owners without disclosing their identities. In a data set to be anonymized, attributes are examined in four different classes as an identifier, an quasi-identifier, sensitive attributes and non-sensitive attributes [12]. Identifiers (ID) are information such as name-surname, passport number, phone number that can be used to uniquely identify a person. If one of this information is present in a data set, it is fully possible for the person to be disclosed. Quasiidentifier (QI) is a set of attributes that do not allow to identify a person on their own but will allow the person to be disclosed if they are linked to some external data sets or if they match multiple records. Examples of QI are age, gender, ethnicity, date of birth, etc. There are many examples like this. Sensitive attribute (SA), salary information belonging to the individual, illness, etc. These are attributes that are unwanted to be known by third parties. Non-sensitive attributes (NSA) are attributes that do not violate the user’s privacy in case of disclosure and fall outside of the above classifications [6]. During data anonymization, the obligation to anonymize sensitive data at a level that cannot be converted to its original form, etc. There are many challenging factors like this. Many methods or models are suggested in the literature to overcome these difficulties encountered during the anonymization of data. In this study, some anonymization and protection methods and models that exist in the literature are examined.
12.3.1 Protection Methods with Anonymization While performing anonymization on a data set, the purpose should be clearly determined, and action should be taken accordingly. Because there are a wide variety of anonymization techniques and they are all structured to serve different purposes.
12.3 Data Anonymization
149
For this reason, it should never be forgotten that certain techniques may be more suitable for a situation than others. In addition, it should be aware that the original information in the data set will decrease to some extent, regardless of the techniques used in anonymization processes. In this study, protection methods with anonymization, generalization, masking, anatomization, permutation, perturbation methods are emphasized [12, 13]. The Generalization method is the replacement of the values of QI attributes of a record in the dataset with less specific descriptions using the generalization method. With this technique, we remove some of the data or replace some of it with a common value. For example, suppose that the age of the individual is 22. Changing this value from 20 to 25 in the table is to apply a generalization method to this record. It should be noted that the method of generalization is irreversible. While performing value generalization in this percentage data set, it should also be ensured that data accuracy is not reduced. Masking method is also called suppression. Data sets with masking method are defined as exposure to excessive anonymization. In this method, some values are replaced by predetermined special symbols. This method is simple to implement. In addition, it is shown as one of the most effective methods in hiding sensitive values and preventing reverse engineering. But the following situation should be noted. With this method, any statistical and analytical value of the data is lost. Anatomization is a method applied by breaking the link between QI or SA rather than changing them. In this process, QI and SA are published in two separate tables. One table contains QI, and the other table contains SA. Both tables contain a common attribute, often called a group number. Permutation or data swapping also known as shuffling, refers to the destruction of the detectability of individuals without harming the overall benefit by mixing values within the data set. In case the values showing the ages of the individuals in a class whose age average is desired to be exchanged, data mix, that is, the Permutation process, was performed. In this method, the locations of the data are also randomly changed, taking into account other data columns for each data column. Perturbation is applied by replacing some synthetic data such as round numbers and add random noise with the original data values in the data set. In this method, additions and subtractions are made to provide a determined degree of distortion in a selected variable. This method is mostly applied to data sets that contain numerical values. Distortion applies equally to each value. Thus, the statistical information calculated from the modified data does not differ significantly from the statistical information calculated from the original data.
12.3.2 Anonymization and Protection Models All the k-anonymity, l-diversity, t-closeness and differential privacy models applied to the records in the data sets where personal data are kept are called methods of anonymizing the data. The purpose of use of these models is to arrange the sensitive
150
12 Anonymization Methods for Privacy-Preserving Data Publishing
data in the datasets they are applied to never knowing who they belong to. In this way, it is ensured that the desired values are extracted from the data without identifying who these sensitive values belong to [14]. The k-anonymity model is made by removing or masking the ID of the records in the data set and changing the values in the QI by ensuring that the QI attributes of at least k records are the same. In the k-anonymity model, k is a number representing the size of the group. The fact that there are at least k-1 individuals with the same characteristics for any individual in the data set indicates that we have achieved kanonymity for that data set. For example, when we look at the age values of the QI attributes of 10 records in a data set. When we look at the age field of 10 values in the data set in question, we can see that there are 9 more people with the same value. For this reason, no one can be identified by only looking at the age value and anonymity is provided to some extent [15]. The l-diversity model was introduced by Machanavajjhala et al. [16]. The reason for the emergence of this model is to show the weaknesses of the k-anonymity model and to establish a structure that prevents anonymized groups from being composed of homogeneous values. If all individuals in a data set share the same SA value, even knowing that these individuals are part of that data set can lead to the exposure of sensitive information. To reduce this risk, the l-diversity is used. For example, suppose k = 5 anonymity is applied to a data set that shows the name, surname, age, gender, nationality, and disease status of individuals. When anonymity is applied to this data set, methods such as masking, data extraction, and generalization are applied to change the descriptive and QI attributes in the data set. Thus, it has been made difficult to understand to which individual the record belongs. But in this data set, when we assume that all individuals in one of the groups with k = 5 are at the age of 40 and their disease is also cancer. In such a case, a person who wants to exploit the data and knows that everyone in their 40 s has “Cancer” regardless of their nationality and gender, can easily conclude that a person with this characteristic has cancer. For these reasons, l-diversity should be applied, paying attention to the creation of a certain variety within each group [17]. t-closeness has been suggested by Li et al. [18], since it was thought that the l-diversity approach could not provide sufficient protection. This model is applied in order to balance the semantic affinities of SA within each QI group. Because, although l-diversity provides diversity on sensitive data, it is not concerned with the contents of these data. For example, suppose a dataset is applied with k-anonymity as k = 5 and l-diversity with l = 5 according to date of birth, gender and postal code fields. With this process, although the records on the cluster are divided into groups of 5 and have sufficient variety in each group, they are still open to abuse by malicious people. Because if the disease states of one group of these rearranged groups are full of serious diseases and the other group is full of insignificant diseases, the person who examines the data set does not know what the individual’s disease is, but he knows exactly whether the disease state is serious or a simple disease. This situation is solved by t-closeness and by mixing serious and non-serious diseases in the table, the estimates of the patients in that group are reduced [15, 17].
12.3 Data Anonymization
151
Differential privacy, in short, is done with the technique of adding mathematical noise to the data. It is difficult to determine whether a record is part of a particular data set in a data collection with differential privacy applied. This is because the result of a particular algorithm will look the same regardless of whether the information of any individual is included. For example, you have a database of Employees’ salaries. Also, suppose a query you allow in the database is the average salary of employees in the database. If a person who wants to exploit this database knows the number of employees in the company and runs this query before and after person X joins the company, the abuser can calculate X’s salary. Differential privacy is a technique that ensures that the results of statistical queries cannot be used to gather any information about specific individuals or to access specific rows in a database more broadly. Information is only accessible collectively. Thus, while the requested information is obtained from the data, it is not disclosed to whom the data belongs. It should be kept in mind that the data will become less useful when noise is added to the data sets in the differential privacy model, as in every anonymization model [19]. Table 12.1 shows the pros and cons features of the anonymization models mentioned below. Table 12.1 Pros and cons of anonymization models [20] Pros
Cons
k-anonymity
• k-anonymity algorithms (such as • When the k-anonymity technique Mondrian, Incognito and Datafly is applied to large data sets, it etc.) are widely preferred for causes high utility loss privacy preserving data publishing • It is vulnerable to homogeneity attack and background attack • The implementation cost of k-anonymity is lower compared to other anonymity methods
l-diversity
• Increases data protection by increasing the variety of SA within the group • It is an improved version of the k-anonymity technique and protects against disclosure of attribute
• Applying l-diversity to some data sets may be unnecessary or too tedious • It is vulnerable to similarity attack and distorted attack
t-closeness
• It provides data confidentiality by balancing the semantic affinities of SA within the QI group • It provides protection against homogeneity and background attacks whereas k-anonymity is insufficient
• It lacks computational procedures to achieve proximity with minimal data usage loss. That is, loss of data usage is possible when obtaining t-closeness
Differential privacy • It enables technology companies • Differential privacy protects the to obtain and share collective privacy of individuals by adding information about user habits sufficient noise to data from while preventing the disclosure of queries made through the server; individuals’ identities however, the original data is still on the server that is vulnerable to data breaches
152
12 Anonymization Methods for Privacy-Preserving Data Publishing
12.4 Literature Review K-anonymity was introduced for the first time by Sweeney [21]. It has been developed to prevent disclosure of information specific to individuals that show singular characteristics in certain combinations by enabling the identification of more than one person with certain fields in a data set. If there is more than one record of combinations created by combining some of the variables in a data set, the probability of identifying the persons corresponding to this combination is reduced. In this article [22], a new algorithm is proposed that processes the QI attributes from which the privacy parameter k is extracted at the end of the process. In addition, reviews of some studies on k-anonymity technique are presented in the study. This article [23] argues that the p-sensitive k-anonymity model is not fully sufficient to preserve SA. In the article, two newly developed p-sensitive k-anonymity models are proposed to overcome the shortcomings of the p-sensitive k-anonymity model. Publishing data without disclosing sensitive information about people is a major problem. This problem is relatively solved with a definition of privacy called kanonymity. Later, the l-diversity method was developed with the studies conducted on the shortcomings of k-anonymity. This method considers the diversity created by SA corresponding to the same variable combinations. In this article [16], the authors found that a k-anonymized dataset was subject to strong attacks due to a lack of diversity in SA. The authors set out to remedy these shortcomings and put forward ldiversity, a framework that gives stronger guarantees of confidentiality and brought this phrase to the literature. Although the l-diversity method provides diversity in personal data, the method in question does not deal with the content and sensitivity of the personal data and it creates cases where it cannot provide sufficient protection. Data breaches that occur with the increasing importance of big data analysis have increased the concerns of data owners about the privacy and security of their personal information. Approaches such as k-anonymity, l-diversity, and t-closeness, which are models of anonymization, have long been used to protect confidentiality in published data. However, these data anonymization models cannot be directly applied to large amounts of data. At this point, there is a need for newly developed data anonymization models for big data sharing. In this article, the authors proposed a new enhanced scalable l-diversity (ImSLD) approach, an extension of enhanced scalable k-anonymity (ImSKA) that would increase the levels of protection in existing PPDP approaches [24]. In many studies in the literature on anonymization, the attributes in the data set to be anonymized are shown by dividing into four different classes as ID, QI, SA, and NSA. However, in this article, Sei et al. argued that QI and SA cannot be represented as completely different from each other. They mentioned that QI attributes in some data set can also be called SA. As a result, Sei et al. [25] can process data sets containing more than one sensitive QI attributes by accepting QI attributes as SA (t1,…, tq)closeness and (l1,…, LQ)-diversity introduced a new model of anonymization. This new method consists of two different algorithms, anonymization algorithm and a reconstruction algorithm. While the anonymization algorithm applied by the data owners is simple but effective, the reconstruction algorithm is adapted to the purpose of the people analyzing the data.
12.4 Literature Review
153
The process of calculating the degree of closeness of personal data among values and making the data set anonymized by dividing it into subclasses according to these degrees of proximity is called the t-closeness method. This article [18], proposed a new privacy concept called t-closeness, which requires the distribution of a SA in any equivalence class to be close to the distribution of the attribute in the general table. They also mentioned in the article that the l-diversity has some limitations in some cases. They have also demonstrated that the l-diversity is neither necessary nor sufficient to prevent feature disclosure. It has been observed in many literature studies that the t-closeness model is applied as an anonymization model that provides big data security and privacy. Many new algorithms have been proposed to reveal the most perfect t-closeness model. In this article, the author proposed a new algorithm called variable t-closeness for numerical data containing precise numerical properties, such as the salary table, in data sets. In the proposed algorithm, setting the t threshold to a fixed value while applying the t-closeness model that we have observed in the studies in this field in the literature has been changed in this new approach. In addition, it has been observed that the algorithm proposed in the study is effective in terms of data anonymization, but it needs calculations in terms of algorithm complexity [26]. Soria-Comas et al. suggested the use of microaggregation technique instead of the masking and generalization methods used in the k-anonymity model to anonymize data sets. In addition, t-closeness anonymization models are shown in the article as a technique that offers the strictest confidentiality guarantee. In the article, it is shown how the microaggregation technique is applied for k-anonymity and t-closeness sets. The most important contribution of the article to the literature is that three new algorithms have been proposed for t-closeness data sets based on microaggregation technique [27]. The comprehensive class of web applications that includes predicting the user’s responses to options is called “Recommendation Systems”. They are algorithms that aim to provide the most meaningful and accurate products for the user by filtering useful content from a large amount of data pool. Many e-commerce sites such as eBay, Amazon, Alibaba etc., use recommendation systems to provide customers with a personalized service with the products they love. Collaborative filtering (CF) is a method applied in recommendation systems to predict users’ preferences by filtering information or patterns. The use of privacy-preserving collaborative filtering (PPCF) in recommendation systems is gradually increasing and therefore it aims to provide privacy protection during the recommendation process as a method that has attracted much attention in recent years. This article [28] proposes a (p, l, α) diversity method that enhances the existing k-anonymity method in PPCF. Here p is the attacker’s previous knowledge of users’ scores. (l, α) is the diversity among users in each group to improve the level of privacy protection. This new method presented in the study is suitable for anonymizing the proposal dataset and improves the existing k-anonymity method in PPCF. Current PPCF methods are mainly based on cryptography, obfuscation, perturbation, and differential privacy methods. These methods have many shortcomings such as high computational cost, low data quality and difficulty in adjusting the magnitude of noise [28].
154
12 Anonymization Methods for Privacy-Preserving Data Publishing
In the article [29] Yin et al., addressing the privacy protection problem in personalized advice systems, a CF algorithm that protects an effective privacy based on different privacy protection and time factor is proposed. MovieLens dataset is used during the testing of the proposed method. The results obtained from the experiments, while this new protection model has a very successful process in protecting private data, it has obtained unsuccessful results in terms of the accuracy of the proposal compared to many traditional CF algorithms in the literature. Differential privacy allows tech companies to collect and share information about user habits while protecting the privacy and confidentiality of individual users. The definition of differential privacy was first proposed in Cynthia Dwork’s ICALP document [30]. Since this work by Cynthia Dwork, many contributions have been made to the issue of differential privacy, both in terms of theoretical analysis and practical examples, and it has become an increasingly popular area of research. Cynthia Dwork, the inventor of differential privacy and privacy, proposed wonderful mathematical evidence of how much noise is enough to achieve different privacy requirements [31]. Many countries around the world have strict policies regarding how technology companies collect and share user data. Because Facebook, Twitter, Google etc. routinely collect a lot of personal information about their customers and users, from leading companies such as industry leading companies and other medium and small private companies in the industry. Most of this information is very private or sensitive information. For this reason, the storage and processing of this data is emerging as an important technological challenge for the future. The point that is desired to be reached in the studies on this subject is to determine how systems and data processing techniques will be designed to obtain inferences from these large-scale data while protecting the privacy and security of the personal identities of the data and the individuals to whom the data belongs. For this reason, many new studies have been added to the literature on the differential privacy model. The differential privacy approach protects privacy by adding noise to the values of personal data in the field of privacy-preserving statistical databases that contain individual records and aim to support information discovery [32]. Sarwate et al. [33] clarified the progress made in different specific machine learning and signal processing and emphasized that differential confidentiality in continuous data would be the optimal algorithm for signal processing. Lin et al. [19] proposed a differential privacy protection model in body area networks, which significantly reduces the risk of privacy disappearing for sensitive big data and greatly ensures the availability of data. In this article, Andrew et al. [34] presented a new (K, L) anonymity model. This model has been developed based on traditional privacy protection approach such as k-anonymity and differential privacy. In the proposed model, it is aimed to provide better data analysis from the data set while protecting data security and privacy. This new model is also used for the Laplace mechanism and exponential mechanisms to add differential noise to the data. In the experiments conducted in the study, it was emphasized that this new model proposed enables more successful data analysis than existing approaches in the literature. In addition, with this new
12.6 Conclusion
155
model, datasets provide resistance to re-identification or linkage attacks. As a result, this model provides security against data breaches in data sets, while allowing data to be published with less information loss.
12.5 Comparison of Existing Studies In this section, the most important articles in the literature regarding the models used in the process of anonymizing data sets are compared, as can be seen in Table 12.2. The table provides information about the authors of the articles, the year they were made, their main purpose, and the anonymization models used in the articles. As can be seen in these articles between 2007 and 2020, the most optimum model for protecting data privacy and security has not yet been found. While some of the existing models in the literature may show great success in some data sets, the same model may not be suitable for another data set. Based on this situation, we can conclude that models to protect data sets still need to be developed by applying data anonymization.
12.6 Conclusion With the enormous increase in data, risks such as data breaches, disclosures, leaks have emerged because of companies’ efforts to benefit from this data. However, these data have become the lifeblood of companies and somehow it has become a need to ensure the use of data without stuck to the privacy-protecting regulations such as the general data protection regulation. The process of ensuring confidentiality should be carried out together with the effort to draw conclusions from the values in the data sets and before publishing the information that will reveal the person in the data sets. This article analyzes studies that attempt to ensure security and privacy through anonymization models and techniques in big data. The main value that this study wants to add to the literature is to reveal the pros and cons features of the anonymization models in the current studies and to shed light on the researchers who want to improve the current anonymization models by working in this field in the future.
156
12 Anonymization Methods for Privacy-Preserving Data Publishing
Table 12.2 Comparison of studies in literature Basic technique
Year Authors
Used methods
k-anonymity
2002 Sweeney
K-anonymity was introduced by Sweeny. This technique is effective against record linkage attack by applying operations such as generalization, suppression and randomization on QI
2018 Ouazzani and Bakkali A new algorithm for processing QI attributes is proposed where the privacy parameter k is extracted at the end of the process
l-diversity
t-closeness
2010 Sun et al
Two new developed p-sensitive k-anonymity models are proposed to overcome the shortcomings of the p-sensitive k-anonymity model
2018 Wei et al
A new (p, l, α)-diversity method has been proposed that improves the existing k-anonymity method in PPCF
2019 Rao et al
This article proposes an equality class generation algorithm and scalable anonymization with k-anonymity and l-diversity using MapReduce programming paradigm to protect privacy when publishing data
2007 Machanavajjhala et al The l-diversity model was developed by demonstrating that a k-anonymized dataset is subject to strong attacks due to lack of diversity in sensitive features 2019 Mehta and Rao
A new ImSLD approach, an extension of ImSKA, has been proposed to increase the levels of protection in existing PPDP approaches
2017 Sei et al
They can process data sets containing more than one sensitive QI attribute by accepting QI attributes as sensitive attributes (t1, …, tq)-closeness and (l1,…, LQ)-diversity introduced a new model of anonymization
2007 Li et al
To overcome the limitations of the l-diversity principle, they proposed the t-closeness principle, which copes with attribute disclosure and similarity attacks
2017 Ouazzani and Bakkali To increase the anonymization quality of sensitive numerical properties in data sets, a new algorithm called variable t-closeness has been proposed (continued)
References
157
Table 12.2 (continued) Basic technique
Year Authors
Used methods
2015 Soria-Comas et al
The most important contribution of the article to the literature is that three new algorithms have been proposed for t-closeness data sets based on microaggregation technique
Differential privacy 2016 Gazeau et al
The focus is on different privacy, a privacy approach used in the field of statistical databases. A new method has been proposed for measuring the loss of privacy caused by limited sensitivity
2013 Sarwate et al
They emphasized that differential privacy in continuous data would be the most suitable algorithm for signal processing
2016 Lin et al
They proposed a differential privacy protection model in body area networks, which significantly reduces the risk of privacy disappearing for sensitive big data and greatly ensures the availability of data
2018 Var and Inan
Unique attribute selection methods based on differential privacy, the most comprehensive and secure solution known for statistical database security, have been integrated and tested with Weka [35]
2020 Andrew et al
A new approach to privacy protection called (K, L) anonymity has been proposed that combines k-anonymity and Laplace different privacy techniques
2019 Yin et al
By addressing the privacy protection problem in personalized advice systems, a collaborative filtering algorithm that protects an effective privacy based on different privacy protection and time factor has been proposed
References 1. Sagiroglu S, Sinanc D (2013) Big data: a review. In: 2013 International conference on collaboration technologies and systems (CTS), pp 42–47. https://doi.org/10.1109/CTS.2013.656 7202 2. Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: from big data to big impact. 36(4):1165–1188 3. Sinanc Terzi D, Terzi R, Sagiroglu S (2015) A survey on security and privacy issues in big data. In: 2015 10th International conference for internet technology and secured transactions, pp 202–207 4. Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4). https://doi.org/10.1145/1749603.1749605
158
12 Anonymization Methods for Privacy-Preserving Data Publishing
˙ 5. Canbay Y, Vural Y, Sa˘gıro˘glu S¸ (2019) Mahremiyet Korumalı Büyük Veri Yayınlama Için Kavramsal Model Önerileri. J Polytech 0900(3):785–798. https://doi.org/10.2339/politeknik. 535184 6. Storey VC, Song IY (2017) Big data technologies and management: what conceptual modeling can do. Data Knowl Eng 108(February):50–67. https://doi.org/10.1016/j.datak.2017.01.001 7. Emmanuel I, Stanier C (2016) Defining big data. In: International conference on big data and advanced wireless technologies, vol 5. pp 1–6.https://doi.org/10.1145/3010089.3010090 8. Sharma S, Tim US, Wong J, Gadia S, Sharma S (2014) A brief review on leading big data models. Data Sci J 13(December):138–157. https://doi.org/10.2481/dsj.14-041 9. Khan MA, Uddin MF, Gupta N (2014) Seven V’S of big data: understanding big data to extract value. In: Proceedings of the 2014 zone 1 conference of the American Society for Engineering Education. IEEE, pp 1–5 10. Agrahari A, Rao DTVD (2017) A review paper on big data: technologies, tools and trends. Int Res J Eng Technol 4(10):640–649 [Online]. Available: www.irjet.net 11. Bhathal GS, Singh A (2019) Big data: Hadoop framework vulnerabilities, security issues and attacks. Array 1–2(July):100002. https://doi.org/10.1016/j.array.2019.100002 12. Eyupoglu C, Aydin MA, Zaim AH, Sertbas A (2018) An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy 20(5):1–18. https://doi.org/10. 3390/e20050373 13. Gkountouna O (2011) A survey on privacy preservation methods. pp 1–30 [Online]. Available: http://www.dblab.ece.ntua.gr/~olga/papers/olga_tr11.pdf 14. Tran HY, Hu J (2019) Privacy-preserving big data analytics a comprehensive survey. J Parallel Distrib Comput 134:207–218. https://doi.org/10.1016/j.jpdc.2019.08.007 15. Rajendran K, Jayabalan M, Rana ME (2017) A study on k-anonymity, l-diversity, and tcloseness techniques focusing medical data. IJCSNS Int J Comput Sci Netw Secur 17(12):172– 177 16. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of—International Conference Data Engineering, vol 2006. p 24. https://doi.org/10.1109/ICDE.2006.1 17. Patel T, Amin K (2019) A Study on k-anonymity, l-diversity, and t- closeness techniques of privacy preservation data publishing. 6(6):19–24 18. Li N (2011) t-closeness: privacy beyond k-anonymity and –diversity. vol 2. pp 106–115 19. Lin C, Song Z, Song H, Zhou Y, Wang Y, Wu G (2016) Differential privacy preserving in big data analytics for connected health. J Med Syst 40(4):1–9. https://doi.org/10.1007/s10916016-0446-0 20. Shamsi JA, Khojaye MA (2018) Understanding privacy violations in big data systems. IT Prof 20(3):73–81. https://doi.org/10.1109/MITP.2018.032501750 21. Sweeney L (2002) k-anonymity: a model for protecting privacy. IEEE Secur Priv 10(5):1–14 22. El Ouazzani Z, El Bakkali H (2018) A new technique ensuring privacy in big data: k-Anonymity without prior value of the threshold k. Procedia Comput Sci 127:52–59. https://doi.org/10.1016/ j.procs.2018.01.097 23. Sun X, Sun L, Wang H (2011) Extended k-anonymity models against sensitive attribute disclosure. Comput Commun 34(4):526–535. https://doi.org/10.1016/j.comcom.2010.03.020 24. Mehta BB, Rao UP (2019) Improved l-diversity: scalable anonymization approach for privacy preserving big data publishing. J King Saud Univ Comput Inf Sci pp 4–11. https://doi.org/10. 1016/j.jksuci.2019.08.006 25. Sei Y, Okumura H, Takenouchi T, Ohsuga A (2019) Anonymization of sensitive quasiidentifiers for l-diversity and t-closeness. IEEE Trans Dependable Secur Comput 16(4):580– 593. https://doi.org/10.1109/TDSC.2017.2698472 26. El Ouazzani Z, El Bakkali H (2018) A new technique ensuring privacy in big data: variable t-closeness for sensitive numerical attributes. In: 2017 3rd International conference of cloud computing technologies and applications CloudTech 2017, vol Jan 2018. pp 1–6. https://doi. org/10.1109/CloudTech.2017.8284733
References
159
27. Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S (2015) T-closeness through microaggregation: strict privacy with enhanced utility preservation. IEEE Trans Knowl Data Eng 27(11):3098–3110. https://doi.org/10.1109/TKDE.2015.2435777 28. Wei R, Tian H, Shen H (2018) Improving k-anonymity based privacy preservation for collaborative filtering. Comput Electr Eng 67:509–519. https://doi.org/10.1016/j.compeleceng.2018. 02.017 29. Yin C, Shi L, Sun R, Wang J (2020) Improved collaborative filtering recommendation algorithm based on differential privacy protection. J Supercomput 76(7):5161–5174. https://doi.org/10. 1007/s11227-019-02751-7 30. Tu S (2013) [Tutorial] Introduction to differential privacy. People.Csail.Mit.Edu, pp 1–7 [Online]. Available: http://people.csail.mit.edu/stephentu/writeups/6885-lec20-b.pdf 31. Dwork C (2006) Differential privacy. Lecture Notes Computer Science (including Subseries Lecture Notes Artificial Intelligence Lecture Notes Bioinformatics), vol 4052. LNCS, pp 1–12. https://doi.org/10.1007/11787006_1 32. Gazeau I, Miller D, Palamidessi C (2016) Preserving differential privacy under finite-precision semantics. Theor Comput Sci 655:92–108. https://doi.org/10.1016/j.tcs.2016.01.015 33. Sarwate AD, Chaudhuri K (2013) Signal processing and machine learning with differential privacy. IEEE Signal Process Mag 30(5):86–94 34. Andrew J, Karthikeyan J (2021) Privacy-preserving big data publication: (k, l) anonymity. vol 1167. Springer, Singapore 35. Var E, ˙Inan A (2018) Differentially private attribute selection for classification. J Fac Eng Archit Gazi Univ 33(1):323–336. https://doi.org/10.17341/gazimmfd.406804
Chapter 13
Improving Accuracy of Document Image Classification Through Soft Voting Ensemble Semih Sevim , Sevinç Ilhan Omurca , and Ekin Ekinci
13.1 Introduction In today’s world, many physical documents are transferred to the computer environment due to the ease of storage, protection and handling provided by digitalization. Documents like bank receipts, invoices, agreements, application forms etc. are examples which are digitized by scanning or camera-capturing. These document images provide precious information about business and transactions in numerous sectors such as finance, healthcare, law etc. [25]. However, automatically and efficiently extracting information from document images and analyzing them are challenging tasks because of scanning of documents at different perspectives, poor quality of scanned documents, and diversity of the document images. Documents generally contain graphics, tables, figures, graphics etc. compared with plain texts and character types can vary in the plain texts. Therefore, to extract relevant information from documents is a time-consuming process and expert knowledge are required [24]. Document image classification is one of the most challenging tasks to analyze and automatically manage the documents. Researchers carry out related works on this problem to reduce both classification error and the impact of humans in the process. However, there is little research published in this domain [19]. The document classification problem is examined under three headings as content based, structure/layout based and hybrid approaches [20, 27]. Content based classification S. Sevim (B) Faculty of Engineering and Natural Sciences, Computer Engineering Department, Bandırma Onyedi Eylul University, Bandırma, Turkey e-mail: [email protected] S. I. Omurca Faculty of Engineering, Computer Engineering Department, Kocaeli University, Kocaeli, Turkey e-mail: [email protected] E. Ekinci Faculty of Technology, Computer Engineering Department, Sakarya University of Applied Sciences, Sakarya, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_13
161
162
13 Improving Accuracy of Document Image Classification …
focused on plain text in the document images and Optical Character Recognition (OCR) is used to extract textual content. [25]. Natural Language Processing (NLP) methods are widely used to convert text to information used in classification models. This approach ignores the document layout. The structure/layout based approaches directly use document images for analysis. Any of the approaches can be used according to the structure of the studied document. While content-based approaches are more frequently preferred for free-form documents, image-based approaches are used for documents containing the same text with different layouts. In recent years, deep learning has attracted researchers and is used for many computer vision and NLP applications. With the increasing performance of Convolutional Neural Networks (CNN), it is easy to classify document images without handcrafted features [22]. CNNs yield the prediction accuracies comparable to state of the art neural networks. In this study, we aim to classify document images that were received from Kocaeli University digital document management system. Students can upload many types of documents by using this system. It has been observed that students frequently upload different types of documents to the system than expected. Therefore, all uploaded documents ought to be checked for correctness. Considering number of registered students and type of documents, it is a time consuming process that checking of uploaded documents.
13.2 Related Works In the related literature, many methods have been developed to analyze and manage the large amounts of document images. Document image classification is one of these methods. With the classification of document image files various tasks can be performed such as organization, analysis and automatic archiving. Several approaches are proposed to perform document image classification. Approaches generally consist of structural-based, content-based and hybrid methods which are created by combining the others [27]. This approaches are examined and briefly explained in this section by scanning the literature. Kang et al. [21] classified structurally similar documents with CNN. The motivation of the study is based on the hierarchical nature of document layout. Model achieved state of the art performance than previous approaches. Jadli et al. [4] used pre-trained VGG19, InceptionV3, DenseNet121 and MobileNetV2 to extract features from document images. Then they trained the traditional machine learning models with these features. Additionally, they applied Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) dimension reduction methods on features. The best result is obtained from the combination of Logistic Regression, LDA and DenseNet121 deep model on small document dataset. Kumar et al. [23] trained random forest classifiers with the codebook of SURF descriptors they obtained by using document images. With the proposed method, successful results were obtained on structurally similar document images. Harley et al. [17] showed that the features they obtained from the CNN model were more effective than the handcrafted features. Roy et al. [29] proposed an ensemble model consisting of Deep Convolutional
13.2 Related Works
163
Neural Network (DCNN) models for document structure learning. The base models were trained with whole documents and their specific-regions, and the results of the base classifiers were combined with the Support Vector Machine (SVM). Csurka et al. [11] compared three approaches to represent document images. These representation methods use shallow features, deep features based on CNN, and features derived from a hybrid model inspired by these two approaches. As a result of the study, it is seen that the best representation was obtained from the CNN model. Afzal et al. [2] used ALEXNET for document classification. The initial weights of the model were obtained as a result of training with the ImageNET dataset. In the study, it was claimed that visual recognition problems share similarities. Yaman et al. [35] used three pre-trained CNN models to classify document images. Among the AlexNet, GoogLeNet, VGG-16, the best result was received from the VGG-16 model. Zavalishin et al. [36] proposed a layout information-based framework for layout-based document classification. The performed methods also used text regions obtained with the MSER-based approach. Visual features were received as Spatial Local Binary Pattern (SLBP), Grayscale Runlength Histogram (GRLH) and Fisher vectors based on Bernoulli Mixture Model (BMMFV). SVM was used as classification models and Meta-classifier. In order to provide a guideline for parameter selection, Csurka [10] performed experiments on Run-Length Histogram (RL) and Fisher Vector (FV) representations obtained from document images. These features were used in classical machine learning models. Tensmeyer and Martinez [33] investigated the effects of the CNN model on document images with various tasks. Kölsch et al. [22] indicated that existing approaches for document classification were not time-efficient. They proposed a real-time classification approach that had two-stage in their study. Alexnet was used in the first stage as a feature extractor. In the second stage, Extreme Learning Machine (ELM) was fed up with features that were obtained from the first stage and images were classified by ELM. Afzal et al. [3] used AlexNet, GoogLeNet, VGG-16, Resnet-50 models with and without pre-training to classify document images. It was observed that pre-trained models were more successful depending on the size of the dataset. Das et al. [12] used region based DCNN for classification on document images. Also, stacked generalization was carried out with multiple meta-classifiers. Hassanpour and Malek [18] examined the results of the SqueezeNet model trained with the ImageNet dataset in the document classification problem. It was observed that training of model was more successful with the weight values obtained from the pre-training process compared to the randomly initiated weight values. Mohsenzadegan et al.[26] performed the classification process with a CNN model fed up with six channels. Sahin ¸ et al. [30] used a keyword-based approach to classify Turkish documents. They extracted keywords for each class from OCRed documents. Then, documents were classified according to the number of matching words. Noce et al. [28] combined visual and textual data of document images to increase accuracy of classification model. In the study, a dictionary was created with the extracted keywords from OCRed documents. First, words were assigned to determined classes, then classes were matched with certain colors. Before the classification process, the words contained in documents were colored with the color of class according to keywords
164
13 Improving Accuracy of Document Image Classification …
in the dictionary. Colored documents were used in training for the CNN model. Bakkali et al. [8] performed an ensemble network model that learned from visual and textual features through a self-attention-based mutual learning strategy. While the model was trained, the positive knowledge between image and text modalities were learnt. The small training set size is a challenge for classification model generalization. Audbert et al. [5] designed a multimodel deep network that learnt from word embedding and visual features. While word embedding representations were computed on text extracted with OCR, visual features were learnt with MobileNetv2 model. Jain and Wigington [20] introduced spatial and multimodal feature fusion to classify document images. In spatial fusion, word embedding representation and visual features were combined. These new features were used to train the VGG-16 model. Feature fusion was realized by combining the outputs of text ensemble networks and VGG-16 Network. Mandivarapu et al. [25] used Graph CNN to classify document images. The model was trained with textual, visual and layout information. The most important difference of the model compared to the pre-trained models was that the processing time was less than the others. Ferronda et al. [15] used an ensemble pipeline model that contained EfficientNets and Bert. LayoutML contained all token embedding, layout embedding and image embedding together and exploited the advantage of multimodal inputs. Xu et al. [34] used LayoutML to classify document images. Cosma et al. [9] performed Latent Dirichlet Allocation (LDA) to extract topics from text and predicted these topics with CNN models. Bakkali et al. [7] proposed a hybrid model which used token embedding extracted from the text corpus, and image structural information to perform end-to-end classification. These modalities were united with a fusion scheme. In real-word, the number of annotated data to train classification models is limited. Abuelwafa et al. [1] produced training data from an unlabeled dataset with augmentation techniques. Each data was assigned a label value according to the data from which it was derived. Then, this new dataset was used to re-train the classification model (Fig. 13.1).
Fig. 13.1 Soft voting model
13.3 Methodology
165
13.3 Methodology 13.3.1 Document Image Classification Document image classification can be defined as class/label assignment to documents according to their features. These features are textual or visual contents of documents and there are two approaches which use these features for classification. Contentbased approach as the first approach uses the textual context of documents which are extracted with OCR applications and this approach is preferred for free-form documents. The second one is image-based approach in which Layout-based features are generally used to classify documents [6, 12, 22]. In this work, it is aimed to classify document images with a structural-based approach. Therefore, we use an ensemble of pre-trained deep CNN models. CNN models are state of art on computer vision tasks and they provide significant accuracy without handcrafting features. Thus, no need for time-consuming pre-processing steps. In the proposed ensemble approach, the results of the individual models were fused with the soft voting to increase the classification accuracy.
13.3.2 Image Pre-processing The dataset used in the experiment consists of documents in pdf format. For the structure/layout based approach, the documents are converted to image type files. 100 dpi resolution is used for the conversion process and 1170 × 827 sized images are obtained. However, the sizes of the images obtained are too large for the CNN models used for trains. Therefore, the image sizes are set to 331 × 331 × 3, 299 × 299 × 1 and 300 × 300 × 3, respectively.
13.3.3 Convolutional Neural Network Convolutional Neural Network (CNN) is a feed forward neural network model. It has two parts which are convolutional layer and full connected layer. The idea behind the architecture of CNN to extract dense features from raw input data with trainable kernels [13]. Each feature point was obtained with kernel and bias value from local receptive field where in the previous layer. Kernel and bias values were shared by the local receptive fields. Also, CNNs contain subsampling layer named pooling layer. Pooling layer is generally used after convolution operation to reduce size of features. Thus, transaction cost is reduced and overfitting can be controlled. Full connected layers are last part of CNN architecture and they are same Multilayer Neural Networks. Full connected layers classify input data according to features obtained from convolutional layers [2, 11, 14].
166
13 Improving Accuracy of Document Image Classification …
Nowadays, deep learning models with more than millions of parameters are developed for tough computer vision tasks. However, lots of annotated samples are required to train large number of learnable parameters [16]. Therefore, the transfer learning approach is used with pre-trained deep learning models in many researches. Transfer learning is transferring of experience between problems which are in a similar domain. With this way, a pre-trained model with a large dataset is used to increase the classification accuracy of other problems when training data is insufficient. In this work, NasNetLarge, InceptionV3 and EfficientNet models are used with transfer learning approach. The full-connected layers were removed from the models to fit our problem. Then, a global pooling layer and a fully-connected layer containing 12 nodes were added to each model. NasNet [38] is a deep CNN model developed by Google Brain Team. NasNet is a combination of Architectural Search (NAS) and AutoML. A recurrent neural network control is trained with CIFAR-10 dataset to compose optimum convolution cell. The end of this process two convolution cells are obtained. These cells are called Reduction Cell and Normal Cell, respectively. The output of normal cells is a feature map that same dimension with input. The reduction cell halves the size of height and width. This two cell used to construct bigger network architecture that is trained with ImageNet dataset. This large model is named as NasNet. The model is took 331 × 331 × 3 size image as input. InceptionV3 [31] is the third generation of the GoogLeNet and a member of the Inception model family. Model contains inception block that factorized convolutions and label smoothing. As the filter size increases, the cost of convolution increases disproportionately. Large filter is splitted into small convolutions in factorized Convolution. Label smoothing is a regularization method and the probability distribution of the ground truth labels is rearranged with this method. The input size of InceptionV3 model is 299 × 299 × 1. EfficientNet is a model group that contains 8 individual models from B0 to B7. As the model number increases, the size of the model also increases. EfficientNet [32] was developed to minimize operation cost and maximize accuracy of the model. Three parameters are used to construct optimum models. These parameters are model width, model height and image resolution, respectively. EfficientNet has special block architecture that contains deep-wise convolution, point-wise convolution and residual connection. Whole EfficientNet models are trained with the ImageNet dataset. In this work, EfficientNetB3 is used and this model has a 300 × 300 × 3 input size.
13.3.4 Soft Voting Soft voting is a simple and efficient ensemble technique to fuse predictions of multiple trained classifiers. Soft voting is used to obtain a maximum weighted sum of prediction probability, that indicates belonging to a certain class, among all predictions of classifiers. It can be think as the mean of the all prediction where all classifiers are equally weighted [37].
13.4 Experiments and Result
167
In this work, soft voting was used to combine three CNN models by setting up the weight of models equal. In order to combine predictions, the output probabilities P = p1 , p2 , . . . , pm of classifiers C = c1 , c2 , . . . , cn are added linearly and divided to number of classifiers. The index of pi which has the highest probability value among all prediction probabilities p1 , p2 , . . . , pm , is chosen as the final prediction. j pi in the Eq. 13.1 represents i-th predicted probability of classifier c j . so f t
pi
=
n 1 j p f or i = 1, 2, . . . , m n j=1 i so f t
pr ediction = argmaxi ( pi
) f or i = 1, 2, . . . , m
(13.1)
(13.2)
13.4 Experiments and Result 13.4.1 Dataset In the experiments, we use a real world dataset provided by Kocaeli University student application system. This dataset consists of 1044 samples which are grouped under 12 classes. The dataset is divided into two as 90% training and 10% test data (Fig. 13.2; Table 13.1).
Table 13.1 Summary of class labels Document class ALES CV Equivalence certificate Course content Course list Diploma Prep class status certificate Student certificate OSYM result document OSYM placement certificate Transcript Foreign language certificate
Number of sample 100 100 71 100 100 100 100 100 49 58 99 98
168
13 Improving Accuracy of Document Image Classification …
Fig. 13.2 Samples of document image dataset
13.4.2 Evaluation Metrics To evaluate performances of classifiers; precision, recall and F1-Score parameters are used. TP, FP and FN are used for computing these parameters. TP, FP, and FN represent true positive, false positive and false negative, respectively. TP is a number of correctly classified document images with “cn ” class. Conversely, FP is a number of misclassified document images that are matched with a relevant class. FN is a number of classified document images except for their actual “cn ” class. Precision is the ratio of correctly classified document images (TP) within the estimations of
13.4 Experiments and Result
169
Fig. 13.3 Confusion matrices of a NasNet Large, b InceptionV3, c EfficientNetB3, d soft voting
“cn ” class (TP + FN). Recall is a ratio of correctly estimated document images (TP) within all document images which belong to “cn ” class (TP + FN). F1-Score is a harmonic mean of precision and recall (Fig. 13.3). precision =
recall =
F1-Score =
TP TP + FP
TP TP + FN
2 ∗ precision ∗ recall precision + recall
(13.3)
(13.4)
(13.5)
170
13 Improving Accuracy of Document Image Classification …
13.4.3 Experiments The processes such as building, training and testing of the model are carried out on Colab with Python programming language. Models are run by selecting Hardware Accelerator parameter as GPU and None in Colab. Pre-trained models such as NasNet Large, InceptionV3 and EfficientNetB3 are used for the classification of document images. The used dataset consists of text-based images and the models are trained with ImageNet dataset. Although these two datasets are in image types, they belong to different domains. Therefore, the parameters of pre-trained models are updated with a fine-tuning process. Stochastic Gradient Descent (SGD) method is used for the model optimization. The learning rate is chosen as 0.01 for all models. The mini-batch value is set to 16. Different epoch numbers are considered for each model and the best ones are determined. 17 epochs are sufficient for training the NasNet Large model. Training of the InceptionV3 model took 20 epochs. EfficientNetB3 achieves the highest performance in the least number of epochs and 10 epochs are enough for the training. About 85M parameters are optimized for the NasNet Large model. The InceptionV3 model has 22M trainable parameters, while the EfficientNetB3 model has a total 4M parameters (Tables 13.2 and 13.3). When the models are evaluated individually with their F1-Score after the test phase, it is seen that the highest F1-Score belongs to the EfficientNetB3 model. The second-best result is received from NasNet Large, while the last model is InceptionV3. When the test process times are compared, the times of the EfficientNetB3 and InceptionV3 models are very close to each other. The NasNet large model is the model with the longest training time. In addition, the predictions of the models are combined with the soft voting method. A higher result is obtained from the ensemble model when compared to individual models [25].
Table 13.2 Results for traditional models on the test—precision, recall, F-Score Precision (%) Recall (%) F-Score (%) Model NasNet Large InceptionV3 EfficientNetB3 Soft voting
92.56 89.20 94.54 95.26
90.38 87.50 93.27 94.23
Table 13.3 Running time of the models None (s) Model NasNet Large InceptionV3 EfficientNetB3
138.18 26.15 26.44
90.56 87.85 93.20 94.04
GPU (s) 2.73 0.6 0.85
References
171
13.5 Conclusion Pre-trained models have been used on diverse classification problems and they generally have succeeded on these tasks. There are two ways to use pre-trained models. They are Transform Learning and fine-tuning. In this work, we used pre-trained CNN models like NasNet Large, InceptionV3, EfficientNetB3 to classify document images that are used in Kocaeli University digital document management system. These models are trained on the ImageNet dataset. ImageNet and document images have the same file type but they have different structures, so fine-tuning is used for model training. When the test results are compared, it is seen that the most successful model is EfficientNetB3. Also, NasNet Large is the model with the most testing time. Except for this, for efficient classification, soft voting is used as an ensemble strategy. In terms of F-score, soft voting outperforms CNN architectures by achieving 94.04% even though the training data is limited. In future works, we plan to apply different ensemble strategies to fuse outputs of base classifiers. Acknowledgements This work has been supported by the Kocaeli University Scientific Research and Development Support Program (BAP) in Turkey under project number FBA-2020-2152.
References 1. Abuelwafa S, Pedersoli M, Cheriet M (2019) Unsupervised exemplar-based learning for improved document image classification. IEEE Access 7:133738–133748 2. Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1111– 1115. https://doi.org/10.1109/ICDAR.2015.7333933 3. Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. CoRR arXiv preprint http://arxiv.org/abs/1704.03557 4. Aissam J, Mustapha H, Hasbaoui A (2021) An improved document image classification using deep transfer learning and feature reduction. Int J Adv Trends Comput Sci Eng 10:549–557. https://doi.org/10.30534/ijatcse/2021/141022021 5. Audebert N, Herold C, Slimani K, Vidal C (2019) Multimodal deep networks for text and image-based document classification. arXiv preprint arXiv:1907.06370 6. Bakkali S, Ming Z, Coustaty M, Rusiñol M (2020) Cross-modal deep networks for document image classification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2556–2560 7. Bakkali S, Ming Z, Coustaty M, Rusinol M (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563 8. Bakkali S, Ming Z, Coustaty M, Rusiñol M (2021) Eaml: ensemble self-attention-based mutual learning network for document image classification. Int J Document Anal Recogn (IJDAR) 1–18 9. Cosma A, Ghidoveanu M, Panaitescu-Liess M, Popescu M (2020) Self-supervised representation learning on document images. In: International workshop on document analysis systems. Springer, Heidelberg, pp 103–117
172
13 Improving Accuracy of Document Image Classification …
10. Csurka G (2016) Document image classification, with a specific view on applications of patent images. CoRR http://arxiv.org/abs/1601.03295 11. Csurka G, Larlus D, Gordo A, Almazán J (2016) What is the right way to represent document images? CoRR abs/1603.01076 http://arxiv.org/abs/1603.01076 12. Das A, Roy S, Bhattacharya U (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. CoRR abs/1801.09321, http://arxiv.org/abs/1801.09321 13. Dutta A, Garai A, Biswas S, Das AK (2021) Segmentation of text lines using multi-scale cnn from warped printed and handwritten document images. Int J Document Anal Recogn (IJDAR) 1–15 14. Fanany MI et al (2017) Handwriting recognition on form document using convolutional neural network and support vector machines (cnn-svm). In: 2017 5th international conference on information and communication technology (ICoIC7). IEEE, pp 1–6 15. Ferrando J, Domínguez JL, Torres J, García R, García D, Garrido D, Cortada J, Valero M (2020) Improving accuracy and speeding up document image classification through parallel systems. In: International conference on computational science. Springer, Heidelberg, pp 387–400 16. Han D, Liu Q, Fan W (2018) A new image classification method using cnn transfer learning and web data augmentation. Expert Syst Appl 95:43–56 17. Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. CoRR abs/1502.07058, http://arxiv.org/abs/1502.07058 18. Hassanpour M, Malek H (2019) Document image classification using squeezenet convolutional neural network. In: 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–4. https://doi.org/10.1109/ICSPIS48872.2019.9066032 19. Hua Y, Huang Z, Guo J, Qiu W (2020) Attention-based graph neural network with global context awareness for document understanding. In: Proceedings of the 19th Chinese national conference on computational linguistics, pp 853–862. Chinese Information Processing Society of China, Haikou, China. https://aclanthology.org/2020.ccl-1.79 20. Jain R, Wigington C (2019) Multimodal document image classification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 71–77. https://doi.org/10. 1109/ICDAR.2019.00021 21. Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, pp 3168– 3172. https://doi.org/10.1109/ICPR.2014.546 22. Kölsch A, Afzal MZ, Ebbecke M, Liwicki M (2017) Real-time document image classification using deep cnn and extreme learning machines. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 1318–1323. https://doi.org/10. 1109/ICDAR.2017.217 23. Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recogn Lett 43:119–126 24. Mahajan K, Sharma M, Vig L (2019) Character keypoint-based homography estimation in scanned documents for efficient information extraction. CoRR abs/1911.05870, http://arxiv. org/abs/1911.05870 25. Mandivarapu JK, Bunch E, You Q, Fung G (2021) Efficient document image classification using region-based graph neural network. CoRR abs/2106.13802, https://arxiv.org/abs/2106. 13802 26. Mohsenzadegan K, Tavakkoli V, De Silva P, Kolli A, Kyamakya K, Pichler R, Bouwmeester O, Zupan R (2020) A convolutional neural network model for robust classification of documentimages under real-world hard conditions. In: Developments of artificial intelligence technologies in computation and robotics: proceedings of the 14th international FLINS conference (FLINS 2020). World Scientific, pp 1023–1030 27. Nemcová K (2018) Document functional type classification. In: Horák A, Rychlý P., Rambousek A (eds) The 12th workshop on recent advances in Slavonic natural languages processing, RASLAN 2018, Karlova Studanka, Czech Republic, 7–9 Dec 2018. Tribun EU, pp 95–100
References
173
28. Noce L, Gallo I, Zamberletti A, Calefati A (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165–173 29. Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 1273–1278. https://doi.org/10.1109/ICPR.2016. 7899812 30. Sahin ¸ S et al (2020) Dijital dokümanların anahtar kelime tabanlı do˘grulanması 31. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 32. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114 33. Tensmeyer C, Martinez TR (2017) Analysis of convolutional neural networks for document image classification. CoRR abs/1708.03273, http://arxiv.org/abs/1708.03273 34. Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2002) Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1192–1200 35. Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), pp 1–4. https://doi.org/10.1109/SIU.2017.7960562 36. Zavalishin S, Bout A, Kurilin I, Rychagov M (2017) Document image classification on the basis of layout information. Electronic Imaging 78–86. https://doi.org/10.2352/ISSN.24701173.2017.2.VIPC-412 37. Zhou Q, Wu, H (2018) Nlp at iest 2018: Bilstm-attention and lstm-attention via soft voting in emotion classification. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 189–194 38. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710
Chapter 14
Improved Performance of Adaptive UKF SLAM with Scaling Parameter Kübra Yalçin, Serhat Karaçam, and Tu˘gba Selcen Navruz
14.1 Introduction Simultaneous Localization and Mapping (SLAM) is a research area that is extensively used in several mobile robot applications and includes simultaneous estimation of both robot’s position and environment map [1]. This technology has been researched for more than 25 years in the field of autonomous systems navigation and artificial intelligence. In cases where the Global Positioning System (GPS) data of the mobile robot cannot be accessed and the environment map is unknown or very limited, location determination and mapping can be obtained using the information obtained from the robot’s sensors and Kalman filters [2]. The most commonly used probabilistic methods are Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF). In EKF method, nonlinear systems are linearized using Taylor expansion [3] and Jacobian calculations are required. Neglecting higher order terms in Taylor expansion causes an error when linearizing the kinematics and observation equations; so EKF results in poor performance for SLAM [4]. To avoid this poor estimation accuracy resulted by linearization error, Andrade-Cetto et al. [5], realised mobile robot tracking with Unscented Kalman Filter (UKF) based SLAM algorithm. UKF method is an estimation scheme that uses a sampling strategy for the nonlinear distribution [6, 7].
K. Yalçin (B) Test Systems Design Department, MGEO, ASELSAN Inc, Ankara, Turkey e-mail: [email protected] S. Karaçam · T. S. Navruz Department of Electrical and Electronics Engineering, Gazi University, Ankara, Turkey e-mail: [email protected] T. S. Navruz e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_14
175
176
14 Improved Performance of Adaptive UKF SLAM …
Since linearization is done by sampling in UKF, jacobian calculations are not needed in kinematics and observation equations. As a result, it is seen that accuracy of UKF for robot tracking is better than EKF in SLAM applications [8]. Martinez-Cantin and Castellanos [9], examined the UKF-based SLAM algorithm in a large-scale outdoor environment with consistency check. Huang et al., proposed UKF-based SLAM formulation that reduces computational complexity [10]. Many developments on UKF-SLAM can be found also in [11, 12]. Due to the differences between the ideal simulation environment and the environment encountered in real life, many researchers have been interested in adaptive filtering research to improve filter performance [13, 14]. The scaling parameter to be calculated for the adaptation of the UKF SLAM is a very important design parameter as it contributes to the approximation error [15]. Usually, it is not possible to find a closed-form solution for proper adaptive parameter, so methods such as systematic search method and grid method are recommended for the choice of scaling parameter [16, 17]. Adaptive random search maximization method is a appropriate technique for the adaptation of non-differentiable and discontinuous functions [16, 18, 19]. In this study, simulation results of Adaptive UKF SLAM will be compared with classical EKF SLAM and UKF SLAM methods. To make this comparison, root mean square error (RMSE) calculations of robot position and landmark estimations exposed to different noise levels are carried out. The paper is organized as follows: Sect. 14.2 reviews UKF algorithm and the scaling parameter adaptive method and presents Adaptive UKF (AUKF) SLAM algorithm. Section 14.3 presents the simulation results and comparisons between methods. Section 14.4 concludes the study.
14.2 Adaptive UKF SLAM UKF method performs operations over probability distribution functions (PDF) instead of directly approximating through a nonlinear function. Let the discrete-time nonlinear stochastic system be considered xk+1 = f k (xk , u k , tk ) + wk , k = 0, 1, 2, . . .
(14.1)
z k = h k (xk , tk ) + vk , k = 0, 1, 2, . . .
(14.2)
wk ∼ (0, Q k )
(14.3)
vk ∼ (0, Rk )
(14.4)
where xk ∈ Rn , u k ∈ Rm , z k ∈ R p represent state vector, input vector and observation vector at time tk , respectively. f : Rn → Rn and h : Rn → R p are known as a vector functions. Q and R are system and observation noise covariance matrices, respectively. It is assumed that wk and vk noises are zero-mean, Gaussian, uncorrelated white sequences [8]. Initial state vector x0 and covariance matrix P0 are described below.
14.2 Adaptive UKF SLAM
177
xˆ0+ = E (x0 ) P0+ = E
x0 − xˆ0+
x0 − xˆ0+
(14.5) T
(14.6)
UKF method consists of a set of sigma points. These sigma points are obtained from state distribution function and with these points real mean vector and covariance matrix can be easily obtained. Sigma points are calculated as given in the equation below. / + + + + = xk−1 xk−1 ± (n x + κ) Pk−1 (14.7) X k−1 where n x is length of state vector xk and κ is the scaling parameter. Each sigma point is propagated through the nonlinear function of the system and a cloud of transformed points are yielded as follows: + , u k , tk X k− = f X k−1
(14.8)
The new mean and covariance matrices of a priori state estimate at time k are obtained by: xˆk− =
2n x
Wi X i,k|k−1 + qk−1
(14.9)
− − T Wi X i,k − xˆk− X i,k − xˆk− + Q k−1
(14.10)
i=0
Pk− =
2n x i=0
here qk is process noise mean, Wi is the weight of the mean and covariance and given in the following equations. κ nx + κ
(14.11)
1 i = 1, . . . , 2n x {2 (n x + κ)}
(14.12)
W0 = W 0 = Wi = Wi =
For implementing the measurement update, the equations are derived as below. + X k− = f X k−1 , u k , tk
(14.13)
/ X k− = xk− , xk− ± (n x + κ) Pk−
(14.14)
zˆ k = h xk− , tk
(14.15)
178
14 Improved Performance of Adaptive UKF SLAM …
z¯ k =
2n x
Wi zˆ i,k + rk
(14.16)
T Wi zˆ i,k − z¯ k zˆ i,k − z¯ k + Rk
(14.17)
i=0
Pz =
2n x i=0
Px z =
2n x
− T Wi xi,k − xˆk− zˆ i,k − z¯ k
(14.18)
i=0
K k = Px z Pz−1
(14.19)
xˆk+ = xˆk− + K k z k − zˆ k
(14.20)
Pk+ = Pk− − K k Pz K kT
(14.21)
where rk is measurement noise mean. In Eq. 14.14, the weighted sigma points are rearranged to update the covariance matrix of the predicted measurement. The crosscovariance matrix, Px z is estimated by Eq. 14.18 and used to find Kalman gain, K in Eq. 14.19. The scaling parameter, κ is an important design parameter in UKF. In most of UKF studies related with determination of scaling parameter, setting κ = 3 − n x has been the most common choice [20]. However, in a system with size of state vector n x > 3 , κ can be negative and this choice might cause to positive definiteness loss of observation covariance matrix, Pz [15]. Thus, Bayesian recursive relations can be used while determining scaling parameter. PDF form of state vector [21]: p (z k | xk ) p xk | z k−1 k (14.22) p xk | z = p z k | z k−1 Following assumptions are associated with Gaussian nature of the prediction PDF [22]: pˆ (z k | xk , κ) ≈ N {z k ; h k (xk , κ) , Rk }
(14.23)
pˆ xk | z k−1 , κ ≈ N xk ; xˆk|k−1 (κ), Pk|k−1 (κ)
(14.24)
pˆ z k | z k−1 , κ ≈ N z k ; zˆ k|k−1 (κ), Pz,k|k−1 (κ)
(14.25)
Various adaptation techniques for scaling parameter are proposed in [15, 17, 21]. Two different problems arise in these proposed methods. First main problem occours when κ is chosen before estimation. If the choice is made before estimation, positive definiteness may be lost and this cause errors in estimation steps. Second main problem is that when only the maximum likelihood criterion is used, information about the dynamic of system may be neglected.
14.2 Adaptive UKF SLAM
179
Fig. 14.1 Implementation flowchart of AUKF SLAM
In [8], a random search technique is proposed for maximizing posteriori probability for adaptation of the scaling parameter. In the proposed approach, observation data is used instead of odometry data in the bayesian recursive relationship in Eq. 14.22, since the data obtained from sensors are more consistent than data obtained from odometry in SLAM applications. As a result, likelihood function and measurement PDF are used to calculate the adaptation criterion as shown in Eq. 14.26.
pˆ z k | xk|k−1 , κ κ∈ κmin, κmax (14.26) κˆ k = arg max pˆ z k | z k−1 , κ
κmin, κmax domain represents random κ values chosen between κmin and κmax . Adaptaston criterion is calculated with random κ values selected from this domain. κ, which gives the maximum value among calculated values, is chosen as Adaptive scaling parameter, κˆ k . To solve this equation, likelihood functions can be used as in Eqs. 14.27 and 14.28. pˆ z k | xk|k−1 , κ ≈
1 x exp − (z k p 2 (2π ) 2 |Rk | T −1 −h k xk|k−1 , κ Rk z k − h k xk|k−1 , κ 1
pˆ z k | z k−1 , κ ≈
1 1 | exp − (z k | 2 (2π ) | PZ ,k|k−1 | T −1 −ˆz k|k−1 Rz,k|k−1 z k − zˆ k|k−1 p 2
(14.27)
(14.28)
The flowchart of Adaptive UKF SLAM is described in Fig. 14.1. Detailed information about steps of the algorithm is given in previous sections.
180
14 Improved Performance of Adaptive UKF SLAM …
14.3 Simulation Results and Discussions In the simulation studies, an open source SLAM Simulator prepared by Bailey et al. [23] for SLAM research was used. In this simulator, it is assumed that there is a known set of waypoints and control signals are generated to move robot between these waypoints. Simulation results at different noise levels (Tables 14.1, 14.3 and 14.5) for EKF, UKF and AUKF SLAM are presented in Tables 14.2, 14.4 and 14.6. In AUKF
SLAM algorithm adaptation parameter is randomly generated in range κmin, κmax . Afterward, from sigma points which are calculated for different κ values, prediction and update of mean and covariance matrices of recently observed landmarks are calculated. The obtained set of data is checked for adaptation criterion. As seen from Tables 14.2, 14.4 and 14.6; AUKF provided more accurate results compared to EKF and UKF in all tests. It is also observed that RMSE values of AUKF were less sensitive to initial noise increase while RMSE values EKF increased sharply with increment of initial noise. The performance of UKF is higher than EKF but lower than AUKF. In AUKF, it was observed that the RMSE results decreased when adaptation parameter, κ is chosen more frequently in the same range. For 1st test, RMSE values, at each time step, in x and y coordinate can be seen in Fig. 14.2. In Fig. 14.3, estimated and true paths throughout with estimated and true landmark positions using UKF and AUKF (κ ∈ {0 : 0.5 : 4}), with 2nd test noise values. As seen in Figs. 14.2 and 14.3, Adaptive UKF has more reliable results with respect to
Table 14.1 Initial noise values for 1st test Process noise mean 0.2 q0 = 2π/180 Measurement noise mean 0.1 r0 = π/180
Process noise covariance 0 0.22 Q0 = 0 (2π/180)2 Measurement noise covariance 0 0.12 R0 = 0 (π/180)2
Table 14.2 RMSE values of test 1 for EKF, UKF and AUKF EKF UKF Filter Scaling parameter
Without κ
κ =3−n
RMSE of robot pose x-axis RMSE of robot pose y-axis RMSE of landmark position
4.5184
2.9287
5.7519
4.0145
6.1358
5.0914
AUKF κ ∈ {0 : 0.5 : 4} κ ∈ {0 : 1 : 4} 0.9016 2.4058 1.5744 4.1241 4.4132 4.8048
14.3 Simulation Results and Discussions Table 14.3 Initial noise values for 2nd test Process noise mean 0.3 q0 = 3π/180 Measurement noise mean 0.1 r0 = 0.5π/180
181
Process noise covariance 0 0.32 Q0 = 0 (3π/180)2 Measurement noise covariance 0 0.12 R0 = 0 (0.5π/180)2
Table 14.4 RMSE values of test 2 for EKF, UKF and AUKF Filter EKF UKF Scaling parameter
Without κ
κ =3−n
RMSE of robot pose x-axis RMSE of robot pose y-axis RMSE of landmark position
4.8457
2.9287
2.5507
4.0145
6.2054
4.9194
Table 14.5 Initial noise values for 3r d test Process noise mean 0.3 q0 = 3π/180 Measurement noise mean 0.1 r0 = π/180
AUKF κ ∈ {0 : 0.5 : 4} κ ∈ {0 : 1 : 4} 1.2704 3.6521 2.6265 6.5731 4.4371 6.1764
Process noise covariance 0 0.32 Q0 = 0 (3π/180)2 Measurement noise covariance 0 0.12 R0 = 0 (π/180)2
Table 14.6 RMSE values of test 3 for EKF, UKF and AUKF EKF UKF Filter Scaling parameter
Without κ
κ =3−n
RMSE of robot pose x-axis RMSE of robot pose y-axis RMSE of landmark position
5.9667
2.9287
6.8190
4.0145
7.1359
5.0914
AUKF κ ∈ {0 : 0.5 : 4} κ ∈ {0 : 1 : 4} 1.2704 3.6523 2.6265 6.5734 4.5137 5.1097
EKF and UKF. Robot path and landmark positions are estimated more accurately. So, it can be said that adaptation of scaling parameter improves precision of estimation for nonlinear systems.
182
14 Improved Performance of Adaptive UKF SLAM …
Fig. 14.2 RMSE value of robot position for x-axis and y-axis respectively (for AUKF, κ ∈ {0 : 0.5 : 4}) in 1st test noise values
Fig. 14.3 Estimated (black) and true (green) paths throughout with estimated (red) and true (blue) landmarks position using UKF and AUKF (κ ∈ {0 : 0.5 : 4}) respectively, in 2nd test noise values
14.4 Conclusion and Suggestions In this study, Adaptive UKF with appropriate scaling parameter is compared with EKF and UKF in SLAM application, by applying three different initial noise levels. It was observed that AUKF was the least affected method from noise variation and for all tested noise levels AUKF provided more accurate results. The quality of the results obtained by AUKF was improved with proper adjustment of scaling parameter, κ. Since mean and the covariances matrices of the observed landmarks at that time instant are used to adjust κ, this approach is proper to real-time applications like SLAM.
References
183
References 1. Durrant-Whyte H, Bailey T (2006) Simultaneous localization and mapping: Part I. IEEE Robot Autom Mag 13:99–108 2. Taheri H, Xia ZC (2020) SLAM; definition and evolution. Eng Appl Artif Intell 97:104032 3. Julier SJ, Uhlmann JK (2001) A counter example to the theory of simultaneous localization and map building. Proc IEEE Int Conf Robot Autom 4238–4243 4. Huang GP, Mourikis AI, Roumeliotis SI (2008) Analysis and improvement of the consistency of extended Kalman filter based SLAM. Proc IEEE Int Conf Robot Autom 473–479 5. Andrade-Cetto J, Vidal-Calleja T, Sanfeliu A (2005) Unscented transformation of vehicle states in SLAM. Proc IEEE Int Conf Robot Autom 323–328 6. Wan EA, Van Der Merwe R (2000) The unscented Kalman filter for nonlinear estimation. IEEE 2000 Adapt Syst Signal Process Commun Control Symp 153–158 7. Julier SJ (2003) The spherical simplex unscented transformation. Proc Am Control Conf 2430– 2434 8. Bahraini MS (2020) On the efficiency of SLAM using adaptive unscented Kalman filter. Iran J Sci Technol Trans Mech 44:727–735 9. Martinez-Cantin R, Castellanos JA (2005) Unscented SLAM for large-scale outdoor environments. IEEE/RSJ Int Conf Intell Robot Syst IROS 3427–3432 10. Huang GP, Mourikis AI, Roumeliotis SI (2009) On the complexity and consistency of UKFbased SLAM. Proc IEEE Int Conf Robot Autom 4401–4408 11. Maeyama S, Takahashi Y, Watanabe K (2015) A solution to SLAM problems by simultaneous estimation of kinematic parameters including sensor mounting offset with an augmented UKF. EAdv Robot 29:1137–1149 12. Cadena C (2016) Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans Robot 32:1309–1332 13. Sasiadek JZ, Monjazeb A, Necsulescu D (2008) Navigation of an autonomous mobile robot using EKF-SLAM and FastSLAM. Mediterr Conf Control Autom Conf Proc MED’08 517–522 14. Bahraini MS, Bozorg M, Rad AB (2018) SLAM in dynamic environments via ML-RANSAC. Mechatronics 49:105–118 15. Bhaumik S, Date P (2019) Unscented Kalman filter. Nonlinear Estim 57:51–64 16. Straka O, Duník J, Šimandl M, Blasch E (2014) Comparison of adaptive and randomized unscented Kalman filter algorithms. FUSION 2014 17th Int Conf Inf Fusion 1–8 17. Straka O, Duník J, Šimandl M (2014) Unscented Kalman filter with advanced adaptation of scaling parameter. Automatica 50:2657–2664 18. Duník J, Straka O, Šimandl M (2011) The development of a randomised unscented Kalman filter. 18th IFAC World Congress 44:8–13 19. Dunik J, Straka O, Šimandl M (2013) Stochastic integration filter. IEEE Trans Automat Contr 58:1561–1566 20. Julier SJ, Uhlmann JK (2004) Corrections to “Unscented filtering and nonlinear estimation. Proc IEEE 92:1958 21. Straka O, Duník J, Simandl M (2011) Performance evaluation of local state estimation methods in bearings-only tracking problems. Fusion 2011 14th Int Conf Inf Fusion 1–8 22. Bahraini M, Bozorg M, Rad A (2018) A new adaptive UKF algorithm to improve the accuracy of SLAM. Int J Robot 5:1–12 23. Bailey T, Nieto J, Guivant J, Stevens M, Nebot E (2006) Consistency of the EKF-SLAM algorithm. IEEE/RSJ Int Conf Intelli Robot Syst 3562–3568
Chapter 15
An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based on MLE, EM and ICE Serhat Karaçam, Kübra Yalçin, and Tu˘gba Selcen Navruz
15.1 Introduction Researchers have been interested in the autonomous movement of mobile robots for many years. A map of the environment is necessary for the autonomous movement of a robot. This map is generally not given to robot at the beginning. Robot must have the ability to create its own map during its motion. The process of determining both the map and the location of the robot simultaneously is called as Simultaneous Localization and Mapping (SLAM) [1, 2]. While SLAM provides many advantages in autonomous movement, it also brings some problems to be solved. The most common problems are accuracy, solution complexity, and continuous uncertainties [3]. In order to solve these problems, probability-based solutions have been proposed to solve the SLAM problem [4]. Kalman Filter (KF) and Particle Filter (PF) based solutions are widely used probability-based solutions. Among the KF [5] solutions, the most commonly used filter is Extended Kalman Filter (EKF) [4]. EKF is a filter which is easy to apply SLAM problem and gives acceptable results when the system model is known. The biggest problem of EKF method is the linearization method of the system. EKF approximately linearizes the nonlinear system by using first-order Taylor expansion and uses the Jacobian matrices during linearization. Jacobian matrices increase S. Karaçam (B) · T. S. Navruz Department of Electrical and Electronics Engineering, Faculty of Engineering, Gazi University, Ankara, Turkey e-mail: [email protected] T. S. Navruz e-mail: [email protected] K. Yalçin Test Systems Design Department, MGEO, ASELSAN Inc., Ankara, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_15
185
186
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
the computational cost while applying SLAM algorithm [6]. Unscented Kalman Filter (UKF), which uses a different linearization method, is proposed to reduce the computational cost [7]. Although there is almost no difference between the results of these two methods, computational cost of UKF is lower than EKF [8]. However, the common problem of both methods is the difficulty to adapt to real life meanwhile they can be applied effectively in the simulation environment. While both systems require known noise statistics and a consistent, known system model, in real life noise statistics are constantly changing and growing. If UKF and EKF are designed under assumption of known noise statics, the system designed will not yield consistent results in real-life applications [9]. In addition, this may cause the filter to diverge [10]. Therefore, noise statistics must change recursively. Some EKF approaches have been proposed to change noise statistics recursively. These approaches are called Adaptive EKF (AEKF). Huang et al. [11] have calculated estimated error covariance and measurement noise statistics at each step by using Maximum Likelihood Estimation (MLE) and Expectation Maximization (EM). Duan et al. [9] suggested to adjust measurement and noise covariance matrices using the last measurements. Kim et al. [10], proposed a two-stage improved AEKF with adaptive damping EKF applied to calculate adaptive terms. Sarkka et al. [12] also discussed the estimation of time-varying noise statistics calculated at each step using the variational Bayesian method. However, the point where these methods are missing is that not all the noise statistics have been estimated. The measurement noise covariance was not taken into account in some studies when estimating process noise, and the process noise covariance was not taken into account in others. Gao et al. [13], proposed an adaptive Kalman Filter which calculates noise statistics at each step for SINS/DVL systems. Suwoyo et al. [14] used the method suggested by Gao et al. as an AEKF SLAM application. However, the estimated process noise covariance and measurement noise covariance matrices calculated in this study are likely to exhibit some undesirable characteristics such as negative definite matrix. To solve this problem, Innovation Covariance Estimation (ICE) method was used to minimize innovation covariance matrix proposed by He et al. [15] and was applied to AEKF by Tian et al. [16] In this study, comparisons of the method proposed by Tian et al. with EKF and UKF were made in different noise scenarios. In addition, the effect of data association, which is an important parameter for SLAM applications [17], in these different noise scenarios has been examined.
15.2 Methods 15.2.1 Extended Kalman Filter (EKF) Considering a classic non-linear EKF SLAM system [4, 14], it can be written as below:
15.2 Methods
187
xk = f (xk−1 , u k ) + wk−1
(15.1)
z k = h(xk ) + vk
(15.2)
where k is the time, x is state vector, u is command vector, and z is measurement vector. f (.) function describes the nonlinear process function and h(.) function describes the nonlinear measurement function. wk and vk represent process and measurement noises, respectively. Further noise terms are given as follows [14]: E[wk ] = qk , Cov wk , w j = Q k δk j
(15.3)
E[vk ] = rk , Cov vk , v j = Rk δk j
(15.4)
where δk j is the Kronecker delta function, E [] stands for mean and Cov [] stands for covariance. As a result of Eqs. (15.3 and 15.4), noise statistics will be represented as follows; process noise mean qk and process noise covariance Q k , measurement noise mean rk and process noise covariance Rk . Finally EKF equations are obtained as below [14]: xˆk|k−1 = f xˆk−1|k−1 , u k + qk−1
(15.5)
T Pk|k−1 = Fk−1 Pk−1|k−1 Fk−1 + Q k−1
(15.6)
zˆ k|k−1 = h xˆk|k−1 + rk
(15.7)
ez,k|k−1 = z k − zˆ k|k−1
(15.8)
Sk = H Pk|k−1 H T + Rk
(15.9)
K k = Pk|k−1 H T Sk−1
(15.10)
xˆk|k = xˆk|k−1 + K k ez,k|k−1
(15.11)
Pk|k = (I − K k H )Pk|k−1
(15.12)
Here F, is the Jacobian matrix of the f (.) function, and H is the Jacobian matrix of the h(.) function.
188
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
15.2.2 Unscented Kalman Filter (UKF) Unlike EKF, UKF linearizes the nonlinear system with unscented transform (UT) using sigma points [18]. The number of sigma points is calculated as 2L + 1, where L, is the dimension number of the state vector x. Sigma points are listed: χi,k−1|k−1 = xˆk−1|k−1 i = 0
(15.13)
χi,k−1|k−1 = xˆk−1|k−1 + (L + λ)P i = 1, . . . , L
(15.14)
χi,k−1|k−1 = xˆk−1|k−1 − (L + λ)P i = L + 1, . . . , 2L
(15.15)
W0m = λ/(L + λ)
(15.16)
W0c = λ/(L + λ) + 1 + α 2 + β
(15.17)
Wim = Wic = 1/(2(L + λ)) i = 1, . . . , 2L
(15.18)
Here W m describes mean weight, W c describes covariance weight, λ is scaling parameter and χi are sigma points. These sigma points are transformed by the nonlinear function: γi,k|k−1, = f χi,k−1|k−1 i = 0, . . . , 2L
(15.19)
With the weighted sum of these transformed sigma points, estimated state vector xˆk|k−1 and its covariance matrix Pk|k−1 can be calculated: xˆk|k−1 =
2L
Wim γi,k|k−1 + qk−1
(15.20)
i=0
Pk|k−1 =
2L
T Wic (γi,k|k−1 − xˆk|k−1 ) γi,k|k−1 − xˆk|k−1
(15.21)
i=0
Sigma points are required for measurement update too. Here, the sigma points from the previous step can also be used or can be recalculated. χi,k|k−1 = xˆk|k−1 i = 0
(15.22)
χi,k|k−1 = xˆk|k−1 + (L + λ)P i = 1, . . . , L
(15.23)
χi,k|k−1 = xˆk|k−1 − (L + λ)P i = L + 1, . . . , 2L
(15.24)
15.2 Methods
189
These sigma points are transformed by the nonlinear function: ξi,k|k−1, = h χi,k−1|k−1 i = 0, . . . , 2L
(15.25)
With the weighted sum of these projected sigma points, estimated state vector zz zˆ k|k−1 , measurement covariance matrix Pk|k−1 , and cross-covariance matrix between xz state and measurement vectors Pk|k−1 can be calculated as follows. zˆ k|k−1 =
2L
Wim ξi,k|k−1 + rk
(15.26)
T Wic (ξi,k|k−1 − zˆ k|k−1 ) ξi,k|k−1 − zˆ k|k−1
(15.27)
T Wic (γi,k|k−1 − xˆk|k−1 ) ξi,k|k−1 − zˆ k|k−1
(15.28)
i=0 zz Pk|k−1 =
2L i=0
xz Pk|k−1 =
2L i=0
Finally, Kalman gain, K, state vector and covariance matrix are obtained as below. zz,−1 xz K = Pk|k−1 Pk|k−1
(15.29)
xˆk|k = xˆk|k−1 + K z k|k−1 − zˆ k|k−1
(15.30)
zz Pk|k = Pk|k−1 − K Pk|k−1 KT
(15.31)
15.2.3 Adaptive Extended Kalman Filter (AEKF) The estimation of the unknown noise statistics determined in EKF was carried out by applying MLE and EM [16]. With MLE [19] the most likely location of data is searched in the probability distribution where measurements can be observed. The equations coming from MLE are solved by EM. Logarithm of likelihood distribution, J is obtained and then partial derivative of J according to each noise term will be set to 0 (see [16] for more details). Suppose θˆ = (q, r, Q, R) is noise statistics for the nonlinear system. θˆ ’s MLE is calculated as: θˆ M L = arg max{ln[L(q, r, Q, R|Z k , X k )]} θ
(15.32)
where L[] denotes the likelihood function. When the partial derivative of J = E[ln[L(q, r, Q, R|Z k , X k )] is set to 0, equations for noise statistics that maximize J can be found.
190
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
dJ =0 d(q, r, Q, R) qˆk =
k k 1 1 xi|i − f xi−1|i , rˆk = z i|i − h xi|i k i=1 k i=1
(15.33)
(15.34)
k T 1 xi|i − f xi−1|i − q xi|i − f xi−1|i − q Qˆ k = k i=1
(15.35)
T 1 z i|i − h xi|i − r z i|i − h xi|i − r Rˆ k = k i=1
(15.36)
k
The term xi−1|i in Eqs. (15.34) and (15.35) cannot be found with classical EKF. Fixed point smoothing algorithm was used to find xi−1|i . The goal is to obtain a prior state estimate for x j at time j + 1 [13]. When these equations are simplified [16], the equations for qˆk , rˆk , Qˆ k and Rˆ k come as follows: 1 1 K i ez,i + qi−1 , rˆk = (I − H K i )ez,i + ri k i=1 k i=1
(15.37)
1 T T K i ez,i ez,i K iT + Fi−1 Pi−1|i−1 Fi−1 − Pi|i Qˆ k = k i=1
(15.38)
1 T Rˆ k = (I − K i H ) ez,i ez,i (I − K i H )T + H Pi|i−1 H T k i=1
(15.39)
k
qˆk =
k
k
k
In Eqs. (15.32–15.39), noise statistics were considered as time-invariant. But noise statistics are time variable. In cases where noise statistics change with time, weighted damping memory method proposed by Gao et al. [20] where the term 1/k is used as the damping term, aims to increase the weight of variations. When the necessary derivations are made in the Eqs. (15.37–15.39), estimation equations of the time-varying noise statistics with damping term dk are obtained [13]. qˆk = qˆk−1 + dk K k ez,k , rˆk = rˆk−1 + dk (I − H K k )ez,k
(15.40)
T T K kT + Fk−1 Pk−1|k−1 Fk−1 − Pk|k Qˆ k = (1 − dk ) Qˆ k−1 + dk K k ez,k ez,k
(15.41)
T Rˆ k = (1 − dk ) Rˆ k−1 + dk (I − H K k )[ez,k ez,k (I − H K k )T + H Pk|k−1 H T (15.42)
dk =
1−b 1 − bk
where b is the damping factor ranging from 0 < b < 1.
(15.43)
15.2 Methods
191
During the previous operations Q and R matrices were assumed to be positive definite matrices. However, Q and R may return to negative definite matrices while AEKF is running, and the filter may diverge. To avoid this, an approach is proposed T which is called as Innovation Covariance for the innovation covariance term ez,k ez,k Estimation (ICE) [16] and formulated as below. ICEk =
N 1 T ez,k − jez, j−k N j=0
= ICEk +
1 T T ez,k ez,k − ez,k ez,k−N N
(15.44)
T ICE value calculated with Eq. (15.44) can be written instead of ez,k ez,k terms in Eq. (15.40 and 15.41).
15.2.4 Data Association Data association is establishing a relationship between previously seen landmarks and new observations. The most commonly used method for data association is Mahalanobis distance. ezT S −1 ez < dmin
(15.45)
ez is the difference between predicted observation position of the landmark and the true position of the landmark, and S is the covariance of this ez . dmin is chosen as minimum distance required to account for data association (see [17] for more details).
15.2.5 AEKF-SLAM Algorithm The algorithm flowchart of the AEKF method described in previous sections is given in Fig. 15.1. Fig. 15.1 Flowchart of AEKF algorithm
192
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
15.3 Simulation Results and Discussion In the simulation studies, an open source SLAM Simulator prepared by Bailey et al. [21] for SLAM research was used. In this simulator, it is assumed that there is a known set of waypoints and control signals are generated to move robot between these waypoints. In the simulations, the effects of data association and prior noise levels on AEKF, AEKF-ICE, EKF and UKF were analyzed. For this, two main simulation tests were determined. In the first test, it was thought that robot knew the data association before starting the motion, and in the other test, the data association was not given to robot. In both tests, observation noise statistics’ and measurement noise statistics’ prior values were changed. Noise statistics matrices during all tests are shown as follows. 2 2 qv qv 0 rR rR 0 , Q0 = , r0 = , R0 = q0 = qG rB 0 qG2 0 r B2
(15.46)
Prior noise statistics values to be used in each test are given in Table 15.1. Test-1: Data Association is Given to Robot Before Motion In this test, at different noise levels, root mean square error (rmse)’s of error values related to the estimation of x and y coordinates of robot’s position and estimation of locations of landmarks were measured according to noise conditions given in Table 15.1. Calculated rmse values are given in Table 15.2. It is seen from Table 15.2 that both AEKF and AEKF-ICE algorithms gave more accurate results than EKF and UKF in determining robot position. In determining landmark position, EKF, UKF and AEKF had approximately similar rmse values, while AEKF-ICE made predictions with less error than the other algorithms. In addition, it is seen that AEKF, and AEKF-ICE are less sensitive to variations in noise statistics compared to EKF and UKF. In case of Noise-5, rmse values, at each timestep, in x and y coordinates can be seen in Figs. 15.2 and 15.3, respectively.
Table 15.1 Prior noise statistics values are used in each test qV (m/s)
qG (rad)
r R (m)
r B (rad)
Noise-1
0.3
π/180
0.1
π/180
Noise-2
0.3
π/180
0.25
π/180
Noise-3
0.3
π/180
0.1
0.5π/180
Noise-4
0.3
0.75π/180
0.1
π/180
Noise-5
0.25
0.75π/180
0.1
π/180
1.2525
Noise-5
1.6717
1.6617
1.3717
1.4930
1.1893
Noise-3
Noise-4
Noise-2
2.1169
2.5132
1.6277
2.0144
Noise-1
4.7845
4.7818
4.7879
4.8512
4.8075
1.2083
1.1376
1.4534
1.9644
1.5598
UKF rmse—x
rmse—lm
rmse—x
rmse—y
EKF
1.5821
1.5501
1.2460
2.4141
1.9544
rmse—y
Table 15.2 rmse values for EKF, UKF, AEKF and AEKF-ICE in Test-1
4.7808
4.7777
4.7839
4.8409
4.7962
rmse—lm
0.2731
0.2677
0.4668
0.4685
0.4606
rmse—x
AEKF
0.3794
0.4031
0.6087
0.6109
0.6034
rmse—y
4.4950
4.9280
4.4630
4.4596
4.4712
rmse—lm
0.2733
0.2683
0.4662
0.4679
0.4669
rmse—x
AEKF-ICE
0.3791
0.4030
0.6078
0.6101
0.6091
rmse—y
2.0320
2.1515
2.0233
2.0249
2.0267
rmse—lm
15.3 Simulation Results and Discussion 193
194
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
Fig. 15.2 rmse values(m) of robot’s position estimation in x coordinate for Noise-5 state in test-1
Fig. 15.3 rmse values(m) of robot’s position estimation in y coordinate for Noise-5 state in test-1
Test-2: Data Association is not Given to Robot Before Motion The same operations performed in Test-1 were also applied in Test-2 and obtained rmse values are shown in Table 15.3. Higher error values were observed in Test-2 (Table 15.3) when compared to Test-1 (Table 15.2) for all algorithms since data association was not given to robot before motion. As seen from Table 15.3 AEKF had better performance than EKF and UKF in four noise cases especially in position estimation. When ICE was included to AEKF a significant amount of decrease in landmark estimation error is obtained. However, position estimation error was not improved. In addition, AEKF was not affected by the changes in both measurement and observation noise statistics as much as EKF and UKF, and even a decrease in the estimation error value was observed.
1.8198
Noise-5
2.6431
2.6393
2.3425
2.2956
1.7813
Noise-3
Noise-4
Noise-2
3.5945
3.3949
2.6178
2.6106
Noise-1
4.8204
4.8183
4.8326
4.9323
4.9534
1.7937
1.7026
2.2652
2.6101
2.4571
UKF rmse—x
rmse—lm
rmse—x
rmse—y
EKF
2.6637
2.4082
2.3109
3.4825
3.1784
rmse—y
Table 15.3 rmse values for EKF, UKF, AEKF and AEKF-ICE in Test-2
4.8210
4.8007
4.8291
4.9379
4.8989
rmse—lm
0.5495
2.9510
1.7273
1.4045
2.0789
rmse—x
AEKF
0.7461
3.3273
1.4342
1.9506
1.9674
rmse—y
4.4449
4.9873
5.0185
4.8176
4.8907
rmse—lm
2.0375
1.6739
2.9668
2.9661
3.1541
rmse—x
AEKF-ICE
2.3249
1.8198
3.5428
3.8646
1.9697
rmse—y
2.1439
2.1209
2.1495
2.1473
2.1132
rmse—lm
15.3 Simulation Results and Discussion 195
196
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
Fig. 15.4 rmse values(m) of robot’s position estimation in x coordinate for Noise-5 state in test-2
Fig. 15.5 rmse values(m) of robot’s position estimation in y coordinate for Noise-5 state in test-2
In case of Noise-5, rmse values, at each timestep, in x and y coordinates can be seen in Figs. 15.4 and 15.5, respectively.
15.4 Conclusions and Future Work In this study, AEKF and AEKF-ICE were compared with EKF and UKF when data association known and unknown in five different noise statistics cases. According to these comparisons, it has been observed that position and landmark estimation accuracy of AEKF was higher than EKF and UKF. Especially when data association
References
197
is given to robot before its motion, AEKF gave much better results. Based on that, it can be said that the sensor types to be selected is important in SLAM applications where AEKF will be used. In landmark position estimation, the estimation accuracy of AEKF is approximately same with EKF and UKF, while the estimation accuracy of AEKF-ICE was much better. Further studies can be carried out for improving landmark position estimation in AEKF and robot position estimation in AEKF-ICE and testing these filters on SLAM applications using different sensors.
References 1. Durrant-Whyte H, Bailey T (2006) Simultaneous localization and mapping: Part I. IEEE Robot Autom Mag 13(2):99–108. https://doi.org/10.1109/MRA.2006.1638022 2. Bailey T, Durrant-Whyte H (2006) Simultaneous localization and mapping (SLAM): Part II. IEEE Robot Autom Mag 13(3):108–117. https://doi.org/10.1109/MRA.2006.1678144 3. Taheri H, Xia ZC (2021) SLAM; definition and evolution. Eng Appl Artif Intell 97:104032. https://doi.org/10.1016/j.engappai.2020.104032 4. Thrun S (2002) Probabilistic robotics. Commun ACM 45(3):52–57. https://doi.org/10.1145/ 504729.504754 5. Kalman RE, Bucy RS (1961) New results in linear filtering and prediction theory. J Fluids Eng Trans ASME 83(1):95–108. https://doi.org/10.1115/1.3658902 6. Kim C, Sakthivel R, Chung WK (2008) Unscented FastSLAM: a robust and efficient solution to the SLAM problem. IEEE Trans Robot 24(4):808–820. https://doi.org/10.1109/TRO.2008. 924946 7. Bhaumik S, Date P (2019) Unscented Kalman filter. Nonlinear Estim 57(9):51–64. https://doi. org/10.1201/9781351012355-3 8. Kurt-Yavuz Z, Yavuz S (2012) A comparison of EKF, UKF, FastSLAM2.0, and UKFbased FastSLAM algorithms. In: Proceedings of 2012 IEEE 16th international conference on intelligent engineering systems (INES), pp 37–43. https://doi.org/10.1109/INES.2012.624 9866 9. Duan H et al (2017) The application of AUV navigation based on cubature Kalman filter. In: 2017 IEEE OES international symposium of underwater technology (UT) pp 8–11.https://doi. org/10.1109/UT.2017.7890310 10. Kim KH, Lee JG, Park CG (2009) Adaptive two-stage extended Kalman filter for a faulttolerant INS-GPS loosely coupled system. IEEE Trans Aerosp Electron Syst 45(1):125–137. https://doi.org/10.1109/TAES.2009.4805268 11. Huang Y, Zhang Y, Xu B, Wu Z, Chambers JA (2018) A new adaptive extended Kalman filter for cooperative localization. IEEE Trans Aerosp Electron Syst 54(1):353–368. https://doi.org/ 10.1109/TAES.2017.2756763 12. Sarkka S, Nummenmaa A (2009) Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Trans Autom Contr 54(3):596–600. https://doi.org/10.1109/ TAC.2008.2008348 13. Gao W, Li J, Zhou G, Li Q (2015) Adaptive Kalman filtering with recursive noise estimator for integrated SINS/DVL systems. J Navig 68(1):142–161. https://doi.org/10.1017/S03734633 14000484 14. Suwoyo H, Tian Y, Wang W, Hossain MM, Li L (2019) a Mapaekf-Slam algorithm with recursive mean and covariance of process and measurement noise statistic. Sinergi 24(1):37. https://doi.org/10.22441/sinergi.2020.1.006
198
15 An Adaptive EKF Algorithm with Adaptation of Noise Statistic Based …
15. He J, Chen Y, Zhang Z, Yin W, Chen D (2018) A hybrid adaptive unscented Kalman filter algorithm. Int J Eng Model 31(3)(Regular Issue):51–65. https://doi.org/10.31534/ENGMOD. 2018.3.RI.04D 16. Tian Y, Suwoyo H, Wang W, Mbemba D, Li L (2020) An AEKF-SLAM algorithm with recursive noise statistic based on MLE and EM. J Intell Robot Syst Theor Appl 97(2):339–355. https:// doi.org/10.1007/s10846-019-01044-8 17. Bailey T (2002) Mobile robot localisation and mapping in extensive outdoor environments. Philosophy 31:212. https://doi.org/10.1016/S0921-8890(99)00078-0 18. Wan EA, Van Der Merwe R (2000) The unscented Kalman filter for nonlinear estimation. In: IEEE 2000 adaptive systems for signal processing, communications, and control symposium (AS-SPCC), pp 153–158. https://doi.org/10.1109/ASSPCC.2000.882463 19. I. J. Myung, “Tutorial on maximum likelihood estimation,” vol. 47, pp. 90–100, 2003, doi: https://doi.org/10.1016/S0022-2496(02)00028-7. 20. Gao X, You D, Katayama S (2012) Seam tracking monitoring based on adaptive Kalman filter embedded elman neural network during high-power fiber laser welding. IEEE Trans Ind Electron 59(11):4315–4325 21. Bailey T, Nieto J, Guivant J, Stevens M, Nebot E (2006) Consistency of the EKF-SLAM algorithm. In: IEEE international conference on intelligent robots and systems (1), pp 3562– 3568.https://doi.org/10.1109/IROS.2006.281644
Chapter 16
Artificial Intelligence Based Detection of Estrus in Animals Using Pedometer Data Ali Hakan I¸sık , Seyit Haso˘glu , Ömer Can Eskicio˘glu , and Edin Dolicanin
16.1 Introduction Human beings live in an age where technology can work on every business and special solutions can be produced. People are constantly intertwined and interacting with technology. Sensor, artificial intelligence, big data and wireless communication technologies have an important place in our lives. By taking advantage of these opportunities, unique solutions are produced for the farms. Today, farms aim to achieve maximum efficiency by using technological opportunities. The technological needs of the farms change in direct proportion to the size of the farms. A wide range of products, from large farms to small farms, helps the farmer. Farmers want to increase the quantity and quality of the product to be obtained from existing animals with the help of technology. For this reason, the condition and health of animals is extremely important. Parameters such as animal welfare and health values should be constantly monitored. Sensor-based products such as collars and pedometers, which can provide up-to-date information, are of great importance in order to ensure the continuous monitoring of the parameters. A. H. I¸sık (B) · Ö. C. Eskicio˘glu Department of Computer Engineering, Burdur Mehmet Akif Ersoy University, Burdur, Türkiye e-mail: [email protected] Ö. C. Eskicio˘glu e-mail: [email protected] S. Haso˘glu Suffatech Arge Yazılım ve Bili¸sim Teknolojileri ve Ticaret Limited Sirketi, ¸ Kayseri, Türkiye e-mail: [email protected] E. Dolicanin Faculty of Technical Sciences, Department of Technical Sciences, State University of Novi Pazar, Novi Pazar, Serbia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_16
199
200
16 Artificial Intelligence Based Detection …
It is necessary to know the status of animals in farms. With the help of sensors, data about many conditions of animals can be obtained and predictions can be made. Knowing the status of animals gives information about the periods in a certain range. In some periods, females want to reproduce. This period is called estrus. It is extremely important to detect the estrus period in a timely manner in order to increase the productivity in farms and to create pregnancy in animals. Estrus is a process that requires constant observation. In this study, using the data obtained from the sensors, machine learning algorithms are used to classify the estrus period.
16.2 Related Works Studies in the literature were examined and it was observed that there were similar studies. These studies; Barriuso et al. study, an application was developed that allows remote monitoring of animals in farms. Aim of this study to increase the yield by reducing the costs in animal husbandry. An embedded agent model with autonomous sensors is used. In the proposed model, the parameters of temperature, physical activity, moment of birth and estrus state of animals are examined [1]. According to Higaki et al. in the study, it was aimed to determine estrus based on ventral tail surface temperature in cattle by using random forest, support vector machines and artificial neural networks. Temperature data were obtained in 51 estrus periods on 11 female cattle. Precision values in 3 machine learning models are 60.6% in random forest algorithm, 65.4% in support vector machine and 73.1% in artificial neural network. The low performance of the performance values of the models can be characterized as the difficulty of separating the temperature changes in estrus from the independent environment [2]. In the thesis study conducted by Yıldız K., estrus was estimated on animals using artificial neural network (ANN) with the data in the pedometer device. 186 estrus period data of 78 cattle for 4 months were recorded together with seasonal data. The most successful ANN model was 2-layer ANN with a ROC score of 0.9733. There are 37 neurons in the first layer and 40 neurons in the second layer. It has been seen that estrus prediction is made with a high performance [3]. S et al. study, values such as vaginal temperature and conductivity were measured in cattle. Decision trees, support vector machines and artificial neural networks, which are machine learning algorithms, were applied on these values measured from certain time periods on 17 animals. The highest accuracy rates were obtained at 86, 84 and 94%, respectively [4]. In the study conducted by Çetin A. F., a literature review on data mining methods used in animal husbandry was prepared. In the study, methods such as k-nearest neighbor, k-means, bayesian classifier, multivariate adaptive regression curves, support vector machines, decision trees and artificial neural networks were investigated. The application of these methods in animal husbandry is mentioned [5]. Akıllı and Atıl study, it was mentioned that researchers and animal breeders can draw meaningful conclusions on various data with the methods in the literature. In
16.3 Method and Material
201
their study, the authors used fuzzy logic to estimate the efficiency of these results and facilitate decision-making processes. He also gave examples of applications in dairy cattle by classifying with artificial neural networks [6]. Brunassi et al. in their study, more than 25 thousand cases of estrus were obtained in a farm of 350 animals. A system based on fuzzy-inference function that evaluates together with the receiver operating curve has been created. The system categorizes the available data in 3 classes. Cases classified as ‘Estrus’, ‘Maybe Estrus’ and ‘Not Estrus’ detected estrus efficiently with a sensitivity rate of 84.2% [7]. Ahamed created an image dataset to detect cat behavior. The image set consists of 4 classes in total as angry, sad, sick and sleepy. In the study, images are processed with the help of OpenCV library. After the applied analysis, a neural network was created using the TensorFlow library. By applying transfer learning, 84.5% success was achieved with the Inception-ResNet-v2 architecture. Detection of animal behavior with image processing has been successfully applied [8]. Shahriar et al. in their work, the activity levels of the animal are determined by the temperature and accelerometer data taken from the cows using unsupervised learning. K-means algorithm was used in the study to group. The activities were determined by change detection techniques on the index (AIxL) determined in the study. When index change detection techniques were applied, 100% sensitivity and overall accuracy between 82 and 100% were obtained [9]. Romadhonny et al. study, it is predicted that it will help the management of dairy cows. It works on testing and training data of 1790 animals. In the study, the estrus cycle in animals is estimated using Multiple Logistic Regression. As a result of the study, an accuracy of over 80% was obtained. When the variables in the study were interpreted as polynomial, an accuracy rate of 83.2% was obtained [10]. Miura et al. in their work, the surface temperature of the ventral tail base is monitored by means of a wearable wireless sensor to detect the estrus periods in animals. Temperature monitoring is implemented in farms. Considering the seasonal differences, it is thought to be useful for future studies [11]. In the studies in the literature, estrus detection is usually estimated due to the differences between the temperature levels. However, due to seasonal differences, problems such as inability to give accurate results and working at low performances arise. In our study, parameters such as lying or lying down status, number of steps taken from the pedometer are monitored via LoRa and the classification of estrus is performed with high performance.
16.3 Method and Material 16.3.1 Architectural Design In our study, the classification of estrus was made with the help of a pedometer attached to the feet of the animals. For this purpose, Atmel ATmega328P microcontroller was used in the pedometer device in our system. The animal’s lying/not lying
202
16 Artificial Intelligence Based Detection …
Fig. 16.1 Structure of the study
condition is detected by MPU6050 and GY-63 sensors. I2C protocol is used between sensors and microcontroller. Wireless communication is provided from clients to servers using LoRa SX1278. LoraWAN protocol is used in communication between server and client. The data coming to the server is classified with high performance with the pre-trained machine learning algorithms XgBoost, Support Vector Machine, Naive Bayes and AdaBoost. Figure 16.1 shows the flow chart of our study in detail.
16.3.2 Devices Atmel Atmega328: It is a 28 pin and 8 bit microcontroller. It works in RISC architecture [12]. It has port, memory, timer, interrupt systems. It has analog-to digital conversion (ADC) capability. It can also do serial communication. It basically has 3 main memory sections. EEPROM, SRAM, and addressable EEPROM are available on the Atmega328 microcontroller. Figure 16.2 shows the structure of the microcontroller. MPU6050: There are 6 axes in total, accelerometer and gyroscope in a single chip. It uses the I2C protocol for communication. It has 3 axes, x, y and z, with 16-bit ADC hardware on each channel. [13] Although it is low speed, there are only 2 cables, SDA(data) and SCL(clock). Figure 16.3 is seen in detail with the technical drawing.
16.3 Method and Material
203
Fig. 16.2 Atmel Atmega328 microcontroller [12]
Fig. 16.3 MPU6050—accelerometer and gyro sensor [13]
GY-63: It is an altimeter with a size of 5 × 3mm. This module with 24-bit ADC converter supports up to 10 cm resolution [14]. Communication is possible via SPI and I2C protocols [15]. GY-63 in Fig. 16.4 was used in our study. Fig. 16.4 GY-63 barometric sensor [15]
204
16 Artificial Intelligence Based Detection …
Fig. 16.5 LoRa SX1278 Ra+01
LoRa (RA-01 SX1278): It complements low-power wide area networks (LPWAN), Bluetooth, Wi-Fi, IEEE 802.15.4 short-range wireless networks. The most striking feature of LPWAN is its long range with minimum cost. It provides low power and high security with AES encrypted transmission [16]. LoRa targets machine-to-machine communication and IoT networks. The word means long range radio. It can operate in 433, 868 and 915MHz frequency bands. It has very little maintenance cost. LoraWAN protocol is used in the communication between the receiver and the transmitter. In our study, the LoRa SX1278 RA-01 model in Fig. 16.5 was used. This model has a range of 10–15 km.
16.3.3 Electronic Circuit Design In our study, a pedometer apparatus to be attached to the ankles of the animals was designed. With the GY-63 and MPU6050 in the client, the status of the animal is sent to the server with LoRa SX1278 via the microcontroller. On the server side, estrus is detected by processing the incoming data with machine learning algorithms. Figure 16.6 shows the client’s circuit diagram and port connections. The client consists of battery, inertial measurement unit (IMU), barometric sensor, LoRa and microcontroller modules. The client prototype is shown in Fig. 16.7. Figure 16.7 shows the front and back sides of the transmitter circuit prototype. On the front, there is a battery, microcontroller, barometric sensor and IMU. On the back, there is a LoRa wireless communication sensor. Figure 16.8 shows the drawing of the prototype of the receiver circuit and the printed circuit. There is a battery and microcontroller on the front. On the back of the circuit, there is a LoRa wireless communication sensor. In Figure 16.9, the box of the transmitter card designed from the 3D printer was created.
16.3 Method and Material
205
Fig. 16.6 Circuit diagram of the study
Fig. 16.7 Client prototype
16.3.4 Proposed Algorithms XgBoost Classifier: It works in the same structure as the Gradient Boosting algorithm. XgBoost is an optimized and improved version of the Gradient Boosting algorithm. It is known in the literature for its high performance work. It works 10 times faster than popular machine learning algorithms. Hardware and software optimization techniques are used to make it work more efficiently [17]. Xgboost is the best performing classifier from tree-based algorithms. It also works more optimized
206
16 Artificial Intelligence Based Detection …
Fig. 16.8 Server prototype
Fig. 16.9 Transmitter box
with large data sets. Regularization, system optimization, pruning and working with null values are the most important differences from the Gradient Boosting algorithm. Adaboost Classifier: It is a classifier proposed by Robert Schapire and Yoav Freund in 1996. The Adaboost classifier is an iterable ensemble method. It creates a powerful classifier by combining low-performance classifiers [18]. The main purpose of Adaboost is to adjust the weights of the classifiers in the system at each iteration and to adjust the correct predictions of out-of-order observations [19]. Support Vector Classifier (SVC): It is a supervised machine learning algorithm. It aims to draw the furthest parallel line to separate the data ensemble of two groups. Basically, it draws a line to separate the points on the plane [20]. Between the 2 separated classes there is an area known as the margin. The points where the margin cuts are called support vectors. The algorithm has several kernels with hyperparameters. It can be modified and adapted according to datasets. Some kernels can work better depending on the type of data. Support vector classifier is recommended for medium and small scale datasets. Naive Bayes Classifier: Bayes’ theorem is a conditional probability calculation formula presented by Thomas Bayes in 1812. Equation (16.1) It shows the relationship between marginal, conditional and marginal probabilities in the distribution for a random variable according to the theorem [21].
16.4 Discussion and Result
207
P(A|B) =
P(B|A)x P( A) P(B)
(16.1)
P(A|B) : The probability “A” being True, given “B” is True. P(B|A) : The probability “B” being True, given “A” is True. P(A) : The probability “A” being True. P(B) : The probability “B” being True. The basic logic behind the Naive Bayes classifier is based on Bayes’ theorem. It is a probability-based supervised classification algorithm that can work on unbalanced data.
16.4 Discussion and Result In our study, there are 250 sensor data series belonging to different periods of 50 animals. Each data series has slope and pressure data taken over several hour periods. In our study, it is aimed to classify the estrus state of animals from the data we have. We have two classes as estrus and not estrus. IIn the data set created with the help of sensors, there are 116 data in the estrus class and 134 data in the not estrus class. Machine learning algorithms are run with the data set we have. Xgboost, Adaboost, Support Vector Classifier and Naive Bayes Classifier algorithms are used. Success rates are 83.6, 82.4, 81.2 and 78.8%, respectively. The highest performance is reached with 83.6% in Xgboost Classifier. Gridsearch hyperparameter optimization method is used in all machine learning algorithms used. Thus, our model is run with the parameters that give the best results. With the parameters in Table 16.1, the existing data set was run on the Adaboost classifier. As a result of the training, our model shows a maximum success rate of 82.4%. Figure 16.10 shows the change in the success rates of the parameters used. The Xgboost classifier was run with the hyperparameters in Table 16.2. At the end of the training and testing process, the highest success rate is reached with 83.6%. Xgboost classifier is the algorithm that gives the best results among the 4 methods used. Figure 16.11 shows the success rates according to the parameters it takes. The Support Vector Classifier was run with the parameter values seen in Table 16.3. The best performance of the algorithm is the situation provided by the c:10 and gamma:0.0001 parameters in Tuning1. Figure 16.12 shows the performance graph. Table 16.1 Hyperparameters of adaboost classifier
Tuning count
n_estimators
Depth
Accuracy
Tuning1
100
2
76.4
Tuning2
250
6
82.4
Tuning3
50
8
77.6
Tuning4
500
1
78.8
Tuning5
1000
4
80.0
208
16 Artificial Intelligence Based Detection …
Fig. 16.10 Performance of Adaboost classifier
Table 16.2 Hyperparameters of Xgboost classifier
Tuning count
n_estimators
Depth
Accuracy
Tuning1
100
2
75.2
Tuning2
300
3
82.4
Tuning3
500
6
83.6
Tuning4
400
5
77.6
Tuning5
50
1
81.2
Tuning count
n_C Value
n_gamma
Accuracy
Tuning1
10
0.0001
81.2
Tuning2
100
1
78.8
Tuning3
1
0.1
77.6
Tuning4
0.1
0.01
80.0
Tuning5
1000
0.001
75.2
Fig. 16.11 Performance of Xgboost classifier
Table 16.3 Hyperparameters of support vector classifier
16.4 Discussion and Result
209
Fig. 16.12 Performance of support vector classifier
Table 16.4 Hyperparameters of Naive Bayes classifier
Tuning count
n_smoothing
Accuracy
Tuning1
1
72.8
Tuning2
0.1
75.2
Tuning3
0.01
77.6
Tuning4
0.001
75.2
Tuning5
0.0001
78.8
Naive Bayes classifier was run by changing the n_smoothing hyperparameter in Table 16.4. With the n_smoothing:0.0001 parameter, it achieved the highest performance in itself with a success rate of 78.8%. It is the lowest performing model compared to other models. Figure 16.13 shows the performance graph of the model according to the parameters. Fig. 16.13 Performance of Naive Bayes classifier
210
16 Artificial Intelligence Based Detection …
Fig. 16.14 Comparison of machine learning models 1
Fig. 16.15 Comparison of machine learning models 2
250 data series in our study were run with 4 machine learning algorithms. As can be seen in Fig. 16.14, the highest performance is obtained with the Xgboost classifier. The study shows that estrus detection in animals can be successfully done with machine learning models. In the light of the data we have, the Xgboost classifier is proposed to solve this classification problem. Comparison of the algorithms used in Fig 16.15. with the hyperparameters that gives the best results is seen.
16.5 Conclusions and Future Work In our study, the classification of the estrus state was made by using machine learning algorithms with numerical data coming from an apparatus attached to the ankle of the animals. Adaboost, Xgboost, Support Vector Classifier and Naive Bayes Classifier were used for classification in the study. In order to get maximum efficiency from these algorithms, hyperparameter optimization was done with gridsearch. The highest performance was obtained from Xgboost Classifier with 83.6%. It is thought that the results can be improved with more data. It is recommended that the apparatus
References
211
be fitted to each animal in farms. With the data to be obtained in future studies, it will be possible to estimate the estrus by using regre sion algorithms. Keeping the data of each animal in the database with systems such as collar, pedometer, stomach apparatus and RFID can be considered for future studies. With these data, parameters such as the health values of the animals in the farms, the amount of water consumption, the state of estrus, and the eating time can be found. With these parameters, an inference can be made about the health and lactation performance of the animals. A mobile or desktop application can be created so that farmers can see these values in their animals. With the created application, farmers can get information about their animals from their smartphones. With a user-friendly simple interface, large farms can be easily managed. Early diagnosis of the disease can be achieved with a few sensors to be used in addition to the hardware. By adding cameras to some parts of the farms, it can be predicted that a certain part of the disease will be detected by real-time object recognition and diagnosis. The work to be done will be shared on the web, and a web portal will be created to share the solutions to the problems to be experienced with other farm owners. Thus, early detection of possible outbreaks can be achieved by keeping in constant contact with the surrounding farm owners. By adding sensors, more detailed information about the animal can be obtained. With the obtained data, it can contribute to future academic studies. Acknowledgments This research has been supported by Burdur Mehmet Akif Ersoy University Scientific Projects. Project numbers: 0745-YL-21 and 0671-YL-20.
References 1. Barriuso A-L, González G-V, Paz J-F-D, Lozano A, Bajo J (2018) Combination of multi-agent systems and wireless sensor networks for the monitoring of cattle. Sensors 18(1):108 2. Higaki S, Darhan H, Suzuki C, Suda T, Sakurai R, Yoshioka K (2021) An attempt at estrus detection in cattle by continuous measurements of ventral tail base surface temperature with supervised machine learning. J Reprod Dev 67(1):67–71 3. Yıldız K-A, Özgüven M-M (2016) Determination of estrus in cattle with artificial neural networks using mobility and environmental data. J Agric Facul Gaziosmanpa¸sa Univer (JAFAG) 9(1):40–45 4. Higaki S, Miura R, Suda T, Andersson L-M, Okada H, Zhang Y, Itoh T, Miwakeichi F, Yoshioka K (2019) Estrous detection by continuous measurements of vaginal temperature and conductivity with supervised machine learning in cattle. Theriogenology 123:90–99 5. Çetin F-A, Mikail N (2016) Hayvancılıkta Veri Madencili˘gi Uygulamaları. Türkiye Tarımsal Ara¸stırmalar Dergisi 3(1):79–88 6. Akıllı A, Atıl H (2014) Süt sı˘gırcılı˘gında yapay zeka teknolojisi: Bulanık mantık ve yapay sinir a˘gları. Hayvansal Üretim 55(1):39–45 7. Brunassi LDA, Moura DJD, Nääs IDA, Vale MDD, Souza SRLD, Lima KAOD, Carvalho TMRD, Bueno LGDF (2010) Improving detection of dairy cow estrus using fuzzy logic. Scientia Agricola 67(5):503–509 8. Ahamed M, Ahsan M (2019) Animal behavior detection using deep learning 9. Shahriar M-S, Smith D, Rahman A, Freeman M, Hills J, Rawnsley R, Henry D, Bishop-Hurley G (2016) Detecting heat events in dairy cows using accelerometers and unsupervised learning. Comput Electron Agric 128:20–26
212
16 Artificial Intelligence Based Detection …
10. Romadhonny RA, Gumelar AB, Fahrudin TM, Setiawan WPA, Putra FDC, Nugroho RD, Budiani JR (2019) Estrous cycle prediction of dairy cows for planned artificial insemination (AI) using multiple logistic regression. In: 2019 international seminar on application for technology of information and communication (iSemantic), 157–162 11. Miura R, Yoshioka K, Miyamoto T, Nogami H, Okada H, Itoh T (2017) Estrous detection by monitoring ventral tail base surface temperature using a wearable wireless sensor in cattle. Animal Reprod Sci 180:50–57 12. Barrett SF (2013) Arduino microcontroller processing for everyone! Syn Lect Dig Circ Syst 8(4):1–513 13. Al-Dahan ZT., Bachache NK., Bachache LN (2016) Design and implementation of fall detection system using MPU6050 Arduino. In: International conference on smart homes and health telematics, pp 180–187 14. Hurtado AM (2021) Research and development of tactile feedback (or sensory feedback) technologies for application in upper limb prosthesis, Doctoral dissertation, Vilniaus Gedimino Technikos Universitetas 15. MS5611 GY-63 Basınç—Altimetre Sensörü. https://www.direnc.net/ms5611-gy-63-basinc-alt imetre-sensoru 16. Daud S, Yang TS, Romli MA, Ahmad ZA, Mahrom N, Raof RAA (2018) Performance evaluation of low cost lora modules in iot applications. In: IOP conference series: materials science and engineering, vol 318, no 1, pp 012053 17. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794 18. An TK, Kim MH (20110) A new diverse AdaBoost classifier. In: 2010 International conference on artificial intelligence and computational intelligence, vol 1, pp 359–363 19. Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J-Jpn Soc Artifc Intell 14(771–780):1612 20. Noble WS (2006) What is a support vector machine? Nature biotechnol 24(12):1565–1567 21. Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, no 22, pp 41–46
Chapter 17
Enhancing Lexicon Based Sentiment Analysis Using n-gram Approach Hassan Abdirahman Farah and Arzu Gorgulu Kakisim
17.1 Introduction With the rapid increase in the usage of social media, it has become an important problem to analyze people’s opinions and feelings using textual data that is observed in digital resources such as social networks, discussion forums, review sites, and blogs. Recently, it has become an important issue to perceive the attitudes and reactions of users to current events from posts shared on social media such as Twitter, Facebook, LinkedIn, or the thoughts of users about the products of some companies shared on platforms such as Amazon and IMDB. In particular, the companies utilize these real-time contents to analyze their customers’ attitudes towards their products, thereby developing appropriate strategies before launching a new product. The texts in the social media published by the users can be subconsciously classified as positive, negative or neutral meaningful contents in terms of the words and meaning they contain. Sentiment analysis attempts to automatically extract people’s opinions or emotions from user generated contents, and decides whether the underlying emotion is positive, negative or neutral. There exist two main approaches in sentiment analysis: machine learning based [1–3] and rule-based analysis [4–6]. The machine learning based approaches aim to classify the user generated contents as positive or negative using some commonly used classifiers with effective feature selection and embedding methods. Recently, researchers focus on capturing the semantic and sentiment knowledge of words by modelling relationships among the words. For instance, word embedding methods such as Word2Vec [7] and Glove [8] are very effective to model proximity among H. A. Farah · A. G. Kakisim (B) Department of Computer Engineering, Istanbul Commerce University, Istanbul, Turkey e-mail: [email protected] H. A. Farah e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_17
213
214
17 Enhancing Lexicon Based Sentiment Analysis …
words. However, these methods are unable to capture the polysemy, synonyms and antonyms [3]. These methods usually determine the distance between words that are similar or different from each other in the context of a content. Therefore, they can produce similar representations for words like “good” and “bad” despite having opposite semantic polarity. This case can negatively impact the results of sentiment analysis, especially for limited texts such as tweets. Moreover, most word embedding methods can oblige us to generate vectors that are far from ideal or to restart the learning process with new coming inputs [9] when they encounter words (unknown or out of vocabulary (OOV) words) that they have not processed before. The other lack of these methods arises when encountering different and new languages [10]. These methods need to generate new representations for new languages because it is not possible to scale a model built for one language to other languages. The rule-based approaches utilize publicly available lexicons to evaluate the sentiment of given texts. A sentiment lexicon is a dictionary of words labeled with sentiment classes such as positive or negative. Mostly, each word in lexicon has a sentiment intensity within a predetermined range. Some lexicons preserve the different forms of words (e.g., singular and plural nouns, tenses of verbs) with one or more semantic categories. Rule-based approaches identify sentiment sentences using synonyms and antonyms of available lexical databases such as WordNet [11]. The words in these lexicons are manually collected. Existing proposed methods generally aim to expand the rules in these dictionaries and to add new words by searching for their synonyms and antonyms. The most important advantage of dictionary-based methods is that they provide flexibility to be used in many languages through direct word translation and updating a few grammatical structure, since sentiment scores are calculated based on the polarity of words [12]. However, for a given sentence or paragraph, these lexicons generate an overall score by evaluating the scores of all the words in that sentence. This case can negatively affect the performance of sentiment classification for given sentences containing words with both positive and negative polarities due to be obtained approximately a mean value from the sentiment scores of these words [13]. In this paper, we propose a new sentiment analysis method that learns a sequential vector representation of each sample (opinion text) over the sentiment scores of the words obtained using the sentiment lexicons and the n-gram approach. Firstly, our method produces n-grams (such as bi-grams or tri-grams) of each text, and computes the sentiment scores for each n-grams using a lexicon. After obtaining sentiment scores, the method constructs a sequential vector of given text by applying zeropadding and summarizing sequences. In this work, we present a hybrid approach that utilizes both lexicons and machine learning approaches for enhancing the performance of lexicon-based sentiment analysis. The rest of the paper is structured as follows: Sect. 2 describes the sentiment lexicons used in this work. We present the proposed framework in Sect. 3. Section 4 presents the experimental results before concluding the paper.
17.2 Sentiment Lexicons
215
17.2 Sentiment Lexicons 17.2.1 Vader Vader [14] is an open source lexicon and rule-based sentiment analysis tool released under the MIT license for social media domain. It utilizes all lexical features of some traditional sentiment lexicons such as LIWC [15] and ANEW [16]. In addition, the authors in [14] have added new lexical and grammatical features including punctuation, emoticons, capitalization, contrastive conjunction, sentiment-related acronyms, commonly used slang in order to express sentiment of social media texts and texts from other domains. The authors have obtained ten different intensity ratings from ten independent human raters for each candidate lexical features in the dictionary and then computed a valid point estimate for the emotion score of the features. By preserving some lexical features having non-zero rating scores and a standard deviation of less than 2.5, the authors have created a corpus that consists of more than 7500 lexical features and their verified valence scores. The valence score of each feature refers to sentiment polarity (positive or negative). The sentiment intensity is measured on a scale from −4 to +4. VADER also calculates a compound score (i.e. aggregated score) for multi-word expressions such as sentences or paragraphs, which are obtained by adding the valence scores of each word in the lexicon. It performs lexicon rules to adjust the compound score, and normalizes the score between −1 (i.e., most extreme negative) and +1 (i.e., most extreme positive).
17.2.2 TextBlob TextBlob [17] is an open source Natural Language Processing (NLP) toolkit that is supported by Python library. It provides many NLP tasks such as part-ofspeech tagging, noun phrase extraction, sentiment analysis, classification, translation. Textblob can used for both word and sentence level semantic analysis. Textblob calculates the polarity for an entire paragraph based on counting the number of positive and negative sentences/words. Textblob computes an average polarity score for each word considering the all grammatical tagging such as noun, adjective etc. of the words and polarity scores from these tags. The sentiment polarity of a word is defined as a score between −1 and 1.
17.2.3 Afinn Afinn [18] is an open source lexicon by provided Finn Årup Nielsen in 2009–2011. It consists of over 3300 words and phrases. Their valence scores are measured on a scale from −5 and +5. Each word has been labeled by Nielsen considering the
216
17 Enhancing Lexicon Based Sentiment Analysis …
sentiment meanings of the words, leaving out subjectivity/objectivity, arousal and dominance. To build this lexicon, the author has used different resources such as Tweeter postings, published some affective word list, urban dictionary including acronyms, and Microsoft Web n-gram similarity Web service. In Afinn lexicon, the sentiment scores of most of the positive words have set to +2 and the sentiment scores of most of the negative words have set to +2. The strong obscene words have labelled with either −4 or −5 values.
17.2.4 SentiWordNet SentiWordNet [19, 20] is a lexical resource that uses sets of synonyms or synsets (a set of part-of-speech tagging pairs sharing the same meaning), instead of individual terms. It operates on the database provided by WordNet [11]. The main objective is to utilize the different senses of the same term that may have different opinionrelated properties. SentiWordNet assigns three numerical scores (objective, positive, and negative) to each term in the corpus to determine how objective, positive, and negative the terms contained in the synset are. These all score values for a word can be non-zero, indicating that it has three different opinion-related properties to a certain degree. Each of these scores can have values between 0 and 1. To predict the class of a synset such as positive or negative, the method applies a sentiment classification process using a synset classifier committee that consists of eight individual synset classifiers. This means that SentiWordNet applies a quantitative analysis for calculating of valence scores of the synsets. It creates vectorial representations of the synsets using different senses of the synsets having different polarities for semisupervised synset classification. The sentiment scores for a synset are determined by the normalized proportion of the classifiers.
17.3 Proposed Framework In this section, we present a detailed description of our proposed framework which constructs a feature matrix using the semantic polarities of n-grams that are generated using n-gram approach with the lexicons. The steps of proposed work is given in Fig. 17.1.
17.3.1 Pre-processing Step In pre-processing step, we tokenize each opinion text in corpus C into words. Then, we apply stemming for reducing related words to a common stem. The stop-words, URL, and numbers are removed from the corpus. The opinion texts as tweets or
17.3 Proposed Framework
217
Fig. 17.1 The illustration of the proposed framework
reviews can consist of emojis that are frequently used to express moods, emotions, and feelings in social media. To take advantage of these emojis, we replace emojis with the texts that they semantically correspond to the emojis.
17.3.2 N-gram Extraction Existing lexicon based approaches can classify some sentences with a negative meaning as positive, or vice versa. The possible reason is that sentiment lexicons generally produce an average semantic polarity for a sentence consisting of both positive and negative meaning words by combining positive and negative scores. To deal with this problem, we aim to represent each text with a list of more than lexical bundles such as unigram, bigram or trigram. Thus, we can preserve more than one sentiment score for each opinion in the corpus. Therefore, the n-gram approach [21] is used to generate multi-word expressions belonging to each text. In Fig. 17.1, we provide bi-gram representations of given examples (Review 1 and Review 2).
17.3.3 Feature Space Construction We aim to represent each opinion text as a vector and construct a new feature space for the corpus to apply a learning algorithm based on the sentiment scores obtained from a sentiment lexicon. Therefore, for each gram of an opinion text, we obtain a semantic score using a sentiment lexicon. The sentiment lexicon mostly gives a
218
17 Enhancing Lexicon Based Sentiment Analysis …
positive score for a word with a positive meaning while it gives a negative score for a word with a negative meaning. A zero score indicates that the word has a neutral meaning. Thus, our method obtain the positive, negative, or zero-valued semantic scores for each gram, and preserves these scores scr in a vector. Figure 17.1 shows the vectors corresponding to the given example reviews. For instance, the lexicon such as Vader gives zero score for the first bi-gram “what absolutely” and positive score (0.439) for the second bi-gram “absolutely stunning”. After obtaining bi-gram scores for each review, the consecutive zeroes are replaced with a single zero to reduce the dimensionality of the vector, in other words to summarize the sequences. This means that consecutive grams having neutral scores is represented with a single neutral score. To make the length of each vector belonging to the opinion texts in the same size, we apply zero padding by adding zero scores at the end of the vectors, and thus obtain our new feature space F. We construct the vector of each text in the corpus by preserving word sequences in those texts. The steps of our method is given in Algorithm 1. After obtaining F matrix, a learning algorithm is used to assign a class label to opinion texts as positive or negative. Algorithm 1: The algorithm of proposed framework Input: C, n Output: F, M 1: for each opinion o ∈ C do 2: G ← n-gram (o, n) 3: for each gram g ∈ G do 4: scr ← lexicon (g) 5: F(o, g) ← scr 6: Replace consecutive zeroes with a single zero value from each row in F 7: Do zero padding to equalize the length of each vector in F 8: M ← LearningAlgorithm (F)
17.4 Experimental Results The experiments have been conducted on three different dataset which are commonly used in the literature [22–24]. The detail information of datasets is given in Table 17.1. US airline dataset is publicly available dataset that is released by CrowdFlower. This dataset contains 14,640 tweets that are related to six different US airlines as American Airlines, United Airlines, and US Airways. The hotel dataset [25] consists of 20,491 reviews crawled from the Tripadvisor website and review scores (e.g. 1, 2, 3 stars) given from users. Based on the ratings given by users, we have assigned a positive class label to reviews with 4 and 5 stars, negative for comments with 1 and 2 stars, and neutral for comments given 3 stars. IMDB dataset [26] includes 50,000 movie reviews that were crawled IMDB website. In this experiment, we have considered the tweets from positive and negative classes by eliminating the samples from neutral class.
17.4 Experimental Results Table 17.1 Statistics of the datasets
219 Class
Airline
Hotel
IMDB
Positive
9080
15,093
24,884
Negative
2283
3214
24,698
Overall
11,363
18,307
49,582
In our experiment, the value of n was varied from 1 to 5 to evaluate the impact of the degree of n-grams on classification performance of our proposed model. To provide experimental comparison results, different sentiment lexicons were implemented and tested on our datasets. Random Forest was selected as a baseline classifier. We applied tenfold cross validation to randomly select training and test samples. The performance of the proposed method is evaluated by comparing it with the popular lexicon based methods. The comparison results are given in Table 17.2. The results show that our method outperforms the baseline lexicon based methods. The proposed method improves the accuracy performance for Airline dataset by at least 16% over Vader, 13% over Afinn, 14% over TextBlob, and 15% over SentiWordnet. Similarly, our method achieves higher accuracy values than baseline methods for both Hotel and IMDB datasets. The highest results for all datasets are obtained by our method with Vader. The performance of this method is followed by that of our method with Afinn. Our method generates a feature matrix with different dimensionality for each dataset. The dimension of matrix F is usually determined by the longest text in the corpus containing the least number of grams having neutral meaning. After performing feature extraction process, the dimension of feature matrices for the datasets are: 88 for US airline, 1744 for Hotel, 2122 for IMDB. Table 17.2 Comparison of accuracy performances of our method with baseline methods Airline
Hotel
IMDB
Vader
0.7073
0.8858
0.6879
Our method (Vader + n-gram + RF)
0.8745
0.9128
0.7318
Afinn
0.7284
0.8909
0.7036
Our method (Afinn + n-gram + RF)
0.8601
0.9061
0.7230
TextBlob
0.6867
0.8850
0.6821
Our method (TextBlob + n-gram + RF)
0.8266
0.8951
0.7106
SentiWordNet
0.6529
0.7976
0.6658
Our method (SentiWordNet + n-gram + RF)
0.8080
0.8528
0.6628
220
17 Enhancing Lexicon Based Sentiment Analysis …
17.5 Conclusion In this paper, we present a hybrid approach using both lexicons and machine learning techniques for sentiment analysis. We aim to create a new sequential feature space by applying n-gram approach with lexicon-based sentiment approaches such as Vader, Afinn, Textblob and SentiWordNet. The experimental results demonstrate that our method can enhance the performance of lexicon-based approaches. In the future, we plan to apply different sequence summarizing approach, especially for long text, to decrease the dimensionality of generated vectors.
References 1. Alshari EM, Azman A, Doraisamy S, Mustapha N, Alkeshr M (2017) Improvement of sentiment analysis based on clustering of Word2Vec features. In: 2017 28th international workshop on database and expert systems applications (DEXA). IEEE, pp 123–126, 28 Aug 2017 2. Gao Z, Feng A, Song X, Wu X (2019) Target-dependent sentiment classification with BERT. IEEE Access 7:154290-9.3, 11 Oct 2019 3. Naseem U, Razzak I, Musial K, Imran M (2020) Transformer based deep intelligent con-textual embedding for twitter sentiment analysis. Future Gener Comput Syst 113:58-69.4, 1 Dec 2020 4. Yang CS, Shih HP, A rule-based approach for effective sentiment analysis 5. Liu H, Cocea M (2017) Fuzzy rule based systems for interpretable sentiment analysis. In: 2017 ninth ınternational conference on advanced computational ıntelligence (ICACI). IEEE, pp 129–136, 4 Feb 2017 6. Sundararajan K, Palanisamy A (2020) Multi-rule based ensemble feature selection model for sarcasm type detection in twitter. Comput Intell Neurosci, 9 Jan 2020 7. Rexha A, Kröll M, Dragoni M, Kern R (2016) Polarity classification for target phrases intweets: a Word2Vec approach. In: European semantic web conference. Springer, Cham, pp 217–223, 29 May 2016 8. Jianqiang Z, Xiaolin G, Xuejun Z (2018) Deep convolution neural networks for twittersentiment analysis. IEEE Access. 6:23253-23260, 1 Jan 2018 9. Won MS, Lee JH (2018) Enhancing lexicon based sentiment analysis using n-gram approach 99. Embedding for out of vocabulary words considering contextual and morphosyntactic ınformation. In: 2018 international conference on fuzzy theory and its applications (iFUZZY). IEEE, pp 212–215, 14 Nov 2018 10. Al-Matham RN, Al-Khalifa HS (2021) Synoextractor: a novel pipeline for Arabic synonym extraction using Word2Vec word embeddings. Complexity, 17 Feb 2021 11. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41, 1 Nov 1995 12. Hogenboom A, Heerschop B, Frasincar F, Kaymak U, de Jong F (2014) Multi-lingualsupport for lexicon-based sentiment analysis guided by semantics. Decis Support Syst 62:43–53, 1 June 2014 13. Borg A, Boldt M (2020) Using VADER sentiment and SVM for predicting customerresponse sentiment. Expert Syst Appl 162:113746, 30 Dec 2020 14. Hutto C, Gilbert E (2014) Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the international AAAI conference on web and social media, vol 8, no 1, 16 May 2014 15. Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC2001. Mahway: Lawrence Erlbaum Associates. 71(2001):2001, Jan 2001
References
221
16. Bradley MM, Lang PJ (1999) Affective norms for English words (ANEW): instruction manual and affective ratings. Technical report C-1, The Center for Research in Psy-chophysiology, University of Florida, Jan 1999 17. Loria S (2018) Textblob documentation. Release 0.15. 2:269, Dec 2018 18. Nielsen FA (2011) A new ANEW: evaluation of a word list for sentiment analysis inmicroblogs. arXiv preprint arXiv:1103.2903, 15 Mar 2011 19. Sebastiani F, Esuli A (2006) Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th international conference on language re-sources and evaluation, pp 417–422, 22 May 2006 20. Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical re-source for sentiment analysis and opinion mining. In: Lrec, vol 10, no 2010, pp 2200–2204, 17 May 2010 21. Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR94, 3rd annual symposium on document analysis and information retrieval, vol 161175, 11 Apr 1994 22. Dey A, Jenamani M, Thakkar JJ (2018) Senti-N-Gram: an n-gram lexicon for sentimentanalysis. Expert Syst Appl 103:92–105, 1 Aug 2018 23. Vashishtha S, Susan S (2019) Sentiment cognition from words shortlisted by fuzzy en-tropy. IEEE Trans Cogn Dev Syst 12(3):541–550, 27 Aug 2019 24. Naseem U, Khan SK, Razzak I, Hameed IA (2019) Hybrid words representation for airlines sentiment analysis. In: Australasian joint conference on artificial intelligence. Springer, Cham, pp 381–392, 2 Dec 2019 25. Alam MH, Ryu WJ, Lee S (2016) Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Inf Sci 339:206–223 26. Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meetingof the association for computational linguistics: human language technologies, pp 142–150, June 2011
Chapter 18
A Comparison of Word Embedding Models for Turkish Ahmet Tu˘grul Bayrak, Musa Berat Bahadir, Güven Yücetürk, ˙ Ismail Utku Sayan, Melike Demirda˘g, and Sare Melek Yalçinkaya
18.1 Introduction For a machine learning model to work, all the features must be digitized. Similarly, in the analyzes made on texts, words should be represented by numbers. Bag of words (BOW) is one of these methods. However, in this case, only the number of occurrences of the words is checked, and the semantic relationships between the words cannot be captured. To achieve this, embedding methods are applied to calculate the relations of the words with other words in their sentences. With embedding methods, words can be represented in different dimensions and in different semantic relations. The word embedding expression was first used by Bengio [1]. Vector representation for word embedding may be computed at different levels; character [2], word [3], expression [4], sentence [5], and document [6]. Different embedding methods have their advantages and disadvantages in terms of data amount, time and memory. There is no single correct embedding method; the necessary embedding is chosen according to the needs and requirements. Although there are ready-made models for A. T. Bayrak (B) · M. B. Bahadir · G. Yücetürk · ˙I. U. Sayan · M. Demirda˘g · S. M. Yalçinkaya Ata Technology Platforms, Research and Innovation Center, ˙Istanbul, Turkey e-mail: [email protected] M. B. Bahadir e-mail: [email protected] G. Yücetürk e-mail: [email protected] ˙I. U. Sayan e-mail: [email protected] M. Demirda˘g e-mail: [email protected] S. M. Yalçinkaya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_18
223
224
18 A Comparison of Word Embedding Models for Turkish
different languages, embedding models can be trained and used with domain-specific corpus if necessary. The numerical representation of words is constantly evolving. Many different embedding methods have been developed and these methods are getting better gradually. The spread of research studies on word embeddings and their applications started with a representation learning method which is called the Word2vec model. In 2013, Mikolov at Google released a software tool for the Word2vec model that learns word embeddings by using artificial neural networks and provides the use of pre-trained word embeddings [7]. The Word2Vec model has mainly been successful [7]. A year later, in 2014, Pentington et al. introduced GloVe, a similar method of word embeddings [8]. In 2016, another method of word embeddings, called FastText, was proposed [9]. By being affected by these studies, a lot of new word embeddings models, like ELMo [10] have also appeared. Bert is the most advanced model today [11]. This model has been trained for many different languages and used in different studies. Comparison between different word embeddings methods for other languages is also applied by researchers. Makrai et al. compare Hungarian analogical questions to English questions by training a skip-gram model on the Hungarian Web corpus. They also apply a proto dictionary generation and comparison using the Word2Vec model on Hungarian/Slovenian/Lithuanian languages to English [12]. Kilimci and Akyoku¸s, on the other hand, compare Word2Vec, Glove, and FastText models in Turkish [13]. Yıldırım et al. use and compare traditional BOW approaches, topic modeling, embedding-based approach, and deep learning methods for document representation in Turkish. Besides, Toshevska et al. study different embedding techniques [14] using RNN [15]. In this study, the latest versions of the most commonly used embedding methods are compared by applying them to the same long short-term memory (LSTM) [16] classification model. In addition, the data on which the classification model is trained is retrieved from Wikipedia on a subject-based basis, and a dataset is created by labeling the data automatically. The remainder of the study is as follows: in Sect. 18.2, the details of the method and data used in the study are explained. In Sect. 18.3, the experiments and their results are discussed, and the study is summarized in Sect. 18.4.
18.2 Data and Data Preprocessing Steps For Turkish, there are labeled datasets that are already available and used for many different fields. However, in the study, an existing dataset is not used; the analysis data is retrieved from Wikipedia. Subject-based data are labeled according to their similarity to keywords. In this way, a mechanism that creates and dynamically labels a dataset related to the given subject has been established. Thus, a new dataset is created. After the data is retrieved from Wikipedia according to the subject titles, they are preprocessed. First, the tags and special characters in the retrieved data are cleaned. Afterward, the remaining multiple spaces are reduced to one. Besides, the numbers are also eliminated and all characters have been converted to lowercase. After data
18.2 Data and Data Preprocessing Steps
225
cleaning, the sentences are labeled. Considering that not all sentences are directly related to the topic title but may also be general sentences, it is aimed to eliminate unrelated sentences. For this purpose, the words that occur the most in the data on the basis of the category are listed, and the sentences containing a word at least 2 Leveinsteins-distance away from the 10 most common words and the words in their 3 closest neighbors according to the established FastText model are used. For the study, data on the topics of “artificial intelligence” and “economy” are retrieved. In this step, sub-headings in Wikipedia have been descended to 3 levels. While descending to the sub-headings, the titles in the “See also” section in the Wikipedia page structure and the contents of the titles under them are used. The titles and subtitles used can be seen in Tables 18.1 and 18.2. The sub-titles are not covered more because the subjects diverge semantically from the main title. The code of the mechanism can be accessed from the webpage.1 With this code, data extraction can be performed for any language after the language option is selected. After the preprocessing and labeling steps are completed, the dataset size and average word lengths for both marks can be seen in Table 18.3. Table 18.1 Data category (economics)
Titles and subcategories Economy 1 Microeconomics 2. Macroeconomics 2.1 Fiscal policy 2.2 Monetary policy 3. Economic history 3.1 The great depression 3.2 Economics in Ancient Greece 3.3 Economic history of Japan 3.4 1948 Turkish economy congress 4. History of economic thought 5. International economics 6. Turkish economy 6.1 List of banks in Turkey 6.2 History of external debt in Turkey 6.3 Producer price ındex 6.3.1 Wholesale price ındex 6.3.3 List of price ındex formulas 6.3.4 Quantity ındex numbers 6.4 Consumer price ındex (continued)
1
https://github.com/tbayrak/text-classification.
226 Table 18.1 (continued)
18 A Comparison of Word Embedding Models for Turkish Titles and subcategories 6.5 Republic of Turkey ministry of commerce 6.6 Agricultural products grown in Turkey 7. Development economics 8. Public economy 9. Labor economics 10. Agricultural economy 11. New economy 12. Mixed economy 13. Business economics
Table 18.2 Data category (artificial ıntelligence)
Titles and subcategories Artificial ıntelligence 1. Machine ıntelligence 2. Artificial neural networks 2.1 Cognitive science 2.2 Comparison of deep learning software 2.3 Convolutional neural networks 2.4 Naive Bayes classifier 2.4.1 Bayes’ theorem 2.4.2 Bayesian statistics 2.4.3 Pattern recognition 2.4.4 Machine learning s2.4.5 Bayesian spam filtering 2.5 Neuroscience 2.6 Artificial life 3. Natural language processing 3.1 Natural language generation 4. Speech synthesis 5. Speech comprehension 6. Expert systems 7. Pattern recognition 8. Genetic algorithms 9. Genetic programming 10. Fuzzy logic 10.1 Fuzzy set 10.2 Logic 10.3 Mathematical logic 10.4 Semantics
18.3 Method
227
Table 18.3 Labeled data details
Average sentence length
Data size
Label 1 (artificial ıntelligence)
140
1316
Label 2 (economy)
130
1586
Label 1 + label 2
135
2902
18.3 Method In this section, details about the different embedding methods used in the study and the established text classification model are presented.
18.3.1 Embedding Models In this study, popular embedding methods trained for Turkish are used and compared. These models are respectively; Word2Vec [4], FastText [17], ELMo [10] and Bert [18]. The most up-to-date versions of the methods for Turkish; Word2Vec,2 FastText,3 ELMo4 and Bert5 are applied for embedding. The Word2Vec model does not work for words that are not in the training corpus. However, because the FastText model trains words based on character grams, it can return the closest words to that word in cases such as typos. Meanwhile, the FastText model also produces a single vector for each word. The ELMo model, on the other hand, is completely character-based and trained to understand the different meanings of words. The Bert model, like ELMo, captures the different meanings of expressions separately. However, ELMo uses LSTMS; Bert uses Transformer. The effect of embedding models on classification success will be examined in the next section.
18.3.2 Classification Model In the study, a category-related classification problem is examined on the dataset mentioned in the data section. Although the embedding layer varies, the same classification model is used. The LSTM model, which is a variant of Recurrent Neural Network (RNN) and works quite well on sequential data, has been trained [19].
2
https://github.com/akoksal/Turkish-Word2Vec. https://fasttext.cc/docs/en/crawl-vectors.html. 4 https://github.com/HIT-SCIR/ELMoForManyLangs. 5 https://github.com/stefan-it/turkish-bert. 3
228
18 A Comparison of Word Embedding Models for Turkish
18.4 Experiments For the study, the classification model is trained with the same parameters and different embedding methods. In order to objectively measure the success of the classification model, approximately 10% of the data (290 rows) is randomly selected and used for testing. After this process, the size of the training set is 2616 rows. The LSTM model is trained as bidirectional. The value of hidden_nodes is 32 and batch_size is 64. drop_out has been added to LSTM layers to prevent over-fitting. The activation function is sigmoid and the optimization parameter is Adam. The model is trained with 242,718 parameters until the accuracy value tends to decrease. The accuracy value change during training of the model is presented in Figs. 18.1 and 18.2. Table 18.4 shows the model successes for different embedding methods. When the results of the classification model trained with the same parameters are examined, it can be observed that the Bert model is the most successful among the others. Python programming language, scikit-learn, keras and wikipedia libraries are used in the study. Fig. 18.1 The learning curve of Word2Vec and ELMo models
18.5 Conclusion
229
Fig. 18.2 The learning curve of FastText and Bert models
Table 18.4 Bi-LSTM with different embeddings
Embedding layer
Precision
Recall
F1 score
Bert
0.801
0.791
0.800
ELMo
0.782
0.779
0.780
FastText
0.763
0.757
0.759
Word2Vec
0.738
0.743
0.740
18.5 Conclusion Embedding methods used to convert text data into numeric form are developing day by day. Considering the performance and time criteria, embedding methods have their advantages and disadvantages. In this study, we used 4 different embedding methods. The embedding methods used are models that are ready and trained for Turkish. After embedding, a classification model is created, and the effect of embedding models on the classification model is examined.
230
18 A Comparison of Word Embedding Models for Turkish
Our study stands out among the existing studies by comparing and using Bert and ELMo models for embedding. In addition, with the mechanism established in the Data and Data Preprocessing Steps section, subject-based data is extracted from Wikipedia and then marked. In this way, a dynamically marked dataset according to the subject is created and embedding methods are used in the classification of the mentioned dataset. An LSTM model is trained for the classification task. For the embedding layer of the classification model, the most current Word2Vec, FastText, ELMo and Bert models embedding methods in Turkish are used. When the success of the embedding methods is compared, the embedding layer created with Bert has been the most successful model among other embedding methods. In future studies, ready-made embedding methods will be trained with the same data set for Turkish, the data set will be created and the part of marking the data according to the subjects will be conducted in a more intelligent way.
References 1. Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155 2. Hellrich J, Hahn U (2017) Don’t get fooled by word embeddings-better watch their neighborhood. In: DH 3. Wang H, Raj B (2017) On the origin of deep learning on the origin of deep learning 4. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics: volume 2, short papers. Association for Computational Linguistics, Valencia, Spain, pp 427–431 5. Li B, Zhou H, He J, Wang M, Yang Y, Li L (2020) On the sentence embeddings from pre-trained language models. In: EMNLP 6. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. ser. Curran Associates Inc., NIPS’13. Red Hook, NY, USA, pp 3111–3119 7. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: ICLR 8. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: EMNLP 9. Joulin A, Grave E, Bojanowski P, Douze M, Jégou H, Mikolov T (2016) Fasttext.zip: compressing text classification models. ArXiv, vol.abs/1612.03651 10. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee L, Zettlemoyer L (2018) Deep contextualized word representations. In NAACL-HLT 11. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers), pp 4171–4186 12. Makrai M (2015) Comparison of distributed language models on medium-resourced languages. In: Tanács A, Varga V, Vincze V (eds) XI. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2015) 13. Kilimci ZH, Akyoku¸s S (2019) The evaluation of word embedding models and deep learning algorithms for Turkish text classification. In: 2019 4th ınternational conference on computer science and engineering (UBMK), pp 548–553
References
231
14. Toshevska M, Stojanovska F, Kalajdjieski J (2020) Comparative analysis of word embeddings for capturing word similarities. ArXiv, vol. abs/2005.03812 15. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neuralcomput 9:1735–1780 17. Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In: Proceedings of the ınternational conference on language resources and evaluation (LREC 2018) 18. Schweter S (2020) Berturk—bert models for turkish, 4 19. Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: INTERSPEECH
Chapter 19
The Unfairness of Collaborative Filtering Algorithms’ Bias Towards Blockbuster Items Emre Yalcin
19.1 Introduction Recommender systems are enhanced artificial intelligence solutions that assist individuals in their decision-making process by recommending appropriate items while filtering out irrelevant ones [1]. Due to their several benefits for both user and business sides, such systems have become more and more critical in recent years for many prominent digital platforms in different application domains such as Spotify, Netflix, Twitter, Booking.com, etc. Many practical algorithms with different types (e.g., content-, or knowledgebased filtering, collaborative filtering, hybrid methods) have been introduced so far to accomplish the recommendation task [2]. Collaborative filtering (CF) methods are the most prominent among them due to their efficiency and high performance. These methods mainly aim to predict a rating score for an item or produce a ranked list of relevant items for individuals by operating a rating matrix that contains users’ past preferences about items [3]. Although the performances of CF algorithms are usually evaluated with their capability for producing high-accurate recommendations, the literature has recently acknowledged that these algorithms are intrinsically biased towards some particular items because of their particular characteristics [4]. Unfortunately, such biases lead to producing recommendation lists where a few items have appeared too frequently while those in the big remaining part of the item catalog do not deserve enough attention [5–7]. This issue also ends up with recommending ranked item lists that are unqualified in terms of beyond-accuracy aspects such as coverage and diversity [8]. Also, the awareness of such biases of recommenders makes the system more vulnerable to social bots or fake reviews. Thus, the providers can manipulate the rating E. Yalcin (B) Computer Engineering Department, Sivas Cumhuriyet University, Sivas, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_19
233
234
19 The Unfairness of Collaborative Filtering Algorithms’ …
data to increase the appearance of their products in the produced recommendations and consequently boosting their sale rates. One of the most prominent of such biases is the well-known popularity bias problem, which is occurred when the most frequently rated items (i.e., popular) are more and more recommended than less rated ones (i.e., non-popular) in the produced ranked lists [9–11]. The literature on popularity bias, however, considers only the count of the provided ratings for the items by disregarding their magnitude when exploring such a bias and controlling this problem in the recommendations. Nevertheless, it is difficult to say that the popularity of an item always means individuals enormously like it. For instance, according to the provided review counts on Metacritic1 , a well-known website collecting user reviews for many products in several different types, Mortal Kombat is one of the most discussed movies of 2021. However, the mean of ratings provided for this movie is equal to 6.3 in a [1, 10] rating scale, meaning that individuals are usually neutral against this movie even if it is a huge hit. Also, such a trade-off between liking degree and popularity can be observed for many items in different application domains [12]. In this study, we, therefore, consider blockbuster items that are items not only popular but also highly liked by individuals, and investigate potential bias issues of CF algorithms towards such items. Moreover, we analyze how different CF algorithms propagate such a blockbuster bias differently for different individuals and compel users to unfair recommendations regarding their original propensities. In doing so, we first define three different groups of users based on their level of interest in blockbuster items and then scrutinize how CF algorithms’ blockbuster biases differently affect such user groups. Our comprehensive analysis shows that four prominent CF algorithms are strongly biased against blockbuster items, and individuals with the lowest level of interest in blockbuster items are mostly affected by such a bias. We also show that underappreciated items-focused users are more active in the system than others, and therefore they can be considered the most significant stakeholders whose expectations should be fulfilled appropriately by the system. In a nutshell, the main research questions we investigate in the present study are as follows: RQ1. How can we describe blockbuster items correctly? RQ2. What is the level of interest in blockbuster items for different users or groups of users, and how does the relation between blockbuster inclination and profile size change? RQ3. Do the CF algorithms induce an undesirable bias in their recommendations towards blockbuster items? RQ4. How do CF algorithms affect different groups of users wity varying levels in blockbuster items? We address RQ1 in Sect. 19.3 by adopting a practical method measuring the blockbuster level of the items. We address RQ2 in Sect. 19.4 by comprehensively analyzing user propensities for blockbuster items and their relationships with users’ 1
https://www.metacritic.com/.
19.2 Related Works
235
profile size. In Sect. 19.6, we address both RQ3 and RQ4 by investigating the potential blockbuster bias of CF algorithms and analyzing how this bias is differently affecting individuals.
19.2 Related Works The popularity bias issue is one of the most studied bias types in the domain of recommender systems to achieve more qualified recommendations, especially in terms of beyond-accuracy aspects such as diversity and coverage [13, 14]. The pioneering research on this topic has mainly focused on exploring the adverse effects of such a bias towards popular items in recommendations in different application domains such as movies [6, 7], music [15, 16], tourism activities [17], and online education [18]. Also, many recent studies have aimed at developing practical treatment procedures that control such a bias by increasing the visibility of the long-tail items (i.e., unpopular) in produced recommendations [5, 6, 19]. Also, the fairness of the recommendations has been gaining more and more attention in recent years. For instance, many studies have focused on recommending items belonging to different providers or categories [9] and eliminating algorithmic discrimination against individuals belonging to certain demographic information, genre, or race [20, 21]. Some recent studies have also correlated the fairness concept with the popularity bias problem and investigated how popularity bias in rating data leads the recommendations to deviate from the actual tastes of individuals according to how many popular items they expect to see in the recommended list [16, 22]. Such deviations are usually acknowledged as unfair, as users usually desire to receive recommendations where the ratio of popular and non-popular items is the same as those in their original profiles. In a similar study to ours [22], the authors have analyzed how CF algorithms’ biases towards popular items differently impact individuals with different degrees of interest in popular items and show that users with lower interest in popular items are affected the most by the popularity bias for movie recommendations. Such an analysis also have reproduced by [16] for the recommendations in the music domain. In another related study [20], the authors introduce the genre calibration concept: the produced recommendations should be consistent with the spectrum of items users have evaluated. For instance, if an individual has evaluated 60% romance and 40% action movies, it is expected to achieve the same pattern in the produced recommendations. Our study is quite different from the previous research, as we consider bias issues towards blockbuster items rather than popular ones and explore the potential unfairness of this bias in the recommendations without using the content information.
236
19 The Unfairness of Collaborative Filtering Algorithms’ …
19.3 Description of Blockbuster Items In this study, we use the blockbuster term to describe items consumed by a majority of the users and evaluated with high ratings. Here, one challenge is to determine whether an item is a blockbuster or not. To accomplish this task, we adopt a practical method measuring the blockbuster level of the items by combining their liking-degree and popularity properly [23]. Note that this method also addresses RQ1. Suppose that ru,i indicates the rating value given by user u for the item i, we first calculate the popularity of the i, called as pi , using the formula given in Eq. 19.1, and then compute the liking-degree of the i, called as ld i , as in Eq. 19.2. pi = 1 (19.1) u∈Ui
ldi =
u∈Ui
ru,i
|Ui |
(19.2)
where Ui denotes the set of users who provided a rating for the item i. The maximum value for pi equals to the number of all available individuals in the system, which can be occurred when all users provided a rating for the item i. On the other hand, the maximum value for ld i is the highest rating in the utilized rating scale, which occurs when all ratings provided for i are equal to the highest rating. Therefore, we first both transform pi and ld i values into [0,1] range using min-max normalization, and then combine them via harmonic mean to achieve Bi scores that describe the blockbuster level of the items, as in Eq. 19.3. Note that the utilized harmonic combination strategy helps balance the trade-off between likingdegree and popularity and to reach blockbuster scores where such two aspects of the items are represented more appropriately. Bi =
2 × p˜i × ld˜ i p˜i + ld˜ i
(19.3)
where p˜i and ld˜ i represent the normalized popularity and liking-degree value for the item i, respectively.
19.4 Blockbuster Bias in User Profiles In all of the following sections, we employ the well-known MovieLens-1M (ML1M) dataset containing 1,000,209 ratings of 6040 individuals about 3900 movies [23, 24]. Note that all provided ratings in this dataset are discrete and in the [1–5] rating scale. As explained in the Introduction section, the popularity bias problem usually origins from the imbalances in rating data towards popular items—a few popular
19.4 Blockbuster Bias in User Profiles
237
Fig. 19.1 Blockbuster score distribution of the items in the ML1M dataset. In the x-axis, items are sorted by descending order according to their calculated blockbuster scores
items have received the majority of the provided ratings while the remaining big part of the items have received only a few ratings. We, therefore, first investigate if there exist similar imbalances towards blockbuster items in rating data, as in [23]. To this end, we present the long-tail distribution of the blockbuster level of items in the ML1M dataset in Fig. 19.1. As shown in Fig. 19.1, a small portion of the items have significantly higher blockbuster scores than the other majority for the ML1M dataset. Such imbalance property of rating data where recommendation algorithms are trained may promote them to inevitably over-feature blockbuster items in their produced ranked lists and, as a result, lead to an undesirable bias in favor of such items in recommendations. However, the actual interests of individuals in blockbuster items may probably highly differ—some mainly concentrated on blockbuster items while some may interest in non-blockbuster ones mostly. Therefore, the system should provide recommendations that fit such different inclinations of individuals, as they desire to receive recommendations having the same pattern as their original profiles, by their nature. We consider users as different stakeholders for the system and investigate how the algorithms are capable of satisfying them according to their original propensities towards blockbuster items.
19.4.1 The Propensities of Users for Blockbuster Items As emphasized in the previous section, users’ interests towards blockbuster items may differ highly, and the system should take care of their different expectations in the provided recommendations. In this section, we, therefore, explore the level of
238
19 The Unfairness of Collaborative Filtering Algorithms’ …
Fig. 19.2 The ratio of blockbuster items in profiles of users in ML1M dataset. In the x-axis, users are sorted in ascending order based on their level of interest in blockbuster items.
interest of users towards popular items. Figure 19.2 presents the ratio of blockbuster items in profiles of users. As shown in Fig. 19.2, while a small portion of users is tending to either extreme ends of the scale, the remaining ones show a differing level of interest in blockbuster items. Note that, in Fig. 19.2, to categorize if an item is blockbuster or not, we follow the famous Pareto principle [25]; we first sort items in descending order based on their calculated blockbuster score and refer to the top items received 20% of all provided ratings as the blockbuster items, and classify the remaining ones in the whole catalog as non-blockbuster items.
19.4.2 Profile Size and Blockbuster Bias This section attempts to explore whether there are considerable correlations between users’ profile size and their propensities towards blockbuster items in their profiles. To this end, we present a comprehensive analysis for exploring the correlation levels between users’ profile sizes versus the number of blockbuster items (see Fig. 19.3), the ratio of blockbuster items (see Fig. 19.4), and the average blockbuster scores of rated items (see Fig. 19.5) in their profiles. We also present correlation values between each pair and p values on top of the figures to demonstrate the concrete level of relationship and statistical significance of such correlations, respectively. Figure 19.3 demonstrates that as the size of users’ profiles proliferates, the number of blockbuster items existing in their profiles tends to grow since the likelihood of including a blockbuster item increases as well. The level of correlation and obtained
19.4 Blockbuster Bias in User Profiles
239
Fig. 19.3 Correlation of the number of blockbuster items in user profile and profile size
Fig. 19.4 Correlation of the ratio of blockbuster items in user profile and profile size
p-value indicate a strong correlation between these parameters. Figure 19.4 shows a counteract that even if the number of blockbuster items is high in large profiles, their relative percentage in the whole profile inevitably diminishes. This finding implies that individuals that interact more with the platform eventually get more interested in non-blockbuster items; thus, their opinions become more precious for the recommender system. Such observed negative correlation between the profile size and the blockbuster item rates is visible in the correlation value and statistically significant at 95% confidence level. Lastly, Fig. 19.5 similarly indicates the same fact from a different perspective by presenting the average blockbuster score of rated
240
19 The Unfairness of Collaborative Filtering Algorithms’ …
Fig. 19.5 Correlation of the average blockbuster score of items in user profile and profile size
items in each user profile. Based on the observed negative correlation level, it can be concluded that profiles tend to include lesser blockbuster items as they grow. These observations demonstrate that individuals who interact more with the system tend not to experience blockbuster items, and therefore desire appropriate recommendations complying with such needs. Also, since they provide more ratings in a diverse range of items, their contribution to the success of the recommender system is crucial. Therefore, the system should not disregard the satisfaction of such conducive users. Note that all analyses performed in this section address RQ2.
19.5 Different User Groups in Terms of Inclination for Blockbuster In this section, we divide users into three different groups as follows according to the ratio of blockbuster items in their profiles; [0, 0.2] ratio as Low, (0.2, 0.5) ratio as Moderate, and [0.5, 1] ratio as High. Table 19.1 gives detailed information about these constructed groups, such as the number of users and average profile size of users.
Table 19.1 Detailed information about the constructed user groups Group size Average profile size Group High Moderate Low
299 3784 1957
53.36 140.96 351.90
19.6 Algorithmic Propagation of Blockbuster Bias
241
As shown in Table 19.1, while most users fall in the moderate group, a minority of the users is classified as High. On the other hand, almost one-third of the population is categorized as Low. Also, these users have the most larger profile sizes on average, which is also parallel with the observations in the previous section. In other words, users in the Low category can be considered as the most interacted users with the system, even if they have not mainly focused on blockbuster items in their profiles.
19.6 Algorithmic Propagation of Blockbuster Bias Section 19.4 shows a strong bias towards blockbuster items; a few items have higher blockbuster scores while many others have relatively lower blockbuster scores. In this section, we attempt to analyze how different CF algorithms propagate such a bias into their produced recommendations. In doing so, we first investigate their general performance without considering how they perform for different user groups. We employ four prominent CF algorithms in two different families (i.e., neighborhood- and matrix factorization-based); UserKNN, ItemKNN, SVD, and SVD++ [26, 27]. When applying these algorithms, we follow the famous leave-one-out cross-validation experimentation methodology [28]. Accordingly, we have labeled one active user as the test set and used the rest of all users as the train set, and then produce predictions for the active user on all items by utilizing one of these CF algorithms on the train set. This process is repeatedly conducted for each user in the dataset. Finally, for each individual, we sort the items in descending order by their predicted ratings and then select top-N items as a recommendation list. Here, we also consider two different N values, 5 and 10, for scrutinizing how the recommendation list size influences the potential blockbuster bias in recommendations. Under these settings, Figs. 19.4 and 19.5 sketch the correlation between blockbuster score an item and the number of times these CF algorithms recommended it for Top-5 and Top-10 recommendation lists, respectively. As can be followed by Figs. 19.6 and 19.7, there are many items that are almost never recommended by all utilized CF algorithms for both top-5 and -10 recommendation lists. These are the items falling on the horizontal tail of the scatter plot. On the other hand, a few items with higher blockbuster scores appeared too frequently in the produced recommendation lists, leading to unqualified ranked lists dominated by such blockbuster items. In addition, the matrix factorization-based CF algorithms, i.e., SVD and SVD++, have the strongest correlation between the blockbuster score of an item and the number of times it is suggested, followed by ItemKNN and UserKNN, which also have a strong correlation. These observations conclude that the most prominent CF algorithms are strongly biased towards blockbuster items, and therefore such a bias, unfortunately, violates the beyond-accuracy quality of the recommendations such as diversity and coverage. Note that these findings also address RQ3.
242
19 The Unfairness of Collaborative Filtering Algorithms’ …
19.6.1 Blockbuster Bias in Recommendations for Different User Groups To address RQ4, we also evaluate how the CF algorithms perform in terms of keeping the correct ratio of blockbuster and non-blockbuster items suggested to different groups of users according to their desired value of such ratio. To accomplish this task, we propose the Group Average Blockbuster (GAB(g)) metric that measures the average blockbuster scores of items in the profiles of individuals in a particular group g or their recommendation lists. From the perspective of the recommendations produced for a given user group, the GAB(g)r measures the average blockbuster scores of recommended items to the individuals in that group and is calculated using the formula given in Eq. 19.4. From the view of users’ profiles, on the other hand, the GAB(g) p measures the average blockbuster scores of the items evaluated by the individuals in that group and is computed by Eq. 19.5.
G AB(g)r =
i∈Nu
(19.4)
|g|
G AB(g) p =
Bi
|Nu |
u∈g
i∈ pu
u∈g
|g|
| pu |
Bi
(19.5)
where g represents the group of users (in our case, it is either High, Moderate or Low) and Bi denotes the blockbuster score of a particular item i, calculated by Eq. 19.3.
Fig. 19.6 The correlation between the blockbuster score of the items and the number of times they have recommended using different CF algorithms for Top-5 recommendations
19.6 Algorithmic Propagation of Blockbuster Bias
243
Fig. 19.7 The correlation between the blockbuster score of the items and the number of times they have recommended using different CF algorithms for Top-10 recommendations
Also, Nu is the produced recommendation list for user u and Pu is the list of the items in the profile of user u. For each utilized CF algorithm, we measure the change in GAB(g)r and GAB(g) p , which is the quantity of unwanted blockbuster bias imposed by the CF algorithm to each defined group, as in Eq. 19.6. Accordingly, the value of G AB = 0 indicates a fair representation of users’ propensities towards blockbuster level in the recommendations. On the other hand, positive G AB values indicate more blockbuster bias in the produced recommendations than user profiles, while negative G AB values mean less blockbuster bias in the produced recommendations than user profiles. Note also that the proposed G AB metric is developed inspired by Abdollahpouri et al. [22]. G AB(g)r − G AB(g) p (19.6) G AB = G AB(g) p Figure 19.8 depicts the change in the group average blockbuster (G AB) in the defined three user groups using various CF algorithms for both top-5 and -10 recommendation lists. As shown in Fig. 19.8, for both recommendation lists, users in the Low group have positive and the largest G AB for all utilized CF algorithms. This observation means that the utilized CF algorithms are highly unfair to such users since they receive recommendations that are much more blockbuster compared to their profiles. On the other hand, G AB values of users in the High group usually vary around 0, meaning that they get recommendations having almost the same blockbuster levels with their profiles. Also, for all defined user groups, matrix factorization-based CF algorithms, i.e., SVD and SVD++, imposes the highest G AB values in general.
244
19 The Unfairness of Collaborative Filtering Algorithms’ …
Fig. 19.8 G AB results of different CF algorithms in user groups
These findings show the unfair nature of blockbuster bias in the recommendations and how its impacts on different individuals vary according to how interested they are in blockbuster items.
19.7 Conclusion and Future Work Because of the biases in rating data towards blockbuster items, various recommendation algorithms usually propagate that bias by suggesting blockbuster items too frequently while not giving enough chance to non-blockbuster ones. In this paper, we evaluate such a problem from a user perspective and comprehensively analyze different inclinations of individuals towards blockbuster items. Also, we further define three different groups of users, i.e., High, Moderate, and Low, based on the degree of their interest towards blockbuster items and explore how the blockbuster bias in recommendations differently affects these groups. Our experiments show that four prominent CF algorithms usually recommend items that
References
245
are much more blockbuster than what the individuals in those groups (especially in Low and Moderate groups) have evaluated. Our future directions include employing more datasets for our analysis. We will also explore other aspects of the user groups’ satisfaction, such as how diverse their recommendations are and relevant they are. Acknowledgements This work is supported by the Scientific Research Project Fund of Sivas Cumhuriyet University under grant no. M-2021-811.
References 1. Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. KnowlBased Syst 46:109–132 2. Chen R, Hua Q, Chang YS, Wang B, Zhang L, Kong X (2018) A survey of collaborative filtering-based recommender systems: From traditional methods to hybrid methods based on social networks. IEEE Access 6:64301–64320 3. Batmaz Z, Yurekli A, Bilge A, Kaleli C (2019) A review on deep learning for recommender systems: challenges and remedies. Artif Intell Rev 52(1):1–37 4. Chen J, Dong H, Wang X, Feng F, Wang M, He X (2020) Bias and debias in recommender system: a survey and future directions. arXiv preprint arXiv:2010.03240 5. Yalcin E, Bilge A (2021) Investigating and counteracting popularity bias in group recommendations. Inf Proc Manage 58(5):102608 6. Abdollahpouri H, Burke R, Mobasher B (2017) Controlling popularity bias in learning-to-rank recommendation. In: Proceedings of the eleventh ACM conference on recommender systems, pp 42–46 7. Boratto L, Fenu G, Marras M (2021) Connecting user and item perspectives in popularity debiasing for collaborative recommendation. Inf Proc Manage 58(1):102387 8. Jannach D, Lerche L, Kamehkhosh I, Jugovac M (2015) What recommenders recommend: an analysis of recommendation biases and possible countermeasures. User Model and User-Adap Inter 25(5):427–491 9. Ekstrand MD, Tian M, Azpiazu IM, Ekstrand JD, Anuyah O, McNeill D, Pera MS (2018) All the cool kids, how do they fit in?: popularity and demographic biases in recommender evaluation and effectiveness. In: Conference on fairness, accountability and transparency. PMLR, pp 172– 186 10. Chen C, Zhang M, Liu Y, Ma S (2018) Missing data modeling with user activity and item popularity in recommendation. In: Asia information retrieval symposium. Springer, pp 113– 125 11. Abdollahpouri H, Burke R, Mobasher B (2018) Popularity-aware item weighting for long-tail recommendation. arXiv preprint arXiv:1802.05382 12. Cremonesi P, Garzotto F, Negro S, Papadopoulos AV, Turrin R (2011) Looking for “good” recommendations: a comparative evaluation of recommender systems. In: IFIP conference on human-computer interaction. Springer, pp 152–168 13. Vargas S, Baltrunas L, Karatzoglou A, Castells P (2014) Coverage, redundancy and sizeawareness in genre diversity for recommender systems. In: Proceedings of the 8th ACM conference on recommender systems, pp 209–216 14. Kunaver M, Požrl T (2017) Diversity in recommender systems-a survey. Knowl-Based Syst 123:154–162 15. Jannach D, Kamehkhosh I, Bonnin G (2016) Biases in automated music playlist generation: a comparison of next-track recommending techniques. In: Proceedings of the 2016 conference on user modeling adaptation and personalization, pp 281–285 (2016)
246
19 The Unfairness of Collaborative Filtering Algorithms’ …
16. Kowald D, Schedl M, Lex E (2020) The unfairness of popularity bias in music recommendation: a reproducibility study. In: European conference on information retrieval. Springer, pp 35–42 17. Sánchez P (2019) Exploiting contextual information for recommender systems oriented to tourism. In: Proceedings of the 13th ACM conference on recommender systems, pp 601–605 18. Boratto L, Fenu G, Marras M (2019) The effect of algorithmic bias on recommender systems for massive open online courses. In: European conference on information retrieval. Springer, pp 457–472 19. Abdollahpouri H, Burke R, Mobasher B (219) Managing popularity bias in recommender systems with personalized re-ranking. In: The thirty-second international flairs conference 20. Steck H (2018) Calibrated recommendations. In: Proceedings of the 12th ACM conference on recommender systems, pp 154–162 21. Zhu Z, Hu X, Caverlee J (2018) Fairness-aware tensor-based recommendation. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 1153–1162 22. Abdollahpouri H, Mansoury M, Burke R, Mobasher B (2019) The unfairness of popularity bias in recommendation. arXiv preprint arXiv:1907.13286 23. Yalcin E (2021) Blockbuster: a new perspective on popularity-bias in recommender systems. In: 2021 6th international conference on computer science and engineering (UBMK). IEEE, pp 107–112 24. Yalcin E, Ismailoglu F, Bilge A (2021) An entropy empowered hybridized aggregation technique for group recommender systems. Expert Syst Appl 166:114111 25. Sanders R (1987) The pareto principle: its use and abuse. J Serv Market 1(2):37–40 26. Koren Y (2010) Factor in the neighbors: scalable and accurate collaborative filtering. ACM Trans Knowl Disc Data (TKDD) 4(1):1–24 27. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 426–434 28. Vehtari A, Gelman A, Gabry J (2017) Practical bayesian model evaluation using leave-one-out cross-validation and waic. Stat Comput 27(5):1413–1432
Chapter 20
Improved Gradient-Based Optimizer with Dynamic Fitness Distance Balance for Global Optimization Problems Durdane Ay¸se Ta¸sci, Hamdi Tolga Kahraman, Mehmet Kati, and Cemal Yilmaz
20.1 Introduction One of the first studies on evolutionary-based metaheuristic search (MHS) algorithms is the development of the Genetic Algorithm (GA) [1]. Differential Evolution (DE) [2], harmony search (HS) [3], simulated annealing (SA) [4], ant colony optimization (ACO) [5], particle swarm optimization (PSO) [6], artificial bee colony (ABC) [7], Cuckoo Search (CS) [8], gravitational search algorithm (GSA) [9] are among the best known heuristic search methods. A number of powerful meta-heuristic search algorithms have been introduced in recent years. LSHADE [10], EBOCMAR [11], BSA [12], WDE [13], COA [14], SFS [15], AGDE [16], TLABC [17], MRFO [18], AEO [19], Chameleon Swarm Algorithm (CSA) [20], capuchin search algorithm (CapSA) [21], adaptive opposition slime mould algorithm (AOSMA) [22] are some of these up-to-date and powerful algorithms. However, it is still not possible to talk about the strongest algorithm. Because there is no algorithm that can provide superiority in all problems against its competitors. Meta-heuristic search algorithms show different performances based on the problem. In addition, some algorithms have more efficient convergence performance compared to their competitors. There are two main factors that determine the search performance of algorithms. These are guide selection process and D. A. Ta¸sci (B) · H. T. Kahraman Software Engineering of Technology Faculty, Karadeniz Technical University, 61080 Trabzon, Turkey e-mail: [email protected] M. Kati HAVELSAN, Ankara, Turkey e-mail: [email protected] C. Yilmaz Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_20
247
248
20 Improved Gradient-Based Optimizer …
exploitation-exploration processes. The guide selection determines the direction of the reference points and new positions that the algorithms follow in the search space. The guides used in the exploitation-exploration operators of the algorithms, that is, in the convergence formulas, have a great influence on the successful outcome of the search process. An effective method introduced in recent years for guide selection is Fitness-Distance Balance [23]. FDB based stochastic fractal search (FDB-SFS) [24], FDB-based Adaptive Guided Differential Evolution (FDB-AGDE) [25], Lévy flight and FDB-based coyote optimization algorithm (LRFDB-COA) [26], Supply– Demand-Based Optimization Algorithm with FDB (FDB-SDO) [27] algorithms were developed using the FDB selection method. The dynamic FDB (dFDB) [28] method was successfully introduced to dynamically adjust the fitness and distance effects in the FDB method and to improve the balanced search performance of MHS algorithms. In this study, research was conducted to improve the search performance of the Gradient Based Optimization (GBO) algorithm [29], which is one of the most up-todate and powerful MHS algorithms in the literature. In order to improve the search performance of GBO, solution candidates that guide the search process were determined by using the dynamic-fitness distance balance (dFDB) [28] method. The CEC 2020 benchmark suite [30] was used to test and verify the performance of the developed dFDB-based GBO algorithm. There are four different types and ten unconstrained test problems in this pool. In order to test and verify the performance of the proposed algorithm in different types and sizes of search spaces, test problems are designed in 30/50/100 dimensions. In addition, five different engineering design problems are used to test and verify the performance of the proposed algorithm in constrained engineering problems. Data from both experimental studies were analyzed using non-parametric statistical test methods Wilcoxon and Friedman. As a result of the studies, it has been seen that the dFDB method provides a significant performance increase on the balanced search capability of the GBO algorithm.
20.2 Related Works In this section, firstly, the design and operation of the Gradient Based Optimization (GBO) algorithm and dynamic FDB method are summarized. Secondly, the guide selection mechanism of the GBO algorithm is designed using the dynamic-FDB (dFDB) method and the pseudocode of the proposed algorithm is presented.
20.2.1 GBO The GBO algorithm is a gradient-based algorithm inspired by Newton’s search method. Combining gradient and population-based methods, the search direction in GBO is determined by Newton’s method for exploring the search space using a set of vectors and two operators. In gradient methods and most other optimization methods, a search direction is chosen and the search process proceeds in that direction to the optimum solution [29]. Exploring the search directions in these methods must determine the derivatives of the objective function along with the constraints. This type of optimization has two main disadvantages:
20.2 Related Works
249
(1) the convergence rate is very slow, and (2) there is no guarantee of reaching the optimal solution [24, 29, 31]. Let the population be represented by P in the GBO. Accordingly, let the number of solution candidates be represented by n and the iteration number by m. The solution candidate with the best fitness value is Pbest . LEO is used to improve the performance of the proposed GBO algorithm to solve complex problems. This operator can significantly change the position of the solution Pnm+1 . LEO, which includes the m m m best position (Pbest ), solutions P1m n and P2n , two random solutions Pr 1 and Pr 2 , and m a new randomly generated solution (Pk ) produces a solution with superior performance (PLmE O ) using various solutions. Solution PLmE O is generated by the following Eq. (20.1) and Eq. (20.2) [29]: PLmE O =Pnm+1 + f 1 × u 1 × Pbest − u 2 × Pkm m m + f 2 × p1 × u 3 × P2nm − P1m n + u 2 × Pr 1 − Pr 2 /2 PLmE O =Pbest + f 1 × u 1 × Pbest − u 2 × Pkm + f 2 × p1 m m × u 3 × P2nm − P1m n + u 2 × Pr 1 − Pr 2 /2
(20.1)
(20.2)
In Eqs. (20.1 and 20.2), f 1 is a uniform random number in the range [−1,1]. f 2 is a random number from the normal distribution with a mean of 0 and a standard deviation of 1, and u1 , u2 and u3 are three random numbers defined as given in Eq. 20.3 [29]: u1 = u3 =
2 × rand i f μ1 < 0.5 , u2 = 1 else
rand i f μ1 < 0.5 1 else
rand i f μ1 < 0.5 1 else
(20.3)
In Eq. (20.3), rand is a random number in the range [0, 1] and µ1 is a number in the range [0, 1] [29].
20.2.2 Dynamic Fitness-Distance Balance (dFDB) Solution candidates that guide the search process in MHS algorithms should be selected from the population. The most frequently used guide selection method in this process is the greedy one. When the greedy selection method is used, the solution candidates with the best fitness value in the population are selected as a guide. This situation destroys the diversity within the population in a short time. After certain stages of the search process, algorithms cannot provide enough diversity. Considering this situation, the dFDB method has been developed as a guide selection mechanism that supports diversity in the search process of the GBO algorithm. In this method, the effect of the distance is increased to strengthen the diversity of the population, while the effect of the fitness value is increased for a sensitive neighborhood search.
250
20 Improved Gradient-Based Optimizer …
In the MHS algorithm, a population of k-dimensional and n-number vectors is represented by P. Let F be the vector representing the fitness values of the solution candidates in the population. Suppose that the best solution candidate in the population be represented by the Pbest vector. According to the dFDB method, Eq. 20.4 is used to calculate the distance of each solution candidate in P from Pbest [28]. n x=1 ∀D(Px ,
Pbest ) ≡ D Px ←
/
1 Px1 − Pbest
2
k + . . . + (Pxk − Pbest )2
(20.4)
The fitness value of the x-th solution candidate in P is F x . The dFDB score (S Px ) of the x-th solution candidate,∀nx=1 F x ∈ F, is calculated using Eq. (20.5) [28]: S Px = wd F D B ∗ nor m Fx + (1 − wd F D B ) ∗ nor m D Px
(20.5)
In Eq. 20.5, the vector of normalized fitness values of the vectors in P is represented by the vector nor m F x , and the vector of normalized distance values of the vectors in P is represented by D P x . wd F D B is a weighting coefficient used to make the dynamic FDB method. The wd F D B weight coefficient is calculated as given in Eq. (20.6) [28]. wd F D B =
l ∗ (1 − lb) + lb max F Es
(20.6)
In Eq. 20.6, the current value of the objective function evaluation number is represented by the parameter l, the maximum number of evaluations of the objective function is represented by the maxFEs parameter, and the minimum value of wd F D B is represented by the lb parameter. According to these explanations, using Eq. (20.5), the dFDB score of each solution candidate in P is calculated and the vector with the largest score is chosen as the guide. When more than one guide needs to be selected, the solution candidates are ranked according to their dFDB scores and a mating pool is created from the ones with the highest scores [28].
20.2.3 Improved GBO with Dynamic Fitness Distance Balance In this section, the guide selection process in the GBO algorithm is redesigned using the dFDB method. In order to overcome the premature convergence problem in the GBO algorithm and to provide the intensification-diversification balance, the dFDB selection method was used to determine the vectors that guide the search process. Since the dFDB selection method has an effect that supports diversity, it is seen that the GBO algorithm is effective in balancing the derivative-based neighborhood search process. Thus, the balance between the neighborhood search and diversity capabilities of the GBO is also improved. Variations with dFDB have been created to improve the performance of the GBO algorithm. All of these variations, designed to be effective in balancing the derivativebased neighborhood search process of the GBO algorithm, were used in the Local
20.2 Related Works
251
Escaping Operator (LEO) process. The design changes have been on the best position in the LEO. Accordingly, the equations for the LEO of the four different GBO variations and the base model of the GBO designed in this article study are summarized in Table 20.1. Table 20.1 LEO equations in GBO [29] and variations of dFDB-GBO (proposed algorithm) Algorithms
LEO equations
GBO
if (rand < 0.5) X YmAO = X nm+1 + f 1 × u 1 × xbest − u 2 × xkm + f 2 × m m p1 × u 3 × X 2nm − X 1m n + u 2 × xr 1 − xr 2 /2 else X m = xbest + f 1 × u 1 × xbest − u 2 × xkm + f 2 × p1 × Y AO m m m u 3 × X 2n − X 1m n + u 2 × xr 1 − xr 2 /2 End
Case-1
20% of the search process lifecycle uses LEO equations from GBO 80% of the search process lifecycle uses LEO equations arranged as follows with dFDB if (rand < 0.5) X m = X d f db + f 1 × u 1 × xbest − u 2 × xkm + f 2 × p1 × Y AO m m m u 3 × X 2n − X 1m n + u 2 × xr 1 − xr 2 /2 else X YmAO = xbest + f 1 × u 1 × xd f db − u 2 × xkm + f 2 × p1 × m m u 3 × X 2nm − X 1m n + u 2 × xr 1 − xr 2 /2 end
Case-2
20% of the search process lifecycle uses LEO equations from GBO 80% of the search process lifecycle uses LEO equations arranged as follows with dFDB if (rand < 0.5) X m = X m+1 + f 1 × u 1 × xbest − u 2 × xkm + f 2 × p1 × Y AO mn m u 3 × X 2n − X 1m n + u 2 × x d f db − xr 2 /2 else X m = X d f db + f 1 × u 1 × xbest − u 2 × xkm + f 2 × p1 × Y AO m m m u 3 × X 2n − X 1m n + u 2 × xr 1 − xr 2 /2 end
Case-3
60% and 20% of the search process lifecycle uses LEO equations from GBO if (rand < 0.5) 40% of the search process lifecycle uses LEO equation arranged as follows with dFDB X m = X m+1 + f 1 × u 1 × xbest − u 2 × xkm + f 2 × p1 × Y AO mn m u 3 × X 2n − xd f db + u 2 × xr 1 − xrm2 /2 else 80% of the search process lifecycle uses LEO equation arranged as follows with dFDB X YmAO = X best + f 1 × u 1 × xd f db − u 2 × xkm + f 2 × p1 × m m u 3 × X 2nm − X 1m n + u 2 × xr 1 − xr 2 /2 end
252
20 Improved Gradient-Based Optimizer …
Each variation given in Table 20.1 corresponds to new designs of guide positions used in the Local Escaping Operator in the GBO algorithm. Table 20.1 shows the changes made on the GBO algorithm and the percentage of the dFDB method applied. In a single variation, the dFDB method was applied to select two design parameters at the same time. The effect of this method on the performance of the algorithm was investigated by conducting a large-scale experimental study and the results are given below.
20.3 Experimental Study In this section, experimental study settings and analysis results are given. Experimental settings, benchmark problems, constrained engineering design problems and statistical analyzes of experimental study results are given respectively in the execution of the work.
20.3.1 Settings An experimental study was conducted to test and explore improvements in neighborhood search and diversity performance of the GBO algorithm. The main purpose of the study is to reveal the effect of the dFDB method on the performance of the base algorithm. Four different types of unconstrained comparison problems were used in experimental studies. The performances of the base model of the GBO algorithm and the dFDB-GBO variations in search spaces of three different dimensions (30/50/100) were tested. The following procedures were followed in conducting the experimental studies: • The conditions defined at the CEC 2020 conference [30] are taken as reference for experimental work settings. • In setting the parameters of the GBO algorithm, the settings given in its own study, namely population size and other settings, were taken as reference [29]. • With the parameter d representing the problem size of the algorithm, 10.000*d operation is performed and the termination criterion is defined over the maximum number of evaluations of the objective function [30]. • Dynamically resizable CEC 2020 comparison functions are used to reveal the performance of the proposed method in low, middle and high dimensional search fields.
20.3 Experimental Study
253
Table 20.2 CEC 2020 comparison problems and features [30] Problem type
Definition
Unimodal
It is used to test the local search capabilities of algorithms
Basic
It is used to test the global search (diversity) capabilities of algorithms
Hybrid
It is used to investigate the balanced search capabilities of algorithms
Composition
It is used to test the performance of algorithms in search spaces with high complexity
20.3.2 Benchmark Problems There are four types of problems in CEC 2020 benchmark suite [30]. These are single mode, basic, hybrid and composition types. Unimodal problems were developed to measure the convergence performance of algorithms, multimodal problems were developed to measure the diversity performance of algorithms, hybrid functions were developed to measure both capabilities (convergence and avoidance of traps), and composition functions were developed to measure the balanced search capabilities of algorithms. The following subsections provide information about the problems. The test suite of the CEC 2020 benchmark suite includes 1 single mode, 3 simple modes, 3 hybrid and 3 composition type functions (Table 20.2).
20.3.3 Constrained Engineering Design Problems Five real engineering problems were used to test and verify the performance of the proposed method in constrained engineering design problems. Information about the problems is given in Table 20.3. Then the problems are introduced and their functions are given. Table 20.3 Constrained engineering design problems and features No
Problem
Number of constraints
Number of parameters
P1
Planetary gear train design optimization problem
11
6
P2
Blending-pooling-separation problem
32
38
P3
Propane, isobutane, n-butane nonsharp separation
38
48
P4
Optimal design of industrial refrigeration System
15
14
P5
Step-cone pulley problem
3
5
254
20 Improved Gradient-Based Optimizer …
20.3.3.1
Planetary Gear Train Design Optimization Problem
The main purpose of this problem is to minimize the errors in the gear ratio used in automobiles. To minimize the maximum error, the total gear-tooth number is calculated for an automatic planetary transmission system. This problem includes six integer variables and 11 constraints of different geometric, assembly constraints. The problem can be defined as follows [30]. Objective Function: f (x) = max|i k − i 0k |, k = {1, 2, . . . , R}, i1 = IR = −
N6 (N4 N3 + N2 N4 ) N6 , i 0R = −3.11 , i 01 = 3.11, i 2 = N4 N1 N3 (N6 − N4 )
N2 N6 , i 02 = 1.84, x = { p, N6 , N5 , N4 , N3 , N2 , N1 , m 2 , m 1 } N1 N3
Constraints: g1 (x) = m 3 (N6 + 2.5) − Dmax ≤ 0, g2 (x) = m 1 (N1 + N2 ) + m 1 (N2 + 2) − Dmax ≤ 0, g3 (x) = m 3 (N4 + N5 ) + m 3 (N5 + 2) − Dmax ≤ 0, g4 (x) = |m 1 (N1 + N2 ) − m 3 (N6 − N3 )| − m 1 − m 3 ≤ 0, π + N2 + 2 + δ22 ≤ 0, ρ π + N3 + 2 + δ23 ≤ 0, g6 (x) = −(N6 + N3 ) sin ρ π + N5 + 2 + δ55 ≤ 0, g7 (x) = −(N4 + N5 ) sin ρ
g5 (x) = −(N1 + N2 ) sin
g8 (x) =(N3 + N5 + 2 + δ35 )2 − (N6 − N3 )2 − (N4 + N5 )2 2π − β ≤ 0, + 2(N6 − N3 )(N4 + N5 )cos ρ g9 (x) = N4 − N6 + 2N5 + 2δ56 + 4 ≤ 0,
20.3 Experimental Study
255
g10 (x) = 2N3 − N6 + N4 + 2δ34 + 4 ≤ 0, h 1 (x) =
N6 − N4 = integer, ρ
δ22 = δ33 = δ55 = δ35 = δ56 = 0.5, cos−1 (N4 + N5 )2 + (N6 − N3 )2 − (N4 + N5 )2 , β= 2(N6 − N3 )(N4 + N5 ) Dmax = 220, Bounds: p = (3, 4, 5), m 1 = (1.75, 2.0, 2.25, 2.5, 2.75, 3.0), m 3 = (1.75, 2.0, 2.25, 2.5, 2.75, 3.0), 17 ≤ N1 ≤ 96, 14 ≤ N2 ≤ 54, 14 ≤ N3 ≤ 51 17 ≤ N4 ≤ 46, 14 ≤ N5 ≤ 51, 48 ≤ N6 ≤ 124 Ni = integer.
20.3.3.2
Blending-Pooling-Separation Problem
This problem involves a three-component feed mix used to separate two multicomponent outputs using separators and by separation/mixing/pooling. The operating cost of each separator is linearly dependent on the flow rate of the separator and constraints based on the relationship of mass balances around the individual separators, separators and mixers [30]. Objective Function: f (x) = 0.9979 + 0.00432x5 + 0.0117x13 Constraints:
256
20 Improved Gradient-Based Optimizer …
h 1 (x) = x4 + x3 + x2 + x1 = 300, h 2 (x) = x6 − x8 − x7 = 0, h 3 (x) = x9 − x11 − x10 − x12 = 300, h 4 (x) = x14 − x16 − x17 − x15 = 0, h 5 (x) = x18 − x20 − x19 = 0, h 6 (x) = x5 x21 − x6 x22 − x9 x23 = 0, h 7 (x) = x5 x24 − x6 x25 − x9 x26 = 0, h 8 (x) = x5 x27 − x6 x28 − x9 x29 = 0, h 9 (x) = x13 x30 − x14 x31 − x18 x32 = 0, h 10 (x) = x13 x33 − x14 x34 − x18 x35 = 0, h 11 (x) = x13 x36 − x14 x37 − x18 x35 = 0, h 12 (x) = 0.333x1 + x15 x31 − x5 x21 = 0, h 13 (x) = 0.333x1 + x15 x34 − x5 x24 = 0, h 14 (x) = 0.333x1 + x15 x37 − x5 x27 = 0, h 15 (x) = 0.333x2 + x10 x23 − x13 x30 = 0, h 16 (x) = 0.333x2 + x10 x26 − x13 x33 = 0, h 17 (x) = 0.333x2 + x10 x29 − x13 x36 = 0, h 18 (x) = 0.333x3 − x7 x22 + x11 x23 − x16 x31 + x19 x32 = 30,
20.3 Experimental Study
257
h 19 (x) = 0.333x3 − x7 x25 + x11 x26 − x16 x34 + x19 x35 = 50, h 20 (x) = 0.333x3 − x7 x28 + x11 x29 − x16 x37 + x19 x38 = 30, h 21 (x) = x21 + x24 + x27 = 1, h 22 (x) = x22 + x25 + x28 = 1, h 23 (x) = x23 + x26 + x29 = 1, h 24 (x) = x30 + x33 + x36 = 1, h 25 (x) = x31 + x34 + x37 = 1, h 26 (x) = x32 + x35 + x38 = 1, h 27 (x) = x25 = 0, h 28 (x) = x28 = 0, h 29 (x) = x23 = 0, h 30 (x) = x37 = 0, h 31 (x) = x32 = 0, h 32 (x) = x35 = 0, Bounds: 0 ≤ x1 , x3 , x8 , x9 , x5 , x6 , x14 , x18 , x10 , x16 , x13 , x20 ≤ 90 0 ≤ x2 , x4 , x7 , x11 , x12 , x15 , x17 , x19 ≤ 150 0 ≤ x21 , x23 , x24 , x25 , x27 , x28 ≤ 1
258
20 Improved Gradient-Based Optimizer …
0 ≤ x22 , x32 , x34 , x35 , x37 , x38 ≤ 1.2 0 ≤ x26 , x29 , x30 , x31 , x33 , x36 ≤ 0.5
20.3.3.3
Propane, Isobutane, n-Butane Nonsharp Separation
This test problem involves a three-component feed mix necessary to separate products into two three-component products. The problem is defined as a nonlinear constrained optimization problem and has the following form [30]. Objective Function: f (x) =c11 + (c21 + c31 x24 + c41 x28 + c51 x33 + c61 x34 )x5 + c12 + (c22 + c32 x26 + c42 x31 + c52 x38 + c62 x39 )x13 , i=1
c
0.23947
c1i
i=2 0.75835
c2i
-0.0139904
-0.0661588
c3i
0.0093514
0.0338147
c4i
0.0077308
0.373349
c5i
-0.0005719
0.0016371
c6i
0.0042656
0.0288996
Constraints: h 1 (x) = x4 + x3 + x2 + x1 = 300, h 2 (x) = x6 − x8 − x7 = 0, h 3 (x) = x9 − x12 − x10 − x11 = 0, h 4 (x) = x14 − x17 − x15 − x16 = 0, h 5 (x) = x18 − x20 − x19 = 0, h 6 (x) = x6 x21 − x24 x25 = 0, h 7 (x) = x14 x22 − x26 x27 = 0, h 8 (x) = x9 x23 − x28 x29 = 0, h 9 (x) = x18 x30 − x31 x32 = 0, h 10 (x) = x25 − x5 x33 = 0, h 11 (x) = x29 − x5 x33 = 0, h 12 (x) = x35 − x5 x36 = 0, h 13 (x) = x37 − x13 x38 = 0, h 14 (x) = x27 − x13 x39 = 0,
20.3 Experimental Study
259
h 15 (x) = x32 − x13 x40 = 0, h 16 (x) = x25 − x6 x21 − x9 x41 = 0, h 17 (x) = x29 − x6 x42 − x9 x23 = 0, h 18 (x) = x35 − x6 x43 − x9 x44 = 0, h 19 (x) = x37 − x14 x45 − x18 x46 = 0, h 20 (x) = x27 − x14 x22 − x18 x47 = 0, h 21 (x) = x32 − x14 x48 − x18 x30 = 0, h 22 (x) = 0.333x1 + x15 x45 − x25 = 0, h 23 (x) = 0.333x1 + x15 x22 − x29 = 0, h 24 (x) = 0.333x1 + x15 x48 − x35 = 0, h 25 (x) = 0.333x2 + x10 x41 − x37 = 0, h 26 (x) = 0.333x2 + x10 x23 − x27 = 0, h 27 (x) = 0.333x2 + x10 x44 − x32 = 0, h 28 (x) = 0.333x3 + x7 x21 + x11 x41 + x16 x45 + x19 x46 = 30, h 29 (x) = 0.333x3 + x7 x22 + x11 x23 + x16 x22 + x19 x47 = 50, h 30 (x) = 0.333x3 + x7 x43 + x11 x44 + x16 x48 + x19 x30 = 30, h 31 (x) = x33 + x34 + x36 = 1, h 32 (x) = x21 + x42 + x43 = 1, h 33 (x) = x41 + x23 + x44 = 1, h 34 (x) = x38 + x39 + x40 = 1, h 35 (x) = x45 + x22 + x48 = 1, h 36 (x) = x46 + x47 + x30 = 1, h 37 (x) = x43 = 0, h 38 (x) = x46 = 0,
260
20 Improved Gradient-Based Optimizer …
Bounds: 0 ≤ x1 , . . . , x20 ≤ 150; 0 ≤ x25 , x27 , x32 , x35 , x37 , x29 ≤ 30; 0 ≤ x21 , x22 , x23 , x30 , x33 , x34 , x36 , x37 , x38 , x39 , x40 , x41 , x42 , x43 , x44 , x45 ≤ 1; 0 ≤ x46 , x47 , x48 ≤ 1; 0.85 ≤ x24 , x26 , x28 , x31 ≤ 1
20.3.3.4
Optimal Design of Industrial Refrigeration System
The mathematical model of this problem is explained in [32–34]. This problem can be formulated as a nonlinear inequality constrained optimization problem and has the following form [30, 34]: Minimize: f (x) =63098.88x2 x4 x12 + 5441.5x22 x12 + 115055.5x21.664 x6 + 6172.27x22 x6 + 63098.88x1 x3 x11 + 5441.5x12 x11 + 115055.5x11.664 x5 + 6172.27x12 x5 + 140.53x1 x11 + 281.29x3 x111 + 70.26x12 + 281.29x1 x3 + 281.29x32 −1 2 0.3424 0.316 2 + 14437x81.8812 x12 x10 x14 x1 x7 x9−1 + 20470.2x72.893 x11 x1
Constraints: g1 (x) = 1.524x7−1 ≤ 1, g2 (x) = 1.524x8−1 ≤ 1, g3 (x) = 0.07789x1 − 2x7−1 x9 − 1 ≤ 0, −1 g4 (x) = 7.05305x9−1 x12 x10 x8−1 x2−1 x14 − 1 ≤ 0, −1 g5 (x) = 0.0833x13 x14 − 1 ≤ 0, −1 2.1195 2.1195 −1 0.2 −1 g6 (x) = 47.136x20.333 x10 x12 − 1.333x8 x13 + 62.08x13 x12 x8 x10 − 1 ≤ 0, 0.3424 g7 (x) = 0.04771x10 x81.8812 x12 − 1 ≤ 0,
20.3 Experimental Study
261
0.316 g8 (x) = 0.0488x9 x71.893 x11 − 1 ≤ 0,
g9 (x) = 0.0099x1 x3−1 − 1 ≤ 0, g10 (x) = 0.0193x2 x4−1 − 1 ≤ 0, g11 (x) = 0.0298x1 x5−1 − 1 ≤ 0, g12 (x) = 0.056x2 x6−1 − 1 ≤ 0, g13 (x) = 2x9−1 − 1 ≤ 0, −1 g14 (x) = 2x10 − 1 ≤ 0, −1 g15 (x) = x12 x11 − 1 ≤ 0,
Bounds: 0.001 ≤ xi ≤ 5, i = 1, . . . , 14.
20.3.3.5
Step-Cone Pulley Problem
The main objective of this problem is to minimize the weight of the 4-stage cone pulley by using five variables, four of which are the diameter of each step of the pulley and the last one is the width of the pulley. This problem includes 11 non-linear constraints to ensure the transmit power is at 0:75 hp. The mathematical formulation of this problem can be defined as follows [30]. Minimize:
2
N1 2 N2 2 f (x) =ρω 11 + + d2 1 + N N 2 2
N4 N3 +d32 1 + + d42 11 + N N d12
Constraints:
262
20 Improved Gradient-Based Optimizer …
h 1 (x) = C1 − C2 = 0, h 2 (x) = C1 − C3 = 0, h 3 (x) = C1 − C4 = 0, gi=1,2,3,4 (x) = −Ri ≤ 2, gi=5,6,7,8 (x) = (0.75 × 745.6998) − Pi ≤ 0, 2 Ni −1 Ni π di N 1+ + + 2a, i = (1, 2, 3, 4), Ci = 2 N 4a di Ni −1 , i = (1, 2, 3, 4) Ri = exp μ π − 2 sin−1 N 2a Pi = stω(1 − Ri )
π di Ni , i = (1, 2, 3, 4) 60
t = 8 mm, s = 1.75 MPa, μ = 0.35, p = 7200 mkg3 , a = 3 mm.
20.4 Analyze Results The results obtained from the experimental studies, the comparison of the base algorithm and dFDB variations are given below under the headings. Two tests (Friedman and Wilcoxon) were applied for the algorithm and variations. The Friedman test compares the base algorithm and its variations and ranks them according to their performance. The Wilcoxon test, on the other hand, compares the variations in pairs with the GBO base algorithm and determines the superiority, draw and defeat situations.
20.4.1 Statistical Analysis Results Table 20.4 shows the rankings of the three different dFDB-GBO variations and the base model of the GBO algorithm according to the Friedman test method. Rankings were obtained using data from studies performed for the 30/50/100 dimensions of problems in the CEC 2020 benchmark suite. In Table 20.4, the averages of dFDB-GBO variations according to the Friedman test result are given in order. When we examine the ranks given in Table 20.4, the superiority of the dFDB-GBO variations is seen. Case-2 has the best average of all 30/50/100 dimensions. For this reason, the algorithm with the best Friedman score of the three algorithms is Case-2. Wilcoxon pairwise comparison results between the base model of the GBO algorithm and the dFDB-GBO variations are as presented in Table 20.5. When the pairwise comparison results conducted on ten problems in the CEC 2020 test suite are
20.4 Analyze Results
263
Table 20.4 Ranks of algorithms according to Friedman test Algorithms
Dimension = 30
Dimension = 50
Dimension = 100
Mean rank
Case-2
2.250
2.289
2.261
2.267
Case-1
2.470
2.375
2.410
2.418
Case-3
2.459
2.540
2.481
2.493
GBO
2.819
2.795
2.846
2.820
Table 20.5 Wilcoxon pairwise comparison results Vs. GBO
Dimension = 30 +
=
Dimension = 50 −
+
=
Dimension = 100 −
+
=
−
Case-1
5
5
0
6
4
0
4
6
0
Case-2
5
5
0
3
7
0
3
7
0
Case-3
4
6
0
4
6
0
3
7
0
examined, it is seen that the dFDB-GBO variations are superior to the GBO base algorithm. The dFDB-GBO variations did not show defeat in any problem. Looking at the results, we can say that the dFDB method is effective in improving the performance of the GBO algorithm. Case-1 algorithm outperformed the base algorithm (GBO) in 5 of 10 problems in 30 dimensions, and similar performance in 5 of 10 problems. Case-1 algorithm outperformed the base algorithm in 6 of 10 problems in 50 dimensions, and similar performance was demonstrated in 4 of them. In 100 dimensions, Case-1 has the advantage in 4 out of 10 problems. Similar performances were demonstrated in 6 of the 10 problems.
20.4.2 Convergence Analysis Results In this section, box plots showing the error values obtained by the 51-independent run for benchmark problems of the GBO and dFDB-GBO variations are presented. In order to examine the box plots of the algorithms, one of four different types of problems was selected in the CEC2020 problem set. From the selected problems, search spaces of 30/50/100 dimensions were designed and the error values of the algorithms were observed. The selected problems are F1 (Single Mode), F3 (basic multimodal), F7 (hybrid) and F9 (composition). Box charts are one of the most suitable indicators to show the sensitivity and limits of the performance of algorithms. When the box plots in Fig. 20.1 are examined, it is clearly seen that the performances of the GBO variants with dFDB are superior to the base algorithm. As the problem dimension and the complexity of the problems increases, this situation becomes more evident. According to these results, it can be said that the guide selection mechanism in the LEO operator, which was developed to
264
20 Improved Gradient-Based Optimizer …
get rid of the local solution traps of the GBO algorithm and to increase the diversity, is not designed effectively enough. Because the guide selection process in the LEO operator was designed using the dFDB method, GBO variations with dFDB could be avoided from the local solution traps. Dimenison=30
Dimenison=50
F1 (Unimodal)
F3(Basic)
F7(Hybrid)
F9(Composition) Fig. 20.1 Box-plot presentation of error values of algorithms
Dimenison=100
20.5 Conclusions and Future Work
265
20.4.3 Results for Engineering Design Problems The five constrained engineering problems are abbreviated as P1, P2, …,P5. The rankings are the same as in Table 20.3, where these problems are introduced. Accordingly, the performances of GBO and dFDB-GBO variations on constrained engineering problems are presented in Table 20.6. When the results presented in Table 20.6 are examined, it is seen that Case-2 has a better mean error value than its competitors in four of the five constrained engineering problems. This indicates that Case-2 is able to make a more determined search than its competitors.
20.5 Conclusions and Future Work In this study, a significant improvement was achieved as a result of the studies carried out on GBO, a mathematics-based and up-to-date meta-heuristic search algorithm. The Local Escaping Operator, which was designed to get rid of the local solution traps of the GBO, has proven to be possible to operate more effectively. The redesign of LEO focused on the guide selection mechanism. In the design of the guide selection mechanism, dFDB, which is a newly introduced selection method in the literature, was used. Thanks to dFDB, efficiency has been increased in the diversity process as well as the derivative-based neighborhood search process of the GBO algorithm. Thus, balancing exploitation-exploration capabilities has been achieved. Experimental studies carried out to test and verify the performance of the dFDBGBO algorithm developed in the article were carried out within the framework of the rules and standards defined at CEC conferences. In experimental studies, the CEC 2020 comparison pool and engineering design problems frequently used in the literature were used. Experimental studies have been carried out for different problem types and search space dimensions. As a result of the evaluation of the data obtained from the studies with statistical analysis methods, it has been revealed that the dFDB-GBO algorithm proposed in the article is indisputably superior to the competitor algorithm. The source codes of the dFDB-GBO algorithm developed in the article can be accessed via the MATLAB File Exchange platform and using the link below. Source codes of the dFDB-GBO algorithm (proposed method) can be accessed at this link: The MATLAB source codes of the dFDB-GBO algorithm developed and proposed for the first time in this article will be shared on the MATLAB File Exchange platform after the article is published. You can search the MATLAB File Exchange platform with the keyword dFDB-GBO to download the source codes.
P2
2.49E+12
2.75E+12
2.19E+12
Case1
Case2
Case3
0.523250
1.2E+13
Case3
GBO
0.523250
0.523250
Case1
0.523250
Case2
Best
Algorithm
GBO
Problem
P1
5.7E+13
5.31E+13
6.53E+13
7.39E+13
0.524379
0.523805
0.524257
0.524352
Mean
2.66E+13
2.75E+13
2.41E+13
1.84458E+13
0.001196
0.000933
0.001253
0.001331
Std. deviation
X1 = 12.486302, X2 = 66.314903, X3 = 71.272116, X4 = 149.926676, X5 = 14.520246, X6 = 44.478766, X7 = 1.040571, X8 = 43.438194, X9 = 55.610160, X10 = 29.000604, X11 = 26.609306, X12 = 0.000249, X13 = 62.032273, X14 = 24.434997, X15 = 0.540346, X16 = 23.893594, X17 = 9.78E06, X18 = 38.772073, X19 = 10.756865, X20 = 28.015207, X21 = 0.286641, X22 = 0.093110, X23 = 0.000372, X24 = 0.324006, X25 = 0.000221, X26 = 0.084424, X27 = 0.286664, X28 = 0.935490, X29 = 2.65E-05, X30 = 0.356520, X31 = 4.07E06, X32 = 0.570402, X33 = 0.395815, X34 = 1.004062, X35 = 0.0004062, X36 = 0.356359, X37 = 0.000613, X38 = 0.569760
X1 = 45.100304, X2 = 27.253482, X3 = 15.504052, X4 = 18.863283, X5 = 13.510000, X6 = 68.951157, X7 = 0.822151, X8 = 1.841422, X9 = 5.388829
Best design parameters (X1 ,X2 ,…………Xn )
Table 20.6 Performance of dFDB-GBO and GBO algorithms in engineering design problems
(continued)
266 20 Improved Gradient-Based Optimizer …
P5
P4
16.090
16.090
16.090
16.090
GBO
Case1
Case2
Case3
0.0322
0.0322
Case2
Case3
0.0322
0.0322
GBO
2.1E+12
Case3
Case1
2.82E+12
4.33E+13
Case1
2.95E+12
GBO
P3
Case2
Best
Algorithm
Problem
Table 20.6 (continued)
16.090
16.090
16.090
16.135
4.18E+11
1.9E+11
3.99E+11
2.28E+11
6.96E+13
8.67E+13
8.26E+13
5.83E+13
Mean
8.46E-07
1.18E-06
6.2E-07
0.159
4.85E+11
3.88E+11
4.82E+11
4.15017E+11
3.6E+13
9.71E+12
1.87E+13
3.82139E+13
Std. deviation
X1 = 38.413759, X2 = 52.858080, X3 = 70.471953, X4 = 84.495716, X5 = 90.000000
X1 = 0.001000, X2 = 0.001000, X3 = 0.001000, X4 = 0.001000, X5 = 0.001000, X6 = 0.001000, X7 = 1.524000, X8 = 1.524000, X9 = 5.000000, X10 = 2.000000, X11 = 0.001000, X12 = 0.001000, X13 = 0.007293, X14 = 0.087556
X1 = 34.076133, X2 = 37.228755, X3 = 88.696121, X4 = 139.998988, X5 = 37.229673, X6 = 34.916429, X7 = 1.333146, X8 = 33.583282, X9 = 23.114909, X10 = 1.308273, X11 = 16.727196, X12 = 5.079439, X13 = 35.853694, X14 = 25.390714, X15 = 6.125740, X16 = 19.264965, X17 = 8.37E-06, X18 = 24.933044, X19 = 2.05E-06, X20 = 24.933043, X21 = 0.325283, X22 = 0.515455, X23 = 0.627983, X24 = 0.999911, X25 = 11.358710, X26 = 0.989162, X27 = 13.231158, X28 = 0.999967, X29 = 14.516251, X30 = 0.497452, X31 = 0.999415, X32 = 12.410248, X33 = 0.305098, X34 = 0.389911, X35 = 11.360461, X36 = 0.305145, X37 = 12.409641, X38 = 0.346119, X39 = 0.369032, X40 = 0.346136, X41 = 4.36E-05, X42 = 1.38E-05, X43 = 0.325025, X44 = 0.000508, X45 = 2.1E-09, X46 = 0.497719, X47 = 0.005751, X48 = 0.000286
Best design parameters (X1 ,X2 ,…………Xn )
20.5 Conclusions and Future Work 267
268
20 Improved Gradient-Based Optimizer …
References 1. Holland JH (1975) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Q Rev Biol 1:211. https://doi.org/ 10.1086/418447 2. Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359 3. Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. SIMULATION 76:60–68 4. Van Laarhoven PJ, Aarts EH (1987) Simulated annealing: theory and applications. Springer, Dordrecht, pp 7–15 5. Dorigo M, Di Caro G (1999) Ant colony optimization: a new meta-heuristic. In: Proceedings of the 1999 congress on evolutionary computation—IEEE CEC99. Washington, DC, USA, pp 1470–1477 6. RJ Kennedy 2011 Particle swarm optimization Encyclopedia of machine learning Springer Boston, MA, USA 760 766 7. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Global Optim 39(3):459–471 8. Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: 2009 World congress on nature & biologically inspired computing (NaBIC). IEEE, pp 210–214 9. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248 10. Tanabe R, Fukunaga AS (2014) Improving the search performance of SHADE using linear population size reduction. In: 2014 IEEE congress on evolutionary computation (CEC). IEEE, pp 1658–1665 11. Kumar A, Misra RK, Singh D (2017) Improving the local search capability of effective butterfly optimizer using covariance matrix adapted retreat phase. In: 2017 IEEE congress on evolutionary computation (CEC). IEEE, pp 1835–1842 12. Civicioglu P (2013) Backtracking search optimization algorithm for numerical optimization problems. Appl Math Comput 219(15):8121–8144 13. Civicioglu P, Besdok E, Gunen MA, Atasever UH (2018) Weighted differential evolution algorithm for numerical function optimization: a comparative study with cuckoo search, artificial bee colony, adaptive differential evolution, and backtracking search optimization algorithms. Neural Comput Appl 1–15 14. Pierezan J, Coelho LDS (2018) Coyote optimization algorithm: a new metaheuristic for global optimization problems. In: IEEE congress on evolutionary computation (CEC). Rio de Janeiro, pp 1–8 15. Salimi H (2015) Stochastic fractal search: a powerful metaheuristic algorithm. Knowl-Based Syst 75:1–18 16. Mohamed AW, Mohamed AK (2019) Adaptive guided differential evolution algorithm with novel mutation for numerical optimization. Int J Mach Learn Cybern 10(2):253–277 17. Chen X, Xu B (2018) Teaching-learning-based artificial bee colony. In: International conference on swarm intelligence. Springer, Cham, Shanghai, China, pp 166–178 18. Zhao W, Zhang Z, Wang L (2020) Manta ray foraging optimization: an effective bio-inspired optimizer for engineering applications. Eng Appl Artif Intell 87:103300 19. Zhao W, Wang L, Zhang Z (2020) Artificial ecosystem-based optimization: a novel natureinspired meta-heuristic algorithm. Neural Comput Appl 32:9383–9425 20. Braik MS (2021) Chameleon swarm algorithm: a bio-inspired optimizer for solving engineering design problems. Expert Syst Appl 174:114685 21. Braik M, Sheta A, Al-Hiary H (2021) A novel meta-heuristic search algorithm for solving optimization problems: capuchin search algorithm. Neural Comput Appl 33(7):2515–2547 22. Naik MK, Panda R, Abraham A (2021) Adaptive opposition slime mould algorithm. Soft Comput 1–17
References
269
23. Kahraman HT, Aras S, Gedikli E (2020) Fitness-distance balance (FDB): a new selection method for meta-heuristic search algorithms. Knowl-Based Syst 190:105169 24. Aras S, Gedikli E, Kahraman HT (2021) A novel stochastic fractal search algorithm with fitness-distance balance for global numerical optimization. Swarm Evol Comput 61:100821 25. Guvenc U, Duman S, Kahraman HT, Aras S, Katı M (2021) Fitness-distance balance based adaptive guided differential evolution algorithm for security-constrained optimal power flow problem incorporating renewable energy sources. Appl Soft Comput 108:107421 26. Duman S, Kahraman HT, Guvenc U, Aras S (2021) Development of a lévy flight and FDBbased coyote optimization algorithm for global optimization and real-world ACOPF problems. Soft Comput 25(8):6577–6617 27. Katı M, Kahraman HT (2020) Improving supply-demand-based optimization algorithm with FDB method: a comprehensive research on engineering design problems. J Eng Sci Des (JESD) 8(5):156–172 28. Kahraman HT, Bakir H, Duman S, Katı M, Aras S, Guvenc U (2021) Dynamic FDB selection method and its application: modeling and optimizing of directional overcurrent relays coordination. Appl Intell 1–36 29. Ahmadianfar I, Bozorg-Haddad O, Chu X (2020) Gradient-based optimizer: a new metaheuristic optimization algorithm. Inf Sci 540:131–159 30. Liang J, Suganthan PN, Qu BY, Gong DW, Yue CT (2019) Problem definitions and evaluation criteria for the CEC 2020 special session on multimodal multiobjective optimization, vol 201912. Zhengzhou University. https://doi.org/10.13140/RG.2.2.31746.02247 31. Salajegheh F, Salajegheh E (2019) PSOG: enhanced particle swarm optimization by a unit vector of first and second order gradient directions. Swarm Evol Comput 46:28–51 32. Kumar A, Wu Z, Ali A, Mallipeddi R, Suganthan PN, Das S (2020) A test-suite of non-convex constrained optimization problems from the real-world and some baseline results, August 2020. In: Swarm and evolutionary computation, vol 100693 33. Andrei N, Andrei N (2013) Nonlinear optimization applications using the GAMS technology. Springer 34. Pant M, Thangaraj R, Singh V (2009) Optimization of mechanical design problems using improved differential evolution algorithm. Int J Recent Trends Eng 1:21
Chapter 21
TR-SUM: An Automatic Text Summarization Tool for Turkish Yigit ˘ Yüksel and Yalçın Çebi
21.1 Introduction In today’s world of big data, the tracing and obtaining of specific and useful information from stored and textual data is one of the most significant challenges of summarization. Summarization is basically the process of shortening the text. However, retrieving a short summary or only relevant information from large texts is demanding. The summary should include the main target, main idea and important information in the text. Most of the diverse fields such as public opinion monitoring, news recommendation, content/article summarization etc. concentrate on the summarization [1]. There will be more demand on the summarization of long-sized texts and electronic documents since they growth an exponentially [2]. Text summarization algorithms are required for today’s big data environment in all languages. The need for summary retrieval from long-sized text is inevitable in today’s big data. According to the research and review performed on the literature so far, an abstractive text summarization system for Turkish is a very significant research direction. Thus, a need for automated text summarization tool is inevitable in the existence of long-sized texts and electronic documents. This study aims to investigate an abstractive text summarization tool for Turkish language depending on the gap in the literature of Turkish text summarization. For this purpose, firstly, a web interface of the data collection system is constituted to collect data from users. Then, a news dataset named as “TR-NEWS-SUM Dataset” is generated with the collected data. The collected dataset is preprocessed to be feed to the deep neural network models. Following, three deep neural network models are studied on the “TR-NEWS-SUM” dataset, and they are compared regarding
Y. Yüksel (B) · Y. Çebi Department of Computer Engineering, Dokuz Eylül University, Izmir, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_21
271
272
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
ROUGE scores. Hence, an automatic abstractive text summarization method for Turkish named as “TR-SUM” is developed. This study is structured as follows: In Sect. 21.2, firstly, the related works performed for Turkish language are provided. In addition, text summarization data sets for Turkish language are discussed briefly. While the proposed text summarization tool for Turkish (TR-SUM) is presented in Sect. 21.3, the discussion and results of TR-SUM method are exhibited in Sect. 21.4. Finally, the concluding comments and possible future research directions are stated in Sect. 21.5.
21.2 Literature Review Recently, text summarization systems are a trending need of people who deals with a huge amount of digital text data worldwide [3]. Most of the people generally encounter an extensive amount of text datasets in their working life. Rather than reading the entire text data, reading its summary is usually preferable. Not only in work lives, but also in daily lives a need for text summarization arises, as well. Hence, in recent years, researchers more likely tend to cope with text summarization systems in all languages. In this section, firstly, the related studies in the literature of text summarization in Turkish language are provided. Then, the datasets existed in the literature of Turkish language are presented.
21.2.1 Related Studies in Turkish The first study on Turkish text summarization is the study of Altan [4] that provided a statistical-based system that contains five different modules. The system analyses the words, the sentences, and the paragraphs received in the input document with the pre calculated weights. Then, Karakaya and Güvenir [5] proposed a study for the automatic report generation that contains both categorization and summarization. In the initial step of the algorithm, the given document is classified by the given topics. Then the categorized documents are summarized. After that, the summary adds to the generated report. As another early study on Turkish, Uzundere [6] weighted the sentences according to various features such as title, keywords, positive/negative words, etc. Then, the summary of texts is obtained based on the users’ summary percentages. At the end, 55% success rate has obtained. One of the leading studies for text summarization in Turkish language is the study of Kutlu et al. [7]. They proposed a generic text summarization methodology which integrates an extraction method over sentences, where they focus on some important surface-level document features such as terms frequency, title similarity, key phrases, sentence position and centrality. Since this study is one of the leading studies in the Turkish summarization literature, the requirement of data creation was inevitable. Hence, they have generated 2 different datasets: the first one consists of 120 newspaper articles and the other consists of 100 Turkish journal articles. While
21.2 Literature Review
273
comparing the performance of the proposed system, ROUGE-1,2, L and W scores are carried out. At the same year, Ozsoy et al. [8] integrated two new algorithms, called Cross and Topic methods, to the latent semantic analysis. They compared the performance of their algorithms with well-known algorithms for text summarization and conclude that Cross method outperforms than the others in terms of ROUGE-L F measure score. Güran et al. [9] introduced a hybrid summarization method for Turkish to incorporate 5 structural features (length, position, title, frequency, class relevance) and 3 semantic features (relevance to each topic, relevance to overall topic, relevance to other sentences). The aim of this study is to achieve an overall score function with the integration of all features. While achieving the overall score function, the weights of the features are obtained by two algorithms: Analytical Hierarchy Process, where weights are calculated by manually, and Artificial Bee Colony algorithms, where the weights are automatically learned. The proposed algorithms are carried out in a newly introduced Turkish corpus (110 news documents) and the results show that the contribution of integration of features is an advantageous way rather than analyzing the features individually. Hatipo˘glu and Omurca [10] studied a hybrid method where they incorporate statistical scoring sentences and latent semantic analysis. The same authors developed a mobile Turkish summarization system where the structural features (length, position and title), latent semantic analysis based features (cross method and meaningful word-set method), and Wikipedia based semantic features (counting keywords) are integrated Hatipo˘glu and Omurca [11]. The cross method that they incorporated is the method of Ozsoy et al. [8]. Next, Demirci et al. [12] studied on producing a single document summary for multi-document news where they carried out the term frequency (TF) to score the documents. Namely, the sentences which have higher importance are to be selected. The system has 43% success rate. As another extraction based text summarization study, Torun and Inner [13] has also used the term frequency (TF) methodology. In this study, the news articles are summarized, and then the similarity ratio is calculated between the summarized texts. Later, Do˘gan and Kaya [14] proposed an LSTM model followed by a LSA model to summarize the texts. The LSTM part of the study is used for sentiment analysis which is a classification problem. In the study of study Hark et al. [15], they proposed a model that represents text as graphs. In the study, the input sentences are represented as nodes, so edges are interpreted as structure of the input text. Furthermore, Karakoc and Yılmaz [16] studied the abstraction text summarization with an encoderdecoder model. The study is aimed to generate the headlines for the news articles. The model trained with Kemik Haber Dataset and results evaluated with ROUGE-1 score. The studies for the text summarization methods in Turkish language have been categorized as provided in Table 21.1. Less studies have been performed on Turkish language, as illustrated in Table 21.1. According to Table 21.1, the ones in Turkish language focuses on latent semantic analysis, namely feature extraction methods, while a few of them focuses on encoder decoder mechanisms. To the best of our knowledge, an abstractive deep neural network method has been performed for Turkish language in only one study [16]. Thus, there is a gap in the literature in abstractive text summarization for Turkish.
274
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
Table 21.1 Text summarization in Turkish Author(s)/Year
Summarization method
Methodology applied
Used dataset
Evaluation metrics
Altan [4]
Extractive
Statistical analysis
50 different articles X about economics
Karakaya and Güvenir [5]
Extractive
Statistical analysis
11 different articles X related economics crisis in Turkey
Uzundere [6]
Extractive
Statistical analysis
10 different articles Standard deviation
Kutlu et al. [7]
Extractive
Generic text summarization
120 newspaper articles and 100 Turkish journal articles
Ozsoy et al. [8]
Extractive
Latent semantic Two different data ROUGE-L analysis sets of scientific F-measure score articles in Turkish (including articles from various areas)
Güran et al. [9]
Extractive
A hybrid summarization method
110 news documents and 3 analysts
ROUGE-1
Hatipo˘glu and Omurca [10]
Extractive
A hybrid summarization
10 different documents from Wikipedia
Selection ratio
Hatipo˘glu and Omurca [11]
Extractive
A hybrid summarization
10 different documents from Wikipedia
Average precision and breakeven point
Demirci et al. [12]
Extractive
Latent semantic analysis
106 selected news
ROUGE-1
Torun and Inner Extractive [13]
TF-IDF
12,000 News
Average calculation
Do˘gan and Kaya [14]
Extractive
Latent semantic 30,000 Tweets analysis
F-measure score
Hark et al. [15]
Extractive
Graph entropy
DUC-2002
ROUGE 1 ROUGE-2 ROUGE-3 ROUGE-4
Karakoc and Yılmaz [16]
Abstractive
RNN
“Kemikhaber” dataset
ROUGE-1
ROUGE-1 ROUGE-2 ROUGE-L ROUGE W
21.2 Literature Review
275
21.2.2 Datasets in Turkish Labeled data set constitutes a substantial part of the supervised machine learning models. In the text summarization models, the labeled data contains the original text and the summarized text. There are multiple natural language datasets for Turkish, which serves for the different purposes. The list of the datasets for Turkish together with their resources are presented in Table 21.2. • TTC-3600: A new benchmark dataset for Turkish text categorization: The dataset has been used for text classification. The dataset has 600 different news articles in 6 different areas such as economy, sport, culture-art, politics, health and technology [17]. • Turkish Datasets from Kemik—Yıldız Techincal University: Kemik is a Natural Language Processing group that studies at Yıldız Techincal University. The group has various datasets serve in different areas such as classification, n-gram analysis, document similarity [18]. • Turkish Sentiment Dataset: The data is collected by crawling Turkish tweets to use in sentiment analysis [19]. • English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset: The dataset is obtained from Turkish and English Wikipedia pages for the purpose of text categorization and named-entity recognition [20]. • TS Corpus: TS Corpus is an independent project that contains multiple corpora for the different objectives [21]. • 1B Tokens Turkish Corpus and Turkish word vectors and analogical reasoning task pairs: The dataset contains word pairs and word vectors [22]. Table 21.2 Datasets in Turkish Dataset
Resource
TTC-3600: a new benchmark dataset for Turkish text categorization
https://github.com/denopas/TTC-3600
Turkish datasets from Kemik—Yıldız Technical University
http://www.kemik.yildiz.edu.tr/
Turkish sentiment dataset
http://www.baskent.edu.tr/~msert/research/dat asets/SentimentDatasetTR.html
English/Turkish Wikipedia named-entity recognition and text categorization dataset
https://data.mendeley.com/datasets/cdczty mf4k/1
TS Corpus
https://tscorpus.com/
1B Tokens Turkish Corpus and Turkish word vectors and analogical reasoning task pairs
https://github.com/onurgu/linguisticfeaturesin-turkish-wordrepresentations/releases
METU-sabanci Turkish treebank
http://tools.nlp.itu.edu.tr/Datasets
Turkish paraphrase corpus (TuPC)
https://osf.io/wp83a/
Turkish product reviews
https://github.com/fthbrmnby/turkish-textdata
276
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
• METU-Sabanci Turkish treebank: The purpose of the dataset is to serve as the test set of the CoNLL-XI shared task [23]. • Turkish Paraphrase Corpus (TuPC): The corpus contains 1270 text and paraphrases pairs that had been checked semantically [24]. • Turkish Product Reviews: The dataset contains 235,000 product reviews from various vendors [25].
21.3 TR-SUM: A Text Summarization Tool for Turkish 21.3.1 General Overview of “TR-SUM: A Text Summarization Tool for Turkish” “TR-Sum: A Text Summarization Tool for Turkish” contains three different models that trained and evaluated separately. The general flow of the system is given Fig. 21.1. Fig. 21.1 Flow chart of TR-sum: a text summarization tool for Turkish
21.3 TR-SUM: A Text Summarization Tool for Turkish
277
The data collection process and the collected dataset (TR-NEWS-SUM) are described in Sect. 21.3.2. Then, the data pre-processing is explained in Sect. 21.3.3. Finally, three different deep neural network models for Turkish are represented in Sect. 21.3.4.
21.3.2 TR-NEWS-SUM Dataset In this study, news texts are carried out as the input data. The collaboration had been done with the Journalism Department, Faculty of Communication, Ege University to collect data. A website is developed to collect the news texts and the related summarized news texts from the Journalism Department students. The website is accessible through the link: “http://txtsum.cs.deu.edu.tr/Account/Login”. A screenshot of the user interface of the data collection system is given in Fig. 21.2. The data collection system proceeds as follows. The system users (students) are registered by the authorized person, i.e., Professor in the Journalism Department. The users must fill the following mandatory fields: main text, summarized text, headline, category, link (if exists), and publication date as depicted in Fig. 21.2. The collected data saved into the database then exported as csv file format to use for training. 150
Fig. 21.2 A screenshot of the data collection system
278
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
news texts and their summarizations are collected via the system. These data are used to evaluate and compare the performance of the neural network models proposed in this study. In the future, the data collection process can be evaluated as another study to contribute to the Turkish dataset literature.
21.3.3 Data Pre-processing Preparing data is an important step to get better results from the deep neural network models. Data preprocessing contains two phases: (i) cleaning, and (ii) generating look up dictionary with pre-trained word embeddings. (i) Cleaning This phase is carried out to remove the undesired characters, i.e., HTML encoded characters (i.e., & %20, etc.), excessive spaces, and stop words. Namely, it is the formatting phase of the texts. One advantage of this process is that fewer null embeddings will be created. (ii) Generating Look up Dictionary with Pre-trained Word Embeddings The neural network models cannot be fed by actual words; thus, words must be converted to the numbers. Initially, the lookup dictionary is generated where each word is represented as a number. The numbers are dependent on the frequency of the word in the corpus. Furthermore, special tokens are added. (End of Sentence) and (Start of Sentence) tokens are crucial for the neural network models. Additionally, Pad() token is also introduced, because all summaries and texts in a batch must be in same length. Next, the pre-trained word embeddings from Conceptnet-Numberbatch are used to increase the training speed and the accuracy. Lastly, the input words are constrained by the following criteria: either word exists in the word-embeddings, or the word occurs in the input more than 20 times. The words those do not meet the criteria are tokenized as (Unknown Words). The reason for this approach is that less frequent words must be eliminated to increase the summarization quality. The characteristics of the “TR-NEWS-SUM” Dataset can be seen Table 21.3. An example for the input can be seen at Table 21.4. The left part of the Table 21.4 shows the input text while the right part of it represents the input vector of the text. Table 21.3 The characteristics of the “TR-NEWS-SUM” dataset
Number of words
39,709
Number of unique words
13,887
Number of words used in model
3432
Percentage of used words
% 24.70
Total number of
18,341
words ratio
% 46.24
21.3 TR-SUM: A Text Summarization Tool for Turkish
279
Table 21.4 An example from dataset as input vector [‘ ‘, ‘twitter’, ‘çalı¸sanlarına’, ‘koronavirüsün’, ‘yayılmasını’, ‘durdurmak’, ‘için’, ‘evden’, ‘çalı¸smalarını’, ‘söyledi’, ‘twitter’, ‘ın’, ‘kurucusu’, ‘jack’, ‘dorsey’, ‘uzun’, ‘zamandır’, ‘uzaktan’, ‘çalı¸sma’, ‘i¸s’, ‘modelini’, ‘destekledi˘gini’, ‘ve’, ‘kasım’, ‘ayında’, ‘bu’, ‘yılın’, ‘altı’, ‘aya’, ‘kadar’, ‘olan’, ‘kısmını’, ‘afrika’, ‘da’, ‘geçirmeyi’, ‘planladı˘gını’, ‘açıkladı’, ‘ < EOS > ‘]
[397, 208, 2097, 3000, 110, 3001, 5143, 1599, 102, 5144, 8, 153, 186, 312, 5145, 3014, 7, 5146, 287, 2098, 5, 703, 10, 5147, 808, 704, 1601, 254, 1277, 5148, 64, 154, 36, 5149, 2099, 1602, 254, 5150, 5151, 3015, 33, 125, 5152, 2100, 208, 1283, 4, 3016, 2087, 5153, 9, 701, 134, 193, 207, 803, 155, 564, 31, 267, 5154, 5155]
21.3.4 The Proposed Neural Network Models for Turkish Text Summarization Three different deep neural network models are proposed for Turkish language. These models are (i) Attention Based Seq2Seq Neural Networks, (ii) Pointer Generator Seq2Seq Neural Networks and (iii) Reinforcement Learning with Seq2Seq Neural Networks. (i) Attention Based Seq2Seq Neural Network Firstly, the idea of Attention Based Seq2Seq Neural Network of Yu and Wang [26] is adapted for Turkish language and applied to the TR-NEWS-SUM dataset. In this model, the input data is passed to the encoder layer initially. The output of encoder layer is called as “context vector”. Then, the “context vector” passed on the decoder layer to generate output sequence. In this step, Bahdanau Attention Mechanism that is provided by Bahdanau et al. [27]. Instead of attempting to learn a single vector representation for each sentence, the model is trained to focus on various input vectors in the input sequence based on attention weights. During the training process, the decoder layer updates the attention mechanism to adjust weights. These attention weights provide contextual information to the decoder for translation. (ii) Pointer Generator Seq2Seq Neural Network As second model, the idea of Pointer Generator Seq2Seq Neural Network model of See et al. [28] is adapted for Turkish language and applied to the TR-NEWS-SUM dataset. The model has the option of copying words from the source via pointing while yet being able to generate words from a preset vocabulary. This brings two major benefits to the model. Firstly, the model can handle unseen words while also allows to use a smaller vocabulary. Secondly, the model makes less cost to clone words from the input text. At each decoder step, a generation probability (Pgen ∈ [0, 1]) is calculated. The decoder weights are adjusted according to the Pgen value. The final distribution, which is the prediction, is weighted and totaled from the vocabulary and attention distributions.
280
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
(iii) Reinforcement Learning Based Seq2Seq Neural Network Model (Deep Reinforcement Learning for Sequence-to-Sequence Models) As another approach, the idea of Reinforcement Learning Based Seq2seq model of Keneshloo et al. [29] is adapted for Turkish language and applied to the TRNEWS-SUM dataset. In this approach, decision making process and LSTM part of the model is powered by reinforcement learning. The model tries to solve main two problems that occurred on previous models, exposure bias and inconsistency between the train/test measurement, Keneshloo et al. [29]. The actor-critic model is designed for the resolve the issues. The actor is a pointer-generator model, and the critic model is a regression model that optimizes action distribution.
21.4 Discussion and Results The proposed three abstractive text summarization algorithms are applied to the TR NEWS-SUM Dataset, and it is analyzed for Turkish Language. The pseudocode for the application of three models is presented (Fig. 21.3). The 80% of the dataset is used for training while the 20% of the dataset is reserved for the testing of the models. The models are trained on Google Collaboratory. The features of the system are as follows. GPU: 1×Tesla K80, compute 3.7, having 2496 CUDA cores, 12GB GDDR5 VRAM CPU: 1×single core hyper threaded Xeon Processors @2.3Ghz i.e. (1 core, 2 threads) RAM: ~12.6 GB Available Disk: ~33 GB Available. Input: Input news text (X) and user summarized news text (Y). Output: Trained Attention Based Seq2Seq Model / Pointer Generator Seq2Seq Neural Network / Reinforcement Learning Based Seq2Seq Neural Network Model. Training Steps: for k fold divided input and output sequences (k = 10) for batch of input and output sequences X and Y do Run encoding on X and get the last encoder state the Run decoding by feeding decoder and obtain the sampled output sequence Yˆ. Calculate the loss according to cross-entropy loss function update the parameters of the model. end for end for Testing Steps: for k fold divided input and output sequences (k = 10) for batch of input and output sequences X and Y do Use the trained model and argmax function to sample the output Yˆ Evaluate the model using a performance measure, e.g., ROUGE end for end for
Fig. 21.3 Pseudocode of the models
21.4 Discussion and Results Table 21.5 Parameter settings
281 Parameter
Value
Learning rate decay
0.95
Min learning rate
0.0005
Display step
20
Stop early
0
Stop count
4000
Epoch count
3
The application is developed in Python 3.6.9 in Jupyter Notebook. The used libraries are Numpy, Pandas, Tensorflow (version 1.10.1). Furthermore, Google Colaboratory has time limitations, thus the model states must be saved on cloud provider, Google Drive. The saved states are used as a knowledge transfer mechanism between the models to increase the summarization quality. The parameters of the application are given in Table 21.5. As it can be seen in the table, the parameters are used for controlling the flow of the system and the performance of the models. The models are trained during 4000 epochs. However, if there are no loss improvement over 40 consecutive epochs, the training process is stopped early. To compare the performance of the three deep neural network models, RecallOriented Understudy for Gisting Evaluation (ROUGE) scores, which are proposed to the literature by Lin [30], are carried out. ROUGE scores are the most common used performance metrics in the text summarization literature. Hence, in this study, the proposed three models are evaluated on the collected Turkish dataset based on the ROUGE-1, ROUGE-2, and ROUGE-L scores. The average ROUGE-1, ROUGE-2 and ROUGE-L scores for each model is reported in Table 21.6. As it can be seen in Table 21.6, Attention Based Seq2Seq Neural Network has values of 0.146, 0.097 and 0.112 for ROUGE-1, ROUGE-2, and ROUGE-L scores, respectively. Pointer Generator Seq2Seq Neural Network has values of 0.147, 0.102, and 0.122 for ROUGE-1, ROUGE-2, and ROUGE-L scores, respectively. Lastly, Reinforcement Learning Based Seq2Seq Neural Network has values of 0.136, 0.090, and 0.117 for ROUGE-1, ROUGE-2 and ROUGE-L scores, respectively. Although ROUGE-1 scores are the highest ones among all ROUGE scores for each model, all ROUGE scores are quite low. Thus, there is an enough room to improve the performance of the models. The reasons for the lower values of ROUGE scores can be listed as follows. The existed summaries of the dataset generally consist of the Table 21.6 Average rouge scores of the three neural network models Models Avg rouge scores
(i) Attention based Seq2Seq neural network
(ii) Pointer generator Seq2Seq neural network
(iii) Reinforcement learning based Seq2Seq neural network
Rouge-1
0.146
0.147
0.136
Rouge-2
0.097
0.102
0.090
Rouge-L
0.112
0.122
0.117
282
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
first two sentences of the main texts. Another reason is that there are no sufficient word embeddings for Turkish language.
21.5 Conclusion and Future Work According to the research and review performed on the literature so far, an abstractive text summarization system for Turkish is a very significant research direction. A web interface of the data collection system is developed to collect data from users. Then, a news dataset named as “TR-NEWS-SUM Dataset” is generated with the collected data. The collected dataset is preprocessed to be feed to the deep neural network models. Following, three deep neural network models are adapted for Turkish language and studied on the “TR-NEWS-SUM” dataset. These models are (i) Attention Based Seq2Seq Neural Networks, (ii) Pointer Generator Seq2Seq Neural Networks and (iii) Reinforcement Learning with Seq2Seq Neural Networks. To compare the performance of the three deep neural network models, Recall-Oriented Understudy for Gisting Evaluation (ROUGE) scores, which are proposed to the literature by Lin [30], are carried out. The proposed three models are evaluated on the collected Turkish dataset based on the ROUGE-1, ROUGE-2, and ROUGEL scores. Hence, an automatic abstractive text summarization method for Turkish named as “TR-SUM” is developed. Although ROUGE-1 scores are the highest ones among all ROUGE scores for each model, all ROUGE scores are quite low. Thus, there is an enough room to improve the performance of the models. The contributions of this study can be listed as follows. i.
ii. iii. iv. v.
The related works in the literature of extractive and abstractive text summarization are analyzed. Then, the works for the Turkish language are also explored. A web interface of the data collection system is constituted to collect data from users. A novel news dataset named as “TR-NEWS-SUM Dataset” is generated with the collected data. The collected dataset is preprocessed to be feed to the deep neural network models. Following, three deep neural network models are studied on the “TR-NEWS SUM” dataset. These models are (i) Attention Based Seq2Seq Neural Networks, (ii) Pointer Generator Seq2Seq Neural Networks and (iii) Reinforcement Learning with Seq2Seq Neural Networks. These models are evaluated on the collected Turkish dataset based on the ROUGE-1, ROUGE 2, and ROUGE-L scores.
There are possible future research directions of this study. One of them is to improve the generated dataset with the addition of new texts and their summaries. More summaries are required to increase the performance of the models. In addition, as it can be seen from the dataset, the existed summaries generally consist of the first
References
283
two sentences of the main texts. Hence, increasing the quality of these summaries will consecutively improve the performance of the models. The more unique summaries bring about the more improved models. Moreover, different models can be also studied to generate more meaningful summaries in Turkish language. Not but not least, the existed words embeddings can be adapted for the state-of-art neural network models or new word embeddings can be generated. Acknowledgements We would particularly thank Prof. Dr. Selda Akçalı and Asst. Prof. Tolga Çelik from Journalism Department, Faculty of Communication, Ege University, and the student of this department for their contribution during the dataset collection period of this thesis.
References 1. Fang C, Mu D, Deng Z, Wu Z (2017) Word-sentence co-ranking for automatic extractive text summarization. Expert Syst Appl 72:189–195 2. Uçkan T, Karcı A (2020) Extractive multi-document text summarization based on graph independent sets. Egypt Inform J 21(3):145–157 3. Yousefi-Azar M, Hamey L (2017) Text summarization using unsupervised deep learning. Expert Syst Appl 68:93–105 4. Altan Z (2004) A Turkish automatic text summarization system 5. Karakaya KM, Güvenir HA (2004) ARG: a tool for automatic report generation. ˙Istanbul Univ J Electr Electron Eng 4.2:1101–1109 6. Uzundere E, Dedja E, Diri B, Amasyali MF, Fakültesi E-E, Bölümü B (2008) Türkçe haber metinleri için otomatik özetleme. In: Akıllı Sistemlerde Yenilikler ve Uygulamaları Sempozyumu, Isparta, Türkiye, pp 1–3 7. Kutlu M, Cıgır C, Cicekli I (2010) Generic text summarization for Turkish. Comput J 53(8):1315–1323 8. Ozsoy MG, Cicekli I, Alpaslan FN (2010) Text summarization of Turkish texts using latent semantic analysis. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 869–876 9. Güran A, Bayazıt NG, Gürbüz MZ (2013) Efficient feature integration with Wikipedia based semantic feature extraction for Turkish text summarization. Turkish J Electr Eng Comput Sci 21(5):1411–1425 10. Hatipo˘glu A, Omurca S˙I (2015) Türkçe metin özetlemede melez modelleme (A hybrid modelling for Turkish text summarization). Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Derg 17(50):95–108 11. Hatipoglu A, Omurca S˙I (2016) A Turkish Wikipedia text summarization system for mobile devices. Inf Technol Comput Sci 01:1–10. https://doi.org/10.5815/ijitcs.2016.01.01 12. Demirci F, Karabudak E, Ilgen B (2017) Multi-document summarization for Turkish news. In: 2017 international artificial intelligence and data processing symposium, October, pp 1–5 13. Torun H, Inner AB (2018) Detecting similar news by summarizing Turkish news. In: 2018 26th IEEE signal processing and communications applications conference, SIU, ˙Izmir 2018, July 2018, pp 1–4 14. Do˘gan E, Kaya B (2019) Deep learning based sentiment analysis and text summarization in social networks. In: 2019 international conference on artificial intelligence and data processing symposium (IDAP), Malatya, Turkey, Sep 2019, pp 1–6 15. Hark C, Uckan T, Seyyarer E, Karci A (2019) Extractive text summarization via graph entropy (Çizge entropi ile çikarici metin özetleme). In: 2019 international conference on artificial intelligence and data processing symposium (IDAP), Malatya, Turkey, Sep 2019, pp 1–5
284
21 TR-SUM: An Automatic Text Summarization Tool for Turkish
16. Karakoc E, Yılmaz B (2019) Deep learning based abstractive Turkish news summarization. In: 27th signal processing and communications applications conference (SIU), IEEE, pp 1–4 17. “TTC-3600: A new benchmark dataset for Turkish text categorization.” https://github.com/den opas/TTC-3600 (Accessed 09 Aug 2021) 18. “Turkish datasets from Kemik—Yıldız Techincal University.” http://www.kemik.yildiz.edu.tr/ (Accessed 09 Aug 2021) 19. “Turkish sentiment dataset.” http://www.baskent.edu.tr/~msert/research/datasets/Sentiment DatasetTR.html (Accessed 09 Aug 2021) 20. “English/Turkish Wikipedia named-entity recognition and text categorization dataset.” https:// data.mendeley.com/datasets/cdcztymf4k/1%0D%0A (Accessed 09 Aug 2021) 21. “TS Corpus dataset.” https://tscorpus.com/ (Accessed 09 Aug 2021) 22. “1B Tokens Turkish corpus and Turkish word vectors and analogical reasoning task pairs.” https://github.com/onurgu/linguistic-features-in-turkish-word-representations/releases (Accessed 09 Aug 2021) 23. “METU-Sabanci Turkish treebank dataset.” http://tools.nlp.itu.edu.tr/Datasets (Accessed 09 Aug 2021) 24. “Turkish paraphrase corpus” https://osf.io/wp83a/ (Accessed 09 Aug 2021) 25. “Turkish product reviews.” https://github.com/fthbrmnby/turkish-text-data (Accessed 09 Aug 2021) 26. Yu H, Yue C, Wang C (2017) News article summarization with attention-based deep recurrent neural networks. Technical Report, Stanford University 27. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate, Sep. arXiv Prepr. arXiv1409.0473 28. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer generator networks. In: ACL 2017—55th annual meeting association computing linguistics proceedings conference, Long Paper, April, vol 1. pp 1073–1083 29. Keneshloo Y, Shi T, Ramakrishnan N, Reddy CK (2019) Deep reinforcement learning for sequence-to-sequence models. IEEE Trans Neural Netw Learn Syst 31(7):2469–2489 30. Lin C-Y (2004) Rouge: a Package for automatic evaluation of summaries. Text Summary Branches Out, pp 74–81
Chapter 22
Automatic and Semi-automatic Bladder Volume Detection in Ultrasound Images U˘gur Can Kavuncuba¸si, Görkem Tek, Kayra Acar, Burak Ertosun, and Mehmet Feyzi Ak¸sahin
22.1 Introduction The bladder is the organ that stores urine after the blood is filtered in the kidneys, in the anatomy of mammals. Normal bladder capacity can vary between 400–750 ml. The first feeling of fullness is 100–200 ml, the feeling of fullness is 300–400 ml, and the “urgency”, which can be defined as the need for urgent evacuation and pain, is felt at 400–500 ml. [1, 2]. It is possible to determine the bladder volume with many methods. In recent years, the bladder volume is generally determined using ultrasound images. Therefore, ultrasound image segmentation was focused in this study. The non-invasive nature of ultrasound makes it an attractive method to evaluate bladder volume in subjects with bladder outlet obstruction. In the last 20 years, many methods have been described using ultrasound bladder images to measure bladder urine volumes. In the article titled Urinary Bladder Volume Measurements: Comparison of Three Ultrasound Calculation Methods published in 2002, the most useful method was determined as the prolate ellipsoid. In this study, semi-automatic and fully automatic measurement methods have been developed that can be an alternative to the Prolate Ellipsoid Method [3]. These two newly developed methods are based on integration. In order to provide semi-automatic measurement, the boundaries of the bladder are determined by the user, and the area of the front view is calculated within these limits. The calculated front view area is proportioned with the height change in the side view area and U. C. Kavuncuba¸si (B) · G. Tek · K. Acar · B. Ertosun Ba¸skent University, 06790 Ankara, Turkey e-mail: [email protected] M. F. Ak¸sahin Gazi University, 06560 Ankara, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_22
285
286
22 Automatic and Semi-automatic Bladder Volume Detection …
the volume is obtained by summing these proportioned areas. In the fully automatic measurement method, discrete wavelet transform was applied on the images and the edges were determined with edge detection algorithms on the new image obtained, after morphological operations, images are set ready for the integration process.
22.2 Related Works Bladder volumes were obtained using 5 different methods in Archivio Italiano di Urologia e Andrologia (2005). The first measurements made in this study are the prolate ellipsoid method, the double area method, the double ellipsoid method, the method of the 1-D shape of the bladder outlined manually with the maximal longitudinal diameter and the method of the 1-D shape of the bladder outlined by a smooth ellipsoid with the maximal longitudinal diameter. These methods proposed in the study of Dicuio et al. are recommended, although not always precise, because they are effective and easy to apply [4]. To apply these methods, first of all, ultrasound images must be segmented. For this purpose, there are few studies in the literature. According to the study of Padmapriya et al., images were first passed through the Low-Pass Filter. Afterward, a gray-scale image was obtained by dividing the images into two regions with the Thresholding method. In the final part, the edges of the bladder are easily perceived by performing Edge Detection, and the thickness of the bladder walls is measured in this way [5]. Another study by Padmapriya et al., named “Application Of Quad-Split Technique Over Edge Segmented Ultrasound Image To Find The Area And Volume Of The Urinary Bladder” segmentation was used to measure bladder volume using the Hough Transform. First, a low-pass filter was used, and then the thresholding method was used. Afterward, the edge detection method was applied and the Hough Transform was started. Then, after using segmentation and converting to the real time units, the result is obtained [6]. Differently from these studies, 2-dimension discrete wavelet transform, edge detection algorithm, and some morphological operations were used together for bladder segmentation on ultrasound images, then an integration operation was used for bladder volume calculation. The results are compared in Sect. 22.4.
22.3 Method and Material 22.3.1 Data Set The images used in this study were obtained from the Baskent University Hospital using an ultrasound device called Vivid™ E9 with XDclear™. Retrospective ultrasound images of 10 different subjects who were registered for the diagnosis and treatment of various diseases were used. The Images ware recorded as the size of 344*444.
22.3 Method and Material
287
22.3.2 Method The semi-automatic algorithm presented in Fig. 22.1 and the fully automatic algorithm presented in Fig. 22.2 were used to calculate the volume of the bladder on the ultrasound images. All of the algorithms used in this study were evaluated in the MATLAB R2021a programming language. Region of Interest Detection. In the semi-automatic method, users have to manually determine the bladder limits. For this reason, there is a need for a structure that will enable manual determination of the region of interest. First, the user determines the edge of the bladder on ultrasound images, then a binary image was created within the limits set by the users manually ready for integration shown in Fig. 22.3. Discrete-Wavelet Transform. Two-dimensional multi-resolution analysis must be evaluated for 2-D discrete wavelet transform. 2-Dimensional f (x, y) function can be decomposed as follows. f (x, y) = A1 f +
3 Σ
D(1, p) f
p=1
Image Acquisition
Manually Defining the Boundaries of the Front View of the Bladder
Manually Defining the Boundaries of The Side View of The Bladder
Creating a Binary Image to Fill the Specified Area
Creating a Binary Image to Fill the Specified Area
Calculating Front View Area
Integration
Fig. 22.1 Semi-automatic algorithm
(22.1)
288
22 Automatic and Semi-automatic Bladder Volume Detection …
Image Acquisition
Bridging Pixels
Applying Wavelet Transform (3 times)
Dilation
Applying Canny Edge Detection
Field Filling
Resizing Image
Erode
Suppressing Unwanted Pixels
Integration
Fig. 22.2 Fully automatic algorithm
Fig. 22.3 Bladder images created by user defined boundaries
= A2 f +
3 Σ p=1
D2, p f +
3 Σ p=1
D1, p f
(22.2)
22.3 Method and Material
289
= An f +
n Σ [
D j,1 f + D j,2 f + D j,3 f
]
(22.3)
j=1
Here A j f (x, y) are approximations and D j, p f (x, y) are difference component. Where; ΣΣ A j f (x, y) = a j,k,l ϕ j,k,l (x, y) (22.4) l∈Z k∈Z
and D j, p f (x, y) =
ΣΣ
d j, p,k,l ψ j, p,k,l (x, y)
(22.5)
l∈Z k∈Z
Here ϕ (x, y) is two-dimensional scaling function and ψ (x, y) is wavelet function which are defined as follows. ( ) ϕ y,k,l (x, y) = 2− j ϕ 2− j (x − k), 2− j (y − l) ( j, k, l) ∈ Z 3
(22.6)
( ) ψ j, p,k,l (x, y) = 2− j ψ p 2− j (x − k), 2− j (y − l) p = 1, 2, 3, ( j, k, l) ∈ Z 3
(22.7)
⟨ ⟩ a j,k,l (x, y) = f (x, y), ϕ j,k,l (x, y)
(22.8)
⟨ ⟩ d j, p,k,l (x, y) = f (x, y), ψ j,k,l (x, y)
(22.9)
The ϕ (x, y) and ψ (x, y) can be decomposed for a separable multiresolution analysis such as; ϕ(x, y) = ϕ(x)ϕ(y),
(22.10)
ψ1 (x, y) = ϕ(x)ψ(y),
(22.11)
ψ2 (x, y) = ψ(x)ϕ(y),
(22.12)
ψ3 (x, y) = ψ(x)ψ(y),
(22.13)
where scale function ϕ is a one dimensional with its wavelet function ψ [7, 8].
290
22 Automatic and Semi-automatic Bladder Volume Detection …
Edge Detection. The edge detection method used in this study includes various mathematical methods that aim to identify points where image brightness changes sharply or where distortions occur in a digital image. Points where image brightness differs sharply are typically arranged into a series of curved line segments called edges. Finding discontinuities in one-dimensional signals is called step detection, and finding signal discontinuities over time is called change detection. Edge detection is one of the essential tools in image processing [9]. The edge detection process is implemented with certain steps. First of all, it is necessary to suppress as much noise as possible without disturbing the real edges. This step is known as “Smoothing”. Next, a filter is applied to improve the quality of the edges in the image, which is called “Enhancement”. Then the image is ready for “Detection” which means determining which edge pixels should be identified and removed as noise and which should be preserved. Finally, “Localization” which can be explained as determining the precise location of an edge. In this study, among the edge detection methods, the ‘Canny Method’, which was determined to be the most suitable for this study, was used. The Canny edge detection method is an edge detection operator that uses a multistage algorithm to detect a wide variety of edges in images. This technique is used to extract useful structural information from different view objects and significantly reduce the amount of data to be processed. The Canny Edge Detection algorithm, which is an edge detection algorithm, is based on the edge detection operations described above, but is done in more specific steps. The first step is smoothing as previously described. In this step, Gaussian Filter is used specifically for removing the noise and smoothing the image. Then the density gradients of the image are found. The third step is to apply non-maximum suppression to avoid false response to edge detection. A double threshold is then applied to identify potential edges. Then, double thresholds are applied to identify potential edges. The last step is hysteresis edge tracking. Hysteresis edge tracking can be defined as the detection of edges by suppressing all other edges that are weak and not dependent on strong edges [10]. ⎡
⎤ 1 2 1 Kernel = ⎣ 0 0 0 ⎦[4] −1 −2 −1
(22.14)
Morphological Operations. Morphological operations are an image processing analysis using clustering operations. It is mainly used to erode and/or dilate the pixel group determined in the image. Pixel erases, pixel bridging, dilation, filling, and erode were applied as shown in Fig. 22.4, based on the erosion and dilation, respectively [11]. { Dilation : A ⊕ B = c ∈ E N |c = a + b
(22.15)
22.3 Method and Material
291
Fig. 22.4 Morphological operations
for some a ∈ A and b ∈ B { Erosion : A⊝B = x ∈ E N |x + b ∈ A for every b ∈ B
(22.16)
Integration. After reaching the binary image obtained in Fig. 22.3 or 22.4 as a result of area of interest determination or image processing, the image is divided into two as front view and side view of the bladder. In the first step, the number of pixels was calculated by counting the white (binary: 1) pixels on the front view, and the front view area was calculated in pixels. Then, vertical height changes between point A and point B on the side area were used to calculate bladder volume, shown in Fig. 22.5. The volume was calculated with the formula:
Fig. 22.5 Binary front and side view
292
22 Automatic and Semi-automatic Bladder Volume Detection … B
Volume = ∫ Area (x)dx
(22.17)
A
In order to calculate the volume of the bladder, the maximum height in the front view area is accepted proportional to the height values on the side view. Therefore, the area changes are obtained for each slide according to the height values on the side view. According to this, the area at each slide is calculated and, the volume information is obtained by summing these slide areas.
22.4 Discussion and Results The volumes calculated for 10 bladder images with the Ellipsoid Method were accepted as the reference. Therefore, these results were used to calculate the relative error of semi-automatic and fully automatic methods. Since the Ellipsoid Method used works depending on the user’s initiative, each image has been measured very carefully. If the depth, width, and length information determined in the ellipsoid method are chosen carelessly, it causes a high rate of measurement error. To eliminate this user subjectivity two different methods were developed which is the basis of our study. The results obtained are shown in Tables 22.1 and 22.2. The measurement results showed that it had a relative error of 4.83% with the developed SemiAutomatic Algorithm, while it had a relative error of 3.94% with the Fully Automatic algorithm. The results show that the Fully-Automatic method has less error margin. It is also more objective since it removes the user initiative. Table 22.1 Semi-automatic segmentation results verses reference method
Reference volume (cm3 ) (ellipsoid method)
Result volume (cm3 ) (fully-automatic method)
263.11
245.50
225.84
224.98
250.95
245.83
229.95
253.93
232.81
227.90
226.46
227.25
254.80
235.95
237.56
231.89
257.20
273.06
255.88
252.24
Mean absolute deviation (MAPE)
3.94%
References Table 22.2 Fully-automatic segmentation results verses reference method
293 Reference volume (cm3 ) (ellipsoid method)
Result volume (cm3 ) (Semi-automatic method)
263.11
234.25
225.84
210.86
250.95
231.86
229.95
223.80
232.81
230.45
226.46
220.57
254.80
245.32
237.56
234.96
257.20
272.22
255.88
240.15
Mean absolute deviation (MAPE)
4.83%
22.5 Conclusions and Future Work According to the estimations and statistical calculations made on the volumes of the bladder images obtained by means of the ellipsoidal method, semi-automatic detection, and fully automatic detection method, the method with the highest accuracy was determined as the fully automatic method. The main reason for this is that the ellipsoid method is a basic calculation method that gives an approximate result based on these three variables, where only the height, width, and depth information is obtained, due to the similitude of the shape of the bladder. And similarly, the semi-automatic method, causes the result to be highly affected depending on the hand sensitivity of the user. On the other hand, the fully automatic volume calculation method ensures that the full volume of the image becomes suitable for the integration process after the image processing steps, independently of the user, and the result is determined with high accuracy. When these methods are compared, the fully automatic bladder volume calculation method has a lower error rate, higher accuracy, and is more objective compared to the ellipsoid and semi-automatic methods. In the later stages of this study, a bladder simulation algorithm can be added completely independently of the user, by controlling the readiness to process images taken from old devices, with high noise and indefinable by the system, with an artificial intelligence algorithm. At the same time, the developed algorithm can also be applied and tested for different organs.
References 1. Boron WF, Boulpaep EL (2016) Medical physiology E-book. Elsevier Health Sciences 2. Walker-Smith J, Murch S (1999) Diseases of the small intestine in childhood. CRC Press
294
22 Automatic and Semi-automatic Bladder Volume Detection …
3. Hvarness H, Skjoldbye B, Jakobsen H (2002) Urinary bladder volume measurements: comparison of three ultrasound calculation methods. Scand J Urol Nephrol 36(3):177–181 4. Dicuio M, Pomara G, Menchini Fabris F, Ales V, Dahlstrand C, Morelli G (2005) Measurements of urinary bladder volume: comparison of five ultrasound calculation methods in volunteers. Arch Ital Urol Androl 77(1):60–62 5. Padmapriya B, Kesavamurthi T, Ferose HW (2012) Edge based image segmentation technique for detection and estimation of the bladder wall thickness. Procedia Eng 30:828–835 6. Padmapriya B, Kesavamurthi T (2012) Application of quad-split technique over edge segmented ultrasound image to find the area and volume of the urinary bladder. Int J Eng Res Technol 1:1–7 7. Karasu S, Saraç Z (2018) Güç Kalitesi Bozulmalarının 2 Boyutlu Ayrık Dalgacık Dönü¸sümüve Torbalama Karar A˘gaçları Yöntemiile Sınıflandırılması. Politeknik Dergisi 21(4):849–855 8. Vyas A, Paik J (2016) Review of the application of wavelet theory to image processing. IEIE Trans Smart Proc Comput 5(6):403–417 9. Umbaugh SE (2010) Digital image processing and analysis: human and computer vision applications with CVIP tools. CRC Press 10. Rong W, Li Z, Zhang W, Sun L (2014) An improved CANNY edge detection algorithm. In: 2014 IEEE international conference on mechatronics and automation. IEEE, pp 577–582 11. Haralick RM, Sternberg SR, Zhuang X (1987) Image analysis using mathematical morphology. IEEE Trans Pattern Anal Mach Intell 4:532–550
Chapter 23
Effects of Variable UAV Speed on Optimization of Travelling Salesman Problem with Drone (TSP-D) Enes Cengiz, Cemal Yilmaz, Hamdi Tolga Kahraman, and Ça˘gri Suiçmez
23.1 Introduction Drones are frequently encountered in many civilian areas besides their use in the military field [1]. Since the transition to the online shopping era, significant changes have occurred in the marketing and delivery processes [2]. The commitment to trucks continues in the delivery period, which is the last mile operation of e-commerce. However, customers’ desire to reach their cargo early, traffic jams and air pollution by truck use have led many researchers to new solutions. Many advantages arise with the inclusion of drones, which are unmanned aerial vehicles, to perform the delivery of cargo packages. The need for humans is greatly reduced with the unmanned use of drones. In logistics delivery operations, it is ensured that cargoes are delivered in a shorter time by performing air flights. Considering the delivery costs, operations can be performed at lower costs than trucks. In addition, in terms of carbon emissions, the use of drones in delivery processes is very attractive compared to trucks [3]. E. Cengiz Mechatronics Engineering, Technology Faculty, Afyon Kocatepe University, Afyonkarahisar, Turkey e-mail: [email protected] C. Yilmaz Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected] H. T. Kahraman Software Engineering, of Technology Faculty, Karadeniz Technical University, 61080 Trabzon, Turkey e-mail: [email protected] Ç. Suiçmez (B) Electrical Electronics Engineering, Technology Faculty, Gazi University, Ankara, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_23
295
296
23 Effects of Variable UAV Speed on Optimization …
The routing problems of drones are among the topics that have been studied frequently in recent years [4–6]. These problems are aimed to minimize the total flight cost or total operation completion time [7–9]. The Flying Sidekick Traveling Salesman Problem (FSTSP), introduced by Murray and Chu in 2015, describes the synchronous operation of a drone and a truck. In the study, in which a truck and drone visited customer points, it was aimed to complete the delivery operation in the least amount of time. Both types of vehicles can work in parallel and deliver cargo. The drone can only serve one customer as it can carry one cargo per flight considering whose the carrying capacity, flight range and battery capacity. Both the battery is changed and cargo is received for the new customer by meeting with the truck in order to fly to a different customer location [10]. Ha et al. tried to minimize the operation time by taking the TSP-D problem to deliver packages to customers. For this, they presented a new Hybrid Genetic Algorithm (HGA) model. This model, in which local search and crossover operators are developed, has been suggested to be effective for solving the routing problem. The HGA model provided better results than other methods as a result of comparisons made with existing good methods [11]. The limited flight range of the drone is seen as one of the biggest problems in package delivery. Therefore, it is necessary to replace or charge the drone battery to make the drone’s flight operation effective. In both these cases, the drone has to return to a depot or station [12]. This process takes extra time for delivery problems. The TSP-D is a truck-serving model as a mobile depot for the drone. In Yurek and Ozmutlu’s studies on changing or charging the drone battery, an intuitive algorithm was presented and brought to the literature. It is an empirical study investigating the effect on delivery time for different charging rates and battery lifetimes [13]. The first study to deal with multiple drones and multiple trucks is by Wang et al. In this problem, delivery operations are performed with multiple trucks and drones assigned to them. In this version, a worst-case analysis on the extreme limits and various evaluations are presented [14]. In the study, which presents a hybrid heuristic method in the synchronous delivery of the drone and the truck, it is aimed to minimize the completion time of the work. According to the test results carried out in the study, it was seen that the speed of the drone had a significant effect on the delivery process. The time spent in delivery varies depending on the speed of the drone and the truck. Moving the drone faster than the truck reduces the overall delivery time when their speeds are equal. However, since the drone is 2 times or 3 times faster than the truck, it does not cause a change in the waiting time, so it gives very similar results [15]. In this study, it is aimed to minimize the total delivery time in drone and truck synchronous operation. For this purpose, since it is known that the speed of the drone has an effect on the completion time of the work, the speed of the drone was determined dynamically, not an average speed. The speed of the drone is determined dynamically by considering the weight of the cargo package while delivering the package by the drone. Thus, the waiting time of the truck due to the drone has been tried to be minimized. The effectiveness of working with the problems with 30, 45 and 60 customers was tested. According to the test results, the dynamic determination
23.2 Problem Definition
297
of the drone speed depending the weight of the cargo package gave more effective results than the determination of the average constant speed. The remainder of the article is designed as follows: Chap. 2 presents the problem definition for TSP-D. The method applied in solving the problem is explained in chap. 3. In Chap. 4, a detailed analysis of the numerical results is given by giving information about the experimental studies. In Chap. 5, the conclusion is presented.
23.2 Problem Definition TSP-D is the problem which a truck and multiple drones deliver cargo packages to customers synchronously. Drones instead of trucks have to serve some customers due to difficult conditions. The drone must meet the truck at the end of each delivery because the drone’s payload and battery life are limited. Also, one reason the drone meets the truck is to take the new customer package. The assumptions set for the problem are given below [10]. Assumptions determined for the problem • Although the drone can only visit one customer per sortie, the truck can serve more than one customer during this time. • It is assumed that the drone will remain in continuous flight in a sortie, except to deliver the package to the customer. Therefore, while coordinating the return to the truck, if the drone arrives before the truck, the drone cannot land temporarily to conserve battery power. • If the drone is launched from any point, it cannot return to the truck at the same point. • If the final step of a drone sortie is meeting the truck, it must be performed at a customer’s location where the truck is delivering packages. • The drone cannot meet the truck in some intermediate locations. Also, the truck cannot revisit any customer to meet with the drone. • Neither the drone nor the truck can visit any non-customer point. Additionally, neither tool can revisit any customer. Drone and truck can serve different customers at the same time. Package delivery of drone and truck to customers is basically shown in Fig. 23.1.
Fig. 23.1 The operation of drone and truck to serve customers
298
23 Effects of Variable UAV Speed on Optimization …
The drone leaves the truck at customer number 1 to serve customer number 2 according to Fig. 23.1. The customer package is placed on the drone before the drone leaves the truck and launched from the truck with the drone’s battery charged. After the drone is launched from number 1 and delivers the cargo to customer number 2, it moves to customer number 3. Meanwhile, the truck is moving from customer number 1 to customer number 3 to deliver packages and meet with the drone. The drone must meet with the truck again after it leaves the truck. This process needs to be done synchronously. Because this synchronization is very important in optimizing the completion process of the cargo delivery problems to the customers. After the drone completes its mission, depending on the flight range, it moves to the designated customer location to meet the truck. At the same time, the truck goes to the customer location where the drone went to meet with the drone. If the drone arrives at the meeting point earlier than the truck, the drone is waiting in the air. If the truck arrives at the meeting point before the drone, the truck will wait for the drone in the parking lot. Therefore, there will be waiting times in both cases. This is undesirable for problems that have optimized the completion time of the operation. There are many studies in the literature that consider the waiting time of the drone or truck [3, 16–19]. The ratio of truck speed to drone speed is a very important parameter in TSP-D problems [20]. In this study, the speed of the drone is determined depending on the weight of the cargo package after it is launched from the truck to serve the customer. Due to this situation, the ratio of truck speed to drone speed is determined dynamically. Thus, the waiting time of the truck is reduced to minimum levels. Customer assignments to the truck or drone are carried out depending on this situation. The graph showing the UAV speed and flight range relationship according to different payload weight is given in Fig. 23.2 [21]. According to Fig. 23.2, as the drone speed increases, the flight range also increases up to a certain level. However, after that level, the flight range decreases rapidly. After determining the cargo package to be carried by the drone for cargo delivery to the customer, the speed of the drone can be determined dynamically. Thus, the waiting
Fig. 23.2 UAV speed and flight range relationship according to different payload weight [21]
23.3 Methodology
299
time of the truck due to the drone in the synchronous package delivery of the drone and truck will be eliminated. In this article, the weights of the cargo packages are predetermined while the speed of the drones is determined dynamically and included in the algorithm.
23.3 Methodology 23.3.1 Truck-Drone Algorithm Approach The population-based evolutionary algorithm has been adapted for drone and truck synchronous package delivery. With the developed algorithm, if the required flight range is sufficient for the drone to serve a customer, that customer is assigned to the drone. Otherwise, the truck will serve the customer or customers. The general flowchart of evolutionary algorithm-based optimization is given in Fig. 23.3. After the initial command is given in Fig. 23.3, the population is created based on the evolutionary algorithm based on the number of customers. Population size is in the population row, and customer points are in the population column. Then, customer assignments are made to the drone or truck, taking into account the distance between customer locations in the created population. In the meantime, the battery life of the drone should be sufficient to deliver the package to the relevant customer. Flight times are calculated and their fitness values are determined after the candidate solutions in each population are assigned to the drone or truck. It is then ranked from best to worst according to fitness function. After this stage, new solution candidates
Fig. 23.3 Flow chart of the evolutionary algorithm [22]
300
23 Effects of Variable UAV Speed on Optimization …
are determined in the search process life cycle and the population is updated. The population update process is carried out by considering every five rows. First row of these five population rows is stored as the best solution candidate of the group and transferred to the next iteration. In second row, two customer locations are swapped randomly. In the third row, between all nodes is performed inversion. In the fourth row, the first and last customer points are swapped and shifted. In the fifth row, two random customer points are replaced with the first and last customer point in the row. This operation is performed for every five rows in the population and the population is updated. After this stage, when the 1000 iteration value determined as the termination criterion is reached, the best solution in the population and the calculated best result are returned. When the termination criterion is not met, the search process life cycle is returned and the process of finding the global optimum is continued.
23.4 Experimental Studies 23.4.1 Settings Appropriate optimization settings should be made in studies where the completion time of the operation is optimized in TSP-D problems. The population size was determined to be five times the number of customers during the optimization process using genetic algorithm. The maximum number of iterations was defined as 1000. The termination criterion is defined by considering the maximum number of evaluations of the objective function in order to ensure equality of opportunity when comparing the proposed situation and other cases. Experimental studies were performed on MATLAB® R2019b, AMD Ryzen 7 4800H 2.90 GHz and 16 GB RAM and × 64 based processor.
23.4.2 Experimental Studies and Results Good synchronization between the two vehicles is required for the drone and truck to deliver packages to customers. Therefore, customer points where the drone is launched from the truck and meet with the truck are very important. The speeds of the drone and truck are among the metrics that affect performance for this synchronization. As a result of the papers examined, the ratio of average drone speed to truck speed is generally greater than 1 [23, 24]. In these papers, the average speeds for the drone and truck were determined and studies were carried out. In this section, experimental studies are carried out for the case where an average speed is assigned to the drone and truck, and case the drone speed is determined dynamically depending on the weight of the cargo package. In the proposed case, while the drone delivers cargo to the customer, its speed is determined according to the weight of the cargo package. Thus, the drone flies at different speeds depending on its characteristic
23.4 Experimental Studies Table 23.1 Drone speed and flight range values according to payload weight [21]
301 Cargo weight (Ibs)
Drone speed (km/h)
Flight range (km)
1
48
27
2
50
22
3
55
18
4
60
16
5
70
14
structure while traveling to different customers. In this way, the waiting time of the truck can be reduced or even eliminated. Considering Fig. 23.2, the maximum flight speed and flight range according to cargo weights are given in Table 23.1. The drone flight speed and flight range of the proposed case for TSP-D operation depend on the weight of the cargo packages. For other situations compared in the study, the drone flight range was determined as 14 km. In the study, the truck speed was determined as 40 km/h, similar to the study in the literature [24]. The speed of the drone is dynamically changed with reference to Fig. 23.2 [21]. In the study, the launch and meeting times of the drone are neglected for simplicity. For the experimental part of the study, sample delivery problems with 30, 45 and 60 customers were created and the results were compared. The mean and standard deviation values of these cases, which were run 51 times, are presented in Table 23.2. When Table 23.2 is examined, the completion time of the work is noticeably reduced in the proposed case compared to other cases. Dynamically changing the drone speed positively affects the drone and truck synchronous delivery process. In this way, the waiting time of the truck is shorter than in other cases. In Fig. 23.4, delivery routes and convergence curves are given for all cases. The problems with 30, 45 and 60 customers, respectively, it is visited by the drone and truck synchronously in Fig. 4a, b, and c. The route with dashed lines belongs to the drone, and the route with straight lines belongs to the truck. In Fig. 23.4, the convergence curve of the completion time of the operation is also given, depending on the number of iterations. Table 23.2 Minimum completion times and standard deviation values of problems with 30, 45 and 60 customers in different cases Instances Case-1
Case-2
Case-3
Proposed case
Truck = 40 km/h Truck = 40 km/h Truck = 40 km/h Truck = 40 Drone = 40 km/h Drone = 48 km/h Drone = 56 km/h km/h Drone = different speeds N-30
61.388 (1.812)
53.879 (1.393)
49.699 (1.424)
45.858 (1.707)
N-45
84.684 (3.098)
73.684 (2.498)
67.118 (2.579)
61.517 (2.286)
N-60
94.577 (4.376)
83.307 (3.937)
78.221 (3.690)
74.764 (3.863)
302
23 Effects of Variable UAV Speed on Optimization …
(a) N-30 Truck = 40 km/h and drone = 40 km/h
(b) N-45 Truck = 40 km/h and drone = 48 km/h
(c) N-60 Truck = 40 km/h drone = different speeds
Fig. 23.4 Delivery completion time with mean and variable drone speeds of problems with 30, 45 and 60 customers
In the study, the drone flight speed was determined by taking into account the weights of the cargo packages to be delivered to the customers. The proposed case and the other 3 cases compared were run 51 times and customer assignments were made to the drone and truck. In this assignment process, the completion time of the operation was determined by taking into account the speed of the drones and trucks and the distance traveled by these vehicles. In the problems with 30, 45 and 60 customers, the data were recorded after 51 runs. The distribution of these data is given in Fig. 23.5 with box-plots. Figure 23.5 shows that the suggested case after 51 runs has a lower average value than the other cases. This means that dynamically determining the drone speed gives high performance results in TSP-D problems. In addition, the increase in the ratio of the drone speed to the truck speed gave slightly better results, but it cannot be said to have much effect.
23.4 Experimental Studies
303
(a) N-30
(b) N-45
(c) N-60
Fig. 23.5 Box-plot of the algorithms as a result of 51 runs for the problems
304
23 Effects of Variable UAV Speed on Optimization …
23.5 Discussions and Conclusion In this paper, a new variant of TSP, called the traveling salesman problem with drones, is presented. A numerical study is proposed to include drone speed in TSP-D depending on the weight of the cargo package and to investigate the effect of varying drone speed. In the study, 30, 45 and 60 customer test problems were created and comparison cases were presented with the proposed algorithm. Appropriate optimization settings have been used for all cases so that the comparison results are fair. The dynamic determination of the speed of the drone while delivering cargo in each flight showed an effective performance in the completion time of the delivery when examined from the experimental results. The box-plots presented in the experimental studies show that the proposed casein the delivery process gives better results than the average drone speed cases. Due to the increasing interest of commercial companies in this new delivery concept, a comprehensive study in this field is required in the future. Future studies should focus on various situations that reveal real-life applications.
References 1. Otto A, Agatz N, Campbell J, Golden B, Pesch E (2018) Optimization approaches for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: a survey. Networks 72(4):411–458 2. Meola A (2017) Shop online and get your items delivery by a drone delivery service: the future Amazon and Dominos have envisioned for us. Business Insider 3. Ha QM, Deville Y, Pham QD, Hà MH (2018) On the min-cost traveling salesman problem with drone. Trans Res Part C: Emerg Technol 86:597–621 4. Toth P, Vigo D (eds) (2014) Vehicle routing: problems, methods, and applications. Soc Ind Appl Math 5. Moshref-Javadi M, Hemmati A, Winkenbach M (2020) A truck and drones model for last-mile delivery: a mathematical model and heuristic approach. Appl Math Model 80:290–318 6. Zhang K, He F, Zhang Z, Lin X, Li M (2020) Multi-vehicle routing problems with soft time windows: a multi-agent reinforcement learning approach. Trans Res Part C: Emer Technol 121:102861 7. Kim S, Moon I (2018) Traveling salesman problem with a drone station. IEEE Trans Syst Man Cybern: Syst 49(1):42–52 8. Bouman P, Agatz N, Schmidt M (2018) Dynamic programming approaches for the traveling salesman problem with drone. Networks 72(4):528–542 9. Chang YS, Lee HJ (2018) Optimal delivery routing with wider drone-delivery areas along a shorter truck-route. Expert Syst Appl 104:307–317 10. Murray CC, Chu AG (2015) The flying sidekick traveling salesman problem: optimization of drone-assisted parcel delivery. Trans Res Part C: Emerg Technol 54:86–109 11. Ha QM, Deville Y, Pham QD, Hà MH (2020) A hybrid genetic algorithm for the traveling salesman problem with drone. J Heuristics 26(2):219–247 12. El-Adle AM, Ghoniem A, Haouari M (2021) Parcel delivery by vehicle and drone. J Oper Res Soc 72(2):398–416 13. Yurek EE, Ozmutlu HC (2021) Traveling salesman problem with drone under recharging policy. Comput Commun 179:35–49
References
305
14. Wang X, Poikonen S, Golden B (2017) The vehicle routing problem with drones: several worst-case results. Optim Lett 11(4):679–697 15. de Freitas JC, Penna PHV (2020) A variable neighborhood search for flying sidekick traveling salesman problem. Int Trans Oper Res 27(1):267–290 16. Yurek EE, Ozmutlu HC (2018) A decomposition-based iterative optimization algorithm for traveling salesman problem with drone. Trans Res Part C: Emerg Technol 91:249–262 17. Tu PA, Dat NT, Dung PQ (2018) Traveling salesman problem with multiple drones. In: Proceedings of the ninth international symposium on information and communication technology, Dec 2018, pp.46–53 18. Choi Y, Schonfeld PM (2021) A comparison of optimized deliveries by drone and truck. Transp Plan Technol 44(3):319–336 19. Cokyasar T, Dong W, Jin M, Verbas ˙IÖ (2021) Designing a drone delivery network with automated battery swapping machines. Comput Oper Res 129:105177 20. Moshref-Javadi M, Lee S, Winkenbach M (2020) Design and evaluation of a multi-trip delivery model with truck and drones. Transp Res Part E: Logistics Trans Rev 136:101887 21. Raj R, Murray C (2020) The multiple flying sidekicks traveling salesman problem with variable drone speeds. Transp Res Part C: Emerg Technol 120:102813 22. Rich R (2020) Inverting the truck-drone network problem to find best case configuration. Adv Oper Res 23. E¸s Yürek E, Özmutlu HC (2017) Analysis of traveling salesman problem with drones under varying drone speed. In: The 15th international logistics and supply chain congress (LMSCM), ˙Istanbul, Turkey 24. Ha QM, Deville Y, Pham QD, Ha MH (2015) Heuristic methods for the traveling salesman problem with drone. Technical Report. Retrieved from https://arxiv.org/abs/1509.08764v1
Chapter 24
Improved Phasor Particle Swarm Optimization with Fitness Distance Balance for Optimal Power Flow Problem of Hybrid AC/DC Power Grids Serhat Duman, Hamdi Tolga Kahraman, Busra Korkmaz, Huseyin Bakir, Ugur Guvenc, and Cemal Yilmaz
24.1 Introduction The No Free Lunch (NFL) theorem emphasizes that there is no single best metaheuristic search (MHS) algorithm for all optimization problems [1]. From this point of view, researchers have done a lot of work on the development of MHS algorithms. A comprehensive analysis of the literature reveals that researchers focused on two issues in studies on MHS algorithms. The first issue is the development of a new MHS optimization algorithm including particle swarm optimization (PSO) [2], artificial S. Duman (B) Electrical Engineering, Engineering and Natural Sciences Faculty, Bandirma Onyedi Eylul University, Bandirma 10200, Turkey e-mail: [email protected] H. T. Kahraman · B. Korkmaz Software Engineering of Technology Faculty, Karadeniz Technical University, Trabzon 61080, Turkey e-mail: [email protected] H. Bakir Department of Electronics and Automation, Dogus Vocational School, Dogus University, Istanbul 34775, Turkey e-mail: [email protected] U. Guvenc Electrical and Electronics Engineering, Engineering Faculty, Duzce University, Duzce 81620, Turkey e-mail: [email protected] C. Yilmaz Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_24
307
308
24 Improved Phasor Particle Swarm Optimization with Fitness …
bee colony (ABC) [3], ant colony optimization (ACO) [4], gravitational search algorithm (GSA) [5], grey wolf optimizer (GWO) [6], moth-flame optimization (MFO) [7], symbiotic organisms search (SOS) [8], stochastic fractal search (SFS) [9], backtracking search algorithm (BSA) [10], crow search algorithm (CSA) [11], salp swarm algorithm (SSA) [12], artificial electric field algorithm (AEFA) [13], atom search optimization (ASO) [14], slime mould algorithm (SMA) [15], supply–demand-based optimization (SDO) [16], henry gas solubility optimization HGSO [17], equilibrium optimizer (EO) [18], artificial ecosystem based optimization (AEO) [19], marine predators algorithm (MPA) [20], student psychology based optimization (SPBO) [21]. The other issue is the development of variants for available MHS algorithms including phasor particle swarm optimization (PPSO) [22], orthogonal PSO (OPSO) [23], self-learning PSO (SLPSO) [24], opposition-based differential evolution (ODE) [25], success history based adaptive differential evolution (SHADE) [26], linear population size reduction adaptive differential evolution (LSHADE) [27], modified grey wolf optimizer (mGWO) [28], complex-valued encoding grey wolf optimizer (CGWO) [29], chaotic Harris hawks optimization (CHHO) [30], enhanced coyote optimization algorithm (ECOA) [31]. Based on the literature review, it is seen that researchers have been studying on improving the search performance of algorithms using various methods (phasor concept, self-learning mechanism, chaos theory). However, these methods alone are not sufficient to improve algorithm performance. Fitness-Distance Balance (FDB) [32], a recently developed effective and powerful selection method that allows MHS algorithms to be designed in harmony with nature, is widely applied to improve the search performance of algorithms. Some of these studies are FDB-based symbiotic organism search (FDBSOS) [32], FDBbased stochastic fractal search (FDBSFS) [33], FDB-based adaptive guided differential evolution algorithm (FDBAGDE) [34], dynamic fitness distance balance manta ray foraging optimization (dFDB-MRFO) [35], Lévy roulette and fitness distance balance-based coyote optimization algorithm (LRFDBCOA) [36]. In the relevant literature studies, the effect of the FDB selection method on the exploration and balanced search capabilities of MHS algorithms has been clearly demonstrated. The search performance of MHS algorithms is directly related to their ability to mimic the process in nature. As with other MHS algorithms, the rule applies to PPSO. When the search performance of PPSO on different types of benchmark problems is examined, it is found that the algorithm suffers from poor diversity and premature convergence problems. To eliminate these problems, PPSO’s search operators have been redesigned. In this regard, the FDB method was used for the selection of solution candidates that guided the search process in PPSO, and two FDB-based PPSO variations were developed. The performances of the developed variations were tested, and the most effective ones were determined. The proposed optimization model has been applied to the solution of the optimal power flow problem in hybrid AC/DC power systems as well as unimodal, multi-modal, hybrid and composition type problems in the CEC 2020 test suite. Recently, the rapid increase in energy demand has raised the stress on transmission lines. In order to reduce the stress on the transmission lines, numerous researchers focused on optimal power flow studies involving MTHVDC transmission links. Pinto et al. [37] used the genetic algorithm (GA) for the optimal and economical operation
24.1 Introduction
309
of the power system incorporating offshore wind farms. Elattar et al. [38] applied the improved manta ray foraging optimizer (IMRFO) to the optimization of the OPF problem in modified AC/DC power systems with VSC stations. Abdul-hamied et al. [39] proposed the equilibrium optimizer (EO) for solving the OPF problem in AC/DC power grid based on VSC technology. Shaheen et al. [40] proposed the multiobjective manta ray foraging optimization (MO-MRFO) for the solution of AC/DC OPF problem that takes into account the techno-economic operation of the power system. Elsayed et al. [41] proposed the multi-objective marine predators algorithm (MO-MPA) for the optimization of the OPF problem of hybrid AC/DC electric grids containing multi-terminal VSC-HVDC. Shaheen et al. [42] utilized the improved crow search algorithm (ICSA) to solve the OPF problem considering the economic and environmental objective functions for AC-MTHVDC networks. The optimal power flow problem of hybrid AC/DC power grids is a non-linear, and high-dimensional optimization problem. The constraints, challenging objective functions of the problem and the high geometric complexity of the search space in which it is defined make it is difficult to produce a feasible solution. This case requires the implementation of a powerful meta-heuristic search algorithm which effective diversity and balanced search capabilities. This paper presents a novel meta-heuristic algorithm, called fitness-distance balance phasor particle swarm optimization (FDBPPSO). The proposed algorithm is applied to the optimization of the AC-MTHVDC OPF problem for the IEEE-30 bus power system. The simulation results revealed that the proposed algorithm is a powerful and effective meta-heuristic algorithm for solving the OPF problem under study. The main contributions of this paper are listed below: • Using the FDB selection method, the search performance of PPSO has been improved and a powerful and robust algorithm called FDBPPSO has been presented to the literature. • The search performance of the FDBPPSO algorithm was tested using the CEC 2020 benchmark problems. • In experimental studies, unimodal, multimodal, hybrid, and composition problem types were used. • Non-parametric Friedman and Wilcoxon statistical analysis results confirmed that the proposed algorithm exhibits stable and robust search performance in low, middle and high dimensional search spaces. • The proposed algorithm is applied to the optimization of the OPF problem in hybrid AC/DC networks. • The dominance of the proposed algorithm over its competitors was approved with various objective functions namely, total generation cost with valve-point effect, voltage deviation, power loss. The remainder of the paper is organized as follows: The second section presents the mathematical model of the OPF problem in a hybrid AC/DC grid. The third section consists of three sub-sections introducing the FDB selection method, overview of PPSO, and proposed FDBPPSO, respectively. Following section, experimental settings are given. The fifth section presents the results of the experimental studies.
310
24 Improved Phasor Particle Swarm Optimization with Fitness …
The performance of the proposed algorithm is tested both in unconstrained benchmark problems and in optimization of the constrained real-world engineering problem. In this context, two sub-sections have been prepared. The first sub-section includes the results of experimental studies performed to identify the most effective FDBPPSO variant. In the second sub-section, the proposed FDBPPSO method is applied to the optimization of the OPF problem in hybrid AC/DC grid. Subsequently, the paper ended with explanations of the conclusions.
24.2 Mathematical Formulation of Optimal Power Flow Problem of Hybrid AC/DC Power Grids Optimal power flow (OPF) remains a hot topic among power system researchers due to its important role in modern power system planning and operation. The OPF problem aims to determine the best settings of control variables that optimize the chosen objective function such as total cost, voltage deviation etc. while subject to a set of equality and inequality constraints [43]. The generic formulation of the OPF can be defined as follows [44]: { } minimi zeK = K 1 (O, P), K 2 (O, P) . . . , K p (O, P) subjecttog(O, P) = 0 h(O, P) ≤ 0
(24.1)
where, K is described as vector of the p objective functions, O and P are state and control variables, and g(O, P) and h(O, P) represent the equality and inequality constraints, respectively.
24.2.1 State and Control Variables The state variables of the OPF problem can be represented by a vector O shown in Eq. (24.2) [41]: ] [ O = PT hg1 , Vl1 . . . Vl N P Q , Q T hg1 . . . Q T hg N T H G , . . . , Sl1 . . . Sl N T L
(24.2)
where, PT hg1 is the active power of slack generator, Q T hg is defined as reactive power of the thermal generation units. Vl indicates the voltage magnitude at load bus, Sl is the transmission line loading. N P Q, N T H G, and N T L are the number of load buses, thermal generators, and transmission lines. In addition, the state variables of the VSC based MTHVDC grid are the voltage magnitude of the DC buses and the power flow in the DC lines. The control variables vector P are defined as follows [41]: [ P = PT hg2 . . . PT hg N T H G , VT hg1 . . . VT hg N T H G , Q S H1 . . . Q S HN S H , ] T1 . . . TN T , Vdci , Vconvi , Pconvi , Q convi
(24.3)
24.2 Mathematical Formulation of Optimal Power Flow …
311
where, PT hg is defined as the active power of the thermal generation units except for the slack generator. VT hg is the voltage magnitude of thermal generation units. Q S H and T represent the output of shunt VAR compensators and tap ratio of the transformers, respectively. N T H G, N S H , and N T are the number of thermal generators, shunt compensators and tap setting transformers. Vdc and Vconv are DC bus voltage and voltage magnitude at AC side of the VSC. Pconv and Q conv are active and reactive power outputs of the converter, respectively.
24.2.2 Constraints The AC-MTHVDC OPF optimization problem must be solved subject to various equality and inequality constraints. Equality and inequality constraints are given in the following sub-sections, respectively.
24.2.2.1
Equality Constraints
Generally, power balance equations in which the active and reactive power produced are equal to the demand and losses in the electric grid can be considered as equality constraints. The AC power balance equations are formulated as follows [40]: PT hgi − PDi − Vi
N AC Σ
( )] [ V j G i j cos(δi j ) + Bi j sin δi j = 0i = 1, 2, . . . , N AC
j=1
(24.4) Q T hgi − Q Di + Q S H i − Vi
N AC Σ
[ ] V j G i j sin(δi j ) − Bi j cos(δi j ) = 0i = 1, 2, . . . , N AC
j=1
(24.5) where, PT hgi and Q T hgi represent active and reactive power of generation units, respectively. PDi and Q Di are the active and reactive load demands. Q S H i is reactive power output of shunt VAR compensator. δi j is defined as the voltage angle difference between i-th and j-th buses. G i j and Bi j represent conductance and susceptance. N AC is the number of AC bus. The DC power is calculated using Eq. (24.6) [40]. In this Equation, Pdci is the active power flow through a DC line, G dc,i j is conductance of DC line, Vdci and Vdc j are voltage magnitude of i-th and j-th DC bus. Pdci = Vdci
N V SC Σ
i =1 i /= j
( ) G dc,i j Vdci − Vdc j
(24.6)
312
24.2.2.2
24 Improved Phasor Particle Swarm Optimization with Fitness …
Inequality Constraints
The AC grid constraints consist of the operating limits of the power system components (generator, shunt compensator, transformer) and the security constraints. The AC grid inequality constraints are as follows [40, 41]: • Generator Constraints max PTmin hgi ≤ PT hgi ≤ PT hgi ∀i ∈ N T H G
(24.7)
max Q min T hgi ≤ Q T hgi ≤ Q T hgi ∀i ∈ N T H G
(24.8)
max VTmin hgi ≤ VT hgi ≤ VT hgi ∀i ∈ N T H G
(24.9)
• Shunt Compensator Constraints max Q min S Hi ≤ Q S Hi ≤ Q S Hi ∀i ∈ N S H
(24.10)
• Transformer Constraints Timin ≤ Ti ≤ Timax ∀i ∈ N T
(24.11)
Vlmin ≤ Vli ≤ Vlmax ∀i ∈ N P Q i i
(24.12)
|Sli | ≤ Slimax ∀i ∈ N T L
(24.13)
• Security Constraints
In similar way, MTHVDC grid inequality constraints are as follows [40, 41]: min max Vconv ≤ Vconvi ≤ Vconv i = 2, 3, 5, 6 i i
(24.14)
min max Vdc ≤ Vdci ≤ Vdc i = 1, 4 i i
(24.15)
min min Pconv ≤ Pconvi ≤ Pconv i = 2, 3, 5, 6 i i
(24.16)
min Q min convi ≤ Q convi ≤ Q convi i = 1, 4
(24.17)
where, Vconvi is defined as the voltage magnitude of the i-th converter. Vdci is the voltage magnitude of DC bus. Pconvi and Q convi are the active and reactive powers of converter (AC side).
24.2 Mathematical Formulation of Optimal Power Flow …
313
24.2.3 Objective Functions In this paper, three objective functions are utilized to optimize the AC-MTHVDC OPF problem. Detailed information about the mathematical models of objective functions is described below:
24.2.3.1
Minimization of Total Generation Cost
The total generation cost of thermal units considering the valve-point effect defined as follows [34]: ) ( Fobj (O, P) = Fobj1 = C F PT hg =
NΣ T HG
| ))| ( ( | z i PT2hgi + yi PT hgi + ti + |h i x sin pi x PTmin hgi − PT hgi
i=1
(24.18) where, PT hgi is represent the active power of i-th thermal generation unit. z i , yi , and ti are the fuel cost coefficients, h i and pi are valve-point loading effect coefficients.
24.2.3.2
Minimization of Voltage Deviation
Voltage deviation mathematically defined as follow [39]: Fobj (O, P) = Fobj2 =
N PQ Σ
(
N DC ( )2 Σ )2 Vlr − 1 + Vdc, p − 1
r =1
(24.19)
p=1
where, Vlr and Vdc, p are voltage magnitudes of r-th PQ bus and p-th DC bus. N P Q represent the number of load buses at the AC grid. N DC is the number of DC bus.
24.2.3.3
Minimization of Active Power Loss
Considering the power losses in AC and DC transmission systems, the active power loss objective function is defined as follows [40, 41]: Fobj (Y, Z ) = Fobj3 = PAC_loss + PDC_loss + PV SC_loss
(24.20)
The AC grid, DC grid and VSC losses are calculated as follows: PAC_loss =
Σ i, j∈N AC
] [ G i j Vi2 + V j2 − 2Vi V j cos(δi j )
(24.21)
314
24 Improved Phasor Particle Swarm Optimization with Fitness …
PDC_loss =
Σ
Ri j Ii2j
(24.22)
i, j∈N DC
PV SC_loss =
N V SC Σ
,
,
,
2 ω1,i Ic,i + ω2,i Ic,i + ω3,i
(24.23)
i=1 ,
,
,
where, ω1 , ω2 , and ω3 are the loss coefficients of VSCs.
24.3 Method This study focused on improving the search performance of the PPSO algorithm by using the FDB selection method. In this context, the selection strategy of the PPSO algorithm was redesigned using the FDB method and thus a novel algorithm called FDBPPSO was proposed. In the next sub-sections, the fitness-distance balance method, the phasor particle swarm optimization algorithm, and the proposed FDBPPSO method are introduced, respectively.
24.3.1 Fitness-Distance Balance Method Fitness Distance Balance (FDB) [32] is an effective selection method used to determine the solution candidate/(s) that can provide the most benefit to the search process. In the FDB selection method, the score value showing its contribution to the search process is calculated for all solution candidates in the population. The FDB score value is calculated by considering the fitness values of the solution candidates and the instantaneous distance values from the best solution in the population. Accordingly, the solution candidate with the highest FDB score guides the search process. [32–36]. Before explaining the basics of the FDB selection method, two steps that are common to all MHS algorithms should be well known. These steps are the generating of random population and calculating fitness values [33]. • MHS optimization algorithms start the optimization process with the random generation of the population (P) including solution candidates. Each solution candidate (X ) contains one or more design parameters. Assume that P represents the population as depicted in Eq. (24.24), where n and m are defined as the number of solution candidates and design parameters, respectively.
24.3 Method
315
] [ ] [ . .. . P ≡ X 1 ..X n ≡ x11 · · · x1m .. . . ..xn1 · · · xnm
(24.24)
n×m
• The objective function, or fitness function, calculates a fitness value corresponding to each solution candidate. Accordingly, the F vector showing the fitness value of the solution candidates is given in Eq. (24.25) [ F≡
. f 1 .. f n
] (24.25) n×1
In order to calculate the FDB score value following steps should be followed [32]. 1. First, the D X i value, which represents the Euclidean distance between the ith solution (X i ) candidate and the best solution (X best ) of the population, is calculated as in Eq. (24.26). n i=1 ∀X i
/= X best , D X i /( )2 ( )2 )2 ( X i[1] − X best[1] + X i[2] − X best[2] + · · · + X i[m] − X best[m] = (24.26)
2. The distance vector D P is shown as follows: ] [ . D X ≡ d1 .. dn
(24.27)
n×1
3. Thereafter the FDB score value is calculated. For the score calculation, normalized fitness (nor m Fi ) and distance (nor m D X i ) values of the solution candidates are used. The w parameter shown in Eq. (24.28) is used to adjust the effect of fitness and distance value in the score calculation. n i=1 ∀X i ,
S X i = w ∗ nor m F[i] + (1 − w) ∗ nor m D X [i]
(24.28)
4. Finally, the n-dimensional FDB score vector (S P ) is generated as created in Eq. (24.29). Accordingly, the solution candidate with the highest FDB score is selected as the guide solution that contributes the most to the search process. ] [ .. S X ≡ s1 . sn n×1
(24.29)
316
24 Improved Phasor Particle Swarm Optimization with Fitness …
24.3.2 Overview of Phasor Particle Swarm Optimization (PPSO) Algorithm Particle swarm optimization (PSO) [22] is a nature-based optimization method inspired by the collective behaviors of animals such as birds and fish. PSO [2] is one of the most frequently used swarm-based optimization algorithm for solving global optimization problems. Artificial intelligence researchers have studied different variants of PSO to improve its search performance [45]. One of them is a powerful and efficient variant of PSO called phasor particle swarm optimization (PPSO). The PPSO algorithm is based on modeling the acceleration coefficients (C 1 and C 2 ) and the inertia weight (w) control parameters with a phase angle (θ ). The proposed PPSO algorithm is a non-parametric, self-adaptive powerful meta-heuristic algorithm developed using the mathematical model of phasor concept. In the optimization process of PPSO, firstly n particles are randomly created in the m-dimensional search space with their own phase angle (θi ), , and calculated of fitness value. Each particle in the swarm is represented by position (X i ) and velocity vectors (Vi ). Equations (24.30) and (24.31) represent the position and velocity vectors for the i-th particle in the m-dimensional search space, respectively [22]. X i = [xi1 , xi2 , . . . , xim ]
(24.30)
Vi = [vi1 , vi2 , . . . , vim ]
(24.31)
After that, the velocity and position of each particle is updated using Eqs. (24.32) and (24.33), respectively [22]. | |2×sinθiI ter ( ) Vi I ter = |cos θiI ter | × PbestiI ter − X iI ter | |2×cosθiI ter ( ) × Gbest I ter − X iI ter + |sin θiI ter |
(24.32)
−−I− −→ −−→ −−→ X i ter +1 = X iI ter + Vi I ter
(24.33)
where, Pbesti and Gbest are defined as personal and global best position vectors, respectively. Vi I ter and X iI ter are the current velocity and position of i-th particle. Vi I ter +1 and X iI ter +1 show velocity and position of i-th particle for next iterations. Throughout the search process lifecycle, Pbest and Gbest are updated with the following equations [22]: ) ( ) { ( } PbestiI ter +1 = PbestiI ter , i f f PbestiI ter ≤ f X iI ter +1 X iI ter +1 , other wise (24.34)
24.3 Method
317
) { ( Gbest I ter +1 = PbestiI ter +1 , i f f PbestiI ter +1 ( ) } ≤ f Gbest I ter Gbest I ter , other wise
(24.35)
For the next iterations, the phase angle and maximum velocity of the particle are updated as shown in Eqs. (24.36) and (24.37), respectively [22]. | | θiI ter +1 = θiI ter + T (θ ) × (2π ) = θiI ter + |cos(θiI ter ) + sin(θiI ter )| × (2π ) (24.36) | | 2 I ter +1 Vmax,i = W (θ ) × (X max − X min ) = |cosθiI ter | × (X max − X min ) (24.37) where, θ denotes the phase angle. X min and X max are defined as the minimum and maximum limits of the design parameters, respectively.
24.3.3 Proposed FDBPPSO Algorithm In this study, we focused on eliminating both poor diversification (exploration) and intensification (exploitation) shortcomings of the PPSO algorithm, which is an improved variant of PSO, thus improving the search performance of the algorithm. In this regard, the selection strategy of PPSO was redesigned by using the FDB selection method, which was developed to determine the solution candidates to guide the search process of MHS algorithms. For the design of the proposed FDBPPSO algorithm, two different variants were created. In the first variant called Case-1, the position index_ f db ). In vector X iI ter given in Eq. (24.33) was selected by the FDB method (X i I ter given in Eq. (24.33) was the other variant called Case-2, the velocity vector Vi index_ f db selected by the FDB method (Vi ). The mathematical models of the developed FDBPPSO variants are given in Table 24.1. The pseudo-code of the proposed FDBPPSO algorithm is depicted in Algorithm 1. Table 24.1 Mathematical model of the proposed FDBPPSO
Case-1
PPSO −−−−→ −−−→ −−−→ X iI ter +1 = X iI ter + ViI ter
Case-2
−−−−→ −−−→ −−−→ X iI ter +1 = X iI ter + ViI ter
Proposed FDBPPSO −−−−→ X iI ter +1 = −−−−−−−→ −−−→ index_ f db + ViI ter Xi (24.38) −−−−→ X iI ter +1 = −−−−−→ −−−→ −−index_ f db X iI ter + Vi (24.39)
318
24 Improved Phasor Particle Swarm Optimization with Fitness …
Algorithm 1. Pseudo-code of FDBPPSO algorithm 1. Input: Swarm size (n), Number of design parameters (m), maxFEs, X min and X max 2. Output: G best 3. Begin 4. for i = 1: n do 5. θi1 = U (0, 2π ) 6. for d = 1: m do 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
1 vmax,id = (xmax,d − xmin,d )
End End // Generate the initial population and evaluate fitness // P: randomly create a population of particles as given in Eq. (24.24) for i = 1: n (particle number) do f : evaluate the fitness for each particle as given in Eq. (24.25) End Determine Pbesti1 and Gbest 1 // Meta-heuristic search process // while search process lifecycle: up to termination criteria(maxFEs) do for i = 1: n (particle number) do Update velocities of particles using Eq. (24.32) Implementation of FDB selection method for i = 1: n do Calculate Euclidean distance of each particle using Eq. (24.26) Calculate FDB score for each particle using Eq. (24.28) end for Identify D X dnd S X vectors index_ f db
index_ f db
Determine X i and Vi according to FDB philosophy (Eq. 24.29) Update positions of particles using Eq. (24.38) // Case-1 // Update positions of particles using Eq. (24.39) // Case-2 //
29.
Update the local best PbestiI ter +1 positions
30.
Update the global best Gbest I ter +1 positions
31.
Update phase angle θiI ter +1 of particles
32.
I ter +1 Update phase angle Vmax,i of particles
33. end for 34. end while 35. return G best 36. End
24.4 Experimental Settings In order to clearly demonstrate the search performance of MHS algorithms, the conditions taken into account in experimental studies are highly critical. For this
24.5 Results and Analysis
319
reason, it is essential that experimental studies are conducted fairly and in accordance with standards. The experimental settings considered in this study are summarized below: • The maximum number of function evaluations (maxFEs) was used as the termination criterion to ensure equality of opportunity between the algorithms. The value of maxFEs is set to 10,000*D, where D shows the number of design variables. • The performance of the algorithms was tested using the benchmark problems in the CEC 2020 test suite [46], which is one of the most prestigious test suites in the literature. • Exploitation, exploration, and balanced search abilities of MHS algorithms are investigated in unimodal, multimodal, hybrid, and composition type problems. • The convergence performance of the algorithms to the global optimum has been tested in 30, 50 and 100 search spaces. • In order to obtain statistically significant results, the algorithms were run 25 times for each test function. • Experimental studies were realized on Intel (R) Core™ i7-7500U CPU @ 2.70 GHz, 12 GB RAM, and an × 64-based processor.
24.5 Results and Analysis This section gives a summary of the two experimental studies carried out to reveal the effects of the FDB selection method on the search performance of the PPSO algorithm. First, the success of PPSO and two different FDBPPSO variants in solving unconstrained CEC2020 benchmark problems were tested. The data obtained from experimental studies were analyzed using Friedman and Wilcoxon statistical analysis methods. After that, the proposed FDBPPSO method was applied to the optimal power flow problem of hybrid AC/DC networks.
24.5.1 Determining the Best FDBPPSO Variant on CEC 2020 Test Suite This subsection provides detailed information on the results of the first experimental study in which the search performances of the PPSO and FDBPPSO variants (Case-1, Case-2) in CEC 2020 benchmark problems were analyzed.
24.5.1.1
Statistical Analysis
In order to analyze the search performance of MHS algorithms, experimental studies were carried out in three different dimensions using CEC 2020 benchmark problems.
320
24 Improved Phasor Particle Swarm Optimization with Fitness …
Table 24.2 Friedman test ranking of PPSO and FDBPPSO variants Algorithms
Dimension = 30
Dimension = 50
Dimension = 100
CEC 2020
CEC 2020
CEC 2020
Mean rank
Case-1
1.37
1.35
1.15
1.29
Case-2
1.87
1.90
1.97
1.91
PPSO
2.76
2.75
2.89
2.80
Table 24.3 Wilcoxon pairwise comparison results for PPSO and FDBPPSO variants
Versus PPSO
Dimension = 30
Dimension = 50
Dimension = 100
+/=/−
CEC 2020
CEC 2020
CEC 2020
Case-1
9/1/0
9/1/0
9/1/0
Case-2
9/1/0
9/1/0
9/1/0
Accordingly, the Friedman rank values calculated using the error values obtained by the algorithms in each experimental study are reported in Table 24.2. In the relevant table, the best value obtained for each experiment is marked in bold-grey. When the Friedman test results reported in Table 24.2 are analyzed, it is clear that the FDB-based PPSO variants exhibited a better search performance than the base PPSO in all experiments. On the other hand, PPSO achieved a worse ranking than its competitors due to the premature convergence problem. Comparison between FDBPPSO variants shows that Case-1 is the only pioneer to achieve the best ranking in all experiments. Table 24.3 presents the Wilcoxon pairwise comparison results between PPSO and FDBPPSO variants. From the table, it is possible to say that the FDBPPSO variants outperformed the PPSO algorithm for all test problems and low- middle- high dimensional search spaces. To put it more clearly, the performance of all FDBPPSO variants on the benchmark functions in the CEC 2020 test suite was 9/1/0. The clarification for these numerical data was that in nine of 10 test functions, Case-1 or Case-2 was better, and in 1, both algorithms had a similar error value. Briefly, the Friedman and Wilcoxon test results indicate that the FBD selection method eliminates the premature convergence problem of the PPSO algorithm and thus improves the exploration and balanced search capabilities.
24.5.1.2
Convergence Analysis
In order to clearly observe the search performance of the base PPSO algorithm and the developed FDBPPSO variants, box plots were drawn. Box-plot graphs indicating the error values obtained by the algorithms over 25 independent runs for the F1 (unimodal), F4 (multimodal), F7 (hybrid), and F10 (composition) benchmark functions selected from the CEC 2020 test suite are depicted in Fig. 24.1. From the figure,
24.5 Results and Analysis
321
it can be clearly seen that the FDBPPSO variants exhibited a more stable and robust search performance for all problem types in the experimental studies performed at 30, 50, and 100 dimensions compared to the base algorithm. This indicates that the FDB selection method significantly improves the exploration and balanced search capabilities of the base algorithm. On the other hand, when comparing FDB-based PPSO variants, it is possible to say that Case-1 performs more successfully in converging to the minimum error value. To sum up, both the Friedman-Wilcoxon statistical analysis results and the convergence analysis showed that Case-1 exhibited superior search performance compared to its competitors. Accordingly, the Case-1 variant will henceforth be referred to as FDBPPSO.
24.5.2 Application of the Proposed FDBPPSO Method for Optimal Power Flow Problem of Hybrid AC/DC Power Grids 24.5.2.1
Experimental Settings
All simulation studies were performed using Intel (R) Core (TM) i5-1135G7 [email protected] GHz, 8 GB RAM, and 64-based processor. 25 independent runs were conducted for all optimization algorithms to obtain statistically significant data. The maximum number of iterations and population size for all algorithms were accepted as 300 and 50, respectively. MATACDC [47] package software was used for AC/DC power calculations.
24.5.2.2
Simulation Results
In this section, optimal power flow optimization was performed in a hybrid ACMTHVDC power system, which is a popular real-world engineering problem. The proposed FDBPPSO, barnacles mating optimizer (BMO) [48], whale optimization algorithm (WOA) [49], and grey wolf optimizer (GWO) [6] methods were applied to solve the AC–MTHVDC OPF problem of the modified IEEE 30-bus power system. Table 24.4 gives detailed information about the test system configuration. Also, cost coefficients of thermal generators are given in Table 24.5 [50]. The schematic diagram of the modified IEEE 30-bus test system with VSC-MTHVDC is depicted in Fig. 24.2. In order to evaluate the performances of optimization algorithms, different simulation cases were tested on the modified IEEE-30 bus system. Simulation cases are described below:
(e) F4 (Multimodal) D = 50
(d) F4 (Multimodal) D = 30
Fig. 24.1 Box-plot charts for CEC 2020 benchmark problems
(b) F1 (Unimodal) D = 50
(a) F1 (Unimodal) D = 30
(f) F4 (Multimodal) D = 100
(c) F1 (Unimodal) D = 100
322 24 Improved Phasor Particle Swarm Optimization with Fitness …
Fig. 24.1 (continued)
(j) F10 (Composition) D = 30
(g) F7 (Hybrid) D = 30
(k) F10 (Composition) D = 50
(h) F7 (Hybrid) D = 50
(l) F10 (Composition) D = 100
(i) F7 (Hybrid) D = 100
24.5 Results and Analysis 323
324
24 Improved Phasor Particle Swarm Optimization with Fitness …
Table 24.4 Configuration of the test system Modified IEEE 30-bus system
Parameters
Number
Details
Buses
30
[41]
Branches
41
[41]
Thermal generation units
6
Buses: 1 (swing), 2, 5, 8, 11 and 13
Transformers
4
Branches: 11, 12, 15 and 36
Shunt capacitors
8
Buses: 10, 12, 15, 17,20, 21, 23,24 and 29
VSC based MTDC grids
2
Multi-terminal DC Grid-1: VSC1, VSC2, VSC3 Multi-terminal DC Grid-2: VSC4, VSC5, VSC6
Total active and reactive demands
–
283.4 MW, 126.2 MVAr
Load bus voltage limits
24
[0.94–1.06] p.u
Table 24.5 The cost coefficients for thermal generation units Modified IEEE 30-bus system Thermal Generator
z ($/MW2 h)
y ($/MWh)
ti ($/h)
h ($/h)
pi (rad/MW)
T hg1
0.0037
2.00
0
18
0.037
T hg2
0.0175
1.75
0
16
0.038
T hg5
0.0625
1.00
0
14
0.040
T hg8
0.0083
3.25
0
12
0.045
T hg11
0.0250
3.00
0
13
0.042
T hg13
0.0250
3.00
0
13.5
0.041
Case 1: Minimizing of total cost with valve point effects for thermal generation units. Case 1 handles the total generation cost optimization of thermal generators using the quadratic fuel cost function including valve-point effect given in Eq. (24.18). The optimal settings of the obtained control and state variables, as well as the corresponding objective function values of each algorithm for Case 1, are reported in Table 24.6, where the best result is highlighted in bold. From the table, it can be clearly seen that total cost values from the proposed FDBPPSO, BMO, WOA, and GWO algorithms were 849.5388 $/h, 859.0905 $/h, 860.1894 $/h, and 870.4174, respectively. Accordingly, the proposed FDBPPSO method achieved the smallest objective function value (849.5388 $/h) and outperformed its competitors. For Case 1, the convergence graphs of the optimization algorithms are depicted in Fig. 24.3a. Upon examination, FDBPPSO appears to exhibit superior convergence performance compared to its competitors. The graph in Fig. 24.3b depicts that the voltage magnitudes of the PQ buses are within reasonable limits at the end of the optimization.
24.5 Results and Analysis
325 29 28
27 30
26
25 23
VSC-6
24
15
DC-6
19
18 VSC-5
17
20 21
14
DC-5
13
DC-4
16
22
12
10 11
9
VSC-4
3
1
4
8
6 VSC-2
7 DC-2
HVDC AC (132 kV) DC-1
AC (33 kV)
VSC-1
2
DC-3
VSC-3
5
Fig. 24.2 Modified IEEE-30 bus test system
Case 2: Minimizing of voltage deviation Voltage deviation is known as one of the most significant security indexes in electric grids. Considering this fact, we chose voltage deviation minimization as the objective function for Case 2. The simulation results of FDBPPSO and other methods are reported in Table 24.7. It is seen from Table 24.7 that FDBPPSO provided the best objective function value (0.00093 p.u), followed by WOA (0.00237 p.u), BMO (0.00328 p.u), and GWO (0.0511 p.u.). The convergence curves are given in Fig. 24.4a for Case 2 illustrate that the proposed FDBPPSO method converges to the global optimum faster than its competitors. From the graph in Fig. 24.4b, it is
326
24 Improved Phasor Particle Swarm Optimization with Fitness …
Table 24.6 The simulation results of the proposed FDBPPSO and other methods for case 1 Parameters
Min
Max
Case 1 BMO
WOA
GWO
FDBPPSO
PT hg1
50
200
199.9929
200.0000
192.9495
200.0000
PT hg2
20
80
49.4833
44.3166
41.5159
46.1500
PT hg5
15
50
19.9218
16.0500
15.2040
19.9949
PT hg8
10
35
10.0000
13.2623
14.4472
10.0003
PT hg11
10
30
10.0781
15.2679
23.5277
10.4446
PT hg13
12
40
12.0000
12.0731
12.8309
12.2521
VT hg1
0.95
1.10
0.9990
1.0506
0.9990
1.0624
VT hg2
0.95
1.10
0.9703
1.0539
0.9637
1.0409
VT hg5
0.95
1.10
0.9500
1.0470
0.9508
1.0170
VT hg8
0.95
1.10
1.0482
0.9552
1.0098
VT hg11
0.95
1.10
1.0526
1.0653
1.0091
VT hg13
0.95
1.10
1.0525
1.0272
1.0589
T11
0.90
1.10
0.9000
1.0227
0.9021
0.9769
T12
0.90
1.10
0.9112
1.0414
0.9920
0.9307
T15
0.90
1.10
0.9867
1.0117
0.9117
1.0163
T36
0.90
1.10
0.9861
0.9753
0.9547
0.9614
Q S H10
0
5
4.4714
0.6135
1.1147
4.8850
Q S H12
0
5
4.9982
3.6201
1.3383
2.9473
Q S H15
0
5
3.0571
4.0212
0.6524
Q S H17
0
5
0.0146
4.1880
0.6481
Q S H20
0
5
4.9798
3.8526
2.0861
3.7228
Q S H21
0
5
0
3.4556
1.1754
3.4163
Q S H23
0
5
1.2399
3.5234
1.5642
1.4000
Q S H24
0
5
4.8577
1.7158
0.2424
4.54440
Q S H29
0
5
0
0.4307
1.8089
1.33575
Q conv1
−100
100
−0.1208
−9.0874
0.2810
1.26061
0.9500 1.0693 0.9779
4.9999 2.5877
Q conv4
−100
100
−32.3940
40.3873
Vconv2
0.9
1.1
0.9531
1.0456
0.9556
1.01602
Vconv3
0.9
1.1
1.0918
1.0613
0.92580
0.98081
Vconv5
0.9
1.1
0.9825
1.0420
1.0147
1.04344
Vconv6
0.9
1.1
1.0439
1.0392
1.05856
1.05170
Pconv2
−100
100
12.4628
40.7967
Pconv3
−100
100
0
7.3728
39.2065
24.2880
Pconv5
−100
100
0
33.5993
6.4645
18.7693
Pconv6
−100
100
20.3393
8.5820
26.1934
10.2334
1.6360
18.7633
−2.6529
0.00124
(continued)
24.5 Results and Analysis
327
Table 24.6 (continued) Parameters
Min
Max
Case 1 BMO
WOA
GWO
FDBPPSO
Vdc1
0.9
1.1
1.0521
1.0464
0.9357
0.99824
Vdc4
0.9
1.1
0.9865
1.0341
0.9410
1.0027
Q T hg1
−20
150
14.3127
−19.9826
−7.8376
8.2285
Q T hg2
−20
60
−5.7967
53.4722
−17.0475
11.5183
Q T hg5
−15
62.5
36.2141
42.4478
26.4085
25.4762
Q T hg8
−15
48.7
23.2611
41.0596
29.9579
16.4272
Q T hg11
−10
40
16.3208
12.8270
16.1530
−8.9480
Q T hg13
−15
44.7
−3.7645
9.5661
3.0310
13.5789
PAC_loss
–
–
10.1712
6.3282
6.2639
6.8060
PDC_loss
–
–
0.2322
1.4439
1.6062
0.5888
PV SC_loss
–
–
7.6727
9.7979
9.2054
8.0474
860.1894
870.4174
849.5388
Total Cost ($/h)
859.0905
VD (p.u.)
0.0254
0.0319
0.0474
0.0251
Ploss (MW)
18.0761
17.5700
17.0755
15.4421
Fig. 24.3 a Convergence curves of optimization algorithms for case 1. b Voltage profile of PQ buses for case 1
understood that at the end of the optimization, all algorithms were able to keep the load bus voltage magnitude within acceptable limits. Case 3: Minimizing of active power loss of the power system. For Case 3, the minimization of active power losses was studied. The objective function values, as well as the control and state variables obtained by FDBPPSO and other optimization methods for this case, are given in Table 24.8. Accordingly, the objective function results obtained from the FDBPPSO, BMO, WOA and GWO methods
328
24 Improved Phasor Particle Swarm Optimization with Fitness …
Table 24.7 The simulation results of the proposed FDBPPSO and other methods for case 2 Parameters
Min
Max
Case 2 BMO
WOA
GWO
FDBPPSO 170.6725
PT hg1
50
200
93.7364
175.4675
131.7320
PT hg2
20
80
78.2865
51.6765
66.4322
41.3559
PT hg5
15
50
49.8320
18.7954
40.6662
26.7796
PT hg8
10
35
34.3244
11.4693
34.4134
23.6964
PT hg11
10
30
30.0000
23.1615
14.1080
18.6264
PT hg13
12
40
12.2231
17.7803
16.4271
18.0377
VT hg1
0.95
1.10
1.05665
1.0155
1.0121
1.0022
VT hg2
0.95
1.10
1.03706
1.0133
1.0172
0.9879
VT hg5
0.95
1.10
0.9995
1.006
1.02176
1.0110
VT hg8
0.95
1.10
1.0017
1.0122
1.0076
1.0060
VT hg11
0.95
1.10
1.0599
1.0134
1.0312
1.0265
VT hg13
0.95
1.10
0.9977
1.0015
1.0193
1.0303
T11
0.90
1.10
1.0578
1.0097
0.9359
0.9912
T12
0.90
1.10
0.9000
0.9000
1.0235
0.9517
T15
0.90
1.10
0.9000
0.9834
0.96821
1.0518
T36
0.90
1.10
0.9000
1.0103
1.0231
0.9834
Q S H10
0
5
0.0000
2.6410
4.9329
3.1050
Q S H12
0
5
0.0030
3.5701
3.8085
2.7841
Q S H15
0
5
0.0000
1.0425
1.1280
2.8264
Q S H17
0
5
1.3177
0.5496
1.3613
2.8303
Q S H20
0
5
5.0000
1.3900
4.9801
2.1131
Q S H21
0
5
2.4816
1.1458
0.2295
4.9954
Q S H23
0
5
4.8750
2.0590
2.7170
4.2112
Q S H24
0
5
3.1321
0.5859
1.8095
4.7358
Q S H29
0
5
4.7229
0.6054
1.7418
2.1394
Q conv1
−100
100
−18.4760
−4.95373
−6.2712
−18.0426 −4.6720
Q conv4
−100
100
−16.0217
−20.7531
−24.3799
Vconv2
0.9
1.1
0.9990
1.0105
1.0050
0.9998
Vconv3
0.9
1.1
0.9436
1.0087
0.9838
0.9908
Vconv5
0.9
1.1
1.0007
1.0107
1.0197
1.0077
Vconv6
0.9
1.1
0.9650
1.0123
1.0176
0.9972
Pconv2
−100
100
0
18.4849
19.4646
0.9666
Pconv3
−100
100
0.3715
27.7554
60.7262
5.0572
Pconv5
−100
100
34.7691
21.8985
52.5130
14.0797
Pconv6
−100
100
3.2650
10.9729
27.9301
6.152E−10 (continued)
24.5 Results and Analysis
329
Table 24.7 (continued) Parameters
Min
Max
Case 2 BMO
WOA
GWO
FDBPPSO
Vdc1
0.9
1.1
1.0102
1.0037
1.0158
1.0015
Vdc4
0.9
1.1
1.0070
1.0029
1.0152
1.0027
Q T hg1
−20
150
61.2742
−15.7681
1.0239
−16.9297 −17.0454
Q T hg2
−20
60
38.6908
29.7951
45.4116
Q T hg5
−15
62.5
16.9290
31.4677
31.0546
60.2549
Q T hg8
−15
48.7
38.7508
37.3176
31.9744
47.2704
Q T hg11
−10
40
30.2134
Q T hg13
−15
44.7
−11.7476
PAC_loss
−
−
4.0099
4.3907
1.7679
7.6516
−4.9848
2.6381
18.6451
5.1409
2.3446
8.1810
PDC_loss
−
−
0.5446
0.9611
4.0884
0.1074
PV SC_loss
−
−
10.4481
8.8489
13.9461
7.4804
872.7440
936.2459
883.2452
Total Cost ($/h)
1009.0000
VD (p.u.)
0.00328
Ploss (MW)
15.0026
(a)
0.00237 14.9509
0.00511 20.3792
0.00093 15.7688
(b)
Fig. 24.4 a Convergence curves of optimization algorithms for Case 2 b Voltage profile of PQ buses for Case 2
were 9.4197 MW, 10.1854 MW, 10.1757 MW, and 13.4982 MW, respectively. In other words, the objective value of the FDBPPSO was 0.7657 MW, 0.7560 MW, and 4.0785 MW lower than those of the other optimization algorithms. Figure 24.5a, b depict convergence curves of all optimization algorithms for Case 3 and voltage profile of PQ buses, respectively. When the convergence curves are analyzed in depth, it is seen that the FDBPPSO is superior to other algorithms in terms of convergence speed and accuracy indices. The voltage profile given in Fig. 24.5b depicts that the load bus voltage magnitudes are in the acceptable range.
330
24 Improved Phasor Particle Swarm Optimization with Fitness …
Table 24.8 The simulation results of the proposed FDBPPSO and other methods for case 3 Parameters
Min
Max
Case 3 BMO
WOA
GWO
FDBPPSO
PT hg1
50
200
61.1380
74.4153
148.9122
59.8700
PT hg2
20
80
80.0000
64.1604
30.9626
78.1255
PT hg5
15
50
50.0000
50.0000
29.8890
49.9981
PT hg8
10
35
34.9999
35.0000
33.5517
34.8260
PT hg11
10
30
30.0000
30.0000
13.8404
29.9999
PT hg13
12
40
37.4473
40.0000
39.7421
40.0000
VT hg1
0.95
1.10
0.9803
0.9937
1.0607
1.0412
VT hg2
0.95
1.10
0.9671
0.9929
1.0374
1.0379
VT hg5
0.95
1.10
0.9500
0.9932
1.0184
1.0233
VT hg8
0.95
1.10
0.9500
0.9937
1.0342
1.0294
VT hg11
0.95
1.10
1.0901
0.9875
0.9790
1.0016
VT hg13
0.95
1.10
1.0362
0.9937
1.0614
1.0140
T11
0.90
1.10
0.9000
0.9523
1.0947
0.9954
T12
0.90
1.10
0.9000
0.9856
0.9097
0.9647
T15
0.90
1.10
0.9774
0.9918
1.0071
0.9942
T36
0.90
1.10
0.9140
0.9824
0.9401
1.0191
Q S H10
0
5
0.3691
1.3811
2.7010
2.2426
Q S H12
0
5
2.3009
0.2999
1.9786
2.9369
Q S H15
0
5
1.2038
0.6809
2.3630
2.9507
Q S H17
0
5
0.3801
1.9342
1.3644
5.0000
Q S H20
0
5
2.3929
0.2683
2.4032
4.8413
Q S H21
0
5
0.0000
2.7487
3.4520
2.6927
Q S H23
0
5
4.9246
0.1328
3.7542
4.4833
Q S H24
0
5
2.8622
2.0030
4.8207
4.9289
Q S H29
0
5
0.2989
0.5018
3.3871
0.2980
Q conv1
−100
100
−14.0579
−4.2264
−32.0420
7.4095
−4.0593
Q conv4
−100
100
4.9022
0.1273
4.4801
Vconv2
0.9
1.1
0.9542
0.9873
1.0328
1.0269
Vconv3
0.9
1.1
0.9000
0.9822
0.9915
0.9085
Vconv5
0.9
1.1
1.0182
0.9884
1.0342
1.01647
Vconv6
0.9
1.1
1.0258
0.9857
1.0339
1.0171
Pconv2
−100
100
8.3782
6.8467
30.4172
2.1975
Pconv3
−100
100
11.7843
22.1708
15.3908
Pconv5
−100
100
3.8610
10.6385
7.3768
2.9626
Pconv6
−100
100
0.0000
9.6509
14.0578
8.4116
16.234
(continued)
24.5 Results and Analysis
331
Table 24.8 (continued) Min
Parameters
Max
Case 3 BMO
WOA
GWO
FDBPPSO
Vdc1
0.9
1.1
1.0674
0.9804
1.0547
1.0091
Vdc4
0.9
1.1
0.9831
0.9702
1.0795
1.0415
Q T hg1
−20
150
5.2945
11.4896
12.5587
−10.7845
Q T hg2
−20
60
8.8185
10.6950
28.0259
2.6928
Q T hg5
−15
62.5
20.5198
35.3566
18.1491
20.9474
Q T hg8
−15
48.7
16.0257
44.5070
32.6813
32.5559
Q T hg11
−10
40
20.5708
−1.7769
−5.1795
Q T hg13
−15
44.7
12.3271
−2.8219 3.2604
16.1323
−4.0189
PAC_loss
−
−
2.7939
2.2150
4.0148
2.0460
PDC_loss
−−
−
0.1933
0.2516
0.8565
0.1870
PV SC_loss
−
−
7.1982
7.7091
8.6269
7.1867
1043.7000
1033.3000
909.1696
1044.6000
Total cost ($/h) VD (p.u.) Ploss (MW)
0.0328
0.0181
0.0423
0.0111
10.1854
10.1757
13.4982
9.4197
Fig. 24.5 a Convergence curves of optimization algorithms for Case 3 b Voltage profile of PQ buses for Case 3
24.5.2.3
Statistical Analysis
The minimum, mean, maximum and standard deviation values obtained with different algorithms in 25 runs are reported in Table 24.9. For Case 1, based on the minimum and standard deviation indices, it can be said that the FDBPPSO method achieves better minimum values compared to its competitors and can explore the search space stably. When the simulation results obtained by the algorithms for Case 2 are analyzed
332
24 Improved Phasor Particle Swarm Optimization with Fitness …
Table 24.9 Simulation results of optimization algorithms for different test cases in 25 independent runs Case 1
Algorithms BMO
WOA
GWO
Case 3
Min
859.0905
0.0032
10.1854
Mean
579.1018
5.3339
14.9596
Max
921.9315
134.0239
32.2421
Std
17.6950
24.3662
4.5030
Min
860.1894
0.0023
10.1757
Mean
909.3961
30.5538
45.2332
Max
1193.1842
242.5893
239.9734
Std
76.4486
60.1879
59.4126
Min
870.4174
0.0051
13.4982
Mean Max FDBPPSO
Case 2
949.0404 1206.382
29.6914
31.3732
216.0894
142.3056
Std
76.4627
61.4100
28.3453 9.4197
Min
849.5388
0.0009
Mean
853.8392
0.0015
9.9352
Max
862.4463
0.0021
10.8805
2.9708
0.0003
0.3915
Std
in-depth, it is understood that the exploration and balanced search abilities of the proposed method are better than its competitors. In Case 3, BMO, WOA, and GWO methods were caught in local solution traps and premature converged. On the other hand, FBDPPSO was successful in exploring the promising solution candidates of the search space and converging to the global optimum in the neighborhood of these solutions. The box-plot graphs are given in Fig. 24.6 provide an opportunity to visually examine the search performance of algorithms for different test cases. It is clear from the box-plot charts that the FDBPPSO method exhibits a robust and stable search performance for all test cases. On the other hand, it was observed that the BMO, WOA, and GWO methods premature converged and could not obtain acceptable minimum, mean, maximum, and standard deviation margins.
24.6 Conclusions In this study, the FDB selection method was applied to develop the search performance of the PPSO algorithm. So, the PPSO algorithm was designed to be compatible with nature, and as a result, a powerful FDBPPSO method was proposed. A comprehensive study has been carried out to test the performance of the proposed algorithm. In the first part of the experimental studies, the performance of the algorithm in low-,
24.6 Conclusions
333
(a)
(b)
(c)
Fig. 24.6 Box-plot graphs of optimization algorithms for 25 independent runs: a Case 1, b Case 2, c Case 3.
middle-, and high dimensional search spaces was tested by using unimodal, multimodal, composition, and hybrid type benchmark problems in the CEC 2020 test suite. The data obtained from this experimental study were analyzed with Friedman and Wilcoxon, two of the most prestigious statistical analysis methods in the literature. Both statistical analysis methods showed that the proposed FDBPPSO method exhibits a more stable search performance compared to PPSO for challenging test problems in different search spaces. In the second part of the experimental studies, optimal power flow of the hybrid AC/DC power grids was realized, which is a popular real-world engineering problem. The performance of the proposed FDBPPSO, BMO, WOA, and GWO methods were tested on the modified IEEE 30-bus power system for various simulation cases including total generation cost with valve-point effect, voltage deviation, and power loss minimization. The simulation results showed that the proposed FDBPPSO method outperforms its competitors in finding high-quality feasible solutions to the OPF problem of hybrid AC/DC power networks.
334
24 Improved Phasor Particle Swarm Optimization with Fitness …
References 1. Wolpert DH, Macready WG (1995) No free lunch theorems for search, vol 10. Technical Report SFI-TR-95-02-010, Santa Fe Institute 2. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science, IEEE, pp 39–43. https://doi.org/10.1109/MHS.1995.494215 3. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. J Global Optim 39(3):459–471 4. Dorigo M, Di Caro G (1999) Ant colony optimization: a new meta-heuristic. In: Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), vol 2, IEEE, pp 1470–1477 5. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) Gsa: a gravitational search algorithm. Inf Sci 179(13):2232–2248 6. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 7. Mirjalili S (2015) Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl-Based Syst 89:228–249 8. Cheng M-Y, Prayogo D (2014) Symbiotic organisms search: a new metaheuristic optimization algorithm. Comput Struct 139:98–112 9. Salimi H (2015) Stochastic fractal search: a powerful metaheuristic algorithm. Knowl-Based Syst 75:1–18 10. Civicioglu P (2013) Backtracking search optimization algorithm for numerical optimization problems. Appl Math Comput 219(15):8121–8144 11. Askarzadeh (2016) A novel metaheuristic method for solving constrained engineering optimization problems: crow search algorithm. Comput Struct 169:1–12 12. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163– 191 13. Yadav et al (2019) Aefa: artificial electric field algorithm for global optimization. Swarm Evol Comput 48:93–108 14. Zhao W, Wang L, Zhang Z (2019) Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl-Based Syst 163:283–304 15. Li S, Chen H, Wang M, Heidari AA, Mirjalili S (2020) Slime mould algorithm: a new method for stochastic optimization. Futur Gener Comput Syst 111:300–323 16. Zhao W, Wang L, Zhang Z (2019) Supply-demand-based optimization: a novel economicsinspired algorithm for global optimization. IEEE Access 7:73182–73206 17. Hashim FA, Houssein EH, Mabrouk MS, Al-Atabany W, Mirjalili S (2019) Henry gas solubility optimization: a novel physics-based algorithm. Futur Gener Comput Syst 101:646–667 18. Faramarzi MH,Stephens B, Mirjalili S (2019) Equilibrium optimizer: a novel optimization algorithm, Knowl-Based Syst 105190 19. Zhao W, Wang L, Zhang Z (2019) Artificial ecosystem-based optimization: a novel natureinspired meta-heuristic algorithm. Neural Comput Appl 1–43 20. Faramarzi A, Heidarinejad M, Mirjalili S, Gandomi AH (2020) Marine predators algorithm: a nature-inspired metaheuristic. Expert Syst Appl 152:113377 21. Das B, Mukherjee V, Das D (2020) Student psychology-based optimization algorithm: a new population based optimization algorithm for solving optimization problems. Adv Eng Softw 146:102804 22. Ghasemi M, Akbari E, Rahimnejad A, Razavi SE, Ghavidel S, Li L (2019) Phasor particle swarm optimization: a simple and efficient variant of PSO. Soft Comput 23(19):9701–9718 23. Ho SY, Lin HS, Liauh WH, Ho SJ (2008) OPSO: orthogonal particle swarm optimization and its application to task assignment problems. IEEE Trans Syst Man, Cybernet-Part A: Syst Humans 38(2):288–298 24. Li C, Yang S, Nguyen TT (2011) A self-learning particle swarm optimizer for global optimization problems. IEEE Trans Syst Man Cybernet Part B (Cybernet) 42(3):627–646
References
335
25. Rahnamayan S, Tizhoosh HR, Salama MM (2008) Opposition-based differential evolution. IEEE Trans Evol Comput 12(1):64–79 26. Tanabe R, Fukunaga A (2013) Success-history based parameter adaptation for differential evolution. In: 2013 IEEE congress on evolutionary computation. IEEE, pp 71–78 27. Tanabe R, Fukunaga AS (2014) Improving the search performance of SHADE using linear population size reduction. In: 2014 IEEE congress on evolutionary computation (CEC). IEEE, pp 1658–1665 28. Mittal N, Singh U, Sohi BS (2016) Modified grey wolf optimizer for global engineering optimization. Appl Comput Intell Soft Comput 29. Luo Q, Zhang S, Li Z, Zhou Y (2016) A novel complex-valued encoding grey wolf optimization algorithm. Algorithms 9(1):4 30. Menesy AS, Sultan HM, Selim A, Ashmawy MG, Kamel S (2019) Developing and applying chaotic harris hawks optimization technique for extracting parameters of several proton exchange membrane fuel cell stacks. IEEE Access 8:1146–1159 31. Abaza A, Fawzy A, El-Sehiemy RA, Alghamdi AS, Kamel S (2021) Sensitive reactive power dispatch solution accomplished with renewable energy allocation using an enhanced coyote optimization algorithm. Ain Shams Eng J 12(2):1723–1739 32. Kahraman HT, Aras S, Gedikli E (2020) Fitness-distance balance (FDB): a new selection method for meta-heuristic search algorithms. Knowl-Based Syst 190:105169 33. Aras S, Gedikli E, Kahraman HT (2021) A novel stochastic fractal search algorithm with fitness-distance balance for global numerical optimization. Swarm Evol Comput 61:100821 34. Guvenc U, Duman S, Kahraman HT, Aras S, Katı M (2021) Fitness-distance balance based adaptive guided differential evolution algorithm for security-constrained optimal power flow problem incorporating renewable energy sources. Appl Soft Comput 108:107421 35. Kahraman HT, Bakir H, Duman S, Katı M, Aras S, Guvenc U (2021) Dynamic FDB selection method and its application: modeling and optimizing of directional overcurrent relays coordination. Appl Intell 1–36 36. Duman S, Kahraman HT, Guvenc U, Aras S (2021) Development of a Lévy flight and FDBbased coyote optimization algorithm for global optimization and real-world ACOPF problems. Soft Comput 25(8):6577–6617 37. Pinto RT, Rodrigues SF, Wiggelinkhuizen E, Scherrer R, Bauer P, Pierik J (2013) Operation and power flow control of multi-terminal DC networks for grid integration of offshore wind farms using genetic algorithms. Energies 6(1):1–26 38. Elattar EE, Shaheen AM, Elsayed AM, El-Sehiemy RA (2020) Optimal power flow with emerged technologies of voltage source converter stations in meshed power systems. IEEE Access 8:166963–166979 39. Abdul-hamied DT, Shaheen AM, Salem WA, Gabr WI, El-sehiemy RA (2020) Equilibrium optimizer based multi dimensions operation of hybrid AC/DC grids. Alex Eng J 59(6):4787– 4803 40. Shaheen AM, El-Sehiemy RA, Elsayed AM, Elattar EE (2021) Multi-objective manta ray foraging algorithm for efficient operation of hybrid AC/DC power grids with emission minimisation. IET Generat Transm Distrib 41. Elsayed AM, Shaheen AM, Alharthi MM, Ghoneim SS, El-Sehiemy RA (2021) Adequate operation of hybrid AC/MT-HVDC power systems using an improved multi-objective marine predators optimizer. IEEE Access 9:51065–51087 42. Shaheen AM, Elsayed AM, El-Sehiemy RA (2021) Optimal economic–environmental operation for AC-MTDC grids by improved crow search algorithm. IEEE Syst J 43. Duman S, Güvenç U, Sönmez Y, Yörükeren N (2012) Optimal power flow using gravitational search algorithm. Energy Convers Manage 59:86–95 44. Guvenc U, Bakir H, Duman S, Ozkaya B (2020) Optimal power flow using manta ray foraging optimization. In: The international conference on artificial intelligence and applied mathematics in engineering. Springer, Cham, pp 136–149 45. Imran M, Hashim R, Abd Khalid NE (2013) An overview of particle swarm optimization variants. Proc Eng 53:491–496
336
24 Improved Phasor Particle Swarm Optimization with Fitness …
46. Yue CT, Price KV, Suganthan PN, Liang JJ, Ali MZ, Qu BY, Awad NH, Biswas PP (2019) Problem definitions and evaluation criteria for the CEC 2020 special session and competition on single objective bound constrained numerical optimization, Tech. Rep., Zhengzhou University and Nanyang Technological University 47. Beerten J, Belmans R (2015) MatACDC-an open source software tool for steady-state analysis and operation of HVDC grids 48. Sulaiman MH, Mustaffa Z, Saari MM, Daniyal H (2020) Barnacles mating optimizer: a new bio-inspired algorithm for solving engineering optimization problems. Eng Appl Artif Intell 87:103330 49. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 50. Chaib AE, Bouchekara HREH, Mehasni R, Abido MA (2016) Optimal power flow with emission and non-smooth cost functions using backtracking search optimization algorithm. Int J Electr Power Energy Syst 81:64–77
Chapter 25
Development of an FDB-Based Chimp Optimization Algorithm for Global Optimization and Determination of the Power System Stabilizer Parameters Huseyin Bakir, Hamdi Tolga Kahraman, Seyithan Temel, Serhat Duman, Ugur Guvenc, and Yusuf Sonmez
25.1 Introduction The effective and powerful search performance of metaheuristic search (MHS) algorithms depends on two basic requirements: exploration and exploitation [1–4]. Exploration refers to the search for promising solution candidates in the search space, while exploitation is known as the intensification of the search around these solution candidates [5]. MHS algorithms can easily accomplish the exploitation task. Because searching in the neighborhood of a reference solution candidate is a relatively simple task. On the other hand, MHS algorithms have difficulties in fulfilling the exploration task and suffer from low diversity and premature convergence problems. Because there are many local solution traps in complex and non-convex search spaces. There is H. Bakir (B) Department of Electronics and Automation, Dogus Vocational School, Dogus University, 34775 Istanbul, Turkey e-mail: [email protected] H. T. Kahraman · S. Temel Software Engineering, Karadeniz Technical University, 61080 Trabzon, Turkey S. Duman Electrical Engineering, Engineering and Natural Sciences Faculty, Bandirma Onyedi Eylul University, 10200 Bandirma, Turkey U. Guvenc Electrical and Electronics Engineering, Engineering Faculty, Duzce University, 81620 Duzce, Turkey e-mail: [email protected] Y. Sonmez Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_25
337
338
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
no explicit way for convergence to the global optimum in such search spaces. Therefore, the effective selection of solution candidates that will guide the metaheuristic search process is critical for a powerful exploration and balanced search capability. Random selection of guide solution candidates randomizes the search process and results in low diversity. Thus, for effective diversity, reference solution candidates must be determined using a strong and stable selection method. Fitness-Distance Balance (FDB) is an effective and powerful selection method used to identify reference solution candidates that will most benefit the search process. The FDB-based selection mechanism calculates a score value for each solution candidate using the fitness value and the distance of the i-th solution candidate from the best solution. Accordingly, the solution candidate with the highest score guides the search process [6]. Power system stability is vital for the reliable and secure operation of modern power systems, and this is ensured by the optimal design of power system stabilizers (PSSs). Optimization of PSS parameters is a popular real-world engineering problem that power system researchers have been studied extensively. Recently, various intelligent optimization algorithms have been utilized for the optimal design of lead-lag PSS structures. Some of them are: differential evolution (DE) [7], tabu search algorithm (TSA) [8], cultural algorithm (CA) [9], simulated annealing (SA) [10], bacterial foraging optimization (BFO) [11], adaptive mutation breeder genetic algorithm (ABGA) [12], artificial bee colony (ABC) [13], culture-PSO-co evolutionary (CPCE) [14], bat algorithm (BA) [15], genetic algorithm (GA) [16], honey–bee algorithm (HBA) [17], ant colony optimization (ACO) [18], particle swarm optimization (PSO) [19], chaotic teaching learning-based optimization (CTLBO) [20], grey wolf optimization (GWO) [21], biogeography-based optimization (BBO) [22], backtracking search algorithm (BSA) [23], salp swarm algorithm (SSA) [24], cuckoo search optimization (CSO) [25], improved whale optimization algorithm (IWOA) [26], moth search algorithm (MSA) [27], farmland fertility algorithm (FFA) [28], slime mould algorithm (SMA) [29], hybrid grey wolf optimization-sine cosine algorithm (GWOSCA) [30], modified invasive weed optimization (MIWO) [31], Archimedes optimization algorithm (AOA) [32]. When the literature studies are analyzed in detail, it is seen that the general trend among researchers is to use MHS algorithms with strong exploration and balanced search capabilities to obtain better control parameters for the high performance of PSS. This paper proposes a hybrid metaheuristic algorithm based on the fitness-distance balance (FDB) and the chimp optimization algorithm (ChOA) for the solution of global optimization problems. The ChOA is a recently developed population-based metaheuristic algorithm that mimics the social behavior of chimps. The search performance of the base ChOA algorithm was tested using unimodal multimodal, hybrid, and composition type benchmark problems in the CEC 2020 test suite, and it was observed that the algorithm converged prematurely especially in multimodal type problems. The results of experimental studies carried out in different types of problems and search spaces revealed the necessity of developing the exploration and balanced search capabilities of ChOA. For this purpose, the hunting process of
25.2 Mathematical Formulation of Power System Stabilizer …
339
attacker and chaser chimps was redesigned using the FDB selection method. Moreover, the proposed FDBChOA algorithm has been applied to the optimization of PSS parameters, which is one of the most important power system problems. The main contributions of this study are summarized below: • A novel hybrid metaheuristic search algorithm called FDBChOA was proposed. • Comprehensive experimental study was realized. Experimental studies were carried out using unimodal, multimodal, hybrid, and composition type problems in CEC 2020 test suite, and considering different search spaces (D = 30, 50 and 100). • The proposed FDBChOA algorithm is applied to the optimization of the power system stabilizer parameters. • The simulation results showed that the proposed FDBChOA algorithm can produce optimum solutions with lower error and higher accuracy compared to the original ChOA algorithm. The remainder of the paper is designed as follows: Sect. 25.2 gives the mathematical formulation of power system stabilizer. In Sect. 25.3, the design steps of the proposed FDBChOA are introduced. Section 25.3 consists of three sub-sections. In these sub-sections, respectively, the FDB selection method, the overview of chimp optimization algorithm, and the proposed FDBChOA algorithm are explained. Section 25.4 introduces the settings referenced in experimental studies. In Sect. 25.5, the results of the experimental studies are summarized. Section 25.5 includes two sub-sections. The first sub-section gives the results of experimental studies performed to determine the best FDBChOA variant using the CEC 2020 benchmark test suite. The other sub-section is about the implementation of the proposed FDBChOA algorithm to optimize the power system stabilizer parameters. The paper ended with the evaluation of conclusions.
25.2 Mathematical Formulation of Power System Stabilizer Parameters Optimization Mechanical and electrical power balance, which is deteriorated as a result of any failure in power systems, causes changes in rotor speed and undesirable oscillations in the rotor angle of the synchronous generator. Generators are equipped with power system stabilizers (PSSs) to improve the damping of system oscillations. PSSs improve the damping of low-frequency oscillations associated with electromechanical modes and thus increase the power system stability limit [33]. In this study, the , power system stabilizer, in which the only generator speed deviation Δ ω is used as the input parameter, is considered. The output signal of the stabilizer V pss is added as an additional input to the generator excitation system.
340
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
25.2.1 Power System Model with PSS Structure Each machine in the power system is modeled with a set of nonlinear differential equations as shown below [24, 25]. X˙ = f (X, U )
(25.1)
]T [ , X = δ, ω, E q, , E f d
(25.2)
where, X and U are defined as state and input variables vectors, respectively. The mathematical models of the state variable vector X components are given in Eqs. (25.3–25.6) [24]. ( ) , , δ˙k = ω ω −1 s
(25.3)
k
)) (, ( ) ( ˙, , , ω = Pmk − E qk ∗ i dk ∗ i qk − Di ω −1 /Mk − xqk − xdk k
(25.4)
) ) , ( ( , , , ∗ i dk /Tdok = E f dk − E qk − xdk − xdk E˙ qk
(25.5)
)) ( ( E˙ f dk = −E f dk + K ak ∗ Vr e f,k − Vk + V pss,k /Tak .
(25.6)
,
,
s
k
where, ω is defined as synchronous speed, δk and ω represent rotor angle and rotor speed for k-th machine. Pmk , Di , and Mk are mechanic put power, damping coeffi, , , denote internal voltage behind xdk . E f dk and Tdok are cient, and inertia constant. E qk equivalent excitation voltage and open circuit transient time constant. K ak and Tak are defined gain and time constant for excitation system. i dk and i qk are stator currents , of q-axis and d-axis circuits. xqk and xdk show reactance of q-axis and d-axis. xdk is d-axis transient reactance. Vr e f,k and Vk indicate reference and terminal voltages, respectively. V pss,k is PSS output signal at k-th machine [24, 25]. The main goal of the PSS is to generate an additional signal to damping rotor oscillations of generator through the excitation system. This task is accomplished by three PSS components: gain, washout, and phase compensation. The function of the gain component is to provide a gain constant to damp out oscillations. The washout and compensation components act as a high pass filter and phase delay improvement in the system. In the light of these definitions, the PSS transfer function is expressed as follows [34]: V pss =
[ ] sTw (1 + sT1 ) (1 + sT3 ) , ∗ Δω Kp 1 + sTw (1 + sT2 ) (1 + sT4 )
(25.7)
where, K p is defined as PSS gain, Tw represents washout component time constant ,
and its value is generally set as 3. T1 −T4 are phase compensation time constants, Δ ω
25.2 Mathematical Formulation of Power System Stabilizer …
341 ,
is angular speed deviation. PSS input signal is generator speed deviation Δ ω and output signal is PSS voltage V pss . The output signal V pss is added as an additional input to the generator excitation system. The next sub-section provides detailed information about the objective functions and constraints used in this study.
25.2.2 Objective Functions and Constraints To demonstrate the effectiveness of the proposed approach, four different objective functions are considered. These are: integral of absolute error (IAE), integral of squared error (ISE), integral of time absolute error (ITAE), integral of time squared error (ITSE). Mathematically, the objective functions are defined as follows [35]: {Tsim |e(t)|dt I AE =
(25.8)
0
{Tsim I SE =
e2 (t)dt
(25.9)
0
{Tsim I T AE =
t|e(t)|dt
(25.10)
te2 (t)dt
(25.11)
0
{Tsim I T SE = 0
where, e(t) and Tsim are defined as error value and simulation time, respectively. In this study, the optimization of the PSS parameters is formulated as the minimization of the objective functions given in Eqs. (25.8–25.11), subject to various inequality constraints. The inequality constraints are as follows: 0.1 ≤ K p ≤ 100
(25.12)
0.2 ≤ T1 ≤ 2
(25.13)
0.02 ≤ T2 ≤ 0.2
(25.14)
In order to ensure stable operation of the system in the event of a fault in the power system, it is necessary to optimize the PSS parameters. For this, PSS parameters should be considered as control variables and optimized according to a selected objective function. The optimization process of PSS parameters is summarized in Fig. 25.1.
342
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
Fig. 25.1 The general concept of PSS parameter optimization problem
25.3 Method This section presents the design steps of the developed FDBChOA method. In this regard, three sub-sections have been prepared. In these sub-sections, the FDB selection method, the original ChOA algorithm and the proposed FDBChOA method are introduced, respectively.
25.3.1 Fitness-Distance Balance Selection Method The FDB is a powerful selection method developed by Kahraman et al. in 2020 [4]. The aim of the researchers to develop FDB was to effectively select the solution candidates to guide the search process, thereby eliminating the premature convergence and low diversity problems of MHS algorithms. The FDB selection mechanism computes a score value for each solution candidate using the fitness value and the distance of the i-th solution candidate (xi ) from the best solution (xbest ). Accordingly, the solution candidate with the highest score guides the search process [6, 36]. In order to apply the FDB selection method the following steps should be followed [6]: i. Assume that z and k are represent the population size and number of design variables, respectively. Accordingly, the population (P) and fitness ( f ) vectors are represented as follows: ⎡
x11 · · · ⎢ .. . . P=⎣ . . x z1 · · ·
⎤ x1d .. ⎥ . ⎦ x zk
(25.15) zxk
25.4 Overview of Chimp Optimization Algorithm
⎡ ⎢ ⎢ ⎢ f =⎢ ⎢ ⎣
343
⎤ f1 . ⎥ ⎥ ⎥ . ⎥ ⎥ . ⎦ f z zx1
(25.16)
ii. The distance of the i-th solution candidate from the xbest (best solution) is calculated as given in Eq. (25.17). z i=1 ∀x i ,
Dx,i =
/ (
xi[1] − xbest[1]
)2
( ( )2 )2 + xi[2] − xbest[2] + . . . + xi[k] − xbest[k] (25.17)
iii. The distance values of the solution candidates calculated in the previous step are represented by the Dx vector as given in Eq. (25.18). ⎤ D1 ⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ Dx ≡ ⎢ . ⎥ ⎢ ⎥ ⎣ . ⎦ Dz zx1 ⎡
(25.18)
iv. In this step, the FDB score values of the solution candidates are calculated. For this, normalized fitness ( f ) and distance (Dx ) values of the relevant solution candidates are used. Also, the w coefficient was used to adjust the effect of fitness and distance values in the FDB score calculation. z i=1 ∀x i ,
Sx[i] = w∗nor m f [i] + (1 − w)∗nor m Dx[i] .
(25.19)
v. The FDB score values of the solution candidates are represented by the Sx vector. Finally, the solution candidate with the highest FDB score is selected as the solution to guide the search process. ⎤ s1 ⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ Sx ≡ ⎢ . ⎥ ⎢ ⎥ ⎣ . ⎦ sz zx1 ⎡
(25.20)
25.4 Overview of Chimp Optimization Algorithm The chimp optimization algorithm (ChOA) [37] is a recently developed metaheuristic method inspired by the individual intelligence and sexual motivation of chimps in their group hunting. In the optimization process, ChOA uses four types of search
344
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
agents entitled attacker, barrier, chaser, and driver. Moreover, four primary steps of hunting, namely driving, chasing, attacking, and blocking are implemented. The mathematical model of the ChOA algorithm has been explained as following step by step [37]: The steps of hunting, called driving and chasing, are mathematically modeled by Eqs. (25.21) and (25.22). | | d = |cx pr ey (t) − mxchimp (t)|
(25.21)
xchimp (t + 1) = x pr ey (t) − a ∗ d
(25.22)
where, x pr ey and xchimp are defined as vectors of prey and chimp positions, respectively. a, m and c are show coefficient vectors and calculated as follows: a = 2 f r1 − f
(25.23)
c = 2r2
(25.24)
m = Chaotic_value
(25.25)
where, f is reduced from 2.5 to 0 for the throughout next generations. r1 and r2 indicate random vectors between 0 and 1. m represent chaotic vector. In the attacking step, the behavior of chimps is simulated. Here, the four best search agents available namely attacker, barrier, chaser, and driver are used as reference solution candidates to mathematically model the behavior of chimps [38]. Other chimpanzees update their positions based on the guide solution candidates (attacker, barrier, chaser, driver). These concepts modelled as follows: dattacker = |c1 xattacker (t) − m 1 x(t)|
(25.26)
dbarrier = |c2 xbarrier (t) − m 2 x(t)|
(25.27)
dchaser = |c3 xchaser (t) − m 3 x(t)|
(25.28)
ddri ver = |c4 xdriver (t) − m 4 x(t)|
(25.29)
If random vectors are lie between the range of [−1, 1], then the next location of a chimp can be in any location between its current location and the location of the target or prey: x1 (t + 1) = xattacker (t) − a1 ∗ dattacker
(25.30)
25.4 Overview of Chimp Optimization Algorithm
345
x2 (t + 1) = xbarrier (t) − a2 ∗ dbarrier
(25.31)
x3 (t + 1) = xchaser (t) − a3 ∗ dchaser
(25.32)
x4 (t + 1) = xdriver (t) − a4 ∗ ddriver
(25.33)
The position of chimps throughout the search process lifecycle is updated using Eq. (25.34). xChimp (t + 1) =
x1 + x2 + x3 + x4 4
(25.34)
Finally, it is accepted that during the metaheuristic search process, chimps have a 50% probability of choosing between the classical and chaotic update mechanisms to update their position. This situation is modelled in Eq. (25.35). { xchimp (t + 1) =
x pr ey (t) − a ∗ di f ϕ < 0.5 Chaoticvalue i f ϕ ≥ 0.5
(25.35)
25.4.1 Proposed FDBChOA Algorithm This study aims to eliminate the premature convergence and low-diversity problems of the ChOA algorithm by using the FDB selection method designed to improve the search performance of nature-based MHS algorithms. Therefore, a new hybrid algorithm called FDBChOA was developed by adapting the FDB selection method to the ChOA algorithm. For the design of the proposed FDBChOA algorithm, four different variants were created. The performance of these variants was tested by statistical analysis methods and finally, the most successful variant was named FDBChOA. The Case-1 variant consists of using FDB-selected x F D B instead of xattacker in Eq. (25.26). The application rate of FDB for this variant is 70%. Case-2 and the Case-3 variant have similar features to Case-1, but the rate of application of FDB is different. The implementation rate of FDB for these two variants is 90% and 50%, respectively. In the Case-4 variant, instead of the xchaser chimp used in Eq. (25.28), the x F D B solution candidate who can make the most contribution to the search process is utilized. The implementation rate of the FDB method for the Case 4 variant is 80%. The mathematical model of the FDB-based ChOA variants is reported in Table 25.1. The pseudo-code of proposed FDBChOA algorithm is given in Algorithm 1. Algorithm 1 Pseudo-code of proposed FDBChOA algorithm 1.
Input: Chimp population size (z), Number of design parameters (k), maxFEs, xmin and xmax (continued)
346
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
(continued) Algorithm 1 Pseudo-code of proposed FDBChOA algorithm 2.
Output: xattacker
3.
Begin
4.
Initialize f , m, a and c
5.
// Create the initial population and evaluate fitness //
6.
P: randomly create a population of chimps as given in Eq. (25.15)
7.
for i = 1: z (chimp population size) do
8.
f : evaluate the fitness for each particle as given in Eq. (25.16)
9.
end
10.
xattacker = thebestsear chagent
11.
xbarrier = thesecondbestsear chagent
12.
xchaser = thethir dbestsear chagent
13.
xdri ver = the f our thbestseachagent
14.
Divide chimps randomly into independent groups
15.
// Metaheuristic search process //
16.
while search process lifecycle: up to termination criteria (maxFEs) do
17.
for i = 1: z do
18.
Extract the chimp’s group
19.
Use its group strategy to update f , m and c, then calculate a
20
Calculate d
21.
Implementation of FDB selection method
22.
for i = 1: z do
23.
Calculate Euclidean distance of each chimp using Eq. (25.17)
24.
Calculate FDB score for each chimp using Eq. (25.19)
25.
end for
26.
Create D X dnd S X vectors as given in Eq. (25.18) and Eq. (25.20)
27.
Determine x F D B based on FDB philosophy (Eq. 25.29)
28.
Calculate dattacker using Eq. (25.36) // Case-1 //
29.
Calculate dattacker using Eq. (25.38) // Case-2 //
30.
Calculate dattacker using Eq. (25.40) // Case-3 //
31.
Calculate dchaser using Eq. (25.42) // Case-4 //
32.
Calculate dbarrier using Eq. (25.27)
33.
Calculate ddriver using Eq. (25.29)
34.
end for
35.
for i = 1: z do
36.
if (ϕ < 0.5)
37.
if (|a| < 1)
38.
Update attacker position using Eq. (25.37) // Case-1 // (continued)
25.5 Experimental Settings
347
(continued) Algorithm 1 Pseudo-code of proposed FDBChOA algorithm 39.
Update attacker position using Eq. (25.39) // Case-2 //
40.
Update attacker position using Eq. (25.41) // Case-3 //
41.
Update chaser position using Eq. (25.43) // Case-4 //
42.
Update barrier position using the Eq. (25.31)
43.
Update driver position using the Eq. (25.33)
44.
Update xChimp (t + 1) using Eq. (25.34)
45.
else if (|a| > 1)
46.
Select a random search agent
47.
end if
48.
else if (ϕ > 0.5)
49.
Update xChimp (t + 1) using Eq. (25.35)
50.
end if
51.
end for
52.
Update f , m, a and c
53.
Update xattacker , xbarrier , xchaser and xdriver
54.
end while
55.
return xattacker
25.5 Experimental Settings This section gives the experimental settings applied to test the performance of the ChOA and FDBCHOA variants. In this study, the test conditions of IEEE Congress on Evolutionary Computation (IEEE CEC) [39, 40] conferences are considered for the analysis and verification of algorithm performances. The experimental settings considered in this paper are as follows: • The maximum number of objective function evaluations (maxFEs) was considered as the stopping criterion (maxFEs = 10000*dimension). • In the experimental studies, unimodal, multimodal, hybrid, and composition test problems in the CEC 2020 test suite were used. • For each test function, 51 independent runs were conducted. • The search performance of the algorithms in 30, 50 and 100 dimensions was examined. • Parameters defined in the original articles of MHS algorithms were used. • Experimental studies were conducted on Intel (R) Core™ i5-1135G7U CPU @ 2.40 GHz, 16 GB RAM, and an × 64-based processor.
348
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
Table 25.1 Mathematical model of the proposed FDBChOA Variants
Original ChOA
Proposed FDBChOA
Case-1 (70% dattacker = |c1 xattacker (t) − m 1 x(t)| FDB) x1 (t + 1) = xattacker (t) − a1 ∗ |c1 xattacker (t) − m 1 x(t)| (25.36)
dattacker = |c1 x F D B (t) − m 1 x(t)| (25.36) x1 (t + 1) = xattacker (t) − a1 ∗ |c1 x F D B (t) − m 1 x(t)| (25.37)
Case-2 (90% dattacker = |c1 xattacker (t) − m 1 x(t)| FDB) x1 (t + 1) = xattacker (t) − a1 ∗ |c1 xattacker (t) − m 1 x(t)| (25.38)
dattacker = |c1 x F D B (t) − m 1 x(t)| (25.38) x1 (t + 1) = xattacker (t) − a1 ∗ |c1 x F D B (t) − m 1 x(t)| (25.39)
Case-3 (50% dattacker = |c1 xattacker (t) − m 1 x(t)| FDB) x1 (t + 1) = xattacker (t) − a1 ∗ |c1 xattacker (t) − m 1 x(t)| (25.40)
dattacker = |c1 x F D B (t) − m 1 x(t)| (25.40) x1 (t + 1) = xattacker (t) − a1 ∗ |c1 x F D B (t) − m 1 x(t)| (25.41)
Case-4 (80% dchaser = |c3 xchaser (t) − m 3 x(t)| FDB) x3 (t + 1) = xchaser (t) − a3 ∗ |c3 xchaser (t) − m 3 x(t)|
dchaser = |c3 x F D B (t) − m 3 x(t)| (25.42) x3 (t + 1) = xchaser (t) − a3 ∗ |c3 x F D B (t) − m 3 x(t)| (25.43)
25.6 Results and Analysis This section contains a comprehensive analysis of data obtained from experimental studies. • In the first sub-section, the search performance of the ChOA and FDBCHOA variants (Case-1,…,Case-4) was analyzed using the CEC 2020 test suite problems. The data obtained from experimental studies carried out in 30, 50 and 100 dimensions using unimodal, multimodal, hybrid, and composition test problems were tested using Friedman and Wilcoxon statistical analysis methods and the most effective FDBChOA variant was determined. • In the second sub-section, the proposed algorithm is applied to the optimization of PSS parameters, which is a popular real-world engineering problem, and the obtained results are compared with other well-known optimization methods.
25.6.1 Determining the Best FDBPPSO Variant on CEC 2020 Benchmark Test Suite This section summarizes the experimental studies performed to identify the most effective FDBChoA variant. For this purpose, firstly, statistical analysis results are presented. Then, the convergence performance of the algorithms for different test problems was examined using box-plot graphs.
25.6 Results and Analysis
349
Table 25.2 Friedman test ranking of ChOA and FDBChOA variants Algorithms
Dimension = 30
Dimension = 50
Dimension = 100
CEC 2020
CEC 2020
CEC 2020
Mean Rank 2.75
Case-2
2.83
2.81
2.62
Case-3
2.74
2.83
2.71
2.76
Case-1
3.15
2.77
2.82
2.91
Case-4
3.01
2.81
3.12
2.98
ChOA
3.30
3.82
3.75
3.62
25.6.1.1
Statistical Analysis
In this sub-section, the search performance of the base ChOA algorithm and FDBChOA variants namely, Case-1, Case-2, Case-3, and Case-4 was examined. The data 10*3*51*5 = 7650 (test function number, dimension, independent runs, number of competing algorithms) obtained from the experimental studies were statistically analyzed with the Friedman method, which is used to rank the competing algorithms according to their performance. The Friedman test ranking of ChOA and FDBChOA variants is reported in Table 25.2. Upon examination, it is seen that the FDB-based Case-3 variant achieved the best ranking in 30-dimensional experiments. On the other hand, the Case-2 variant achieved a better ranking in high-dimensional search spaces compared to its competitors. Based on the ‘’Mean Rank” performance indicator in the last column of Table 25.2, it is possible to say that Case-2 is the most successful variation among the FDB-based CHOA variants. Briefly, it has been statistically verified that the Case-2 variant exhibits stable and robust search performance. Wilcoxon pairwise comparison results between ChOA and FDBChOA variants are given in Table 25.3. When the results are analyzed in depth, it is seen that FDBChOA variants outperform the base algorithm in 11 of 12 experimental studies. The numerical data given in each cell demonstarte the number of test problems that the ChOA lost against its competitor, the number of test problems in which the two competing algorithms performed similarly, and the number of test problems in which the ChOA was superior to its competitor, respectively. According to the score given as 4/6/0 in the second cell (Case-3/Dimension = 30) of Table 25.3, the ChOA algorithm lost against Case-3 in four of 10 problems, the two algorithms performed similarly in six problems. The score values given in Table 25.3 proved that the variants equipped with the FDB-based selection mechanism had a clear advantage over the ChOA algorithm.
25.6.1.2
Convergence Analysis
In this sub-section, the convergence performance of ChOA and FDBChOA variants for different problem types is analyzed. For this purpose, box-plot graphs were
350
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
Table 25.3 Wilcoxon pairwise comparison results between ChOA and FDBChOA variants Versus ChOA
Dimension = 30
Dimension = 50
Dimension = 100
+ / = /−
CEC 2020
CEC 2020
CEC 2020
Case-2
2/8/0
5/5/0
5/5/0
Case-3
4/6/0
6/4/0
6/4/0
Case-1
2/8/0
5/4/1
5/5/0
Case-4
1/8/1
6/4/0
5/5/0
drawn for unimodal (F1), multimodal (F3), hybrid (F6) and composition (F9) type problems selected from the CEC 2020 test suite. Figure 25.2 depicts the box-plot graphs of the algorithms for these problem types. Upon examination, it is seen that for the F1 unimodal problem, the FDBChOA variants in all dimensions converge to a lower error value compared to the base algorithm. Figure 25.2 (d, e, f) shows that in the F3 multimodal test problem with many local solution traps, the base algorithm has difficulty converging to the minimum error value due to the premature convergence problem. Analyzation of the F6 (hybrid-type) problem (Fig. 25.2 g, h, i) shows that the Case-2 variant was able to converge to a minimum error value in a stable manner. In Fig. 25.2 (j, k, l), the box-plot charts for the F9 (composition-type) problem of the algorithms are given. Accordingly, it is clearly seen that the Case-1 variant can provide a strong exploration–exploitation balance. In summary, box-plot plots revealed that the ChOA algorithm suffers from premature convergence and low diversity problems. On the other hand, it was observed that FDBChOA variants designed using the FDB selection method exhibited a stable and robust search performance.
25.6.2 Application of the Proposed FDB- Based Chimp Optimization Algorithm for Power System Stabilizer Parameters Optimization In this section, the proposed FDBChOA, Lévy flight distribution (LFD) [41], and dynamic differential annealed optimization (DDAO) [42] algorithms are applied to optimizing power system stabilizer parameters. The simulation studies have been conducted in the MATLAB/ Simulink. K p , T1 and T2 parameters were considered as control variables. T3 and T4 parameters are assumed to be equal to optimized T1 and T2 , respectively. The performance and effectiveness of the optimization algorithms were tested using the IAE, ISE, ITAE, and ITSE performance indicators. In order to ensure fairness between the optimization algorithms, the population size and the maximum number of iterations were accepted as 30 and 100, respectively.
(e) F3 (Multimodal) D = 50
(d) F3 (Multimodal) D = 30
Fig. 25.2 Box-plot charts for CEC 2020 benchmark problems
(b) F1 (Unimodal) D = 50
(a) F1 (Unimodal) D = 30
(f) F3 (Multimodal) D = 100
(c) F1 (Unimodal) D = 100
25.6 Results and Analysis 351
(k) F9 (Composition) D = 50
(j) F9 (Composition) D = 30
Fig. 25.2 (continued)
(h) F6 (Hybrid) D = 50
(g) F6 (Hybrid) D = 30
(l) F9 (Composition) D = 100
(i) F6 (Hybrid) D = 100
352 25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
25.6 Results and Analysis
353
R
R
X
X
Fig. 25.3 Single machine-infinitive bus test system
The simulation studies were carried out in the power system with a synchronous machine connected to the infinite bus as shown in Fig. 25.3. Power system data taken from Ref. [43]. Simulation studies were conducted considering the following conditions: • A three-phase short-circuit fault has occurred in one of the parallel lines at between 0.6 and 0.78 s. • The fault is cleared between 0.78 and 0.87 s and then the system is returned to the pre-fault configuration. • The simulation time is 10 s (Fig. 25.3). Table 25.4 presents the optimal tuned PSS parameters by proposed FDBChOA, LFD, and DDAO algorithms, as well as the objective function values. In this table, the best value obtained by the algorithms for different performance indicators is marked in bold. The lower objective function value obtained for the IAE, ISE, ITAE, and ITSE performance indicators shows that it provides better system response in terms of time-domain properties such as faster damping of oscillations and minimum overshoot. As it is seen in Table 25.4, the objective function values of the FDBChOA, LFD, and DDA algorithms for the IAE performance index are 6.43010, 6.99320, and 6.63250, respectively. In other words, the proposed algorithm offered the minimum objective function value of 6.43010, which is lower by 8.0521% and 3.0516% than simulation results obtained from LFD and DDAO, respectively. Figure 25.4 depicts rotor angle, speed deviation, terminal voltage, electrical power, and convergence graphs for the IAE performance indicator. It can be seen from the figure that the proposed algorithm provides faster damping and minimum overshoot to lowfrequency oscillations in the rotor of the synchronous machine. The convergence curves given for the IAE objective function in Fig. 25.4 (i) show that the proposed algorithm is superior to its competitors in terms of convergence accuracy. Upon examination of Table 25.4, regardful to the ISE performance indicator, it is clear that FDBChOA obtained the best objective function value (0.00025), which is followed by DDAO and LFD, respectively. The time-domain simulation results for the ISE objective function are shown in Fig. 25.5. It can be seen from the figure that the
354
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
Table 25.4 Optimal PSS parameters tuned by FDBChOA, LFD and DDAO algorithms Method
Objective function
Kp
T1
T2
FDBChOA
IAE
4.7485
0.2349
0.0517
LFD
DDAO
ISE
3.0354
0.2000
0.0336
ITAE
6.9658
0.2301
0.0591
ITSE
6.3832
0.2069
0.0484
Objective function value 6.43010 0.00025 30.9947 0.00148
IAE
2.0020
0.2343
0.0200
6.99320
ISE
2.2292
0.2000
0.0200
0.00027
ITAE
3.3065
0.2000
0.0200
34.8351
ITSE
7.8947
0.1999
0.2000
0.00160
IAE
0.1171
1.3547
0.0640
6.63250
ISE
3.5514
0.2185
0.0558
0.00026
ITAE
0.4994
0.9034
0.0913
ITSE
2.3126
0.4242
0.0690
31.0592 0.00149
PSS tuned by the LFD method unreasonably dampens the oscillations. On the other hand, it can be said that FDBChOA and DDAO methods provide minimum overshoot and fast settling time. Figure 25.5i shows the convergence curves of the optimization algorithms for the ISE performance indicator. It is seen that the FDBChOA method is superior to its competitors in terms of convergence speed and accuracy. According to the simulation results in Table 25.4, the results obtained with the FDBChOA, LFD, and DDAO algorithms were 30.9947, 34.8351, and 31.0592 for the ITAE performance indicator. Strictly speaking, the objective function value of the FDBChOA method was 11.0245%, and 0.2076% lower than the simulation results of the LFD and DDAO, respectively. From the curves in Fig. 25.6, it is possible to say that LFD is insufficient to suppress system oscillations, while FDBChOA and DDAO methods completely suppress system oscillations. Figure 25.6i shows the convergence curves of the optimization algorithms for ITAE performance indicator. Upon examination, it is seen that the proposed algorithm converges faster than its competitors. Optimum PSS parameters tuned by FDBChOA, LFD, and DDAO algorithms using ITSE objective function as well as objective function values are tabulated in Table 25.4. As can be seen from the relevant table, the proposed algorithm produced better quality solutions compared to its competitors. The graphs of rotor angle, speed deviation, terminal voltage, and electrical power obtained using optimal PSS parameters are shown in Fig. 25.7. It is seen from Fig. 25.7a–d that the LFD method is insufficient to keep the rotor angle and speed deviation values at the desired level. The DDAO method failed to keep the terminal voltage and electrical power at the set values. On the other hand, the proposed method achieved superior performance for damping low-frequency oscillations and successfully ensured system stability even
25.6 Results and Analysis
355
(a) Rotor angle response
(b) Rotor angle response (zoom version)
(c) Speed deviation response
(d) Speed deviation response (zoom version)
(e) Terminal voltage
(f) Terminal voltage (zoom version)
Fig. 25.4 Simulation results for IAE objective function
356
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
(g) Electrical power
(h) Electrical power (zoom version)
(i) Convergence curves of optimization algorithms
Fig. 25.4 (continued)
under various disturbances. The curves shown in Fig. 25.7i show that the convergence performances of the FDBChOA and DDAO methods for the ITSE performance indicator are quite close. The robustness of the proposed method has been tested for different objective functions. The time-domain system response curves obtained for the FDBChoAbased PSS are given in Fig. 25.8. It is seen from the figure that oscillations are suppressed successfully with minimum overshoot, reasonable settling time for IAE, ISE, ITAE and ITSE objective functions. The comparison between the objective functions used indicates that the ITAE performance indicator is the only pioneer to achieve a superior system response.
25.6 Results and Analysis
(a) Rotor angle response
357
(b) Rotor angle response (zoom version)
(c) Speed deviation response
(d) Speed deviation response (zoom version)
(e) Terminal voltage
(f) Terminal voltage (zoom version)
Fig. 25.5 Simulation results for ISE objective function
358
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
(g) Electrical power
(h) Electrical power (zoom version)
(i) Convergence curves of optimization algorithms
Fig. 25.5 (continued)
25.7 Conclusions In this study, the FDB selection method was applied to increase the overall search performance of the ChOA algorithm. A comprehensive experimental study was carried out to investigate the effects of FDB selection method on the exploration, and balanced search capabilities of ChOA algorithm. For this purpose, unimodal, multimodal, hybrid and composition type problems were used. In addition, low-, middle- and high-dimensional search spaces were considered. Data from experimental studies have shown that the ChOA algorithm suffers from premature convergence and poor diversity. In order to overcome such issues, the position update operators of attacker and chaser chimps, which are used as guide solution candidates in the base ChOA algorithm, are designed with the FDB method. Thus, a powerful and effective metaheuristic algorithm called FDBChOA has been developed. As a result of experimental studies, it was observed that the FDB method largely eliminated the shortcomings of ChOA. Moreover, the proposed algorithm is applied to the optimization of PSS parameters, which is a popular real-world engineering
25.7 Conclusions
359
(a) Rotor angle response
(b) Rotor angle response (zoom version)
(c) Speed deviation response
(d) Speed deviation response (zoom version)
(e) Terminal voltage
(f) Terminal voltage (zoom version)
Fig. 25.6 Simulation results based on ITAE objective function
360
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
(g) Electrical power
(h) Electrical power (zoom version)
(i) Convergence curves of optimization algorithms
Fig. 25.6 (continued)
problem, and the obtained results are compared with recently developed powerful optimization methods namely LFD and DDAO. The performance of the optimization algorithms has been investigated with various simulation studies using IAE, ISE, ITAE, and ITSE performance indicators. The simulation results demonstrated that the FDBChOA algorithm obtained better performance and stability characteristics compared to its competitors.
25.7 Conclusions
361
(a) Rotor angle response
(c) Speed deviation response
(e) Terminal voltage
(b) Rotor angle response (zoom version)
(d) Speed deviation response (zoom version)
(f) Terminal voltage (zoom version)
Fig. 25.7 Simulation results based on ITSE objective function
362
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
(g) Electrical power
(h) Electrical power (zoom version)
(i) Convergence curves of optimization algorithms
Fig. 25.7 (continued)
(a) Rotor angle response
(b) Rotor angle response (zoom version)
Fig. 25.8 Simulation results of FDBChOA based PSS for different objective functions
References
363
(c) Speed deviation response
(d) Speed deviation response (zoom version)
(e) Terminal voltage
(f) Terminal voltage (zoom version)
(g) Electrical power
(h) Electrical power (zoom version)
Fig. 25.8 (continued)
References 1. Aras S, Gedikli E, Kahraman HT (2021) A novel stochastic fractal search algorithm with fitness-distance balance for global numerical optimization. Swarm Evol Comput 61:100821
364
25 Development of an FDB-Based Chimp Optimization Algorithm for Global …
2. Salgotra R, Singh U, Saha S (2018) New cuckoo search algorithms with enhanced exploration and exploitation properties. Expert Syst Appl 95:384–420 3. Stanovov V, Akhmedova S, Semenkin E (2019) Selective pressure strategy in differential evolution: exploitation improvement in solving global optimization problems. Swarm Evol Comput 50:100463 4. H. T. Kahraman, S. Aras, Y. Sonmez, U. Guvenc¸, E. Gedikli, Analysis, test and management of the meta-heuristic searching process: An experimental study on SOS, Politeknik Dergisi, 23(2), 445–455. 5. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 6. Kahraman HT, Aras S, Gedikli E (2019) Fitness-distance balance (fdb): a new selection method for meta-heuristic search algorithms, Knowl-Based Syst 105169 7. Panda S (2011) Robust coordinated design of multiple and multi-type damping controller using differential evolution algorithm. Int J Electr Power Energy Syst 33:1018–1030 8. Abido MA, Abdel-Magid YL (2002) Eigenvalue assignments in multimachine power systems using tabu search algorithm. Comput Electr Eng 28:527–545 9. Khodabakhshian A, Hemmati R (2012) Multi-machine power system stabilizers design by using cultural algorithms. Int J Elect Power Energy Syst 44:571–580 10. Abido MA (2000) Robust design of multimachine power system stabilizers using simulated annealing. IEEE Trans. on Energy Convers 15:297–304 11. Abd-Elazim S, Ali E (2012) Coordinated design of PSSs and SVC via bacteria foraging optimization algorithm in a multimachine power system. Int J Electr Power Energy Syst 41:44–53 12. Mary Linda M, Kesavan Nair N (2013) A new-fangled adaptive mutation breeder genetic optimization of global multi-machine power system stabilizer. Int J Electr Power Energy Syst 44:249–258 13. Shrivastava A, Dubey M, Kumar Y (2013) Design of interactive artificial bee colony based multiband power system stabilizers in multimachine power system. In: 2013 international conference on control, automation, robotics and embedded systems (CARE). IEEE, pp 1–6 14. Khodabakhshian A, Hemmati R, Moazzami M (2013) Multi-band power system stabilizer design by using CPCE algorithm for multi-machine power system. Electr Power Syst Res 101:36–48 15. Sambariya DK, Prasad R (2014) Robust tuning of power system stabilizer for small signal stability enhancement using metaheuristic bat algorithm. Int J Elect Power Energy Syst 61:229– 238 16. Hassan LH, Moghavvemi M, Almurib HAF, Muttaqi KM, Ganapathy VG (2014) Optimization of power system stabilizers using participation factor and genetic algorithm. Int J Electr Power Energy Syst 55:668–679 17. Mohammadi M, Ghadimi N (2015) Optimal location and optimized parameters for robust power system stabilizer using honeybee mating optimization. Complexity 21(1):242–258 18. Peres W, de Oliveira EJ, Filho JAP, da Silva Jr IC (2015) Coordinated tuning of power system stabilizers using bio-inspired algorithms. Int J Elect Power Energy Syst 64:419–428 19. Labdelaoui H, Boudjema F, Boukhetala D (2016) A Multiobjective tuning approach of power system stabilizers using particle swarm optimization. Turk J Elec Eng Comp Sci 24:3898–3909 20. Farah A, Guesmi T, Abdallah HH, Ouali A (2016) A novel chaotic teaching–learning-based optimization algorithm for multi-machine power system stabilizers design problem. Int J Elect Power Energy Syst 75:197–209 21. Shakarami MR, Faraji Davoudkhani I (2016) Wide-area power system stabilizer design based on Grey wolf optimization algorithm considering the time delay. Electr Power Syst Res 133:149–159 22. Hasan Z, Salman K, Talaq J, El-Hawary ME (2016) Optimal tuning of power system stabilizers by biogeography-based optimization method. In: Proceedings of 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), pp 1–6 23. Islam NN, Hannan MA, Shareef H, Mohamed A (2017) An application of backtracking search algorithm in designing power system stabilizers for large multi-machine system. Neurocomputing 237:175–184
References
365
24. Ekinci S, Hekimoglu B (2018) Parameter optimization of power system stabilizer via salp swarm algorithm. In: 2018 5th international conference on electrical and electronic engineering (ICEEE). IEEE, pp 143–147 25. Chitara D, Niazi KR, Swarnkar A, Gupta N (2018) Cuckoo search optimization algorithm for designing of a multimachine power system stabilizer. IEEE Trans Ind Appl 54(4):3056–3065 26. Butti D, Mangipudi SK, Rayapudi SR (2020) An improved whale optimization algorithm for the design of multi-machine power system stabilizer. Int Trans Electrical Energy Syst 30(5):e12314 27. Razmjooy N, Razmjooy S, Vahedi Z, Estrela VV, de Oliveira GG (2021) A new design for robust control of power system stabilizer based on Moth search algorithm. In: Metaheuristics and optimization in computer and electrical engineering. Springer, Cham, pp 187–202 28. Sabo A, Abdul Wahab NI, Othman ML, Mohd MZA, Beiranvand H (2020) Optimal design of power system stabilizer for multimachine power system using farmland fertility algorithm. Int Trans Electrical Energy Syst 30(12):e12657 29. Ekinci S, Izci D, Zeynelgil HL, Orenc S (2020) An application of slime mould algorithm for optimizing parameters of power system stabilizer. In: 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). IEEE, pp 1–5 30. Devarapalli R, Bhattacharyya B (2020) A hybrid modified grey wolf optimization-sine cosine algorithm-based power system stabilizer parameter tuning in a multimachine power system. Optimal Control Appl Methods 41(4):1143–1159 31. Salik M, Rout PK, Mohanty MN (2020) Inter-area and intra-area oscillation damping of power system stabilizer design using modified invasive weed optimization. In: Advances in intelligent computing and communication. Springer, Singapore, pp 347–359 32. Aribowo W, Muslim S, Suprianto B, Haryudo SI, Hermawan AC. Intelligent control of power system stabilizer based on archimedes optimization algorithm–feed forward neural network 33. Ekinci S (2019) Optimal design of power system stabilizer using sine cosine algorithm. J Faculty Eng Archit Gazi Univ 34(3):1329–1350 34. Dasu B, Sivakumar M, Srinivasarao R (2019) Interconnected multi-machine power system stabilizer design using whale optimization algorithm. Protection Control Modern Power Syst 4(1):1–11 35. Duman S, Yörükeren N, Alta¸s ˙IH (2016) Gravitational search algorithm for determining controller parameters in an automatic voltage regulator system. Turk J Electr Eng Comput Sci 24(4):2387–2400 36. Kahraman HT, Bakir H, Duman S, Katı M, Aras S, Guvenc U (2021) Dynamic FDB selection method and its application: modeling and optimizing of directional overcurrent relays coordination. Appl Intelli 1–36 37. Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Expert Syst Appl 149:113338 38. Kaur M, Kaur R, Singh N, Dhiman G (2021) Schoa: a newly fusion of sine and cosine with chimp optimization algorithm for hls of datapaths in digital filters and engineering applications. In: Engineering with computers, pp 1–29 39. Liang JJ, Qu BY, Suganthan PN (2013) Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization. Comput Intelli Lab Zhengzhou Univ Zhengzhou China Tech Rep Nanyang Technol Univ Singapore 635:490 40. Yue CT, Price KV, Suganthan N, Liang JJ, Ali MZ, Qu BY, Awad NH, Biswas PP (2019) Problem definitions and evaluation criteria for the CEC 2020 special session and competition on single objective bound constrained numerical optimization, Tech Rep, Zhengzhou University and Nanyang Technological University 41. Houssein EH, Saad MR, Hashim FA, Shaban H, Hassaballah M (2020) Lévy flight distribution: a new metaheuristic algorithm for solving engineering optimization problems. Eng Appl Artif Intell 94:103731 42. Ghafil HN, Jármai K (2020) Dynamic differential annealed optimization: New metaheuristic optimization algorithm for engineering applications. Appl Soft Comput 93:106392 43. Demiroren A, Zeynelgil HL (2002) Modelling and simulation of synchronous machine transient analysis using SIMULINK. Int J Electr Eng Educ 39(4):337–346
Chapter 26
Deep Learning-Based Prediction Model of Fruit Growth Dynamics in Apple Hamit Arma˘gan, Ersin Atay, Xavier Crété, Pierre-Eric Lauri, Mevlüt Ersoy, and Okan Oral
26.1 Introduction Because of unexpected climatic events, population expansion, and food security concerns, the agricultural industries are looking for new ways to improve crop yields. As a result, agricultural artificial intelligence, also known as Agriculture Intelligence, is gradually emerging as a component of the industry’s technological revolution [1]. In recent years, smart systems and techniques in different agricultural sectors have provided remarkable results in increasing crop production and reducing costs. Using artificial intelligence algorithms on existing agricultural technologies enables the farmer to improve product selection and crop yield prediction, crop diseases forecast, weather forecast, minimum support price and smart irrigation system [2]. Today, the Internet of Things (IoT) has infiltrated practically every functional area of industry and business, and it has evolved into a basic form of technological decision support. The current productivity focus is computing power centred on IoT H. Arma˘gan (B) Department of Informatics, Suleyman Demirel University, 32100 Isparta, Turkey e-mail: [email protected] E. Atay Department of Crop and Livestock Production, Horticulture Program, Food Agriculture and Livestock School, Burdur Mehmet Akif Ersoy University, Burdur, Turkey X. Crété Station Expérimentale Fruits and et Légumes (SUDEXPE-CEHM), 34590 Marsillargues, France P.-E. Lauri ABSys, INRAE, CIRAD, CIHEAM-IAMM, Montpellier SupAgro, Montpellier, France M. Ersoy Department of Computer Engineering, Suleyman Demirel University, 32100, Isparta, Turkey O. Oral Department of Mechatronics Engineering, Akdeniz University, Antalya, Turkey © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_26
367
368
26 Deep Learning-Based Prediction Model …
and artificial intelligence technology [3]. High-throughput technology has become widely employed in acquiring characteristics of plants in diverse growing settings through integrated image sensors, thanks to the rapid development of IoT in agriculture technology, current imaging techniques, and automated industrial equipment [4]. Deep Learning (DL) is a complex artificial neural network architecture that provides state-of-the-art results in smart farming applications. Precision agriculture has a revolutionary impact in different agricultural applications, such as crop monitoring, disease diagnosis, and intelligent yield estimation, by combining modern machine vision with DL architectures [5]. DL has been successfully utilized in a variety of sectors, and it has lately entered the agricultural arena [6]. Nilay Ganatra et al. (2020) gave a comprehensive evaluation of research dedicated to deep nine learning applications for precision agriculture, including real-time applications, ten tools, and available datasets. The findings show that using deep 11 learning techniques for precision agriculture has much potential. Ren et al. (2020) talked about the benefits and drawbacks of DL and future research areas. According to the survey results, DL based research outperforms traditional machine learning techniques in terms of accuracy. Khan et al. (2020) introduced a novel way to predicting fruit production using deep neural networks to develop a rapid and reliable agricultural output prediction system. Wireless Sensor Network (WSN) consists of nodes containing sensors that communicate [10]. Sensors connect to these nodes to independently collect data from indoor or outdoor environments [11]. The sensors enable the data to convert to analogue or digital signals [11, 12]. The data enabling by sensors can be very diverse such as temperature, humidity, noise and gases in the air [11]. Sensors integrate into an embedded system that can transfer data between nodes with the help of end-toend wireless protocols [13]. Embedded systems are low power devices with limited computing power, memory, battery and capacity. Many studies in the military, health and agricultural fields were carried out with WSNs [11]. DL is one of the artificial intelligence methods using multi-layered artificial neural networks [14]. It learns from loaded data instead of learning codes or rules that are in traditional machine learning methods [15]. DL performs learning through examples [16, 17]. The basis of neural networks comes from DL [16]. The system’s input and output are the first and last layers of the network, respectively. The middle layers are the hidden systems where the data is processed [17, 18]. Multi-layered models with more than one hidden layer perform complex tasks [6, 17]. In the artificial neural network, the transfer of data from the input layer (to the hidden layer then to the output layer) is defined as feedforward [19, 20]. This study aimed to create a model for the relationship between fruit growth, VPD and the reference day using IoT, WSN and DL techniques. Such models can contribute to estimating how fruit could grow under different VPD scenarios.
26.2 Materials and Methods
369
26.2 Materials and Methods The system we created here is a feedforward model consisting of input, output and two hidden layers. Figure 26.1 shows the calculation model in a neural cell in the network. As we can recognize in the figure, neurons receive the input signal from dendrites, and then the output signal produced by the neurons is transmitted along the axon. Axons are connected to synapses and dendrites of other neurons. In the calculation model, it interacts between synaptic strength and dentries. This process allows one neuron to change the strength of the signal from another neuron [21]. Figure 26.2 shows a layer structure in a neural network. Indeed, a single neuron is not enough to perform complex operations. A large number of neurons are used to perform such operations. Transferring the input data from the input layer to the hidden layer and then to the output layer is defined as forwarding propagation. All operations on the network are carried out on the hidden layer. In fully connected networks, neurons in each layer are connected to neurons in the next layer [21]. Data were collected at an apple orchard (cv. ‘JoyaTM ’) of Station Expérimentale Fruits et Légumes (SUDEXPE-CEHM; http://www.sudexpe.net/), situated in southeastern France. The three trees were chosen randomly to attach the dendrometers (Megatron Elektronik AG & Co., Munchen, Germany). One apple in each tree was used for measurements (Fig. 26.3). The VPD and micrometric fruit diameter fluctuations were monitored at 15 min intervals via an Internet-of-Things (IoT) sensor network of Sud Agro Météo (SAM; http://www.sudagrometeo.fr/). KNIME Analytics Platform (https://www.knime.com/), free and open-source software, was used to model the data obtained. The flow chart (workflow) of the model created here is provided in Fig. 26.4. DL4J Model Initializer, this node creates an empty Deep Learning Model, which is used to start a network architecture. Dense Layer: This metanode creates a simple general-purpose multi-layer perceptron consisting with two fully connected (FC) layers. Activation Function: ReLU.
Fig. 26.1 A computational model of a neuron
370
26 Deep Learning-Based Prediction Model …
Fig. 26.2 Layer structure in the neural network
Fig. 26.3 A dendrometer (left) and antenna connection (right). Antennas are used for capturing signals from dendrometers
DL4J Feedforward Learner (Regression): this node performs supervised training of a feedforward DL model for regression. Optimization Algorithm: Stochastic Gradient Descent. Loss Function: Mean Squared Error. Neural Net Type: Deep Convolutional. DL4J Feedforward Predictor (Regression): This node creates a regression prediction using the supplied network and test data [22–25]. In the model, average fruit diameter, day reference number and VPD values were taken as parameters. The necessary components for reading, normalization, and partitioning data required for the model were added to the design. The model was completed by adding the feed-forward DL component, testing, prediction analysis and imaging. Data of the 100-day (24 July to 31 October) period were used in the model. Of these data, 66 out of 100 were used for network training, and the remaining 34 were used for network testing. The average fruit diameter was calculated using the pooled data obtained from three individual fruit.
26.3 Results and Discussion
371
Fig. 26.4 DL workflow for the model
26.3 Results and Discussion A relatively high R2 value (calculated as 0.998) shows that the approach rate of the model is relatively high and represents the actual data at a high rate (Table 26.1). Calculating the value of Mean Absolute Error (MAE) as 0.04 indicates that the actual data and the forecast data are at an acceptable level. The actual data and fruit diameter prediction model are presented comparatively in Fig. 26.5. This visual includes the diameter values of the fruit taken from three individual trees and the fruit diameter estimation values obtained from the model created. As mentioned above, randomly selected 34 data from the 100-day (24 July to 31 October) period were used for the test. Z-score normalization (Gaussian) was applied in all of the data. Briefly, each point on the x-axis corresponding to one day in the test data. However, the days that corresponded were selected randomly, not consecutively. The scenario of a 2 °C temperature increase under the same humidity conditions was modelled with the developed model. Temperature values are used to calculate VPD [26, 27]. For this reason, VPD values have changed depending on the change of temperature values. When we uploaded the day and recalculated VPD values to the model, fruit diameter decreased by 0.18% (Fig. 26.6). Table 26.1 Approach and error values of the data obtained from the model. yˆ i : measured value yˆ i : estimated value
Error
Value
R-square:
0.998
R2 = 1 −
SSE SST
=1−
Σn Σ1n 1
(yi −ˆyi )2 (yi −yi )2
MAE (mean absolute error): )| Σ |( MAE = 1n ni =1 | yi − yˆ i |
0.04
Root mean squared error: / )2 Σ ( RMSE = 1n ni=1 yi − yˆ i
0.048
372
26 Deep Learning-Based Prediction Model …
Fig. 26.5 Visualization of the model
Fig. 26.6 The output of the developed model in the scenario of 2 °C temperature increase under the same humidity conditions
To conclude, the model created was seen to be successful in representing the actual data collected. It thought that the use of pooled data of three individual fruit, ensuring repetition, increased the model’s coverage. In the model, reference days and VPD values were used as parameters in the fruit growth period. Increasing the number of parameters (such as trunk diameter, branch diameter, and soil analysis values) and data in operations (such as curve fitting, regression, and mathematical model development) would increase the model’s representation ability and reduce the margin of error.
References 1. Pathan M, Patel N, Yagnik H, Shah M (2020) Artificial cognition for applications in smart agriculture: a comprehensive review. Artif Intell Agric 4:81–95
References
373
2. Rehman TU, Mahmud MS, Chang YK, Jin J, Shin J (2019) Current and future applications of statistical machine learning algorithms for agricultural machine vision systems. Comput Electron Agric 156:585–605 3. Reddy MR, Srinivasa KG, Reddy BE (2018) Smart vehicular system based on the internet of things. J Organ End User Comput 30(3):45–62 4. Fan J, Zhang Y, Wen W, Gu S, Lu X, Guo X (2021) The future of internet of things in agriculture: plant high-throughput phenotypic platform. J Clean Prod 280:123651 5. Maheswari P, Raja P, Apolo-Apolo OE, Pérez-Ruiz M (2021) Intelligent fruit yield estimation for orchards using deep learning based semantic segmentation techniques—a review. Front Plant Sci 12:1247 6. Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture: a survey. Comput Electron Agric 147:70–90 7. Ganatra N, Patel A (2021) Deep learning methods and applications for precision agriculture. Mach Learn Predictive Anal 515–527 8. Ren C, Kim DK, Jeong D (2020) A survey of deep learning in agriculture: techniques and their applications. J Inf Process Syst 16(5):1015–1033 9. Khan T, Qiu J, Qureshi MAA, Iqbal MS, Mehmood R, Hussain W (2020) Agricultural fruit prediction using deep neural networks. Procedia Comput Sci 174:72–78 10. Yick J, Mukherjee B, Ghosal D (2008) Wireless sensor network survey. Comput Netw 52(12):2292–2330 11. Ersoy M, Yi˘git T, Arma˘gan H (2018) Kablosuz Algılayıcı A˘glarda Makine Ö˘grenme Tabanlı Çok Kriterli Yönlendirme. In: 2018 3rd ınternational conference on computer science and engineering (UBMK). IEEE, pp. 652–657, Sept 2018. https://doi.org/10.1109/UBMK.2018. 8566317 12. Lo C, Lynch JP, Liu M (2016) Distributed model-based nonlinear sensor fault diagnosis in wireless sensor networks. Mech Syst Signal Process 66:470–484 13. Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38(4):393–422. https://doi.org/10.1016/S1389-1286(01)00302-4 14. Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning. Mach Learn 3(2):95–99 15. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260. https://doi.org/10.1126/science.aaa8415 16. Buduma N, Locascio N (2017) Fundamentals of deep learning: designing next-generation machine intelligence algorithms. O’Reilly Media, Inc. 17. Yilmaz A, Kaya U (2019) Derin Ö˘grenme. ISBN: 978-605-2118-39-9. Kodlab, Ltd. Sti ¸ 18. Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065 19. ˙Inik Ö, Ülker E (2017) Derin Ö˘grenme ve Görüntü Analizinde Kullanılan Derin Ö˘grenme Modelleri. Gaziosmanpa¸sa Bilimsel Ara¸stırma Dergisi 6(3):85–104 20. Do˘gan F, Türko˘glu ˙I (2019) Derin Ö˘grenme Modelleri ve Uygulama Alanlarına ˙Ili¸skin Bir Derleme. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 10(2):409–445. https://doi.org/10.24012/dumf.411130 21. Stanford-ConvNet (2020). Accessed on 28 July 2020. http://cs231n.stanford.edu https://cs2 31n.github.io/neural-networks-1/ 22. Dense-Layer (2020). Accessed on 28 July 2020. https://kni.me/n/GubcYb-AtuxUkunf 23. DL4J-Learner (2020). Accessed on 28 July 2020. https://kni.me/n/K256mPx7jG7iBPna 24. DL4J-Model (2020). Accessed on 28 July 2020. https://kni.me/n/_ENUzFhgYJh9S_IV 25. DL4J-Predictor (2020). Accessed on 28 July 2020. https://kni.me/n/y6w-GPUhil3tn1lv 26. Atay E, Hucbourg B, Drevet A, Lauri PE (2016) Growth responses to water stress and vapour pressure deficit in nectarine. Acta Hort 1139:353–358. https://doi.org/10.17660/ActaHortic. 2016.1139.61 27. Murray FW (1967) On the computation of saturation vapor pressure. J Appl Meteorol 6(1):203– 204. https://doi.org/10.1175/1520-0450(1967)006%3c0203:OTCOSV%3e2.0.CO;2
Chapter 27
Prediction of Hepatitis C Disease with Different Machine Learning and Data Mining Technique Ça˘grı Suiçmez, Cemal Yılmaz, Hamdi Tolga Kahraman, Enes Cengiz, and Alihan Suiçmez
27.1 Introduction According to the data of the World Health Organization (WHO), 71 million people in the world have chronic hepatitis C, and approximately 400 thousand of these patients have died. Since there is no effective vaccine for chronic hepatitis C disease, patients live with this disease for life. Severe conditions such as fibrosis, cirrhosis and hepatocellular carcinoma are frequently encountered in the liver of the patients [1]. Hepatitis C virus (HCV) is an RNA virus and is one of the major blood-borne human pathogens, called Hepatitis C. HCV infection is largely asymptomatic and has few visible symptoms at the stage of infection. If appropriate treatment is not applied to the patient, most of the acute infections progress to chronic ones, followed by liver diseases such as cirrhosis and hepatocellular carcinoma [2]. The hepatitis C virus is a blood-borne virus and the most common transmission of infection is exposure to Ç. Suiçmez (B) Electrical and Electronics Engineering, Faculty of Technology, Gazi University, Ankara, Turkey e-mail: [email protected] C. Yılmaz Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected] H. T. Kahraman Software Engineering, Faculty of Technology, Karadeniz Technical University, Trabzon, Turkey e-mail: [email protected] E. Cengiz Mechatronics Engineering, Faculty of Technology, Afyon Kocatepe University, Afyonkarahisar, Turkey e-mail: [email protected] A. Suiçmez Electrical and Electronic Engineering, Faculty of Engineering, Ondokuz Mayıs University, Samsun, Turkey © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_27
375
376
27 Prediction of Hepatitis C Disease with Different Machine …
small amounts of blood. Routes of transmission can occur through injection drug use, unsafe injection procedures, unsafe healthcare, transfusion of unscreened blood and blood products, sexual practices, and other practices that cause blood exposure [3]. In recent years, about 200 learning algorithms have been developed in machine learning to provide suitable results for today’s large datasets [4]. Machine learning (ML) is a branch of Artificial Intelligence (AI) that generally focuses on classification and regression, and uses input from data to describe the actions taken by the AI application and generate logical results, according to the machine learning algorithm’s ability to process the data. Data mining is a set of processes that reveal the information in the data and the relationships between this information to make valid and consistent predictions, make it ready for use, and increase the efficiency of the algorithm. Classification is a process for estimating the classes of objects using data processed in data mining and inserted into the necessary algorithm. In recent years, researchers have contributed to the prediction and diagnosis process of the disease by constructing predictive models derived from these datasets and using the appropriate algorithm to predict the response to therapy in patients with Hepatitis C based on relevant datasets [5]. The primary purpose of the data mining techniques used in this study is to stabilize the data set that has an imbalanced data distribution used in the study. For this, ADASYN is one of the synthetic data generation algorithms, one of the over-sampling models. In short, when the ADASYN algorithm is applied to an imbalanced data set, the density distribution of this algorithm automatically determines the number of synthetic samples to be generated for the minority classes in the data set, and it is the synthetic data generation algorithm that is least affected by the overfitting problem [6–8]. The second purpose is to get the maximum efficiency, that is, the real accuracy rate, from the synthetically balanced data set, thanks to the classification algorithms. For this, the K-fold cross-validation method was used. Machine learning algorithms such as classification trees and artificial neural networks (ANNs), which are included in machine learning techniques, are applied to problems in many areas including applications such as prediction and classification. In medicine, machine learning algorithms are used as an aid to invasive methods in the prediction and diagnosis of diseases such as fibrosis, cirrhosis and hepatocellular carcinoma in hepatitis C patients [9]. Proactive identification of patients who need more frequent monitoring and treatment will be easier if hypersensitive risk estimation models are established. The generated machine learning prediction models will be useful for diagnostic applications among people with chronic hepatitis C virus, a leading cause of cirrhosis [10]. In this study, thanks to data mining techniques applied with different machine learning techniques such as logistic regression, support vector regression, random forest, decision trees, k-nearest neighbor, support vector machines, naive bayes, gradient boost classifier and multilayer perceptrons, hepatitis C can be detected. An important step is taken towards the early diagnosis of this disease by making predictions.
27.2 Materıals and Method
377
27.2 Materials and Method In this study, the performances of nine machine learning algorithms are evaluated before and after data mining. The flow chart of the system is shown in Fig. 27.1.
27.2.1 Dataset Introduction The dataset used in this study is the dataset named “HepatitisCdata.csv” taken from the Kaggle website (https://www.kaggle.com/fedesoriano/hepatitis-c-dataset). A total of 615 patients were evaluated by looking at 11 parameters.
27.2.2 Data Mining Process The data set we used in our study is an imbalanced data set. Our set contains data for 533 BloodDonor, 7 SuspectBloodDonor, 24 Hepatitis, 21 Fibrosis, 30 Cirrhosis. This makes the dataset relatively unstable. ADASYN (adaptive synthetic sampling approach for imbalanced learning), which is a branch of Oversampling methods, is used to obtain better results and make this imbalanced data set more applicable. Then, K-fold cross validation from the Cross Validation technique is used to get the maximum throughput from the imbalanced data set. Fig. 27.1 Flow chart of hepatitis C prediction
378
27.2.2.1
27 Prediction of Hepatitis C Disease with Different Machine …
ADASYN
In machine learning studies, it has become possible to save time and cost in many world problems by using appropriate classification algorithms together with the features obtained from data sets. However, the way to achieve these gains at the maximum level is to process appropriate data sets by appropriate algorithms, as mentioned above. In today’s world, data sets are produced by both humans and machines in countless varieties and with an innumerable size capacity. However, most of these are not balanced data sets as machine learning algorithms want to produce maximum performance in classification [6–8]. It is mostly due to the fact that the data contained in the datasets, which are defined as imbalanced, is too much of one of the dataset classes, the other or too few of the others [11]. In the use of so-called imbalanced data sets, the classification performance of machine learning algorithms will often detect the majority class in the data set and will not be able to present a real classification metric to the users. For example, let’s consider a dataset that includes patients with cancer. Let’s assume that there are 95% of cancer-free people and 5% of people with cancer in the data set, and let’s say we are trying to detect people with cancer. In the classification we make using classical machine learning algorithms, the accuracy rate will mostly be close to 95%. However, when we look at the confusion matrix, it will be seen that the success of the other class, not the class we want it to predict, is reflected in this accuracy rate. In other words, the classification algorithm used described all cancer patients as non-cancerous and made a vital mistake [6]. To prevent this, data preprocessing steps should be applied to the data set. These steps include synthetically stabilizing the imbalanced dataset used first. We faced the problem in this example given in our study and tried to overcome this problem by applying ADASYN, one of the synthetic data reproduction techniques. There are two problems related to the imbalanced learning problem. These are the metrics used to evaluate the sensitivity to the imbalanced training set and the classification algorithm [12, 13]. The ADASYN algorithm we used in our study is one of the over-sampling techniques used to stabilize imbalanced data sets. These techniques include Synthetic Minority Oversampling Technique (SMOTE), Borderline SMOTE, Safe Level SMOTE and Adaptive Synthetic Sampling Approach (ADASYN) techniques. To briefly mention these; the technique called SMOTE is an oversampling technique found by Chawla et al. in 2002. However, there are also shortcomings of this developed technique. This shortcoming is an overgeneralization of the minority class without paying attention to the distribution of the majority class [14]. The first step in the borderline SMOTE technique, as the name suggests, is the identification of borderline minority samples. The SMOTE technique is then applied to oversample this class. However, this technique has disadvantages. This technique is prone to misclassification based on distant samples. That is, samples away from the borderline are not included in the classification [15]. Safe Level SMOTE Bunkhumpornpat was designed by Sinapiromsaran and Lursinsap in 2009. Its working principle is that it assigns a safe level value to positive samples before generating synthetic data,
27.2 Materıals and Method
379
and then synthetic data is generated close to the largest safe level value. The safe level is the number of minority samples in the k nearest neighbors. The safe level is recalculated for each generated sample [16]. The ADASYN technique, on the other hand, produces the most meaningful and realistic data related to the original data in the production of synthetic data among these techniques. Because in this algorithm, minority data samples are produced adaptively according to their distribution. It is produced with synthetic data from data that is easy to learn. This helps to reduce the learning bias associated with uneven data distribution in the first imbalanced data set. In the SMOTE algorithm, the same number of synthetic samples is made for each minority class. In ADASYN, on the other hand, it is automatically decided to produce synthetic samples of each minority class, thanks to the density distribution. The main purpose of this algorithm is to produce different amounts of synthetic data for each sample by assigning weights to different minority class samples [17]. ADASYN is in a very advantageous position when it comes to classification accuracy and uptime.
27.2.2.2
K-Fold Cross Validation
Cross validation is one of the most commonly used data mining methods. Crossvalidation is like a check of the robustness, classification ability, and accuracy of the model used. At the same time, cross-validation helps the model to overcome the overfitting problem. The main idea of this method is to divide the used data one or more times and use some of the data to train algorithms and some to test algorithms. Then, crossvalidation selects the algorithm with the least risk [18]. There are many different cross validation methods. These methods include Jack Knife Test, Boot-strapping, Monte Carlo Test, three-way split test, etc. includes [19]. The K-layer cross validation we used in this study is a branch of the Jack Knife Test. In K-layer cross validation, the data set is divided into K equal parts. One of these divided k pieces is kept and the remaining k − 1 pieces are used as training data. Then this process is repeated k times and each equal part is used as test data. The process is terminated by taking the average of each piece last [20].
27.2.3 Machine Learning Methods The data set to be given to the input of machine learning is first prepared by applying data mining processes. Then, about 80% of the data set is separated as training data and 20% as test data. With the training data set prepared, the trained model was put forward by using machine learning methods. In the study, their performance is compared using logistic regression, support vector regression, random forest, decision trees, k-nearest neighbor, support vector machines, naive bayes, multilayer perceptrons and gradient boost classifier machine learning methods.
380
27 Prediction of Hepatitis C Disease with Different Machine …
27.2.3.1
Logistic Regresyon
Logistic regression usually calculates the class membership of one of two categories in a data set. The parameters are determined according to the data set used with the maximum probability estimation [21]. In Logistic Regression, (x) represents the input variable and (y) represents the output variable. Logistic Regression is a supervised learning algorithm. Here the target variable is different from linear regression. In logistic regression, the variables are categorical and categorize any data point into one of the binary data classes. The general equation of the Logistic Regression algorithm is as follows: Log( p(x))/(1 − p(x)) = β0 + β1 X [22]
(27.1)
Here, p (X) is the dependent variable, X is the independent variable, B0 is the intersection point and B1 is the slope coefficient. Logistic Regression estimates the probabilities of the model using this equation [22]. 27.2.3.2
SVM Regressor
A regression problem is performed in SVM Regressors. The model gives a continuous value output instead of an output from a finite set. The sparse solution generalization of SVMs becomes suitable for regression problems. It is accomplished by placing a non-e-sensitive region around the function called e-tube when generalizing SVM for SVR. Said tube optimizes the complexity and error of the model. As with SVMs, support vectors in SVRs are the most important examples that affect the shape of this tube. The SVR problem equation is generated from a geometrical perspective with a one-dimensional example as in the figure [23]. The general equation can be written as: y = f (x) =< ω, x > +b =
M Σ
ω j + b, y, b ∈ R, x, ω ∈ R M [23]
(27.2)
j=1
For multidimensional data, you add b to the vector W by increasing the x’s and you get the following multidimensional regression [23]. [ ]T [ ] x ω f (x) = = ω T x + bx, ω ∈ R M+1 [23] 1 b
27.2.3.3
(27.3)
Random Forest
Although the random forest algorithm shown in Fig. 27.2 is used for classification and regression, it is mainly used in classification problems. This algorithm creates decision trees on data samples and obtains predictions from all of them and conveys the best result to us by voting [22].
27.2 Materıals and Method
381
Fig. 27.2 A picture for random forest
27.2.3.4
Decision Tree
Decision Tree Algorithm has been successfully applied in many areas such as signal classification, character recognition, medical diagnosis. One of the most important features of this algorithm is its solution capability, which divides the complex process of decision making into simple parts and interprets it more easily. This algorithm has advantages over single-state classifiers. The most basic feature that this algorithm provides is flexibility. For example, it is the ability to use different subsets of features and decision rules at any classification stage. This results in improved classification accuracy [24].
27.2.3.5
K-NN
For classifiers dealing with K-NN classification problems, training examples are generally provided with class labels. Then the classifier learns and tries to classify the samples whose labels it does not know. Each sample is estimated with an attribute class. An instance of x is defined by a feature vector ; ai (x), defined in this vector, is the i of X. Specifies the value of the attribute Ai. In the equation given below, C and c specify the class variable and its value, respectively. The class of instance X is denoted by c (x). K-NN is an algorithm used in classification problems. K-NN basically measures the distance between two vectors. The Euclidean distance between two points is defined as follows. [ | n |Σ d(x, y) = √ (ai (x) − ai (y))2 [25] i=1
(27.4)
382
27 Prediction of Hepatitis C Disease with Different Machine …
When an example of x is given to the model, K-NN assigns the k nearest neighbors of x to the most common class x, as in the equation below. C(x) = arg max c∈C
k Σ
δ(c, c(yi )) [25]
(27.5)
i=1
Here y1, y2, …, yk are the k nearest neighbors of x and k is the number of neighbors [25].
27.2.3.6
SVM Classifier
SVM Classifiers are supervised learning algorithms used for regression and classification. One of the greatest features of SVM is that it minimizes the artificial classification error and maximizes the geometric margin. The SVM input vector shapes a space that is maximally separated by a high-dimensional hyperplane. Parallel hyperplanes are formed on both sides of the hyperplane that separates the data. The hyperplane, which acts as a separator, maximizes the distance between parallel hyperplanes. The greater the distance and margin between the parallel hyperplanes, the smaller the error. Considering the {(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), …, (xn , yn )} data points, if yn = 1/−1, xn indicates the class it belongs to. Here n is the number of samples, xn are vectors of p dimension. The following equation is used to display the data by the separating hyperplane. w.X + b = o [26]
(27.6)
Here b is a scalar and is called the offset parameter, it also serves to increase the margin. w is a p-dimensional vector. The vector W is perpendicular to each other in the hyperplane responsible for separating it. Parallel planes separated by a hyperplane are defined by equations as follows. w.x + b = 1 [26]
(27.7)
w.x + b = −1 [26]
(27.8)
The length of the distance between these parallel hyperplanes is 2/|w| available as. Therefore, minimizing w is important for the model. w.xi − b ≥ 1 or w.xi − b ≤ −1 [26] This equation can be written as [26].
(27.9)
27.2 Materıals and Method
383
yi (w.xi −b) ≥ 1, 1 ≤ i ≤ n [26]
27.2.3.7
(27.10)
Naive Bayes
The algorithm of the Naive_Bayes model consists of conditional probabilities. This model consists of a graph with attributes of nodes and attributes of arcs. These attributes are evaluated with conditional probabilities for each node. Bayesian networks are generally used when classification is made from samples with specific labels of the class to which they belong. Assuming that A1 , A2 , …, An have n attributes (which denotes nodes), it is indicated by the vector that ai is the attribute of Ai for an example E. C represents the class variable (which represents the class node). Here we use c (E) to denote the value of C, c and class E. Bayes is defined by the following general equation. C(E) = arg max P(c)P(a1 , a2 , . . . , an |c) [27]
(27.11)
c∈C
If we assume that each attribute is independent when the class is obtained, then conditionally independent attribute is obtained with this assumption and the first equation turns into the following form. P(E|c) = P(a1 , a2 , . . . , an |c) =
n ⊓
P(ai |c) [27]
(27.12)
i=1
And then the obtained classifier is called the Bayesian classifier [27].
27.2.3.8
Multi Layer Perceptron
On the basis of MLP, it is a hidden layer enriched version of a simple neural network that matches the input data with the output data. In multilayer perceptrons, each node has a non-linear activation function. Thanks to these functions, they perform the output operations. A simple MLP is shown in Fig. 27.3. MLPs consist of an input, an output, and at least one hidden layer. The number of neurons of the input layer in this structure is equal to the number of features of the data set used. The number of output neurons is equal to the number of categories in the classification process. However, a sensitive method should be followed in determining the number of neurons in the hidden layers. Otherwise, the result of this is returned to the user in the form of cost or wrong estimation. Back propagation supervised learning is used to train MLPs. The general equation is given below.
384
27 Prediction of Hepatitis C Disease with Different Machine …
Fig. 27.3 A simple MLP structure (silinecek)
( n ) Σ y=∅ ωi xi + b [23]
(27.13)
i=1
In this equation, the weight vector is represented by w, the input vector x, the bias b, and the activation function j. The activation functions used in MLP are usually hyperbolic tangent (tanh), relu or sigmoid. MLPs are often used for classification problems in supervised learning [23].
27.2.3.9
Gradient Boost Classifier
It has been reported in the literature that this classifier algorithm has a high performance especially on unbalanced data sets. It owes this performance to the way it handles classified data. This algorithm aims to combine weak predictions to make strong predictions. Poor learners qualify as learners with near-mediocre prediction accuracy. In this algorithm, weak learning decision trees are created by supporting the gradient descent optimization algorithm. In this way, by minimizing the loss function, stronger learners are built in the next steps. In each iteration, this algorithm obtains more accurate results by interfering with the weights of the misclassified ones from the previous iteration, and it is an algorithm based on reinforcement learning [28].
27.3 Experimental Results
385
27.3 Experimental Results In this section, the comparison of various machine learning techniques we have used with each other in terms of metrics such as accuracy, precision, sensitivity, f1-score and confusion matrix is given.
27.3.1 Evaluation Metrics When examining evaluation metrics, we will use a total of four metrics and a confusion matrix, as we mentioned earlier. While talking about these metrics, we will also dwell on some of the concepts and formulas that form their basis. True Positives (TP): The actually correctly specified data in the data set. True Negatives (TN): The data in the data set that are actually incorrectly specified. False Positives (FP): Negatives data that the classifier predicts as positive. False Negatives (FN): Data that are positives that the classifier predicts are negatives. Accuracy: The percentage of data classified as correct. It is calculated with the following formula. Accuracy =
TP +TN T P + FP + T N + FN
Precision: It is the metric that shows how many of the data that the classifier predicts positively are actually positive. It is calculated with the following formula. Pr ecision =
TP T P + FP
Recall: A metric that shows how many of the data that the classifier is supposed to predict positively actually predicted positively. It is calculated with the following formula. Recall =
TP T P + FN
F1-Score: It is the harmonic average of the sharpness and sensitivity metrics revealed by the classifier. It is calculated with the following formula [28].
386
27 Prediction of Hepatitis C Disease with Different Machine …
f 1-scor e =
2 × ( pr ecision) × (r ecall) ( pr ecision) + (r ecall)
Confusion Matrix: It is the matrix in which the predictions and actual data obtained from the data set used to evaluate the performance of the classifier used in classification problems are compared. In the view of this information, the comparison of the values before and after the application of data mining techniques is given in the following section.
27.3.2 Results and Findings In this section, the comparison of data mining techniques with each other in terms of metrics such as accuracy, precision, sensitivity, f1-score and confusion matrix before and after application is given. These comparison results are available in the table below. After making the initially unbalanced data set relatively more balanced with the ADASYN synthetic data generation technique, it is seen that the classifier performances are increased with cross-validation. In particular, it is seen in the tables below that the false accuracy obtained at first in the confusion matrix is misleading and that the confusion matrix and other metrics obtained after the data mining processes support each other and reveal the true accuracy. The increase in the performance metrics of Logistic Regression before and after the data mining process is supported by the confusion matrix and presented in Table 27.1. The change in performance metrics of SVM Regressor before and after data mining is given in Table 27.2. While there is a relative increase in the scores, the improvement in the confusion matrix is clearly seen. The increase in the performance metrics of Random Forest before and after the data mining process is supported by the confusion matrix and presented in Table 27.3. As can be seen in Table 27.4, it is seen that the performance values have increased remarkably in this classifier. The increase in the performance metrics of K-NN before and after the data mining process is supported by the confusion matrix and presented in Table 27.5. When Table 27.6 is examined, an increase is observed in some metrics for the SVM Classifier algorithm, while a decrease is observed in some metrics. But the most important thing is the improvement in the confusion matrix. When the confusion matrices were examined, they knew almost all of the blood donors with zero class value in the first one, but could not classify the sick people. In the second matrix, it is clearly seen that it is classified with a much better accuracy. When Table 27.7 is examined, it is seen that there is a clear decrease in scoring metrics in the Naive Bayes algorithm. But it is clear that he compensates for this with matrices.
With data mining Accuracy: 0.92459 Precision: 0.92459 Recall: 0.92459 f1-score: 0.92459
Without data mining
Accuracy: 0.93220
Precision: 0.91149
Recall: 0.93220
f1-score: 0.92106
Table 27.1 Logistic regression’s performance metrics before and after data mining
27.3 Experimental Results 387
With data mining Accuracy: 0.92651 Precision: 0.92651 Recall: 0.92651 f1-score: 0.92651
Without data mining
Accuracy: 0.92372
Precision: 0.89107
Recall: 0.92372
f1-score: 0.90709
Table 27.2 SVM regressor’s performance metrics before and after data mining
388 27 Prediction of Hepatitis C Disease with Different Machine …
With data mining Accuracy: 0.98731 Precision: 0.98671 Recall: 0.98418 f1-score: 0.98354
Without data mining
Accuracy: 0.92372
Precision: 0.91054
Recall: 0.92372
f1-score: 0.91629
Table 27.3 Random forest’s performance metrics before and after data mining
27.3 Experimental Results 389
With data mining Accuracy: 0.93722 Precision: 0.93855 Recall: 0.93407 f1-score: 0.93788
Without data mining
Accuracy: 0.91525
Precision: 0.92339
Recall: 0.91525
f1-score: 0.91613
Table 27.4 Decision tree’s performance metrics before and after data mining
390 27 Prediction of Hepatitis C Disease with Different Machine …
With data mining Accuracy: 0.95001 Precision: 0.95001 Recall: 0.95001 f1-score: 0.95001
Without data mining
Accuracy: 0.92372
Precision: 0.90023
Recall: 0.92372
f1-score: 0.90939
Table 27.5 K-NN’s performance metrics before and after data mining
27.3 Experimental Results 391
With data mining Accuracy: 0.92651 Precision: 0.92651 Recall: 0.92651 f1-score: 0.92651
Without data mining
Accuracy: 0.93220
Precision: 0.91965
Recall: 0.93220
f1-score: 0.92520
Table 27.6 SVM classifier’s performance metrics before and after data mining
392 27 Prediction of Hepatitis C Disease with Different Machine …
With data mining Accuracy: 0.84673 Precision: 0.84673 Recall: 0.84673 f1-score: 0.84673
Without data mining
Accuracy: 0.91525
Precision: 0.87726
Recall: 0.91525
f1-score: 0.89569
Table 27.7 Naive Bayes’s performance metrics before and after data mining
27.3 Experimental Results 393
394
27 Prediction of Hepatitis C Disease with Different Machine …
In the Multi Layer Perceptron classifier in Table 27.8, it is seen that every performance metric has reached a very good level. When Table 27.9 is examined, an increase is observed in Gradient Boost Classifier performance metrics.
27.4 Conclusions and Future Work In this study, a classification process will be applied for the prediction of hepatitis C disease. There are 5 groups as output. These are blood donor, hepatitis, fibrosis, cirrhosis and suspect blood donor. To estimate these 5 outputs, logistic regression, support vector regression, random forest, decision trees, k-nearest neighbor, support vector machines, naive bayes, multilayer perceptrons and gradient boost classifier machine learning techniques were used. Considering the classification techniques given above, the highest score belongs to the random forest and then to the multilayer perceptron. As described above, these results were obtained after certain preprocessing was applied to the data set. Even though these pre-processes were not as high, the results were high and the confusion matrix was not as high as desired, which reflected the false accuracy score to us. After applying the preprocessing to the data set and entering the classification algorithms, results directly proportional to the confusion matrix were obtained. This shows us how important the data mining process and the selection of the right algorithm for the problem are. It is hoped that this study will facilitate clinical and other research groups in disease prediction and shed light on future studies.
With data mining Accuracy: 0.96834 Precision: 0.96834 Recall: 0.96834 f1-score: 0.96834
Without data mining
Accuracy: 0.92372
Precision: 0.92796
Recall: 0.92372
f1-score: 0.92467
Table 27.8 Multi layer perceptron’s performance metrics before and after data mining
27.4 Conclusions and Future Work 395
With data mining Accuracy: 0.97408 Precision: 0.97409 Recall: 0.97471 f1-score: 0.97408
Without data mining
Accuracy: 0.96610
Precision: 0.95935
Recall: 0.96610
f1-score: 0.96182
Table 27.9 Gradient boost classifier’s performance metrics before and after data mining
396 27 Prediction of Hepatitis C Disease with Different Machine …
References
397
References 1. Chicco D, Jurman G (2021) An ensemble learning approach for enhanced classification of patients with hepatitis and cirrhosis. IEEE Access 9:24485–24498 2. Nandipati SC, XinYing C, Wah KK (2020) Hepatitis C virus (HCV) prediction by machine learning techniques. Appl Model Simul 4:89–100 3. Oladimeji OO, Oladimeji A, Olayanju O (2021) Machine learning models for diagnostic classification of hepatitis C tests. Front Health Inf 10(1):70 4. Haga H, Sato H, Koseki A, Saito T, Okumoto K, Hoshikawa K et al (2020) A machine learningbased treatment prediction model using whole genome variants of hepatitis C virus. PloS One 15(11):e0242028 5. Abd El-Salam SM, Ezz MM, Hashem S, Elakel W, Salama R, ElMakhzangy H, ElHefnawi M (2019) Performance of machine learning approaches on prediction of esophageal varices for Egyptian chronic hepatitis C patients. Inf Med Unlocked 17:100267 6. Durahim AO (2016) Comparison of sampling techniques for imbalanced learning. Yönetim Bili¸sim Sistemleri Dergisi 2(2):181–191 7. Gosain A, Sardana S (2017, Sept) Handling class imbalance problem using oversampling techniques: a review. In: 2017 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 79–85 8. Susan S, Kumar A (2019) SSOMaj-SMOTE-SSOMin: three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets. Appl Soft Comput 78:141–149 9. Hashem S, Esmat G, Elakel W, Habashy S, Raouf SA, Elhefnawi M et al (2017) Comparison of machine learning approaches for prediction of advanced liver fibrosis in chronic hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf 15(3):861–868 10. Konerman MA, Beste LA, Van T, Liu B, Zhang X, Zhu J et al (2019) Machine learning models to predict disease progression among veterans with hepatitis C virus. PloS One 14(1):e0208141 11. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284 12. Fatourechi M, Ward RK, Mason SG, Huggins J, Schloegl A, Birch GE (2008, Dec) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: 2008 Seventh international conference on machine learning and applications. IEEE, pp 777–782 13. Dal Pozzolo A, Caelen O, Waterschoot S, Bontempi G (2013, Oct) Racing for unbalanced methods selection. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 24–31 14. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority oversampling technique. J Artif Intell Res 16:321–357 15. Han H, Wang WY, Mao BH (2005, Aug) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, pp 878–887 16. Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009, Apr) Safe-level-smote: safe-levelsynthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp 475–482 17. He H, Bai Y, Garcia EA, Li S (2008, June) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328 18. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Statist Surv 4:40–79 19. Saud S, Jamil B, Upadhyay Y, Irshad K (2020) Performance improvement of empirical models for estimation of global solar radiation in India: a k-fold cross-validation approach. Sustain Energy Technol Assess 40:100768 20. Moreno-Torres JG, Sáez JA, Herrera F (2012) Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans Neural Networks Learn Syst 23(8):1304–1312
398
27 Prediction of Hepatitis C Disease with Different Machine …
21. Dreiseitl S, Ohno-Machado L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 35(5–6):352–359 22. Azam MS, Habibullah M, Rana HK. Performance analysis of various machine learning approaches in stroke prediction. Int J Comput Appl 975:8887 23. Awad M, Khanna R (2015) Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature, Berlin, p 268 24. Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern 21(3):660–674 25. Jiang L, Cai Z, Wang D, Jiang S (2007, Aug) Survey of improving k-nearest-neighbor for classification. In: Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), vol 1. IEEE, pp 679–683 26. Bhavsar H, Panchal MH (2012) A review on support vector machine for data classification. Int J Adv Res Comput Eng Technol (IJARCET) 1(10):185–189 27. Jiang L, Wang D, Cai Z, Yan X (2007, Aug) Survey of improving Naive Bayes for classification. In: International conference on advanced data mining and applications. Springer, Berlin, pp 134–145 28. Singhal Y, Jain A, Batra S, Varshney Y, Rathi M (2018, Dec) Review of bagging and boosting classification performance on unbalanced binary classification. In: 2018 IEEE 8th International advance computing conference (IACC). IEEE, pp 338–343
Chapter 28
Prediction of Development Types from Release Notes for Automatic Versioning of OSS Projects ˙ Abdulkadir Seker, ¸ Saliha Ye¸silyurt, Ismail Can Ardahan, and Berfin Çınar
28.1 Introduction Software projects are usually developed as versions identified as individual numbers. Software versioning makes it easier for both developers and users to follow the changes made in the project. In addition, thanks to versioning, projects are presented as the smallest working project, called MVP (minimum value project), at an earlier stage, without waiting for all features to be developed. New features are offered to users in next versions. While updating to new versions, different bugs are fixed as well as adding new features. Using simple rules and a certain logic in these version transitions will avoid many problems that may be encountered in new versions. In recent years, project managers have used semantic versioning logic to manage their versions. In this method, projects are versioned in numbers in the form of “major.minor.patch”. For semantic versioning to work, it is necessary to define an open access API in the project. After that, when version transitions are made, jumps are made in the relevant branch according to certain rules. For example, if some bugs that do not break the API have been resolved while in version 1.0.0, patch version 1.0.1 is made. Similarly, if the project is in version 1.0.1, if API additions/changes are made that are compatible with the current version, the project is upgraded to a new minor version as 1.1.0. If an API that is incompatible (API break) with the current version has been added, a major version transition is made as 2.0.0. In this context, the type and number of changes made in version management are very important. All these changes are presented as release notes with each new version. The developments in the project are prepared by classifying them with new, feature, bug fix, etc. different labels in the release notes. Thus, these labels also help to determine the level of version jumping to be made. In most software projects, creating these labels and determining version updating steps are handled manually. However, A. Seker ¸ (B) · S. Ye¸silyurt · ˙I. Can Ardahan · B. Çınar Department of Computer Engineering, Sivas Cumhuriyet University, Sivas, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_28
399
400
28 Prediction of Development Types from Release …
this situation causes extra cost to developers and project managers. In this context, different models are proposed to automate these processes. In this study, it is aimed to classify release notes, which is an important step to automate version transitions. The type of changes given in the release notes have been estimated by machine learning methods.
28.2 Related Works In this section, classification methods applied for different problems in various software engineering fields and studies on software versioning are reviewed. One of the major challenges in software projects is the problem of classification/labelling of issues. Although some issues are labelled by developers in projects, issue labelling is used rarely [1]. For this reason, different methods are being tried to label these issues. In a study, authors are analyzed the cost of mislabeling problems [2]. It has been emphasized how important automatic issue labelling is, that approximately 1/3 of the more than 7000 issues examined are incorrectly labelled. Antonio et al. used naive bayes, ADTree and linear logistic regression methods to classify 1800 issues collected from issue monitors of Mozilla, Eclipse and JBoss projects [3]. In another study used the same dataset, different text mining techniques were used, and higher F-score results were obtained compared to the previous study [4]. In another study, 4000 issues that extract from JIRA were analyzed and classify with the naive bayes method [5]. When solving bugs in software projects, sometimes the bug can be assigned to the wrong developer. In this case, this bug is forwarded to another developer. The network formed by the redirects made in this way is called a bug tossing graph. In a study, various machine learning methods and tossing graphs were used to solve the bug assignment problem [6]. Another problem in software development is the transition of projects to new versions after solving the bugs given in the previous section. One of the important steps in software versioning is the creation of release notes. The release notes contain all the changes made in the new version of the projects (i.e. description of new features, improvements, bug fixes, deprecated features, etc.). Moreno et al. proposed a method is called that ARENA for automatic generation of release notes [7]. In another study for the generation of release notes, it was stated that more accurate notes were produced than manually created [8].
28.3 Method and Material 28.3.1 Dataset First, a dataset must be created to label developments in the release notes. In this study, a dataset was created with the 800 release notes that obtained from Mozilla Firefox, Mozilla Thunderbird, OBSStudio and Slack projects. While creating the dataset,
28.3 Method and Material
401
Fig. 28.1 The techniques of AI
the release notes were drawn automatically using the selenium library of Python programming language. The dataset contains descriptions of all the improvements for each release. These descriptions have different labels depending on the development type. When creating the dataset, projects with common labels in the release notes were selected. Thus, the obtained release notes have 5 different labels; fixed, new, changed, unresolved and other. The distribution of improvements according to the labels in the dataset is given in Fig. 28.1.
28.3.2 Pre-processes The release notes are textual data. The textual data are defined as unstructured data [9]. Before developing a model on textual data, it is necessary to convert this data into structured datasets with the help of various pre-processes in order for machine learning algorithms to work correctly [10]. The preprocessing steps applied to the Release_Note column of the dataset used in the study are: • • • • •
All letters are convert into lower case. Punctuation symbols are removed. Alpha numerical characters are removed. All special characters are removed. All stop words are removed.
In this study, firstly, we tried to use the stemming. However, this process has negative impact to success of model. For this reason, we ignored the stems of words. For the traditional machine learning algorithms used in the study, it is necessary to extract the features and prepare the data after the text pre-processing. Therefore, we
402
28 Prediction of Development Types from Release …
converted the text to vector representation using the term frequency-inverse document frequency (TF-IDF) method. With the bag of words method, each release note is represented as a vector according to the words it contains. Two multipliers are used in the TF-IDF approach. The TF multiplier used in this study is the number of occurrences of the relevant term in a release note. IDF is the number of times the related term is mentioned in different release notes (Eq. 28.1). t f _id f _{(t, d, D)} = t f (t, D) ∗ id f (t, D)
(28.1)
The TfidfVectorizer method of Sklearn library were used for converting documents to feature vectors in the phase of encoding. Thus, the (800, 1545) dimensional TF_IDF matrix was generated. As the last step of the pre-process, the labels of developments were converted into numerical format with the help of the LabelEncoder method.
28.3.3 Methods The 5 different machine learning methods are used in this study. They are Multinomial Naive Bayes (MNB), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Classifier (GBM). The models are executed on the Google Collaboratory platform. The Scikit-learn library was used to implement the classification methods. While deciding on the algorithms to be used in text classification, studies on this subject in the literature and the algorithms used have been the mainstay. For example, Naive Bayes is one of the oldest learning algorithms and is used for purposes such as text classification, sentiment analysis, document categorization. Naive Bayes is basically a probabilistic classifier and makes use of conditional probability while classifying. In this study, Multinomial Naive Bayes, which is a type of Naive Bayes classifier and generally used in text classification, was used [11]. LR is one of the most easily explained models among machine learning algorithms and it is one of the models that are widely used in classification problems and have serious counterparts in the industry. If the type of variable we want to predict is categorical, we can use the Logistic Regression model [12]. To avoid overfitting, L1 or L2 regularity is usually done. In this study, the penalty value, which is the regularization parameter, is taken as L2. SVM is a popular supervised learning algorithm used in the field of pattern recognition, text classification [13]. There are two types of errors in the SVM algorithm, “margin error” and “classification error”. These are expected to be optimized. While calculating the SVM cost function, the decision of which error is more important is determined by the C hyper parameter. There is a trade-off between these two error types. If the C value is chosen large, due to this tradeoff, the margin between lines will decrease, which can cause overfit. Therefore, in our study, parameter C was chosen as 0.1, which is a smaller value than its default value (1.0).
28.3 Method and Material
403
Table 28.1 The hyperparameters of used methods Algorithms
Hyperparameters
Multinomial NB
default parameters
SVM (LinearSVC)
class_weight = ‘balanced’, max_iter = 10,000, tol = 1e-4, C = 0.1
Random forest
n_estimators = 100, max_depth = 100, min_samples_split = 10, n_jobs = −1
Gradient boosting classifier
n_estimators = 50, max_depth = 10
RF is a model that provides classification by dividing a data into certain parts and obtaining a new decision tree from each part. At the same time, RF is a collective learning method that aims to reveal a strong model with more than one weak learner and is frequently used in multi-text classification tasks [12]. In this study, the number of trees (n_estimator), the maximum depth of the tree (max_depth), the minimum number of samples required to split the node in the tree (min_samples_split) values were determined as 100, 100, 10, respectively. Also, n_jobs parameter is taken as − 1 to run jobs in parallel using all processors. GBM is an ensemble machine learning algorithm used for classification problems similar to RF. Similar to RF, GBM uses the boosting technique by combining some weak learners to produce a strong model. Relatively, it has been preferred because it is a new machine learning method compared to other machine learning methods used. The hyperparameters of the used algorithms are given in Table 28.1. Stratified K-Fold Cross Validation method was used due to the unbalanced distribution of the data set. As seen in Fig. 28.2, stratify is a method used to divide the dataset into training and test sets when there are no balanced number of samples for each class label in the classification problem, keeping the same proportions of samples in each class as observed in the original dataset.
28.3.4 Model Evaluation After enabling training operations on the dataset by making use of the necessary classification model libraries, classification accuracy (accuracy), precision (precision), sensitivity (recall) and F-Score (F1-Score) metrics were used for model evaluation. In addition, stratified 5k-fold cross validation was applied to ensure the objectivity of the results and to prevent overfitting. The results of a k-fold cross-validation study are usually summarized by the average of model performance metrics. For example, the F1-Score of the RF algorithm in Table 28.3 is 77.63%. This value is actually the average of the F1-Scores of 5 different folds. The value of ±2.91% in parentheses indicates the standard deviation. The standard deviation is a measure of the variance of the model’s score values. In this example, it is seen that the F1-Score value obtained with 5 different folds varies between ±2.91 over the average value.
404
28 Prediction of Development Types from Release …
Fig. 28.2 The stratified K-fold cross validation
When the dataset is not evenly distributed among the classes, the accuracy, precision and sensitivity metrics are not sufficient by themselves to give a meaningful comparison result. Therefore, the F1-Score metric is used. Confusion matrix used for binary classification is given in Table 28.2. In multiclassification, the criteria in Table 28.2 should be generalized for each Ci class. Equations (28.2), (28.3), (28.4) show the calculation of precision, recall and F1Score metrics, respectively. The w value indicates that the ‘weighted’ average was used when calculating these values. l Pr ecision w = l
i=1
i=1 (t pi
l Recallw = l
i=1
i=1 (t pi
F1Scor ew =
Table 28.2 The confusion matrix for binary classification
t pi
(28.2)
+ f pi )
t pi
(28.3)
+ f ni )
2 ∗ Pr ecision w ∗ Recallw Pr ecision w + Recallw
(28.4)
Predicted: H1
Predicted: H2
Actual: H1
True positive
False negative
Actual: H0
False positive
True negative
28.4 Results
405
Table 28.3 The results of used methods Algorithms
Accuracy
Precision
Recall
F1-Score
MNB
71.00% (±2.89%)
76.52% (±2.26%)
71.00% (±2.89%)
69.67% (±3.03%)
Time (s) 0.07
LR
74.38% (±3.81%)
76.80% (±3.38%)
74.38% (±3.81%)
74.04% (±3.99%)
0.53
SVM
77.38% (±4.93%)
78.38% (±4.82%)
77.38% (±4.93%)
77.45% (±4.98%)
0.08
RB
77.88% (±2.97%)
80.48% (±2.40%)
77.88% (±2.97%)
77.63% (±2.91%)
2.89
GBM
77.00% (±3.05%)
78.89% (±3.05%)
77.00% (±3.05%)
76.78% (±3.19%)
22.89
28.4 Results In this section, the results obtained from the data set will be interpreted in the form of tables and graphs. For the release notes classification task, 5 different learning algorithms were implemented. All algorithms used in the study, the results of the performance metrics of the algorithms and the training periods are given in Table 28.3. The average and standard deviation of the calculated performance metrics of each algorithm are given in Table 28.3. When evaluated in terms of runtime and performance, the most successful algorithm among 5 different classifiers is the SVM algorithm. The performance comparison of accuracy, sensitivity, sensitivity and F1-score values of all models run using 5K-Fold is shown in Fig. 28.3, and the runtime comparison is shown in Fig. 28.4.
Fig. 28.3 The results of used methods
406
28 Prediction of Development Types from Release …
Fig. 28.4 The run time comparisons of used methods
28.5 Discussion The versioning of software is important step especially in terms of dependency problem. While a software transit to new version, the developments of the version are described with the release notes. These developments are presented as some labels according to improvement types. To classify these developments is important for creating an automatic versioning model. In this study, we proposed a classification model for developments in the release notes. According to the results, it was clearly seen that the SVM method gave the best results in terms of time-performance. As a future study, it is planned to expand the used dataset. With this new dataset, it is aimed to produce new solutions by using deep learning models, which are known to be successful especially in terms of natural language processing. The labels of developments in the release notes were given by the developers manually. When the results are analyzed whether some labels are right or not is a matter of debate. In this context, it is planned to evaluate the performance of models developed with explainable artificial intelligence techniques as another future study.
References 1. Seker ¸ A, Diri B, Arslan H (2020) Using open source distributed code development features on GitHub: a real-world example. In: 2nd international Eurasian conference on science, engineering and technology, pp 518–525 2. Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings—international conference on software engineering, pp 392–401. https://doi.org/10.1109/ICSE.2013.6606585
References
407
3. Antoniol G, Ayari K, di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies, CASCON’08. https://doi.org/10.1145/1463788.146 3819 4. Zhou Y, Tong Y, Gu R, Gall H (2014) Combining text mining and data mining for bug report classification. In: Proceedings—30th ınternational conference on software maintenance and evolution, ICSME 2014. Institute of Electrical and Electronics Engineers Inc., pp 311–320. https://doi.org/10.1109/ICSME.2014.53 5. Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Matsumoto K et al (2015) A dataset of high impact bugs: manually-classified issue reports. In: IEEE international working conference on mining software repositories, vol 2015 Aug. IEEE Computer Society, pp 518–521. https://doi.org/10.1109/MSR.2015.78 6. Bhattacharya P, Neamtiu I, Shelton CR (2012) Automated, highly-accurate, bug assignment using machine learning and tossing graphs. J Syst Softw 85(10):2275–2292. https://doi.org/ 10.1016/j.jss.2012.04.053 7. Moreno L, Bavota G, di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic generation of release notes. In: Proceedings of the ACM SIGSOFT symposium on the foundations of software engineering, 484–495, 16–21 Nov 2014. https://doi.org/10.1145/2635868.2635870 8. Ali M, Aftab A, Buttt WH (2020) Automatic release notes generation. In: Proceedings of the IEEE international conference on software engineering and service sciences, ICSESS, 76–81, Oct 2020. https://doi.org/10.1109/ICSESS49938.2020.9237671 9. Gharehchopogh FS, Khalifelu ZA (2011) Analysis and evaluation of unstructured data: text mining versus natural language processing. In: 2011 5th international conference on application of information and communication technologies, AICT 2011. https://doi.org/10.1109/ICAICT. 2011.6111017 10. Mohan V (2015) Preprocessing Techniques for Text Mining - An Overview. International Journal of Computer Science & Communication Networks 5(1):7–16 11. Xu S, Li Y, Wang Z (2017) Bayesian multinomial Naïve Bayes classifier to text classification. Lecture Notes Electr Eng 448:347–352. https://doi.org/10.1007/978-981-10-5041-1_57 12. Shah K, Patel H, Sanghvi D, Shah M (2020) A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augmented Human Res 5(1):1–16. https://doi.org/10.1007/S41133-020-00032-0 13. Saigal P, Khanna V (2020) Multi-category news classification using support vector machine based classifiers. SN Appl Sci 2(3):1–12. https://doi.org/10.1007/S42452-020-2266-6
Chapter 29
Design Optimization of Induction Motor with FDB-Based Archimedes Optimization Algorithm for High Power Fan and Pump Applications Burak Yenipinar, Ay¸segül Sahin, ¸ Yusuf Sönmez, Cemal Yilmaz, and Hamdi Tolga Kahraman
29.1 Introduction The issue of global energy efficiency is growing in importance as a result of environmental and economic concerns. The sector with the highest energy consumption is undoubtedly industry. Induction motors (IM), which are mostly preferred in fan and pump applications due to their ability to operate directly at mains voltage, robust structure, low cost and simple, have the largest share in electrical energy consumption in the industry [1–3]. For this reason, the efficiency of IMs used in fan and pump systems is a very important research topic. For this reason, the regulations published by the European Union regarding the efficiency level of IMs are becoming more stringent day by day. As an example of this situation, as of July 1, 2021, the regulation numbered 2019/1781 came into force instead of the current regulation. With the publication of this Regulation, the use of IMs in efficiency class IE2 is prohibited in the countries of the European Union [4]. B. Yenipinar (B) Department of Electronics Technology, Vocational School, Ostim Technical University, Ostim, 06374 Ankara, Turkey e-mail: [email protected] A. Sahin ¸ · H. T. Kahraman Software Engineering of Technology Faculty, Karadeniz Technical University, Trabzon 61080, Turkey e-mail: [email protected] Y. . Sönmez Faculty of Engineering, Mingachevir State University, Mingachevir, Azerbaijan e-mail: [email protected] C. Yilmaz Faculty of Information and Telecommunication Technologies, Azerbaijan Technical University, Baku, Azerbaijan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_29
409
410
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
When it comes to efficiency, permanent magnet synchronous motors (PMSM) are known to offer higher efficiency values than IMs. The dramatic increase in the prices of rare earth elements such as Dysporsium and Terbium contained in NdFeB magnets used in PMSMs, especially in 2011, and the inability to maintain price stability came into the literature as the rare-earth crisis [5]. For this reason, design optimization studies carried out to reduce the losses of IMs are still up to date. The design of electrical machines requires the solution of nonlinear multi-purpose problems such as high efficiency, low cost, minimum weight of active materials at the same time [6]. To solve these problems, artificial intelligence optimization algorithms are a useful tool used by the designer to reduce design time and obtain optimal electric machine design [7]. When the literature is examined, various optimization algorithms are used in the process of optimization the design of IMs. Srikumar et al. [8] have optimized the effects of core and rotor conductive materials used in an IM on losses and performance parameters such as efficiency, power factor, temperature rise and cost by using Genetic Algorithm (GA). As a result of the study, it has been determined that both the core material and the rotor conductor material have significant effects on the IM performance. However, considering the cost constraints, it has been determined that the core material has a greater effect on IM performance. Das et al. [9] performed the design study of a 6-phase IM with the traditional design method based on assumptions, Artificial Bee Colony (ABC) and GA and presented the results comparatively. The study is single-objective and the aim is to minimize rotor and stator copper losses. As a result of the study, it was stated that the lowest stator and rotor copper losses were achieved by using the ABC algorithm. Nicolas Rivière et al. [10] have used the Meta-model of Optimal Prognosis (MOP) method to obtain an IM design with minimum core length, maximum efficiency, maximum peak power at rated point above 200 kW and weight less than 44.6 kg. It has been determined that the optimum motor design obtained using the MOP approach provides higher efficiency, higher specific torque and higher specific power levels to the IM used in Tesla 60S as a purpose of traction. Qingfeng et al. [11] have shown that the algorithm produces effective solutions by using Multi-objective Ant Colony Algorithm (MOACA), which is an evolutionary algorithm type in the IM design process. It is seen that optimization algorithms such as Genetic Algorithm (GA), Artificial Bee Colony (ABC), Symbiotic Organisms Search (SOS) are widely used in the design optimization of electrical machines. Although these algorithms contribute to the solution of many problems, it is not possible to solve every problem in the best way according to the No Free Lunch Theorem [12, 13]. For this reason, it is obvious that the use of an optimization algorithm that has not been used before in the design optimization of the IM will contribute to the literature. In this study, Fitness Distance Balance based developed Archimedes Optimization Algorithm (FDB-AOA), was used instead of the commonly used optimization algorithms for design optimization of electrical machines. Within this search, Fitness Distance Balance based developed Archimedes Optimization Algorithm (FDB-AOA), was used instead of the commonly used optimization algorithms for design optimization of electrical machines. In the design optimization study, the
29.2 Mathematical Formulation of Optimization Problem
411
geometric parameters and inequality constraints of the IM were determined. Then, the design optimization of the IM was done using the FDB-AOA algorithm to provide the minimum cost/maximum efficiency criterion. In the second part of this study, the objective function, inequality constraints and design parameters used in the optimization process are presented. In addition, the equations with the losses of the IM are given in this section using the steady state model. In the third and fourth parts, the development process of FDB-based AOA and the conditions in the experimental study process is explained, respectively. In the fifth part, benchmark test results of FDB-based developed AOA are presented. In this part, the design optimization results of the IM are given by using the developed algorithm. Finite Element Analysis (FEA) results of the optimum design are also presented in this part.
29.2 Mathematical Formulation of Optimization Problem IM losses must be minimized to achieve an IM design in the IE3 efficiency class defined in the IEC 60,034–30-1-2014 standard. The geometric dimensions and specification of the IM for which the design optimization study was carried out are given in Table 29.1. Losses in an IM consist of iron loss, stator and rotor copper losses, mechanical losses and additional losses. When calculating the losses of the IM, the single phase equivalent circuit given in Fig. 29.1 was used. For a three-phase IM, the stator copper losses (Ps) can be calculated using steady state model [14]. Ps = 3 × R1 × I12 Table 29.1 Induction motor parameters
Phase number
(29.1)
3
Rated voltage
400 V
Rated frequency
60 Hz
Number of poles
4
Rating output power
185 kW
Stator slot number
48
Rotor slot number
36
Rotor skewing
1 slot pitch
Connection
Delta
Stator outer diameter
500 mm
Stator inner diameter
300 mm
Core length
450 mm
Air gap length
1.5 mm
Number of stator slots
48
Number of rotor slots
36
412
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
Fig. 29.1 Steady state model of an IM [1]
where R_1 is stator resistance and I_1 is stator current. Rotor copper losses (Pr) can be calculated similarly to stator copper losses [14].
Pr = 3 × Rr × (I2 )2
(29.2)
where Rr is rotor resistance and I2 is rotor current of the single phase equivalent circuit referred to stator. Core losses (PFe) Eq. (29.3) can be calculated as in following [15]. PFe ≈
3Ui2 × Rc
1 1+
X1 Xm
2
(29.3)
where X 1 is stator leakage reactance; Rc is iron loss resistance; X m is magnetizing reactance and Ui is stator voltage. In addition, the sum of stray loss and mechanical losses was accepted as 0.1% of the motor output power. Finally, total motor losses and efficiency are given in Eqs. (29.4 and 29.5), respectively. Ploss = Ps + Pr + PFe + Psl + Pmec n=
Pinput − Ploss Pinput
(29.4) (29.5)
where Psl is stray loss and; Pmec is mechanical loss. In Table 29.2 below, prices of active materials used in the IM are given. The cost calculation based on the unit prices of the active materials used in the IM shown in Table 29.2 is given in Eq. (29.6). cost = Cor e_weight ∗ 4 + copper _weight ∗ 30 + Aluminium_weight ∗ 8.5 (29.6)
29.2 Mathematical Formulation of Optimization Problem Table 29.2 Materials prices used in IM
413
Material
USD/kg
Silicon steel
4
Aluminum
8.5
Enameled copper
30
The objective function determined in order to minimize the cost and losses at the same time is presented in the equations below. min : { f 1 (x) = 100 − e f f iciency, f 2 (x) = cost
(29.7)
Fobj = f 1 (x) + f 2 (x)
(29.8)
The inequality constraints given in the equations below was selected from the parameters such as power factor and starting current that affect the steady-state and transient characteristics of the IM. g1 (x) = 0.85 − p f ≤ 0
(29.9)
g2 (x) = tl − 200 ≤ 0
(29.10)
g3 (x) = 90 − e f f iciency ≤ 0
(29.11)
g4 (x) = Bst − 2 ≤ 0
(29.12)
g5 (x) = Bsy − 1.6 ≤ 0
(29.13)
g6 (x) = Br t − 1.8 ≤ 0
(29.14)
g7 (x) = Br y − 1.6 ≤ 0
(29.15)
g8 (x) = Istar t /Inom − 8 ≤ 0
(29.16)
g9 (x) = 1.8 − Tstar t /Tnom ≤ 0
(29.17)
where p f is power factor; tl is Armature Thermal Load; Bst is stator teeth flux density; Bsy is stator yoke flux density; Br t is rotor teeth flux density; Br y is rotor yoke flux density; Istar t is starting current; Inom is rated current; Tstar t is starting torque and Tnom is rated torque.
414
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
The high efficiency of IMs alone are not enough when evaluated in terms of performance. Therefore, some performance parameters were used as inequality constraints in the optimization process such as power factor, stator teeth flux density etc. The efficiency of the IM in continuous running duty (type S1) is proportional to the temperature increase of the stator and rotor winding of IM. In order to minimize winding losses, a low temperature rise in the continuous running duty S1 of the motor can be achieved by a low thermal load [16]. Stator and rotor teeth, Stator and rotor back iron flux density values are the inequality constraints chosen to prevent the core from operating in the saturation. The fact that an IM has a high starting torque and a low starting current together is one of the issue that must be solved in the design process. For this reason Eqs. (29.16 and 29.17) the inequality constraints given in are used. The design parameters used in the design optimization process of the IM are expressed in Fig. 29.2. Table 29.3 presents the minimum and maximum values which these parameters can be used in the design optimization process. In order to interpret the stator copper losses correctly, the total stator slot fill factor was kept constant at 45% during the optimization process. Also the optimization process flowchart is presented in Fig. 29.3. In this study, an ANSYS Maxwell RMxprt package program is used for performance analysis of the engine while the optimization algorithm was running on MATLAB software. The optimization of the design parameters of the IM was carried out according to the objective function and inequality constraints based on the criteria of minimum cost/maximum efficiency.
Fig. 29.2 Design parameters
29.3 Method Table 29.3 Minimum and maximum value of design parameters
415 Parameters
Minimum value
Maximum value
Sod
480
550
Sid
290
330
Rod
287.5
327.5
Sh2
20
35
Rsr
4
10
Rhs2
30
40
Rhs0
2
8
Core length
300
450
Conductor per slot
2
20
Fig. 29.3 Flowchart of optimization process
29.3 Method In this section, Fitness-Distance Balance based Archimedes Optimization Algorithm is introduced.
416
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
29.3.1 Archimedes Optimization Algorithm Archimedes Optimization Algorithm (AOA) is an algorithm developed inspired by the forces acting on the objects immersed in the liquid and the positions of the objects in the liquid. In AOA, individuals in the population are objects immersed in liquid [17]. Archimedes principle was used in the development of the algorithm. Archimedes principle is based on the principle that when an object is completely or partially immersed in a liquid, the liquid is pushed upwards against the object with a force equal to the weight of the liquid displaced by the object [17]. AOA uses two different equations. One of these equations (Eq. 29.18) is used in the exploration phase and the other (Eq. 29.19) in the exploitation phase. X it+1 = X it + C1 ∗ rand ∗ accnor m i X rand − X it t t X it+1 = X best ± C2 ∗ rand ∗ accnor m i ∗ C3 ∗ T F ∗ X best − X it
(29.18) (29.19)
In the given equations, X it + is the position of the object at the t-th iteration of t is the object which has the best position. The parameters the i-th solution. The X best from which the best cost function values of AOA are obtained are C1, C2, C3. These parameters are equal to 2, 6, 2 respectively. The parameter acc_nor m i is the current normalized acceleration value. In AOA, the transfer operator that transforms the search from discovery to use phase is represented by the TF parameter. The detailed description of the parameters used in the equations is given from the reference work [17].
29.3.2 Archimedes Optimization Algorithm (AOA) with Fitness Distance Balance In AOA, neighborhood search and diversity tasks are performed using Eqs. (29.18 and 29.19). The X rand vector in Eq. (29.18) is a randomly chosen material and t in Eq. (29.19) represents the best material in serves the diversity process. X best the collection of objects in the absence of collisions between objects. In AOA, the t . According exploitation task is fulfilled by Eq. (29.19) by intensification around X best to these explanations, the search performance of the AOA changes depending on the positions of the guide objects in Eqs. (29.18 and 29.19). When Eqs. (29.18 and 29.19) are examined, it is seen that three different objects, namely X it , X rand and X best guide the search process in AOA. The first of these three guides, X it , is the object selected sequentially from among the population members. The second guide, X rand , is an object randomly selected from among the population members. The third guide, X best , is the object with the best objective function value among the population members.
29.3 Method
417
Fitness-Distance Balance (FDB) is a selection method [18] developed to identify vectors that can best guide the search process in population-based meta-heuristic search algorithms. Many algorithms were designed using the FDB-based guidance mechanism and significant improvements were achieved in the search performance of the algorithms [19–22]. Accordingly, the convergence equations performing neighborhood search and diversity tasks in AOA can be redesigned using the FDB-based guidance mechanism. The purpose of the redesign is to examine the effect of the FDB-based guide selection method on the exploitation and exploration phases of the AEO algorithm and to determine the guidance mechanism that exhibits the best search performance. Based on the explanations above, Eqs. (29.18 and 29.19) used in the convergence process in the AOA algorithm have been redesigned with an FDB-based guidance mechanism. Accordingly, the X f db vector selected by the FDB selection mechanism is used instead of some of the X it , X rand and X best guides in these two equations. Variations of AOA with an FDB-based guidance mechanism were designated Case-1, 2, and 3. Accordingly, the mathematical models of the FDB-AOA variations created by redesigning the guide selection process in Eqs. (29.18 and 29.19) are presented in Table 29.4. When Table 29.4 is examined, it is seen that three different FDB-AOA variations were created. These are labeled as ‘Case-1, Case-2, Case-3’ in column 1 of Table 29.1. Accordingly, some of the guide positions in Eqs. (29.18 and 29.19) were assigned using the FDB method. Guidelines used in each of the cases are provided in the second and third columns of Table 29.1. The application rate of the FDB selection method varies between 20 and 80%. Whether these design changes are effective or not has been investigated with a comprehensive experimental study. Table 29.4 Mathematical descriptions of variations of FDB-AOA Explanation Case-1 X i ← X f db
Convergence equations redesigned using the FDB-based guide selection method X it+1 = X f db + C1 ∗ rand ∗ acc_nor m i . ∗ (X rand − X i )
X best ← X f db X it+1 = X f db + C2 ∗ rand ∗ acc_nor m i . ∗ (C3 ∗ T F ∗ X best − X i ) X i ← X f db X it+1 = X best − C2 ∗ rand ∗ acc_nor m i . ∗ C3 ∗ T F ∗ X best − X f db Case-2 X i ← X f db
X it+1 = X f db + C1 ∗ rand ∗ acc_nor m i . ∗ (X rand − X i )
X best ← X f db X it+1 = X f db + C2 ∗ rand ∗ acc_nor m i . ∗ (C3 ∗ T F ∗ X best − X i ) X best ← X f db X it+1 = X best − C2 ∗ rand ∗ acc_nor m i . ∗ C3 ∗ T F ∗ X f db − X i Case-3 X i ← X f db
X it+1 = X f db + C1 ∗ rand ∗ acc_nor m i . ∗ (X rand − X i )
X best ← X f db X it+1 = X f db + C2 ∗ rand ∗ acc_nor m i . ∗ (C3 ∗ T F ∗ ∗X best − X i ) X best ← X f db X it+1 = X f db − C2 ∗ rand ∗ acc_nor m i . ∗ (C3 ∗ T F ∗ ∗X best − X i )
418
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
29.4 Experimental Settings • For experimental study settings, the conditions defined for the CEC 2020 competition are taken as reference [23]. • In setting the parameters of the AOA algorithm, the settings given in his study were taken as reference [19]. • 51 independent studies were conducted to benchmark AOA and FDB-AOA variations. • In order to ensure fairness between AOA and FDB-AOA variations, a termination criterion is defined over the maximum number of function evaluations. This value is 10.000*D (D: problem size). • CEC 2020 comparison functions 30/50/100 dimensional settings were used to reveal the performance of the proposed method in low, medium and high dimensional search fields [23]. • Experimental studies were performed on MATLAB®R2016b, on Intel (R) Core ™ i7-8550U CPU @ 1.80 GHz and 16 GB RAM and × 64 based processor.
29.5 Results and Analysis Experimental study section consists of two subsections. In the first subsection, the performance of the FDB-AOA algorithm is tested and analyzed. In the second subsection, the design parameters of the IM are optimized for high-power fan and pump applications using the FDB-AOA algorithm.
29.5.1 Determining the Best FDB-AOA Method on Benchmark Problems In this section, the performances of the FDB-AOA variations (Case-1, Case-2 and Case-3) are analyzed using statistical test methods [24, 25].
29.5.1.1
Statistical Analysis
Ten different problems in the CEC 2020 benchmark suite have been optimized using the FDB-AOA variations and the base model of the AOA algorithm. Friedman test was performed by using the error values obtained by the algorithms for the 30/50/100 dimensions of the problems. Friedman ranking results are presented in Table 29.5. Accordingly, all three of the FDB-AOA variations have a better rank than the base algorithm.
29.5 Results and Analysis
419
Table 29.5 Friedman test ranking of AOA and FDB-AOA variations Algorithms
Dimension = 30
Dimension = 50
Dimension = 100
Mean rank
Case-1
3.37
3.22
2.96
3.18
Case-2
3.40
3.29
2.86
3.19
Case-3
3.35
3.38
2.95
3.23
AOA
4.01
4.21
4.85
4.36
Table 29.6 Wilcoxon pairwise comparison results between AOA and FDB-AOA variations
Versus AOA + / = /–
Dimension = 30D
Dimension = 50D
Dimension = 100D
Case-1
4/4/2
5/4/1
7/2/1
Case-2
4/4/2
6/2/2
8/1/1
Case-3
5/4/1
6/2/2
8/1/1
When the results given in Table 29.5 are examined, it is understood that there is a remarkable improvement in the search performance of AOA thanks to the FDB-based guidance mechanism. The analysis results obtained from the Wilcoxon pairwise comparison test performed between FDB-AOA variations and AOA are presented in Table 29.6. When the analysis results in Table 29.6 are examined, all three of the FDB-AOA variations obtained a better statistical result than the base algorithm in 30/50/100 dimensions. The Case-1 algorithm outperformed its competitor in 4 of 10 problems for the 30-dimension, had a worse result in 2 problems, and achieved similar performance with its competitor in 4 problems. When the problem size is 50, the Case-1 algorithm outperformed its competitor in 5 of 10 problems, outperformed its competitor in 1 problem, and achieved similar performance with its competitor in 4 problems. When the problem size is 100, the Case-1 algorithm outperformed its competitor in 7 of 10 problems, outperformed its competitor in 1 problem, and achieved similar performance with its competitor in 2 problems. According to Wilcoxon pairwise comparison results, the FDB-AOA variation that provides the most superiority to the AOA algorithm is Case-3. The mean and standard deviation information of the error values obtained in 30/50/100 dimensions for 10 problems with AOA and FDB-AOA variations on the CEC 2020 benchmark suite are given in Table 29.7.
29.5.1.2
Convergence Analysis
Box plot convergence performances of AOA and FDB-AOA variations for four different types of benchmark problems using error values obtained from 51 independent studies are presented in Fig. 29.4.
420
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
Table 29.7 Mean and standard deviations obtained for CEC 2020 F
D
AOA
Case-1
F1
30
1.71E + 04 (3.32E + 04)
3.14E + 03 (3.38E 2.52E + 03 (3.30E + 03) + + 03) +
50
8.41E + 07 (3.62E + 07)
1.72E + 03 (2.55E 6.42E + 03 ( 2.57E 2.74E + 03 (3.89E + 03) + + 04) + + 03) +
F2
F3
F4
F5
F6
F7
Case-2
Case-3 1.90E + 06 (3.22E + 03) -
100 1.79E + 09 (2.58E + 09)
1.04E + 07 (4.89E 8.22E + 05 (1.75E + 07) + + 06) +
1.07E + 07(3.30E + 07) +
30
6.64E + 03 (3.49E + 02)
5.12E + 03 (6.46E 4.96E + 03 (6.27E + 02) + + 02) +
5.22E + 03 (5.61E + 02) +
50
1.29E + 04 (5.91E + 02)
1.08E + 04 (1.17E 1.08E + 04 (1.07E + 03) + + 03) +
1.09E + 04 (1.02E + 03) +
100 2.92E + 04 (6.47E + 02)
2.85E + 04 (1.90E 2.89E + 04 (1.58E + 03) + + 03) +
2.84E + 04 (2.28E + 03) +
30
2.12E + 02 (5.9E + 01)
1.69E + 02 (1.56E 1.70E + 02 (1.52E + 01) + + 01) +
1.71E + 02 (1.7E + 01) +
50
3.62E + 02 (8.2E + 01)
3.55E + 02 (7.62E 3.74E + 02 (5.25E + 01) + + 01) -
3.46E + 02 (7.54E + 01) +
100 1.45E + 03 (2.05E + 02)
8.24E + 02 (1.24E 8.33E + 02 (1.30E + 02) + + 02) +
8.37E + 02 (1.25E + 02) +
30
5.06E + 00 (5.52E + 00)
7.34E + 00 (1.43E 6.87E + 00 (2.78E + 00) + 00) -
6.72E + 00 (2.41E + 00) -
50
1.11E + 01 (9.80E + 00)
1.48E + 01 (6.61E 1.57E + 01 (7.30E + 00) + 00) -
1.75E + 01 (5.54E + 00) -
100 0.00E + 00 (0.00E + 00)
1.54E + 01 (2.47E 1.90E + 01 (2.50E + 01) + 01) -
1.38E + 01 (2.24E + 01) -
30
1.18E + 06 (6.68E + 05)
1.02E + 06 (7.81E 1.01E + 06 (5.98E + 05) + + 05) +
9.16E + 05 (5.86E + 05) +
50
1.80E + 06 (7.93E + 05)
1.77E + 06 (1.09E 1.43E + 06 (8.42E + 06) + + 05) +
1.37E + 06 (8.22E + 05) +
100 6.71E + 06 (1.75E + 06)
3.22E + 06 (1.04E 3.34E + 06 (1.06E + 06) + + 06) +
3.53E + 06 (1.10E + 06) +
30
4.15E + 02 (2.36E + 02)
4.99E + 02 (1.51E 4.77E + 02 (1.93E + 02) + 02) -
4.67E + 02 (1.46E + 02) -
50
1.17E + 03 (5.54E + 02)
1.18E + 03 (4.20E 1.15E + 03 (4.72E + 02) + 02) +
1.21E + 03 (3.80E + 02) -
100 4.46E + 03 (6.23E + 02)
2.89E + 03 (6.16E 2.82E + 03 (6.28E + 02) + + 02) +
2.82E + 03 (6.97E + 02) +
30
3.32E + 05 (2.19E + 05)
2.67E + 05 (2.00E 3.11E + 05 (2.37E + 05) + + 05) +
3.96E + 05 (2.75E + 05) -
50
2.06E + 06 (8.99E + 05)
1.62E + 06 (8.23E 1.49E + 06 (9.96E + 05) + + 05) +
1.69E + 06 (8.45E + 05) +
100 3.89E + 06 (1.04E + 06)
2.65E + 06 (8.43E 2.43E + 06 (7.31E + 05) + + 05) +
2.86 E + 06 (1.02E + 06) + (continued)
29.5 Results and Analysis
421
Table 29.7 (continued) F
D
AOA
F8
30
1.79E + 03 (2.56 E 3.13E + 03 (1.83E 3.27E + 03 (1.88E + 03) + 03) + 03) -
2.83E + 03 (1.82E + 03) -
50
1.29E + 04 (4.57E + 02)
1.08E + 04 (1.24E 1.10E + 04 (1.20E + 03) + + 03) +
1.10E + 04 (1.25E + 03) +
100 3.00E + 04 (7.30E + 02)
2.93E + 04 (1.93E 2.89E + 04 (2.06E + 03) + + 03) +
2.88E + 04 (2.14E + 03) +
30
5.31E + 02 (6.07E + 01)
5.57E + 02 (1.77E 5.54E + 02 (1.52E + 01) + 01) -
5.51E + 02 (1.59E + 01) -
50
6.65E + 02 (7.34E + 01)
7.75E + 02 (6.17E 7.72E + 02 (6.44E + 01) + 01) -
7.85E + 02 (5.43E + 01) -
F9
100 1.59E + 03 (1.01E + 02) F10 30 50
Case-1
Case-2
Case-3
1.27E + 03 (7.49E 1.25 E + 03 (7.08E 1.25E + 03 (6.07E + 01) + + 01) + + 01) +
4.33E + 02 (2.27E + 01)
3.91E + 02 (9.77E 3.97E + 02 (1.52E + 00) + + 01) +
3.93E + 02 (1.28E + 01) +
6.88E + 02 (5.1E + 01)
5.84E + 02 (2.67E 5.89E + 02 (2.68E + 01) + + 01) +
5.82E + 02 (2.89E + 01) +
9.63E + 02 (6.22E 9.45E + 02 (6.69E + 01) + + 01) +
9.61E + 02 (6.50E + 01) +
100 1.38E + 03 (2.27E + 02)
When the box plots seen in Fig. 29.4 are examined, it is understood that there is a significant improvement in the performances of FDB-AOA variations compared to AOA, in parallel with the increase in problem dimension. This improvement applies to all four different types of problems. This shows that the guidance mechanism designed based on FDB is effective on the ability of balanced search and diversity in the population in the AOA algorithm.
29.5.2 Application of the Proposed FDB-AOA Method for Design Optimization of Induction Motor Independent 10 repetition analyzes of the design optimized motor were made using three different algorithms developed based on FDB and AOA base algorithms. When the results given in Table 29.8 are examined, it is seen that case-3 offers the best candidate solution. In addition, it is seen that three different FDB-AOA variations give better results than base model of the AOA algorithm. The result obtained in the 4th solution with the Case-3 variation was accepted as the optimum result. The design parameters produced in this solution are shown in Table 29.9. The results obtained after analytical analyses performed using optimum design parameters are presented in Table 29.10.
Fig. 29.4 Box-plot charts for CEC 2020 benchmark problems
d ) Dimension = 30
a ) Dimension = 30
e ) Dimension = 50
F2 (Basic)
b ) Dimension = 50
F1 (Unimodal)
f ) Dimension = 100
c ) Dimension = 100
422 29 Design Optimization of Induction Motor with FDB-Based Archimedes …
Fig. 29.4 (continued)
j ) Dimension = 30
g ) Dimension = 30
k ) Dimension = 50
F10 (Composition)
h ) Dimension = 50
F7 (Hybrid)
l ) Dimension = 100
i ) Dimension = 100
29.5 Results and Analysis 423
0.5089
0.4443
0.4695
0.4445
Case-2
Case-3
0.4912
0.5129
0.4781
0.4837
AOA
2
1
Case-1
No
0.4556
0.4948
0.5026
0.5323
3
0.4400
0.4552
0.4896
0.4846
4
0.5025
0.4867
0.4691
0.4534
5
Table 29.8 Results of the FDB-AOA optimization problems
0.4612
0.4627
0.4832
0.4861
6
0.4678
0.4517
0.4729
0.4980
7
0.4636
0.4961
0.4760
0.5375
8
0.4777
0.4605
0.4770
0.4965
9
0.5585
0.4928
0.5170
0.4692
10
0.4400
0.4517
0.4691
0.4534
Min
0.5585
0.5089
0.5170
0.5375
Max
0.4716
0.4779
0.4884
0.4927
Avg
424 29 Design Optimization of Induction Motor with FDB-Based Archimedes …
29.5 Results and Analysis Table 29.9 Optimum design parameters
Table 29.10 Optimum IM performance parameters
425 Parameters
Value
Sod
520.39
Sid
322.21
Rod
319.71
Sh2
30.68
Rsr
8.23
Rhs2
31.89
Rhs0
2.63
Core length
300.00
Conductor per slot
20
Parameters
Value
Efficiency
96.23%
Power factor
0.873
Armature thermal load
192.24
Stator teeth flux density
1.61
Stator yoke flux density
1.09
Rotor teeth flux density
1.18
Rotor yoke flux density
0.65
Istar t /Inom
7.39
Tstar t /Tnom
2.52
When the performance parameters of the IM is examined, it is seen that the solution proposed by Case-3 satisfies all the inequality constraints and offers acceptable performance parameters. In the IEC 60,034–30-1–2014 standard, the lower limit of the IE3 efficiency value of the 185 kW IM operating at a line voltage of 60 Hz is 96.2%. As a result of the analysis of the optimum design model, the efficiency value is 96.23% and it is seen that this design is in the IE3 efficiency class. After obtaining the optimal result with the analytic methods, 2D transient FEA analysis of the optimum design were made with ANSYS Maxwell. The design of the optimum squirrel cage IM (SCIM) obtained by FEA analysis has been verified (Figs. 29.5 and 29.6). After the 2D FEA analyzes of the optimal design, the magnetic flux density values, which are one of the inequality constraints, were observed and its suitability was verified. After the 2D FEA analyzes of the optimum model, the magnetic flux density values, which are one of the inequality constraints, were observed and its suitability was verified. It is also seen that torque fluctuation is acceptable.
426
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
Fig. 29.5 Mesh and distribution of magnetic flux density at 500 ms
Fig. 29.6 0−500 ms torque-time graph
29.6 Conclusions In this study, two original contributions to the literature were presented. The first of these is the improvement of the search performance of the algorithm as a result of the studies carried out on the design of AOA, one of the most up-to-date meta-heuristic search algorithms in the literature. This improvement was achieved by redesigning the equations used for convergence in the AOA algorithm. The Fitness-Distance Balance selection method used in the design of the guide selection mechanism has strengthened the diversity ability of AOA, which enables it to escape from the local solution traps. As a result of this study, the FDB-AOA algorithm has been developed and the performance of the developed algorithm has been tested by experimental studies carried out on the CEC 2020 benchmark suite, and analyzed and verified using non-parametric statistical test methods. Thus, the search performance has been improved and a powerful FDB-AOA algorithm has been introduced to the literature.
References
427
When the analysis results of the problems in 30, 50 and 100 dimensions are examined, the second original contribution of the article to the literature is the optimization study carried out on the real world engineering design problem. The design of the Induction Motor for High Power Fan and Pump Applications has been optimized using the FDB-AOA algorithm proposed in this article. As a result of the analytical analysis, the optimum engine model obtained was found to meet the targeted IE3 efficiency class conditions. In addition, 2D FEA analyzes confirmed the motor design and showed that the FDB-AOA firstly presented in this study, which has not been used before in the design optimization of electrical machines in the literature, is a powerful and effective method that can be used in the design optimization of IMs. In future studies, it is aimed to investigate different MHS algorithms that can be used in the optimization of electrical machine designs and to develop effective optimization methods. The MATLAB source codes of the FDB-AOA algorithm developed and proposed for the first time in this article will be shared on the MATLAB File Exchange platform after the article is published. You can search the MATLAB File Exchange platform with the keyword FDB-AOA to download the source codes.
References 1. Mirzaeva G, Sazdanoff L (2015) The effect of flux optimization on energy efficiency of induction motors in fan and pump applications. In: 2015 Australasian universities power engineering conference (AUPEC), September, IEEE, pp 1–6 2. Vartanian R, Deshpande Y, Toliyat HA (2013) Performance analysis of a ferrite based fractional horsepower permanent magnet assisted SynRM for fan and pump applications. In: 2013 ınternational electric machines and drives conference, May, IEEE, pp 1405–1410 3. Widmer JD, Martin R, Kimiabeigi M (2015) Electric vehicle traction motors without rare earth magnets. Sustain Mater Technol 3:7–13 4. Commission Regulation (EC) No 1781/2019 (2019) Laying down ecodesign requirements for electric motors and variable speed drives pursuant to Directive 2009/125/EC of the European Parliament and of the Council, amending Regulation (EC) No 641/2009 with regard to ecodesign requirements for glandless standalone circulators and glandless circulators integrated in products and repealing Commission Regulation (EC) No 640/2009 5. Bourzac K (2011) The rare-earth crisis, MIT Technology Review 2011. (Available: http://www. technologyreview.com/featuredstory/423730/the-rare-earth-crisis/) 6. Duan Y, Ionel DM (2013) A review of recent developments in electrical machine design optimization methods with a permanent-magnet synchronous dr benchmark study. IEEE Trans Ind Appl 49(3):1268–1275 7. More I (2017) Optimization of electric machine designs—part I. IEEE Trans Indust Electron 64(12):9717 8. Mallik S, Mallik K, Barman A, Maiti D, Biswas SK, Deb NK, Basu S (2017) Efficiency and cost optimized design of an induction motor using genetic algorithm. IEEE Trans Industr Electron 64(12):9854–9863 9. Das PP, Mahato SN (2016) Artificial Bee Colony based design optimization of a six-phase induction motor. In: 2016 2nd ınternational conference on control, ınstrumentation, energy and communication (CIEC), January, IEEE, pp 526–530
428
29 Design Optimization of Induction Motor with FDB-Based Archimedes …
10. Rivière N, Villani M, Popescu M (2019) Optimisation of a high speed copper rotor induction motor for a traction application. In: IECON 2019–45th annual conference of the IEEE ındustrial electronics society, October, vol 1. IEEE, pp 2720–2725 11. Chen Q, Li G, Wang Q, Zhou R, Fang G, Xu W (2011) Optimization design of three-phase asynchronous motor based on multi-objective ant colony algorithm. In: 2011 ınternational conference on electrical and control engineering, September, IEEE, pp 2410–2413 12. Kahraman HT, Bakir H, Duman S, Katı M, Aras S, Guvenc U (2021) Dynamic FDB selection method and its application: modeling and optimizing of directional overcurrent relays coordination. Appl Intell 1–36 13. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82 14. Aarniovuori L, Lindh P, Kärkkäinen H, Niemelä M, Pyrhönen J, Cao W (2019) Analytical evaluation of high-efficiency induction motor losses. In: 2019 IEEE ınternational electric machines and drives conference (IEMDC), May, IEEE, pp 1501–1507 15. Aarniovuori L, Niemelä M, Pyrhönen J, Cao W, Agamloh EB (2018) Loss components and performance of modern induction motors. In: 2018 XIII ınternational conference on electrical machines (ICEM), September, IEEE, pp 1253–1259 16. Nogal Ł, Magdziarz A, Rasolomampionona DD, Łukaszewski P, Sapuła Ł, Szreder R (2021) The laboratory analysis of the thermal processes occurring in low-voltage asynchronous electric motors. Energies 14(8):2056 17. Hashim FA, Hussain K, Houssein EH, Mabrouk MS, Al-Atabany W (2021) Archimedes optimization algorithm: a new metaheuristic algorithm for solving optimization problems. Appl Intell 51(3):1531–1551 18. Kahraman HT, Aras S, Gedikli E (2020) Fitness-distance balance (FDB): a new selection method for meta-heuristic search algorithms. Knowl-Based Syst 190:105169 19. Aras S, Gedikli E, Kahraman HT (2021) A novel stochastic fractal search algorithm with fitness-Distance balance for global numerical optimization. Swarm Evol Comput 61:100821 20. Guvenc U, Duman S, Kahraman HT, Aras S, Katı M (2021) Fitness-distance balance based adaptive guided differential evolution algorithm for security-constrained optimal power flow problem incorporating renewable energy sources. Appl Soft Comput 108:107421 21. Duman S, Kahraman HT, Guvenc U, Aras S (2021) Development of a Lévy flight and FDBbased coyote optimization algorithm for global optimization and real-world ACOPF problems. Soft Comput 25(8):6577–6617 22. Katı M, Kahraman HT (2020) Improving supply-demand-based optimization algorithm with FDB method: a comprehensive research on engineering design problems. J Eng Sci Design (JESD) 8(5):156–172 23. Liang J, Suganthan PN, Qu BY, Gong DW, Yue CT (2019) Problem definitions and evaluation criteria for the CEC 2020 special session on multimodal multiobjective optimization. 201912, Zhengzhou University.https://doi.org/10.13140/RG.2.2.31746.02247 24. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inform Sci 180(10):2044–2206 25. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Chapter 30
Collecting Health Information with LoRa Technology Zinnet Duygu Ak¸sehir, Sedat Akleylek, Erdal Kiliç, Burçe Sirin, ¸ and Korhan Cengiz
30.1 Introduction With the concept of the IoT, object-human relations are becoming more interactive, while information is becoming more widely and quickly disseminated. It is aimed to present many powerful solutions by combining health, which is the most critical issue of humanity, and many opportunities provided by IoT. The importance of foresight in the health sector is an indisputable fact. With the IoT, meaningful data that can be spread to distant points significantly increases its predictive ability. There are many sensors with different functions in IoT technology. During the studies, it is determined which sensors to include depending on the purpose of the study. Pulse, electrocardiography (EKG), oxygen, temperature, and acceleration sensors are often used in health studies [1]. Sensors are considered as the source of data, and the main task is assigned to the system with an application part where this data is processed. The state of carrying the sensor in humans is important for continuous access to data. Studies are committed to producing small-sized and pluggable prototypes for sensors to be portable [2–4]. It is an undeniable fact that the wearable technology Z. D. Ak¸sehir (B) · S. Akleylek · E. Kiliç Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey e-mail: [email protected] S. Akleylek e-mail: [email protected] E. Kiliç e-mail: [email protected] B. Sirin ¸ Rönesans Holding, Ankara, Turkey e-mail: [email protected] K. Cengiz College of Information Technology University, Fujairah, UAE e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_30
429
430
30 Collecting Health Information with LoRa Technology
industry consists of Internet-based health monitoring systems of objects. Wearable technology is used in health sector studies to receive specific data about people, detect changes in data, and create reactions depending on these changes. Smartwatches and wristbands, which collect data from the arms, can collect more data about the person compared to other wearable technologies, given the proportion of areas where smart clothes come into touch with the body [5]. Monitoring the health data of workers in the field can be seen as an important opportunity to provide early intervention in extraordinary situations and prevent possible work accidents. Because many of the work accidents occur due to imprudence, carelessness, and not using materials required by the environment [6]. In this study, the health data (pulse and body temperature) of the personnel working in the construction area and equipped with wearable devices were monitored instantly. We also identified the situation of falling from a high place of workers working in the field. With the developed mobile application, we aimed to take the necessary measures quickly to alarm in extraordinary situations and prevent possible accidents. In addition to worker and workplace safety, this study will contribute to the long-term productivity planning of companies with the software building blocks to add. This paper is organized as follows: In Sect. 30.2, studies in the literature within the scope of health monitoring systems are examined. In Sect. 30.3, information about the designed system to track workers’ health information is given. Within this scope, LoRa technology, wearable module device, and mobile application are detailed. We are stated how the health information of the workers is obtained in Sect. 30.4. The last section consists of the conclusion and future works.
30.2 Related Works Detecting emotions based on physiological changes in the body and tracking destructive emotions such as anger is necessary for a healthy life [7]. With the anger tracking study, Jha et al. developed a monitoring system that detects and analyzes the physiological changes that occur in people during anger. The monitoring system is created with Global System for Mobile Communication (GSM) and Subscriber Identity Module (SIM) 900. GSM sensors give the object the ability to connect to the internet, make calls, and send SMS. With this study, turbulent situations in a person’s mode are detected by hardware. The IoT sensors used in this project are pulse sensor, acceleration sensor, and temperature sensor. They aimed to make the most accurate inference from the total data obtained from these sensors. Perez et al. designed a glucose sensor that measures glucose in the blood. In this structure, the glucose level in the blood is measured with the Ion Sensitive Field-Effect Transistor (ISFET) sensor. The sensor is connected to the conditioning circuit, which generates a voltage value for each blood sample. The voltage value generated from this conditioning data is given to the microcontroller. While this data is displayed on the LCD, it is transferred to a remote database via the wifi module. In addition, the results obtained as a result of the bidirectional interaction between the database and the server are also shown to the users [8].
30.2 Related Works
431
Santhi et al. discussed the issue of monitoring the health status of pregnant women. They used CC3200 as a development kit. According to Santhi et al., this study presents a solution system in which web application and CC3200 methods are used together in wireless sensor networks. Since the project includes a wireless network, doctors can monitor the health status of pregnant women who are far away and their babies. The CC3200 method enables data to be dynamically measured and transferred to the cloud using IoT technology. Body temperature sensor, pressure sensor, and pulse sensor were used in this study [9]. Delrobaei et al. focused on the state of tremor, which is the most important symptom of Parkinson’s disease. By bringing together healthy and Parkinson’s patients, they developed an immobility-based motion detection system to separate the two groups from each other. In this measurement system they created, they based the Unified Parkinson’s Disease Rating Scale (UPDRS). In the study, an inertial measurement unit (IMU) sensor was used and placed in various parts of the body. Delrobaei et al., in their study, observed a significant relationship between UPDRS and tremor severity degrees for Parkinson’s patients. The results obtained from the study are appropriate and clinically meaningful in the evaluation of Parkinson’s patients [10]. Arun and Alexander designed an EKG monitoring device that performs the operation of an electrocardiography sensor through an armband. Their design is an armband that is lightweight and easily portable. The fact that standard EKG devices are not portable and have 12 ends that need to be connected to the human body creates problems in their use. At the same time, since these ends can also show an allergic reaction, they can cause unsuccessful experiences with long-term monitoring [11]. Yotha et al., in their study, created a wireless design for monitoring hypoglycemia, in other words, states where blood sugar levels were low. Wireless designs have 3 important parts. The first part consists of sensors. Humidity and pulse sensors are used as sensors. The other part is the Arduino mini development board, which also contains a microprocessor. Arduino Integrated Development Environment (IDE) was used for data processing as part of the study. The last part is the program that provides the tracking and monitoring of the data [12]. Fu and Liu have implemented a wearable tracking device for people engaged in sports activities. With this device, they are focused on the person’s pulse and oxygen level. Transmission of obtained data is realized via General Packet Radio Service (GPRS), WiFi and Zigbee networks. Fu and Liu preferred the expert decision-making system when processing the saturation and pulse data they obtained. Processed and became meaningful data is sent to the mobile devices of the relevant people over GPRS and wifi networks [13]. O˘guz and Bolat proposed a patient monitoring system in their study. They offered the ability to instantly monitor the patient’s EKG signal, heart rate, oxygen saturation level, and body temperature over an Android-based interface with this system. In addition, due to the application they have developed, a warning SMS is sent to the patient’s relatives or the specialist doctor when there is an anomaly in the patient’s values [14]. Similar applications are used to increase safety at construction sites and factories. These applications increase the survival rate of the personnel and reduce the compensation costs of the businesses by 20–40% [15].
432
30 Collecting Health Information with LoRa Technology
Two different oil companies use wearable devices on their drilling platforms to monitor their employees’ activities, locations, and exposure to chemicals. It has been determined that it reduces the disease rate by more than 40% by limiting exposure to harsh conditions and ensuring that workers get the rest time they need [15]. An Australian construction company has developed smart helmets equipped with sensors to monitor the health of its workers. IoT data generated by the helmets is uploaded to the cloud, which is organized and analyzed by Microsoft Azure and Microsoft Power Bi. Produced data shows the status of a worker suffering from heatstroke (a common but dangerous event in Australia) [15]. Intel and Honeywell recently developed the Honeywell Connected Worker solution. It combines data collection sensors with real-time analysis and processing software to reduce related injuries during operation. Workers carry sensors that collect data about heartbeats, job-site activities, hand movements, toxic gases, and other factors. A mobile hub enables data to be processed locally and monitored by both employees and remote operators [15]. Another company detects the fatigue and falling of the employees with the shoes it produces and warns the oncoming vehicles [16].
30.3 Material and Method The system proposed in this study consists of three main parts: node, server, and client. Each section is associated with at least one other part. The system is realized over a network developed with LoRa communication technology.
30.3.1 LoRa Communication LoRa (Long Range) is a radio frequency-based modulation technique and communication technology that enables data transfer over long distances with low power consumption [15]. Methods such as cellular communication, 6LowPAN, and WiFi, which are currently used to connect objects to the Internet, face various problems in transmitting data over long distances. Cellular data transfer systems are difficult to meet requirements such as low cost and flexibility required for network applications that will run locally [16]. With this, Low Power Wide Area Network (LPWAN) technologies have come to the forefront in wireless communication. LoRa, one of the LPWAN technologies, is a good alternative because it works at free frequencies that do not require a license, has standards [17], and works well in scenarios where nodes are moving. Currently, the most widely adopted LPWAN technology is LoRa. LoRa, which differs regionally and uses radio frequency ranges, can communicate in open areas at a distance of up to 15 km, while endpoint devices connected can maintain communication from 8 to 15 years. An intelligent working environment for worker health and safety can be monitored and improved much more easily due to LoRa technology, which can transmit signals to large areas without the need for long-term maintenance.
30.3 Material and Method
433
Fig. 30.1 Basic elements of LoRa technology
A system consisting of 4 parts is usually designed with LoRa technology. Figure 30.1 represents an example of LoRa system. The first part is the endpoints called the LoRa node. At these endpoints, devices such as sensors, counters, control devices are used that contain a LoRa module or can be plug-in in the LoRa module later due to module compatibility. In the second part, there are devices with the gateway feature called LoRa Gateway. These devices transmit data from the LoRa nodes to the server, which is the third part, using communication technologies such as Ethernet, WiFi, or cellular communication. The fourth part is the application part that the user interacts with. The application communicates with the server, and the system can be accessed from anywhere. When LoRa technology is evaluated with its limits and features, it is not negligible that LoRa technology has great potential. Although it is still an emerging technology, its achievements in lighting, smart agriculture, waste tracking, and meter reading applications around the world have been the biggest proof of this potential [18]. We used gateways that support LoRa technology in our proposed system. The RAK2245 model LoRa module is integrated into Raspberry Pi devices, providing the gateway feature with the necessary libraries and programs. Gateways are the elements that provide communication between the node-side and the server-side. In the developed network structure one of the first steps that realize when the system is run is to join each node to the network.
30.3.2 Node Part The node part is consists of Multitech mDot devices that contain a LoRa radio module and support LoRa Technology. mDot devices have the Arm Mbed operating system. The node software is developed with the C++ language on the online Mbed Compiler. The developed software is compiled and downloaded and installed on mDot devices, giving the functionality to nodes. There are various sensors on the node devices. Communication with these sensors is realized with the I2C serial communication protocol. These sensors are
434
30 Collecting Health Information with LoRa Technology
BH1790GLC and MAX30105 pulse (heart rate) sensors, BMA220 3-axis accelerometer, PPG sensor. In addition, a thermistor-based structure was created to obtain body temperature data. Pulse sensors measure the heartbeat. The heart rate measured by the pulse sensor is the minute mechanical impact of blood passing through the vein after the blood pumping process provided by the contracting heart, and this is measured through the photocells in the pulse sensor. The temperature sensor indicates the person’s body temperature. Assuming that the person is carrying on, acceleration sensors are used to detect situations such as trembling, falling, running, walking. All of these sensors are integrated with the mDot device through the assembly card. Node devices send two different packets over radio signals. Packets are carried out by gateway devices that allow radio signals to be received and transmitted to the server-side. These packages are network participation and information packages. A single gateway is defined in the developed network, which transfers the nodes’ request to join the network to the server and notifies the node of the node-ID returned from the server. The gateway that performs this operation is defined as root on the system. Each node that joins the network and has a node-ID spreads an informationtype packet after this process. The information package contains the values of health parameters obtained through sensors. Packets includes node identity information, pulse, body temperature, falling status, and battery information. The nodes have a transmission mechanism that is sequential and non-clogging on the channel due to a special software located in the root gateway. Listening to messages on LoRa nodes requires extra power. The nodes will not be in continuous listening mode for power save but will enter listening mode for a short time after sending a message. The gate chip SX1301 has the capability of timing messages to be sent to nodes [19]. In this way, a message can be sent to the node listening range very precisely if the timing is set at the moment a message is received from the node. Figure 30.2 shows the node algorithm flowchart in which two different signals with different time intervals are sent. In the flowchart, tS denotes the sensor data transmission interval, tL the positioning signal transmission interval, and t the current time, all of which have values in seconds.
30.3.3 Server Part The server stores sensor packets sent by gateways in its database and shares this information with the mobile application. Communication between the mobile application and gateway is via the RESTful API. RESTful (REST (Representational State Transfer)) API is an application programming interface that enables fast and easy communication between client and server. It works with HTTP methods (such as GET, POST) on HTTP. There is a computer that acts as server hardware, in the server part. The server software is compiled with TypeScript language. The server computer has a “message broker” program that aims to receive packets sent via the MQTT protocol from the gateway. Mosquitto software was preferred as the message broker in the developed system.
30.3 Material and Method
435
Fig. 30.2 Node algorithm flowchart
Server and gateway pairs communicate via socket programming. The packets sent by the gateways are transferred to the port where the software that acts as a message broker is running on the server computer. This port is listened by the server and an operation is performed according to the type of the incoming packet. When an incoming packet requests to join the network, the id information is transferred from the server to the node. This information is also included in the packages that come after that. When the incoming packet type is an information packet, the data obtained from the packet is mapped to the object generated using the information data interface on the server. The obtained information is then saved in the MySQL database.
30.3.4 Client Part The client part has client software developed to present data to the user. The client software is developed using TypeScript language and Node.js platform to be mobile and web compatible. The client software interface displays the values of the health parameters obtained from the contents of the information package. Inter-device communication is bidirectional between the node and the gateway, in the system architecture. The data transferred between these two devices is of binary type. Communication between gateway and server is bidirectional and JSON data format. This communication is realized via the MQTT protocol over the MQTT port. Communication between server and client is unidirectional, from server to client.
436
30 Collecting Health Information with LoRa Technology
Fig. 30.3 The general architecture of the system
This communication is realized via TCP protocol through socket programming. The transferred data type is in JSON format. The general structure of the system is given in Fig. 30.3 and it consists of a variable number of nodes, at least four gateways, servers, and mobile applications.
30.3.5 Mobile Application The mobile application has been developed to allow instant monitoring of health data of personnel in the field. It consists of a personnel search page and pages displaying information about the body temperature and pulse of the relevant personnel. The application performs the necessary imaging operations by tracking data from the database in real-time. Personnel whose health data are outside the critical values will be marked in the personnel list. In case of any risky situation, the mobile application will warn the user with audio–video notifications. Thus, it will forefront the information of the personnel at risk and ensure that the intervention is as fast as possible.
30.3.6 Wearable Module Hardware The block diagram of the wearable module hardware is given in Fig. 30.4. As can be seen from the block diagram, the mDot LoRa module is combined with various hardware and one LiPo battery. The necessary circuitry for proper charging of the battery is also integrated into the wearable device. The device is designed to work with a LiPo battery for a long time. If the LiPo battery needs charging, can be charged via the USB interface integrated into the wearable device. Integrated of Max30105 oximeter is combined with mDot module via I2C interface. With the help of this integration, measurement of pulse determinedly of users working in the field
30.4 Discussion and Results
437
Fig. 30.4 Wearable hardware structure
Fig. 30.5 Image of wearable hardware
is provided. The temperature sensor undertakes the task of constantly monitoring the body temperature of the users. Due to the acceleration sensor integrated into the mDot module, it will cause the system to generate an alarm when an unexpected acceleration of the workers is detected (falling, etc.). The data obtained from these sensors are formatted on the mDot device and transferred to the central server at certain December times via LoRa gateways. The image of the related design is shown in Fig. 30.5.
30.4 Discussion and Results Pulse data is obtained by one of two different pulse sensors. The sensors measure at one-minute intervals. The sensor, which measures continuously until the first heart rate data is obtained, is then put to sleep for the time specified in the node software. At the end of this period, the sensor is restarted and makes a new measurement. The obtained pulse data is 1 byte in size. Then this value is placed in the information package. Body temperature is measured through thermistors located on the assembly card. Thermistors are circuit elements whose resistance changes according to temperature.
438
30 Collecting Health Information with LoRa Technology
This change occurs exponentially. The measured body temperature is mapped to a 5-bit number whose value increases by 1 every 0.5° and transferred to the server. Detection of the fall situation is carried out with the accelerometer sensor. With the accelerometer, the total acceleration is calculated over the accelerations in the three axes. The calculated acceleration value is used in the algorithm developed to detect the fall situation. As stated in the datasheet document of the BMA220 sensor used to detect the falling state, the falling situation is performed by the succession of low and high G interrupts. When these consecutive interrupts are detected, the value of the alarm status for the node is changed as true. This status information is sent to the server in the information packet.
30.5 Conclusions and Future Work A wearable device has designed to ensure the occupational health and safety of the workers in the field. Due to this device, the pulse, and body temperature information of the personnel are monitored instantly also the detection of falling state from a high place is realized. In this direction, a fast, efficient, and low-cost architecture was proposed. The proposed architecture consists of three parts: node, server, and client, and LoRa communication technology was used. The health data of the personnel working in the construction site environment was monitored instantly with this low-cost system created using LoRa communication technology. In addition, due to the developed mobile application, necessary measures were taken quickly with the alarm given in extraordinary situations. Therefore, the goals of preventing accidents that may occur in the business environment and instant intervention in accidents were achieved. In the developed system, new sensors can be included for the obtain of additional health parameters in future works. Also, additional information can be obtained by developing algorithms on existing sensors. For example, step calculation can be performed with the accelerometer sensor.
References 1. Akleylek S, Kiliç E, Söylemez B, Aruk TE, Aksaç C (2020) Nesnelerin interneri tabanli sa˘glik izleme sistemleri üzerine bir çali¸sma. Mühendislik Bilimleri ve Tasarim Dergisi 8(5):80–89 2. Wan J, Al-awlaqi MA, Li M, O’Grady M, Gu X, Wang J, Cao N (2018) Wearable IoT enabled real-time health monitoring system. EURASIP J Wirel Commun Netw 2018(1):1–10 3. Sönmez Çakir F, Aytekin A, Tüminçin F (2018) Nesnelerin ˙Interneti ve Giyilebilir Teknolojiler. J Soc Res Behav Sci 4(5):84–95 4. Acharya AD, Patil SN (2020) IoT based health care monitoring kit. In: 2020 fourth international conference on computing methodologies and communication (ICCMC). IEEE, pp 363–368 5. Kiliç HÖ (2017) Giyilebilir teknoloji ürünleri pazari ve kullanim alanlari. Aksaray Üniversitesi ˙Iktisadi ve ˙Idari Bilimler Fakültesi Dergisi 9(4):99–112
References
439
6. SGK «˙Istatistik Yilliklari» [Online]. http://www.sgk.gov.tr/wps/portal/sgk/tr/kurumsal/istati stik/sgk_istatistik_yilliklari 7. Jha V, Prakash N, Sagar S (2018) Wearable anger-monitoring system. ICT Express 4(4):194– 198 8. Perez JMD, Misa WB, Tan PAC, Yap R, Robles J (Nov 2016) A wireless blood sugar monitoring system using ion-sensitive field effect transistor. In: 2016 IEEE region 10 conference (TENCON). IEEE, pp 1742–1746 9. Santhi V, Ramya K, Tarana APJ, Vinitha G (2017) IOT based wearable health monitoring system for pregnant ladies using cc3200. Int J Adv Res Method Eng Technol 1(3):56–59 10. Delrobaei M, Memar S, Pieterman M, Stratton TW, McIsaac K, Jog M (2018) Towards remote monitoring of Parkinson’s disease tremor using wearable motion capture systems. J Neurol Sci 384:38–45 11. Arun CS, Alexander A (2017) Mobile ECG monitoring device using wearable non-contact armband. In: 2017 international conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–4 12. Yotha D, Pidthalek C, Yimman S, Niramitmahapanya S (2016) Design and construction of the hypoglycemia monitor wireless system for diabetic. In: 2016 9th biomedical engineering international conference (BMEiCON). IEEE, pp 1–4 13. Fu Y, Liu J (2015) System design for wearable blood oxygen saturation and pulse measurement device. Procedia Manuf 3:1187–1194 14. O˘guz FE, Bolat ED (2021) Nesnelerin ˙Interneti Tabanli Akilli Uzaktan Hasta Sa˘glik Takip ve Uyari Sistemi. Kocaeli Üniversitesi Fen Bilimleri Dergisi 4(1):14–21 15. Initiafy «The IoT: revolutionizing safety in construction and mining» [Online]. Available: https://www.initiafy.com/blog/the-iot-revolutionizing-safety-in-construction-and-mining/ 16. Construct Connect «How IoT can improve productivity and safety on the construction site» [Online]. Available: https://www.constructconnect.com/blog/construction-technology/iot-canimprove-productivity-safety-construction-site/ 17. Lora Alliance «Lorawan for developers» [Online]. Available: https://lora-alliance.org/lorawanfor-developers 18. Semtech «LoRa applications» [Online]. Available: https://www.semtech.com/lora/lora-applic ations 19. Semtech «SX1301 datasheet» [Online]. Available: https://www.semtech.com/uploads/docume nts/sx1301.pdf
Chapter 31
A New Hybrid Method for Indoor Positioning Zinnet Duygu Aksehir, ¸ Sedat Akleylek, Erdal Kılıç, Ceyda Aksaç, and Ali Ghaffari
31.1 Introduction Today, positioning technologies are at the forefront of technologies needed in many areas. These technologies, along with developed applications on various platforms, help individuals in many areas of life. Positioning technologies are used not only in position estimation but also in studies such as mapping and tracking. GPS Technology is the most commonly heard and forefront technology among these technologies. GPS technology is a one-way distance technology realized with satellites that send range code and messages unidirectionally [1]. Indoor positioning is important in many areas where the community is likely to be present. For example, indoor positioning technology can provide a great convenience for personnel and material tracking in a business area with a high number of workers and a large working area. Since GPS technology performs communication-based on satellite signals, it does not have a suitable working system for indoor positioning systems. The reason for this is satellite signals interact with the walls, and because of this interaction, the signal strength is significantly weakened or cannot pass through Z. D. Ak¸sehir (B) · S. Akleylek · E. Kılıç Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey e-mail: [email protected] S. Akleylek e-mail: [email protected] E. Kılıç e-mail: [email protected] C. Aksaç Rönesans Holding, Ankara, Turkey e-mail: [email protected] A. Ghaffari Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_31
441
442
31 A New Hybrid Method for Indoor Positioning
the walls in any way [2]. Therefore, different approaches have developed for indoor positioning. These approaches are generally implemented in two ways. These are the model-based approach and the fingerprinting-based approach. The model-based approach depends on geometric models. In the fingerprinting approach, the basic idea is fingerprint matching. The created database with the obtained data from the signal strength is used to match traces in the incoming signals, and a meaningful location is generated from this match [3]. Indoor positioning approaches can be based on Wifi, Bluetooth, Radio Frequency Identification (RFID), Ultra Wide Band (UWB), ultrasound [4]. The strength of the signal is important in Wifi-based indoor positioning. The working principle of Wifi based systems is based on the RSSI, in other words, the received signal strength indicator [5]. In addition to the used technology, there are various algorithms to create a system with this technology. These are algorithms such as triangulation, trilateration, time of arrival (ToA), angle of arrival, time difference of arrival (TDoA). In this study, an algorithm for generating estimated locations was proposed by supporting indoor positioning approaches with developed algorithms. Two different positioning approaches were used, namely the squares and circles method in this proposed system. Two different positioning approaches were used, namely the squares and circles method, and two different filtering approaches in this proposed system. Incoming signal strength information affects the system at different rates through filtering approaches. Also, LoRa technology was used for communication in the proposed system. Signal strength was calculated through packets obtained from nodes created by devices supporting LoRa technology. In addition to this calculated value, position detection was performed with an error margin of 10–90 cm through the proposed positioning and filtering approaches. This paper is organized as follows: In Sect. 31.2, studies in the literature within the scope of indoor positioning systems are examined. In Sect. 31.3, information about the designed system for indoor positioning is given, and the proposed positioning algorithm is detailed. We give the test results with the designed system for indoor positioning in Sect. 31.4. The last section consists of the conclusion and future works.
31.2 Related Works Bluetooth technology is used in the developed system by Li et al. [6]. Connection nodes and Bluetooth gateways are placed as a priority in the study. Real-time signal strengths of nodes in the system are measured by Bluetooth gateways and transmitted to the server. The average signal strength is recorded for each node in the server. The measured signal strengths are corrected by a specific formula for blind nodes in the server. The corrected RSSI values are smoothed with the Kalman filter. The RSSI values are converted into distance information with the particle swarm optimization-back propagation neural network (PSO-BPNN) model. The least-squares algorithm is also used to find blind node positions. It has been observed that the least-squares method used in the position estimation of the mobile device can be developed according to observations and tests and can meet the needs of indoor positioning systems. Bluetooth positioning technology is disadvantageous
31.2 Related Works
443
in producing continuous location estimation with high precision in this study. Development can be achieved by using multiple sensors to correct this disadvantage in future works [6]. Zegeye et al. [7] proposed an RSSI fingerprinting-based Wifi positioning system in their work. Radio maps are created using measured RSSI values offline. Then, position estimation is performed by using these radio signals in reverse. Since the system is open source and has a large user base, it has developed with Android. Radio maps are saved on the SD card. These radio maps are created on the SD card with the developed application on the Android platform. After the creation of radio maps offline, analysis is performed by the reverse function, which is an online stage. The estimated position is calculated with this function [7]. Received signal strength (RSS) information is collected for sampled locations in the offline phase of the fingerprinting-based approach proposed by Lembo et al. [8]. RSS information is signals that can be detected by user terminals and emitted from base stations. Data is collected and clustered as an average value. Then, coordinate mapping is performed according to the selected method. Stored radio maps in the database are used for the method based on fingerprint matching. The neural network is trained with examples in this database for neural network (NN) application methods. In the genetic algorithm-based method, samples are used to create a signal strength surface (SS-surface) per perceived base station. Each SS-surface has the function that maps the coordinates to the RSS value. After the SS-surfaces are created, it is used together with the RSS group in the genetic-based method. Then, the output is obtained via different positioning methods. They observed the performance of genetic algorithms and neural networks methods in the results of the study. The genetic algorithm was better for low percentages, while NN for after 55% gave better results. The reason for this is that the performance of the genetic algorithm decreases as the number of base stations decreases [8]. Sugano et al. [9] realized the position estimation of targets located at the observation angle of the sensors in the system in the sensor-sided application. The created system network communicates according to ZigBee standards. The system aims to estimate the position using data from a certain number of fixed sensor nodes. Two message types are used in the system. One of them is the measurement request, and the other is the signal reception report. A measurement request is a type of message sent from a node to sensor nodes and used to request a measurement of the signal in the node. A sequence number is added to determine from which node the measurement request message was sent. The received signal report message is used to report the RSSI value measured on the sensor node to the receiving node. The received signal report message contains the target ID and sequence number information. The system was tested in experimental environments to determine the accuracy of position estimation. Due to the limited number of targets and sensors, the number of RSSI data affecting system accuracy was not high. For this reason, there is no clear statement about the result to be obtained in practice [9]. In the study of Oldenburg et al. [10], various indoor positioning techniques were investigated as a preliminary stage. Then, the developed system consists of two different parts. These are the client and back-end sides. The back-end is divided into
444
31 A New Hybrid Method for Indoor Positioning
three parts in itself. A RESTful [REST (Representational State Transfer)] API was created in the back-end part of the developed system. This API allows clients to communicate with the backend in JSON format [10]. The android operating system was used in the study by Jianyong et al. [11]. For this, client and server software has been developed on the Android platform. The developed client software communicates with Bluetooth devices. It is responsible for calculating the RSSI value from incoming signals and transferring data on the server software side. The server-side application receives data from the client software running on the mobile device. Metadata is saved to the database. Then, coordinate calculations are performed on this information [11]. In the study by Tekba¸s et al. [12], location detection was performed with a fingerprint algorithm based on artificial neural networks. For this, the fingerprint of the environment was created with the help of sensors located in the indoor environment. Then, the artificial neural networks method was used to determine the position of the sensor nodes. Two different scenarios were determined for the implemented application: an empty environment and an environment with object/human mobility. For these two scenarios, position detection was performed with an average error of 18.2 cm and 24.2 cm, respectively [12]. Yoshitome et al. [13] proposed an outdoor positioning system using LoRa communication technology in their study. In their proposed system, two different indicators were used: TDoA and RSSI. When the results were examined, it was observed that TDoA positioned better than RSSI with margins of error ranging from 66 to 253 m in general [13].
31.3 Material and Method The system proposed in this study consists of three main parts: node, server, and client. Each section is associated with at least one other part. The system is realized over a network developed with LoRa communication technology. The general structure of the system is consists of a variable number of nodes, at least four gateways, servers, and mobile applications.
31.3.1 System Architecture The node part is consists of Multitech mDot devices that contain a LoRa radio module and support LoRa Technology. mDot devices have the Arm Mbed operating system. The node software is developed with the C++ language on the online Mbed Compiler. The developed software is compiled and downloaded and installed on mDot devices, giving the functionality to nodes. Figure 31.1 shows the node algorithm flowchart in which two different signals with different time intervals are sent. In the flowchart, tS denotes the sensor data transmission interval, tL the positioning signal transmission interval, and t the current time, all of which have values in seconds. There are several sensors on the node devices. The mobile nodes that the personnel will carry on their wrists will read the sensor data and transmit it to the gateway in
31.3 Material and Method
445
Fig. 31.1 Node algorithm flowchart
the appropriate format. In addition, it will periodically send TDoA packets for position detection. The sensor and TDoA packet transmission have different periods to improve the controllability of power consumption and communication performance. Gateways contain internal GPS receivers for location information. A gateway listens on eight channels so that packets arriving simultaneously from multiple nodes are not lost. In general, they transmit packets from nodes to the server over a 3G cellular connection by adding a timestamp. The TDoA algorithm running on the server and detecting the position of the nodes uses this timestamp. The relationship between the gateway and the nodes is planned to be bidirectional. In order to provide this bidirectional communication, the nodes are programmed to listen for a certain period after transmitting. The server is the part that owns the database and runs the TDoA algorithm. The server processes the data it receives from the gateway via a cellular connection and makes it ready to be read by the mobile application. Thus, people who have authority over the mobile app will track the position of the personnel.
31.3.2 Indoor Positioning With the proposed system, the positions of nodes with the Mbed operating system are calculated based on radio signals. Mbed is an operating system for IoT devices developed by AMD. Multitech Mdot model nodes are devices that allow the use of LoRa libraries and run a Mbed operating system on them. These Multitech Mdot devices will be called nodes throughout the study.
446
31 A New Hybrid Method for Indoor Positioning
Signal strength information read from nodes is transmitted to the server by devices called RAK2245 called gateway. This transmitted signal strength information is sent from the server to the client via socket programming, and positioning estimation is performed in this section. Indoor Positioning Algorithm: The positioning algorithm is used to find the estimated location. This algorithm runs on the client part of the system. An estimated position value is generated for a node at the end of this algorithm. Various algorithms and methods are available for positioning. The forefront approach in this study is RSSI, in other words, the received signal strength indicator. In addition, TDoA and ToA information are also used indirectly. Figure 31.2 shows the flowchart of the positioning algorithm. Positioning starts with the arrival of an RSSI packet to the client. RSSI information is stored on the server part in the form of gateways-specific map structures. Maps contain node ID and RSSI information. Distance information is calculated from these RSSI values. For the beginning, the distance information is set directly, while for the next steps, each incoming information is filtered based on the positioning algorithm in Fig. 31.2. For the positioning process, two different filtering (mean and good) types and two different position calculation methods (squares and circles method) are applied. The squares method used in position calculation is called the minMax algorithm. In the MinMax algorithm, the intersection areas of the squares drawn for position estimation are based, while in the circles method, the point where the intersections of the circles are most dense is based. For this reason, when calculating an estimated region in the squares method, an estimated point is obtained in the circles method. In the squares method, squares are drawn so that the gateways are in the center. Squares and circles drawn when the RSSI value is high are smaller, while squares and circles are larger when the RSSI value is low. For each incoming RSSI value for positioning, a filtering operation is performed according to used the filtering mode, and the position is calculated. For each gateway, the distance of the node sending the RSSI information from the gateway is recalculated. The distance of the nodes to the gateways is kept in the map type over the gateways. There are three parameters used to calculate positioning. These are N, RAOM, and maximum walking speed. These parameters can be set from the interface of the client software. The ratio value indicated in Fig. 31.2 represents the concept of “rating” that plays a role in filtering. In mean filtering, which is one of the filtering algorithms, the old and new distance values have half the effect. However, in the good filtering method, it is aimed that the impact rate is lower as the rate of disorder in the incoming data is high. While calculating the impact ratio, firstly, the ratio of the person’s walking speed (calculatedSpeed) to the maximum walking speed (maxWalkingSpeed), which represents the abnormality in the incoming data, is found. The ratio calculation is 1/(1 + (calculatedSpeed/maxWalkingSpeed)2 ). The distance information is updated with the formula given in the flow diagram (seen Fig. 31.2) by using this ratio. This edited location information is then assigned as the old location information. Then it goes into a waiting state for new RSSI information. Position estimation was performed with two different filtering and two different positioning approaches proposed.
31.3 Material and Method
Fig. 31.2 Flowchart for filtering approaches
447
448
31 A New Hybrid Method for Indoor Positioning
31.4 Discussion and Results The proposed positioning algorithm works very efficiently. According to the experimental results, it has an average error of 10–90 cm for an indoor. The importance of the RAOM parameter for the positioning calculation places on the first compared to the others. If the RAOM parameter is set incorrectly, proper position results cannot be obtained. If the RAOM value is set well for a single node, the position is detected directly without any error margin. As the number of nodes increases, small error margins occur because the compatibility of the RAOM value with nodes does not have a final value and shows slight differences from node to node. In order to minimize this error margin, the only step is to determine the parameter value that will generate the appropriate position on all nodes. Indoor positioning tests were realized in the system. In the test results, the blue point represents the result of the intersection/circles method, and the red point represents the result of the minmax/squares method. Test results in Figs. 31.3 and 31.4 are obtained when a single node is in the system. The result in Fig. 31.3 is obtained from the circles method and has a 0 error margin or axis shift. Acoording to the test results in Fig. 31.4, it was observed that the squares method has an error margin of 0. However, when the result of the circles method is examined, it is seen that it has a shift rate of approximately 30 cm in the positive direction in the y-axis and approximately 10 cm in the negative direction in the x-axis. Fig. 31.3 First scenario
Fig. 31.4 Second scenario
31.5 Conclusions and Future Work
449
Fig. 31.5 Third scenario
Fig. 31.6 Fourth scenario
In Fig. 31.5, a test result obtained while in motion is given. Because the MPS and RAOM values are set appropriately, the squares method produced smooth results. The reason why the result with the circles method cannot be produced in Fig. 31.5 is that there is no region where the circle intersection is denser. Since the squares drawn as a result of the test shown in Fig. 31.6 do not intersect, the instantaneous position of the squares method is not calculated. The obtained result with the circles method is position information with 0 error margin. As a result of the test in Fig. 31.7, the result of the circles method has 0 error margin, while the squares method has an error margin of 15 cm. In Fig. 31.8, while the squares method has a 0 error margin, the circles method has an error margin of approximately 40 cm in the y-axis.
31.5 Conclusions and Future Work In this study, positioning for an indoor is estimated with LoRa technology and the proposed positioning algorithm. While estimating the positioning, two different filtering methods and two different positioning approaches were used. Indoor positioning tests were performed in an indoor with a maximum length of 10 m. When the obtained results are examined it was observed that there is an error margin between 10 and 90 cm for indoor positioning. How the parameters used in the positioning algorithm affect the system was more clearly observed for the
450
31 A New Hybrid Method for Indoor Positioning
Fig. 31.7 Fifth scenario
Fig. 31.8 Sixth scenario
different values used in the test stages. We saw that the positioning algorithm fails if the RAOM parameter is given incorrectly. In addition, we observed that the applied filtering operations prevented the error margin and deviations largely. Consequently, with the proposed approach, we were observed that an object in an indoor was located close to its location without much deviation and low cost. In the future, the used filtering algorithm for positioning can be improved.
References 1. Kumar S, Moore KB (2002) The evolution of global positioning system (GPS) technology. J Sci Educ Technol 11(1):59–80 2. Xiao A, Chen R, Li D, Chen Y, Wu D (2018) An indoor positioning system based on static objects in large indoor scenes by using smartphone cameras. Sensors 18(7):2229 3. Zou H, Xie L, Jia QS, Wang H (2014) Platform and algorithm development for a rfid-based indoor positioning system. Unmanned Syst 2(03):279–291 4. Akleylek S, Kılıç E, Söylemez B, Aruk TE, Yıldırım AÇ (2020) Kapalı mekân konumlandırma üzerine bir çalı¸sma. Mühendislik Bilimleri ve Tasarım Dergisi 8(5):90–105 5. Wen LP, Nee CW, Chun KM, Shiang-Yen T, Idrus R (2011). Application of WiFi-based ındoor positioning system in handheld directory system. In: 5th European computing conference 6. Li G, Geng E, Ye Z, Xu Y, Lin J, Pang Y (2018) Indoor positioning algorithm based on the improved RSSI distance model. Sensors 18(9):2820
References
451
7. Zegeye WK, Amsalu SB, Astatke Y, Moazzami F (2016) WiFi RSS fingerprinting indoor localization for mobile devices. In: 2016 IEEE 7th annual ubiquitous computing, electronics & mobile communication conference (UEMCON), IEEE, pp 1–6 8. Lembo S, Horsmanheimo S, Honkamaa P (2019) Indoor positioning based on RSS fingerprinting in a LTE network: method based on genetic algorithms. In: 2019 IEEE ınternational conference on communications workshops (ICC Workshops). IEEE, pp 1–6 9. Sugano M, Kawazoe T, Ohta Y, Murata M (2006) Indoor localization system using RSSI measurement of wireless sensor network based on ZigBee standard. Wireless and Opt Commun 538:1–6 10. Oldenburg L, Meznaric J, Lukau E, Hechenberger A (2016) Indoor navigation/ındoor positioning with mobile devices. https://doi.org/10.13140/RG.2.1.3100.4568 11. Jianyong Z, Haiyong L, Zili C, Zhaohui L (2014) RSSI based bluetooth low energy indoor positioning. In: 2014 ınternational conference on ındoor positioning and ındoor navigation (IPIN). IEEE, , pp 526–533 12. Tekba¸s A, Tuncer T, Eerdem E (2020) RSSI sinyalleri kullanarak iç ortamda parmak izi tabanlı YSA ile konum tespitinin gerçekle¸stirilmesi. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 11(3):925–931 13. Yoshitome EH, da Cruz JVR, Monteiro MEP, Rebelatto JL (2021) LoRa-aided outdoor localization system: RSSI or TDoA? Internet Technol Lett e319
Chapter 32
On the Android Malware Detection System Based on Deep Learning Durmus¸ Özkan Sahin, ¸ Bilge Kagan ˘ Yazar, Sedat Akleylek, Erdal Kiliç, and Debasis Giri
32.1 Introduction Recently, smart mobile devices have taken the place of personal computers used in daily life. The most important reason for this is that many operations performed on a personal computer are also made possible by intelligent mobile devices. In addition, new generation mobile devices have hardware as powerful as personal computers. For these reasons, intelligent mobile devices are indispensable for our daily life. Banking transactions, electronic commerce, electronic health are some of the frequently used transactions on mobile devices. With the widespread use of mobile devices, there has been a remarkable increase in the number of malware for mobile devices in recent years. While 33.5 million mobile malware was detected in the first quarter of 2019, this figure is reported to increase to 46.1 million in the first quarter of 2021 [1]. In addition, 1.6 million new mobile malware were detected in the first quarter of 2019, while 2.3 million new mobile malware were detected in the first quarter of 2021 [1]. Considering these figures, it is seen that mobile device users are under serious threat. The Android operating system is the world’s most popular mobile operating system. According to the most recent data, Android-based devices account for 72.18% of the market [2]. One of the most fundamental reasons for the popularity of the Android operating system is that it is open source. In addition, the ability to D. Ö. Sahin ¸ (B) · B. K. Yazar · S. Akleylek · E. Kiliç Department of Computer Engineering, Ondokuz Mayıs University, Samsun, Turkey e-mail: [email protected] S. Akleylek e-mail: [email protected] E. Kiliç e-mail: [email protected] D. Giri Maulana Abul Kalam Azad University of Technology, Kolkata 700 064, West Bengal, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_32
453
454
32 On the Android Malware Detection System Based on Deep Learning
install applications (app) from different application stores other than official application stores and the capability to easily install third-party applications on Android devices are other important reasons for choosing Android. However, unofficial app stores and third-party apps pose a serious threat to Android. Although every application uploaded to the official application store, Google Play Store, is inspected in detail, it is seen that some malware has been detected in the Google Play Store [3]. Therefore, devices are likely to be infected with malware in cases where there are not as much control as official application repositories. Malware developers are interested in the Android operating system since it is extensively used and has some security weaknesses. As a result, malware detection research for the Android operating system has become increasingly important in recent years. In this research, several deep learning approaches are used to detect malware on the Android operating system.
32.1.1 Previous Works Basically, there are two different ways to detect malware. These are static and dynamic analyses. Hybrid analysis, which combines static and dynamic analysis, is also available. Applications do not need to be run in static analysis. Dynamic analysis, on the other hand, involves running apps on a real or virtual machine and collecting the essential information about them. The results of the static and dynamic analyses are primarily turned into feature vectors, which are then fed into machine learning algorithms. Thus, these feature vectors can be used to distinguish between benign and malicious software. To construct a malware system employing machine learning techniques, meaningful input should be given to the algorithm. In this approach, the algorithm learns various pieces of information from the inputs it receives during the training phase. It is decided throughout the testing phase to which sort of data the new incoming data will belong. The key challenge encountered is what to input the algorithm because the algorithm’s performance and the system’s success are closely tied to the input. There are many strategies to produce inputs of machine learning algorithms in malware detection. In most of the studies given in Table 32.1, researchers perform classification using conventional machine learning techniques. Deep learning techniques, which generally give better results than classical machine learning approaches due to the cheapness and development of computer hardware, have been quite popular recently. Therefore, a significant increase in deep learning based Android malware detection has been observed in the past five years [4]. Some of the studies using deep learning are summarized as follows. Amin et al. utilized a static analysis technique to detect malware using deep learning [19]. Different deep learning approaches were used to extract opcodes from Android Application Package (APK) files. Many deep learning approaches were used in the study, including Convolutional Neural Network (CNN), Deep Belief Networks (DBN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM),
32.1 Introduction
455
Table 32.1 Studies on Android malware detection using machine learning Study
Year
Analysis type
Algorithms
Droidmat [5]
2012
Static
KNN
Gascon et al. [6]
2013
Dynamic
SVM
Droidapiminer [7]
2013
Static
ID3 DT, C45 DT, KNN, SVM
DroidDolphin [8]
2014
Dynamic
SVM
DRACO [9]
2015
Hybrid
SVM
ANASTASIA [10]
2016
Static
KNN, LR, DT, NB, SVM, XGboost, Adaboost, Deep learning
ScanMe mobile [11]
2016
Static
ANN
Samadroid [12]
2018
Hybrid
SVM
DroidDusion [13]
2019
Static
Ensemble learning
Kural et al. [14]
2019
Static
KNN, NB, SVM
Bhattacharya et al. [15]
2019
Static
Bayesian networks, NB, SMO, DT, RF, MLP
Sahin ¸ et al. [16]
2020
Static
Regression techniques
Sahin ¸ et al. [17]
2021
Static
SMO, RF, LR, NB, KNN, MLP, C4.5
Sahin ¸ et al. [18]
2021
Static
SMO, RF, LR, NB, KNN, MLP, C4.5
ANN Artificial neural network; DT Decision tree; KNN K nearest neighborhood; LR Logistic regression; MLP Multi layer perceptron; NB Naive Bayes; RF Random forest; SMO Sequential minimal optimization; SVM Support vector machine.
Bidirectional Long Short-Term Memory (Bi-LSTM). The best result was attained with Bi-LSTM. According to the accuracy metric, this result is 0.99. Haq et al. proposed Android malware detection using a hybrid of deep learning techniques [20]. The mixed structure created is based on CNN and LSTM architectures. The feature vector was constructed by extracting data from the APK file’s code, classes, and manifest files. The obtained feature vectors are given to the created mixed deep learning network. Classification is aimed to be effective by applying preprocessing steps such as feature selection and normalization on the created dataset. Alzaylaee et al. presented extensive experiments in a paper they call DL-Droid [21]. In the presented Android malware system, firstly, deep neural networks and classical machine learning techniques are compared. In addition, the effects of static properties and dynamic properties on classification performance are examined in detail. Recently, it has been seen that the classification process is done with deep learning techniques by converting APK files to images. Shiqi et al. used application programming interface (API) call sequences and image texture models [22]. Ganesh et al. converted the permissions extracted from APK files to 12 × 12 images [23]. These images were then classified with CNN. Ding et al., on the other hand, created twodimensional graphics from API call sequences retrieved from APK files [24]. As a result, the acquired images were used to train the CNN algorithm. This method was
456
32 On the Android Malware Detection System Based on Deep Learning
used to classify programs as benign or harmful. Xiao proposed converting Dalvik bytecodes to pictures using a CNN-based technique [25]. Dalvik executables were converted to RGB pictures of constant size by Hsien-De Huang and Kao [26]. The acquired images were then subjected to a CNN analysis. In this study, we study three different models, namely deep neural networks (DNN), one-dimensional convolutional neural networks (1D-CNN), and twodimensional convolutional neural networks (2D-CNN). In the models created, ReLU was used as the activation function in all the intermediate layers, and the sigmoid activation function was used in the output layers since the problem examined was a two-class problem. As an optimization method, the RMSProp method was used due to its dynamically updating learning parameter.
32.1.2 Motivation and Contribution Deep learning has recently been used to a variety of domains, including image classification, speech recognition, natural language processing, and machine translation [27]. In addition to these study subjects, in the Android malware detection area, deep learning algorithms are widely preferred [28]. Hence, how different deep learning approaches will result in Android malware is the primary motivation of this study. In addition, in recent years, it has been observed that malware detection has been made by converting applications to images. Since two-dimensional convolutional neural networks are very successful in image classification, an alternative approach is presented by transforming the feature vector consisting of static features to black-white images, distinguishing malware from benign ones. The study’s primary contributions can be summarized as follows: . Performances are compared under different metrics using DNN, 1D-CNN, and 2D-CNN. . Two separate datasets are used to conduct extensive experiments; Malgenome-215 and Drebin-215. . 215 static features in these datasets are directly given to DNN and 1D-CNN. On the other hand, each vector is converted to 15 × 15 images by padding. These images are then classified with 2D-CNN. In general, it is seen that the classification made by converting to images is as successful as other techniques.
32.1.3 Organization This research is structured as follows: The datasets and evaluation metrics utilized will be detailed in Sect. 32.2. The general methods of Android malware detection will be presented in Sect. 32.3. The obtained results will be described in Sect. 32.4. Finally, in the last section, a general assessment will be offered, as well as information on future investigations.
32.2 Experimental Settings
457
32.2 Experimental Settings There are two subsections in this section. The datasets used are discussed in Sect. 2.1. The measures used to evaluate the performance of algorithms are discussed in Sect. 2.2.
32.2.1 Used Datasets The datasets utilized in this investigation are the same as those used in [13]. Malgenome-215 [29] and Drebin-215 [30] are the datasets in question. The Malgenome-215 dataset contains a total of 3799 applications. While 1260 of these programs are malicious, the remaining 2539 are safe. The Drebin-215 dataset contains a total of 15,036 applications. While 5560 of these programs are malicious, the remaining 9476 are safe. The TelephonyManager.getSimCountryIso attribute of applications in the Drebin-215 dataset, ranked 178, 1973, 2111, 2952, and 5176, appears as “?”. Since this attribute is mostly found as “0” in the applications in the dataset, experiments are performed by placing “0” information instead of “?”. For training, validation, and testing, the dataset is split into 70%, 15%, and 15%, respectively. The same training, validation, and testing examples are given, which are devoted to the DNN, 1D-CNN, and 2D-CNN techniques in order to make a fair assessment.
32.2.2 Performance Measure The confusion matrix is frequently used in machine learning studies to determine the performance of the classifier. In the confusion matrix, there are several metrics termed True Negative (TN), True Positive (TP), False Negative (FN), and False Positive (FP). Accuracy in Eq. 32.1, precision in Eq. 32.2, and recall metric in Eq. 32.3 are calculated using these metrics. accuracy =
TN +TP T N + FN + FP + T P
(32.1)
TP FP + T P
(32.2)
pr ecision = r ecall =
TP FN + T P
(32.3)
Equation 32.4 represents the f-measure, which is the harmonic mean of Eqs. 32.2 and 32.3. The values obtained in Eqs. 32.2, 32.3, and 32.4 are only for one class.
458
32 On the Android Malware Detection System Based on Deep Learning
Similarly, the values of these metrics within the other class are calculated, and the average results of these metrics are found. Finally, the performances of deep learning techniques are interpreted. f − measur e =
2 × pr ecision × r ecall pr ecision + r ecall
(32.4)
32.3 Methodologies The study’s methodology are described in this section. In the first subsection, information about static analysis will be given. In the second subsection, how black-white images are created will be discussed. Finally, in the third subsection, information about the deep learning techniques used will be given.
32.3.1 Static Analysis It is the analysis without running the application files. Since the application files are not run, the information collection process is faster than the dynamic analysis. However, it is vulnerable to zero-day attacks. In Android malware detection, information such as application permissions and API calls in APK files are extracted with static analysis. This information is then evaluated with machine learning and deep learning techniques, and malicious software is detected. The datasets used in this study include main application permissions, API call signature, intent, and commands signature attributes. These static properties can be expressed as: . Application permissions: It is one of the essential components of Android security. The AndroidManifest.xml file included in the APK file contains the permissions used by the application. In the permission list of any application in the AndroidManifest.xml file, various access permissions can be given to the applications. For example, operations such as accessing camera, reading SMS, accessing contacts, and accessing photos are possible with various permissions. . API call signature: Attributes in this group specify how applications use the operating system’s resources and services. Application developers provide various functions to applications by using these API functions. Similarly, malware developers take advantage of APIs that are important for security. With the reverse engineering approach, the information of the APIs in the applications can be obtained from the Java source files. . Intent: The attributes in this attribute group are included in the AndroidManifest.xml file, as in the application permissions. Through intents, an application
32.3 Methodologies
459
can request certain functions from another application. For example, if an application needs a web browser, the developer does not need to rewrite a web browser. By specifying the need for a web browser through the developer intent, the operating system opens the appropriate program. In the case of more than one suitable program, alternative options are presented to the user. In this way, the problem is solved by switching to existing applications without writing an application again. Malware developers take advantage of this because it allows switching between applications via intent. . Commands signature: There are two different kinds of commands. These are the root and botnet commands, respectively. There are some root commands such as “cp”, “cat”, “kill”, and “mount” that come from Unix. In Unix, these commands are used by administrators to perform particular functions. Since the Android operating system is of Unix origin, these root commands are encountered in Android applications. Also, malware developers often prefer root commands to control their target devices. That’s why root commands are critical in malware detection. With the reverse engineering approach, root commands used in applications are accessed from Java source files.
32.3.2 Converting Static Properties to Images Feature vectors are generated with a total of 215 features from four various static feature groups, detailed in Sect. 3.1. Thus, each application is represented by 1 × 215 vectors. This creates a suitable structure in which machine learning techniques can be used. The dataset is directly classified with DNN and 1D-CNN. 2D-CNN is very successful in image classification. In addition, since it automatically extracts features from images, Algorithm 1 is applied to the datasets, and each application is represented as an image. The classification process is then performed on the images that have been created. Since there are 215 features in the datasets, padding is performed to convert the size to the nearest perfect square. The dimension is increased to 225 by adding ten zeros to the feature vector. Line 10 of Algorithm 1 illustrates this case. In the other steps of the algorithm, the attributes of each application are checked. In general, the dataset is created as 1 if the applications contain the relevant attribute and 0 if it does not. If the value of the appropriate feature is 0, the value of the suitable pixel is 0, and if the value of the related attribute is 1, the value of the relevant pixel is set to 255, and the applications are converted into images.
460
32 On the Android Malware Detection System Based on Deep Learning
32.3.3 Deep Learning Techniques Machine learning is more and more embedded in consumer electronics products such as smartphones. In some situations, machine learning algorithms are used to recognize objects in photos, convert speech to text, construct recommendation systems, or discover anomalies. With the increasing data size and increasing processing capacity of computers, these applications now use techniques called deep learning. Traditional machine learning techniques are limited to handling data in its unprocessed form [31]. Deep learning, as opposed to task-specific methods, is a broader family of machine learning methods in which learning is based on data representations. When compared to earlier methods, deep learning methods have the advantage of constructing deep structures to learn more soft information. Artificial Neural Networks (ANN) are built on shallow networks with an input and output layer and a hidden layer in between. In deep learning networks, qualified models are obtained when there are more than three layers, including the input and output layers [32]. As the number of layers increase, the network becomes more complex. Hidden layers can be added between the input and output layers of ANN of various shapes to build DNN. These layers correspond to different concept levels, and high-level concepts are defined using low-level concepts [33]. Since shallow neural networks have only one hidden layer, they lack advanced feature extraction skills and cannot learn high-level concepts that deep neural networks can. This also applies to other machine learning algorithms. In an ANN or DNN, a nonlinear operation
32.3 Methodologies
461
is applied to the weighted sum of the units in the preceding layer. There are many activation functions to do this. However, the most common are the sigmoid, softmax, hyperbolic tangent, and ReLU functions. Deep learning methods can model nonlinear relationships, and one of the most popular methods is CNN. In the 1990s, LeCun et al. achieved successful results for the handwritten digit classification problem by applying a gradient-based learning algorithm on CNN [34]. CNN offer a number of advantages over DNN, such as being more optimized for processing images and being more effective in learning abstract features on images [35]. Furthermore, compared to a fully connected deep network of same size, a sparsely connected CNN with linked weights has considerably less parameters. The general architecture of CNNs consists of three layers: convolutional, pooling, and output layers. There are weights, input, and output in each hidden layer in a typical ANN. However, due to the 2-dimensional nature of images, each neuron in the CNN includes inputs and outputs called 2-dimensional planes and feature maps for weights known as kernels [36]. Nodes in convolution layers extract features from the input images. The outputs of the convolution and pooling layers are ingrouped in a 2-dimensional plane termed feature mapping when the input images are fed into the network. Each plane in a layer is created by combining the outputs from previous layers. In this way, the high-level information (the distinctive features of the image that allow it to be separated) in the images is derived by spreading from the bottom layers. As the features spread to the upper layers, the feature sizes shrink according to the filter size used in the convolution and pooling layers. However, the number of feature maps is increasing to provide the best features from the input images to improve classification success. Pooling layers are generally used after the convolution layers in the models created. After performing the convolution and pooling operations, the obtained features are converted into a vector, and the classification performance is done using fully connected layers (Dense), which is known to have better performance [35]. 1D-CNN have become popular in recent years, performing quite well in applications on the limited number of labeled data and signals from different sources. Its application is very similar to 2D-CNN. The only difference is that operations are performed using one dimension instead of 2 dimensions. Deep learning networks are usually trained using a method known as the backpropagation algorithm. The gradient size of each network parameter, such as the weights of the convolution or fully connected layers, is determined at each step in the process. These values are then utilized to update the model parameters until a certain stopping criterion is reached. For the backpropagation procedure, there are gradient-descent optimization methods such as Stochastic Gradient Descent (SGD), Momentum SGD, AdaGrad, RMSProp, and Adam in the literature. In DNN, after the input layer, 128, 64, 32 dimensional layers are used, and finally, a one-dimensional output layer is used. In the 2D-CNN model, two convolution layers with 32 and 64 dimensions, respectively, are used with a kernel size of 3 × 3 After these two layers, there are MaxPooling and Dropout layers. Kernel size of MaxPooling is 2 × 2, and Dropout rate is a 0.25. Outputs obtained from this layer are vectorized and given to a 512-dimensional Dense layer, followed by a 1-dimensional
462
32 On the Android Malware Detection System Based on Deep Learning
output layer. In addition, there is a 0.2 Dropout rate between these two layers. The dropout layer increases model performance by preventing over-fitting of the created models. 3 × 1 filter is used in the 1D-CNN model. The only difference is that the outputs obtained from the 2 × 1 kernel size MaxPooling layer are given to a 128 size Dense layer. Such a preference was made in the 1D-CNN model since the number of weights that need to be trained increases too much when a 512-dimensional layer is used instead of 128 dimensions. After the models were created, 70% of the datasets were used for training, 15% for validation, and 15% for testing. At this stage, the models were trained for a total of 15 epochs by taking the batch size as 64. For the RMSProp optimization method, the learning rate was chosen as 0.001, and binary cross-entropy was used as the loss function. All coding is done using the Tensorflow library.
32.4 Results and Discussions There are two subsections in this section. Firstly, the Malgenome-215 dataset results will be presented in Sect. 4.1. Secondly, the Drebin-215 dataset results will be reviewed in Sect. 4.2.
32.4.1 Results with Malgenome-215 Dataset Table 32.2 shows the results obtained from the DNN technique. The weighted performances obtained according to the confusion matrix given in Table 32.2 are 0.9947 and 0.994 for the accuracy and f-measure metrics, respectively. Similarly, the results obtained from the 1D-CNN technique are given in Table 32.3. The 1D-CNN method, on the other hand, gives the results of 0.9982 and 0.998, respectively, according to the accuracy and f-measure metrics. Table 32.4 shows the results obtained from the classification of images with 2D-CNN. It is seen that image classification is as successful as other methods. These results are 0.9947 and 0.994 according to the accuracy and f-measure metrics, respectively. In the classification made with DNN, 1 malicious application is classified as benign, while 2 benign applications are classified as malicious. On the other hand, in experiments with 2D-CNN, 2 malicious applications are classified as benign, while 1 benign application is classified as malicious. Considering the Malgenome-215 dataset, 1D-CNN technique is more successful than DNN and 2D-CNN. Because the 1D-CNN technique accurately predicts all the benign applications. However, only 1 malicious application is estimated as benign.
32.4 Results and Discussions
463
Table 32.2 Results of DNN on Malgenome-215 dataset
Estimated class Initial class
Malicious
Benignant
Malicious
TN (188)
FP (1)
Benignant
FN (2)
TP (379)
Table 32.3 Results of 1D-CNN on Malgenome-215 dataset
Estimated class Malicious Initial class
Benignant
Malicious
TN (188)
FP (1)
Benignant
FN (0)
TP (381)
Table 32.4 Results of 2D-CNN on Malgenome-215 dataset
Estimated class Malicious Initial class
Benignant
Malware
TN (187)
FP (2)
Benignant
FN (1)
TP (380)
32.4.2 Results with the Drebin-215 Dataset Table 32.5 shows the results obtained from the DNN technique. The weighted performances obtained according to the complexity matrix given in Table 32.5 are 0.9783 and 0.978 for the accuracy and f-measure metrics, respectively. Similarly, the results obtained from the 1D-CNN technique are given in Table 32.6. The 1D-CNN technique, on the other hand, gives a result of 0.98 according to the accuracy and fmeasure metrics. Table 32.7 shows the results obtained from the classification of images with 2D-CNN. It is seen that image classification is as successful as other methods. According to the accuracy and f-measure metrics, these findings are 0.9769 and 0.976, respectively. In the DNN classification, while a total of 24 malicious applications are classified as benign, 25 benign applications are labeled as malicious. In the 2D-CNN experiment, 32 malicious applications are labeled as benign, whereas 20 benign applications are classified as malicious. Finally, in the 1D-CNN technique, 30 malicious applications are identified as benign, whereas 15 benign applications are labeled as malicious. Table 32.5 Results of DNN on Drebin-215 dataset
Estimated class Initial class
Malicious
Benignant
Malicious
TN (810)
FP (24)
Benignant
FN (25)
TP (1396)
464
32 On the Android Malware Detection System Based on Deep Learning
Table 32.6 Results of 1D-CNN on Drebin-215 dataset
Estimated class Initial class
Malicious
Benignant
Malicious
TN (804)
FP (30)
Benignant
FN (15)
TP (1406)
Table 32.7 Results of 2D-CNN on Drebin-215 dataset
Estimated class Malicious Initial class
Benignant
Malicious
TN (802)
FP (32)
Benignant
FN (20)
TP (1401)
Considering the Drebin-215 dataset, 1D-CNN technique is more successful than DNN and 2D-CNN. The 1D-CNN technique incorrectly predicts the labels of 45 applications in total. On the other hand, while the number of wrong predictions is 49 in DNN, it is 52 in 2D-CNN. Although the same feature groups were used in both Malgenome-215 and Drebin-215 datasets, more classification errors were made when classifying applications on Drebin-215 by deep learning techniques compared to Malgenome-215. The reason is that the applications that make up the Drebin-215 dataset are difficult to discriminate. In other words, some malicious applications in this dataset appear to be benign, while some benign applications appear to be malicious.
32.5 Conclusion and Future Works Deep learning has been one of the most crucial area of research in machine learning in recent years. In this study, several deep learning approaches are used to compare performance on Android malware detection. Malware is detected by giving feature vectors consisting of static features to DNN and 1D-CNN techniques. In addition, it is shown that it is possible to detect malware with 2D-CNN by obtaining blackwhite images from the same features. 2D-CNN, which is very successful in image classification, also classifies the images obtained in this study with a high degree of accuracy. Since the number of features in the datasets used in the study is low, operations are performed on very small images. Larger images can be obtained when the number of static features increases. In this way, it is possible to increase the classification performance. For this reason, it is aimed to create larger images by increasing the number of features in future studies. Furthermore, different deep learning techniques such as LSTM and RNN, as well as transfer learning approaches, will be used to compare the performances of the models.
References
465
References 1. McAfee Mobile Threat Report. https://www.mcafee.com/en-us/consumer-support/2020-mob ile-threat-report.html. Accessed 21 Aug 2021 2. Mobile Operating System Market Share Worldwide. https://gs.statcounter.com/os-marketshare/mobile/worldwide. Accessed 21 Aug 2021 3. Malicious apps on Google Play dropped banking Trojans on user devices. https://www. zdnet.com/article/malicious-apps-on-google-play-dropped-banking-trojans-on-user-devices/. Accessed 21 Aug 2021 4. Naway A, Li Y (2018) A review on the use of deep learning in android malware detection. arXiv preprint arXiv:1812.10360 5. Wu DJ, Mao CH, Wei TE, Lee HM, Wu KP (2012) Droidmat: android malware detection through manifest and api calls tracing. In: 2012 seventh Asia joint conference on information security. IEEE, pp 62–69 6. Gascon H, Yamaguchi F, Arp D, Rieck K (2013) Structural detection of android malware using embedded call graphs. In: Proceedings of the 2013 ACM workshop on artificial intelligence and security, pp 45–54 7. Aafer Y, Du W, Yin H (2013) Droidapiminer: mining api-level features for robust malware detection in android. In: International conference on security and privacy in communication systems. Springer, pp 86–103 8. Wu WC, Hung SH (2014) Droiddolphin: a dynamic android malware detection framework using big data and machine learning. In: Proceedings of the 2014 conference on research in adaptive and convergent systems, pp 247–252 9. Bhandari S, Gupta R, Laxmi V, Gaur MS, Zemmari A, Anikeev M (2015) Draco: Droidanalyst combo an android malware analysis framework. In: Proceedings of the 8th international conference on security of information and networks, pp 283–289 10. Fereidooni H, Conti M, Yao D, Sperduti A (2016) Anastasia: android malware detection using static analysis of applications. In: 2016 8th IFIP international conference on new technologies, mobility and security (NTMS). IEEE, pp 1–5 11. Zhang H, Cole Y, Ge L, Wei S, Yu W, Lu C, Chen G, Shen D, Blasch E, Pham KD (2016) Scanme mobile: a cloud-based android malware analysis service. ACM SIGAPP Appl Comput Rev 16(1):36–49 12. Arshad S, Shah MA, Wahid A, Mehmood A, Song H, Yu H (2018) Samadroid: a novel 3-level hybrid malware detection model for android operating system. IEEE Access 6:4321–4339 13. Yerima SY, Sezer S (2018) Droidfusion: a novel multilevel classifier fusion approach for android malware detection. IEEE Trans Cybern 49(2):453–466 14. Kural OE, Sahin ¸ DÖ, Akleylek S, Kiliç E (2019) Permission weighting approaches in permission based android malware detection. In: 2019 4th international conference on computer science and engineering (UBMK). IEEE, pp 134–139 15. Bhattacharya A, Goswami RT, Mukherjee K (2019) A feature selection technique based on rough set and improvised pso algorithm (psors-fs) for permission based detection of android malwares. Int J Mach Learn Cybern 10(7):1893–1907 16. Sahin ¸ DÖ, Kural OE, Akleylek S, Kiliç E (2020) Comparison of regression methods in permission based android malware detection. In: 2020 28th signal processing and communications applications conference (SIU). IEEE, pp 1–4 17. Sahin ¸ DÖ, Kural OE, Akleylek S, Kiliç E (2021) A novel permission-based android malware detection system using feature selection based on linear regression. Neural Comput Appl 1–16 18. Sahin ¸ DÖ, Kural OE, Akleylek S, Kiliç E (2021) A novel android malware detection system: adaption of filter-based feature selection methods. J Ambient Intell Humanized Comput 1–15 19. Amin M, Tanveer TA, Tehseen M, Khan M, Khan FA, Anwar S (2020) Static malware detection and attribution in android byte-code through an end-to-end deep system. Futur Gener Comput Syst 102:112–126 20. Haq IU, Khan TA, Akhunzada A, Liu X (2021) Maldroid: secure dl-enabled intelligent malware detection framework. IET Commun
466
32 On the Android Malware Detection System Based on Deep Learning
21. Alzaylaee MK, Yerima SY, Sezer S (2020) Dl-droid: deep learning based android malware detection using real devices. Comput Secur 89:101663 22. Shiqi L, Shengwei T, Long Y, Jiong Y, Hua S (2018) Android malicious code classification using deep belief network. KSII Trans Internet Inf Syst 12(1) 23. Ganesh M, Pednekar P, Prabhuswamy P, Nair DS, Park Y, Jeon H (2017) Cnn-based android malware detection. In: 2017 international conference on software security and assurance (ICSSA). IEEE, pp 60–65 24. Ding YX, Zhao WG, Wang ZP, Wang LF (2018) Automatically learning features of android apps using cnn. In: 2018 international conference on machine learning and cybernetics (ICMLC), vol 1. IEEE, pp 331–336 25. Xiao X (2019) An image-inspired and cnn-based android malware detection approach. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 1259–1261 26. Hsien-De Huang T, Kao HY (2018) R2-d2: color-inspired convolutional neural network (cnn)-based android malware detections. In: 2018 IEEE international conference on big data (BigData). IEEE, pp 2633–2642 27. Yapici MM, Tekerek A, Topalo˘glu N (2019) Literature review of deep learning research areas. Gazi Mühendislik Bilimleri Dergisi (GMBD) 5(3):188–215 28. Qiu J, Zhang J, Luo W, Pan L, Nepal S, Xiang Y (2020) A survey of android malware detection with deep neural models. ACM Comput Surv (CSUR) 53(6):1–36 29. Malgenome-215: Android malware dataset for machine learning 1. https://figshare.com/ articles/dataset/Android_malware_dataset_for_machine_learning_1/5854590/1. Accessed 21 Aug 2021 30. Drebin 215: Android malware dataset for machine learning 2. https://figshare.com/articles/dat aset/Android_malware_dataset_for_machine_learning_2/5854653. Accessed 21 Aug 2021 31. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 32. Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Methods Eng 27(4):1071–1092 33. Berman DS, Buczak AL, Chavis JS, Corbett CL (2019) A survey of deep learning methods for cyber security. Information 10(4):122 34. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 35. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Hasan M, Van Essen BC, Awwal AA, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292 36. Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1d convolutional neural networks and applications: a survey. Mech Syst Signal Process 151:107398
Chapter 33
Poisson Stability in Inertial Neural Networks Marat Akhmet, Madina Tleubergenova, Roza Seilova, and Akylbek Zhamanshin
33.1 Introduction Classical neural network models are described by first-order differential equations [1–3]. This article discusses INNs which is a class of second-order differential equations. One of the firsts, who explored inertial neural networks were Babcock and Westervelt [4]. Later, Wheeler and Schieve [5], considered Hopfield effective-neuron system with an added inertial term. Currently, many works are devoted to the study of the INNs dynamics, since they are widely used in applications [4, 6–8]. For example, in [9–11] the problem of the stability in the inertial bidirectional associative memory neural networks, in [12, 13] inertial Cohen-Grossberg neural networks, and in [14, 15] inertial memristive neural networks are studied. In our previous papers [16, 17] and book [18], when investigating unpredictable oscillations, a new method to confirm Poisson stable oscillation have been provided. Unpredictable motions are Poisson stable, to test for the unpredictability, it is necessary to check whether Poisson stability is satisfied, first of all. Our method is significantly different then that used in [19–22]. What we are suggesting is specifically focused on Poisson stable functions identification, and it is easier for implementation and ready to be applied to all those problems, mathematical and of neuroscience, where oscillations should be approved. In the present paper, we study the Poisson M. Akhmet (B) · A. Zhamanshin Department of Mathematics, Middle East Technical University, 06800 Ankara, Turkey e-mail: [email protected] A. Zhamanshin e-mail: [email protected] M. Tleubergenova · R. Seilova · A. Zhamanshin Department of Mathematics, Aktobe Regional University, 030000 Aktobe, Kazakhstan e-mail: [email protected] R. Seilova e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_33
467
468
33 Poisson Stability in Inertial Neural Networks
stable motion of INNs, applying the method that was used in the articles [16–18, 23–26].
33.2 Main Result In this paper, we consider INNs of the form: Σ dui (t) d 2 ui (t) = −di − bi ui (t) + cij gj (uj (t)) + hi (t), 2 dt dt j=1 p
(33.1)
where i = 1, . . . , p, p is the number of neurons, bi > 0, di > 0 are constants, ui (t) is the state and hi (t) is considered as input for ith neuron, gi is a continuous activation function, cij is a constant denoting the neuron connection weight. We will use the following norm ||v||1 = supt∈R ||v(t)||, where ||v|| = max1≤i≤p |vi |, v = (v1 , . . . , vp ), t, vi ∈ R, i = 1, . . . , p, p ∈ N. Assume that cij ∈ R, and the activation function gi : R → R satisfies the following assumption: (C1)
there exists a positive number L such that |gi (u1 ) − gi (u2 )| ≤ L|u1 − u2 | for all u1 , u2 ∈ R.
If one consider the following transformation vi (t) = ξi
dui (t) + ζi ui (t), i = 1, . . . , p, dt
Eq. (33.1) can be written as a system ⎧ dui (t) ζi 1 ⎪ ⎪ = − ui (t) + vi (t), ⎪ ⎪ ⎪ dt ξi ξi ⎪ ⎪ ζi ζi ⎨ dvi (t) = −(di − )vi (t) − (ξi bi − ζi (di − ))ui (t) dt ξ ξi i ⎪ p ⎪ Σ ⎪ ⎪ ⎪ ⎪ +ξ cij gj (uj (t)) + ξi hi (t), i = 1, . . . , p. ⎪ ⎩ i
(33.2)
(33.3)
j=1
According to the results in [27], the couple of bounded functions u(t) = (u1 (t), . . . , up (t)), v(t) = (v1 (t), . . . , vp (t)) satisfies (33.3), if and only if it is a solution of the following integral equation, ⎧ .t ⎪ ζ ⎪ 1 − i (t−s) ⎪ ⎪ (t) = e ξi vi (s)ds, u i ⎪ ⎪ ξi ⎨ −∞
.t p ⎪ [ ] ζ Σ ⎪ ζi ⎪ −(di − ξi )(t−s) ⎪ i ⎪ (ζ v (t) = e (d − ) − ξ b )u (s) + ξ cij gj (uj (s)) + ξi hi (s) ds, i i i i i i i ⎪ ⎩ ξi −∞
(33.4)
j=1
where i = 1, . . . , p. In what follows, we shall focus on the integral Eqs. (33.5) and (33.6).
33.2 Main Result
469
Let P be the Banach space of functions, φ(t) = (φ1 , . . . , φ2p ), such that: (L1) (L2)
φ(t) are continuous uniformly, and ||φ||1 < H , where H is a positive number; ∃ {tn }, tn → ∞ as n → ∞ which satisfies the convergence φ(t + tn ) → φ(t), uniform on any bounded interval.
We introduce an operator . on the space P, .ϕ(t) = (.1 ϕ1 (t), .2 ϕ2 (t), . . . , .2p ϕ2p (t)), ϕ(t) ∈ P, where ⎧1 . t ζ − i (t−s) ⎪ e ξi ϕi+p (s)ds, i = 1, . . . , p, ⎪ ⎪ ⎪ ξi −∞ ⎪ ⎪ ⎪. t −(d − ζi−p )(t−s) [ ⎨ ζi−p i−p ξ i−p (ζi−p (di−p − e ) − ξi−p bi−p )ϕi−p (s) .i ϕi (t) = ξi−p −∞ ⎪ ⎪ p ⎪ ] Σ ⎪ ⎪ ⎪ c(i−p)j gj (ϕj (s)) + ξi−p hi−p (s) ds, i = p + 1, . . . , 2p. ⎪ ⎩+ξi−p
(33.5)
j=1
In the following, we will need the additional conditions: (C2) (C3) (C4) (C5) (C6)
(C7) (C8)
the external inputs hi (t), i = 1, . . . , p, in INNs (33.1) are Poisson stable with common sequence tn , and they belong to P; ∃ a number Mg > 0 which satisfies |gi (s)| ≤ Mg , i = 1, 2, . . . , p, |s| < H ; ζi di > + ξi , ζi > ξi > 1, i = 1, . . . , p; ξi ζi ζi (di − ) − (|ζi (di − ) − ξi bi | + ξi ) > 0, for each i = 1, . . . , p; ξi Σ p ξi ξi Mg cij j=1 < H , i = 1, . . . , p; ζi ζi (di − ) − (|ζi (di − ) − ξi bi | + ξi ) ξi ξi [ ] Σ p 1 ζi |ζ (d − ) − ξ b | + Lξ c < 1, i = 1, . . . , p; i i i i i ij j=1 ξi (di − ζξii ) (1 ) (ζ Σ p ζi ) ζi i maxi , |ζi (di − ) − ξi bi | + Lξi cij < mini , di − , i = 1, j=1 ξi ξi ξi ξi . . . , p.
Lemma 33.1 . is a contraction operator. Proof For ϕ and ψ belonging to P, we have that |.i ϕi (t) − .i ψi (t)| = ⎧ | | .t ⎪ ζ ⎪ 1 − i (t−s) | ⎪| 1 ⎪ e ξi (ϕi+p (s) − ψi+p (s))ds| ≤ ||ϕ(t) − ψ(t)||1 , i = 1, . . . , p, | ⎪ ⎪ ζi ξi ⎪ ⎪ ⎪ −∞ ⎪ ⎪ ⎪| . t ⎪ ζ [ ⎪ ζi−p −(di−p − ξi−p )(t−s) ⎪ | ⎪| i−p ⎪ (ζi−p (di−p − e ) − ξi−p bi−p )(ϕi−p (s) − ψi−p (s)) ⎪ ⎪ ξi−p ⎪ ⎪ ⎪ ⎨ −∞ p ] | Σ | +ξi−p c(i−p)j (gj−p (ϕj−p (s) − gj−p (ψj−p (s))) ds| ⎪ ⎪ ⎪ ⎪ j=1 ⎪ ⎪ [ ⎪ ζi−p 1 ⎪ ⎪ ⎪ |ζi−p (di−p − ) − ξi−p bi−p | ≤ ⎪ ζi−p ⎪ ξi−p ⎪ (d − ) ⎪ i−p ξi−p ⎪ ⎪ ⎪ p ] ⎪ Σ ⎪ ⎪ ⎪ c(i−p)j ||ϕ(t) − ψ(t)||1 , i = p + 1, . . . , 2p. ⎪ ⎩+Lξi−p j=1
470
33 Poisson Stability in Inertial Neural Networks
The last inequality yields ||.ϕ − .ψ||1 = max i
Σ
])
p
Lξi
cij
(1
,
[
1
ξi (di −
ζi ) ξi
|ζi (di −
ζi ) − ξi bi | + ξi
||ϕ − ψ||1 . According to conditions (C4), (C7) we can deduce that the
j=1
operator . is contractive. . Theorem 33.1 Suppose that conditions (C1)–(C8) are hold. Then the model (33.1) has a unique Poisson stable oscillation that possesses the asymptotic property. Proof Firstly, we will show that .(P) ⊆ P. For fixed i = 1, 2, · · · , p, and ϕ(t) ∈ P, we obtain that |.i ϕi (t)| = ⎧ | | 1 .t ⎪ ζ ⎪ 1 H | − ξi (t−s) ⎪ ⎪|| i e ϕ (s)ds | ≤ |ϕi+p (t)| ≤ , ⎪ i+p ⎪ ζi ζi ξi ⎪ ⎪ ⎪ −∞ ⎪ ⎪ ⎪ ⎪ i = 1, . . . , p, ⎪ ⎪ ⎪ .t ⎪ | ⎪ [ ζ ⎪ ζi−p −(di−p − ξi−p )(t−s) ⎪| i−p ⎪ e ) − ξi−p bi−p )ϕi−p (s) (ζi−p (di−p − | ⎪ ⎪ ⎪ ξi−p ⎪ ⎪ ⎪ ⎪ −∞ p ⎪ ] | Σ ⎪ ⎪ | ⎪ c(i−p)j gj (ϕj (s)) + ξi−p hi−p (s) ds| ⎨+ξi−p j=1
⎪ ⎪ .t ⎪ [ ζ ⎪ ζi−p −(di−p − ξi−p )(t−s) ⎪ ⎪ i−p e ) − ξi−p bi−p |H |ζi−p (di−p − ≤ ⎪ ⎪ ⎪ ξi−p ⎪ ⎪ −∞ ⎪ ⎪ ] p ⎪ Σ ⎪ ⎪ ⎪ +ξ c M + ξ H ds i−p (i−p)j g i−p ⎪ ⎪ ⎪ j=1 ⎪ ⎪ p ⎪ [ ] Σ ⎪ ζi−p 1 ⎪ ⎪ (d − ) − ξ b | + ξ ) + ξ M c ≤ H (|ζ , ⎪ i−p i−p i−p i−p i−p i−p g (i−p)j ζi−p ⎪ ⎪ ξi−p di−p − ξi−p ⎪ j=1 ⎪ ⎪ ⎩ i = p + 1, . . . , 2p. Conditions (C5), (C6) imply that |.i ϕi (t)| < H , for each i = 1, . . . , 2p. So that ||.φ||1 = max |.i ϕi | < H . The boundedness of the derivative of .ϕ(t) implies unii
form continuity. Thereby, condition (L1) is satisfied.
33.2 Main Result
471
Next, for fixed number e > 0 and an interval [a, b], −∞ < a < b < ∞, we will prove that ||.ϕ(t + tn ) − .ϕ(t)||1 < e on [a, b], as n → ∞. One can find that |. ϕ (t + t ) − .i ϕi (t) = ⎧i i t n . | ⎪ ζ ⎪|| 1 | ⎪ − ξi (t−s) ⎪ i e (ϕ (s + t ) − ϕ (s))ds |, i = 1, . . . , p, ⎪ | i+p n i+p ⎪ ξ ⎪ i ⎪ ⎪ −∞ ⎪ ⎪ t ⎪ ⎪ ⎪|| . −(di−p − ξi−p )(t−s) [ ζi−p ⎨ ζi−p e ) − ξi−p bi−p )(ϕi−p (s + tn ) − ϕi−p (s)) (ζi−p (di−p − | ξi−p ⎪ ⎪ −∞ ⎪ ⎪ p ] | ⎪ Σ ⎪ | ⎪ ⎪ +ξ c (g (ϕ (s + t ) − g (ϕ (s))) + ξ (h (s + t ) − h (s)) ds|, i−p (i−p)j j j n j j i−p i−p n i−p ⎪ ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎩ i = p + 1, . . . , 2p. Numbers c < a and ξ > 0, can be chosen so that the following inequalities hold: e 2H − ζξi (a−c) e i < , 2 ζi 2H ( di −
ζi ξi
(33.6)
) Σ ζ ζi e −(d − i )(a−c) cij + ξi e i ξi < , |ζi (di − ) − ξi bi | + Lξi 2 ξi j=1 p
e ξ < 2 ζi (
ξ di −
ζi ξi
(33.7)
(33.8)
) e Σ ζi cij + ξi < . |ζi (di − ) − ξi bi | + Lξi ξi 2 j=1 p
(33.9)
For large enough number n we can attain that |ϕi (t + tn ) − ϕi (t)| < ξ, i = 1, . . . , 2p, and |hi (t + tn ) − hi (t)| < ξ , i = 1, . . . , p, on [c, b]. Then, for all t ∈ [a, b], it is true that
472
33 Poisson Stability in Inertial Neural Networks
|.i ϕi (t + tn ) − .i ϕi (t)| ≤ (33.10) ⎧ c | . | ⎪ ζ ⎪ − ξi (t−s) | ⎪| 1 ⎪ i e (ϕ (s + t ) − ϕ (s))ds | | ⎪ n i+p i+p ⎪ ξ ⎪ i ⎪ ⎪ −∞ ⎪ ⎪ ⎪ ⎪ | 1 .t | 2H ζ ζ ⎪ ⎪ 1 − i (t−s) − i (a−c) | | ⎪ ⎪ + e ξi (ϕi+p (s + tn ) − ϕi+p (s))ds| ≤ e ξi + ξ, | ⎪ ⎪ ξ ζ ζ ⎪ i i i ⎪ ⎪ c ⎪ ⎪ ⎪ ⎪ i = 1, . . . , p, ⎪ ⎪ ⎪ ⎪| . c ζ [ ⎪ ζi−p ⎪ −(di−p − ξi−p ⎪| i−p )(t−s) (ζ ⎪ e ) − ξi−p bi−p )(ϕi−p (s + tn ) − ϕi−p (s)) | ⎪ i−p (di−p − ⎪ ξi−p ⎪ ⎪ ⎪ −∞ ⎪ ⎪ p ⎪ | ] ⎪ Σ ⎪ | ⎪ ⎪ +ξ c (g (ϕ (s + t )) − g (ϕ (s))) + ξ (h (s + t ) − h (s)) d τ ds | n n ⎪ i−p j j j j i−p i−p i−p (i−p)j ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎨ .t ζ | ζi−p −(di−p − ξi−p )(t−s) [ | i−p (ζi−p (di−p − e ) − ξi−p bi−p )(ϕi−p (s + tn ) − ϕi−p (s)) + ⎪ | ⎪ ⎪ ξi−p ⎪ ⎪ c ⎪ ⎪ p ⎪ ] | ⎪ Σ ⎪ | ⎪ ⎪ c(i−p)j (gj (φj (s + tn )) − gj (φj (s))) + ξi−p (hi−p (s + tn ) − hi−p (s)) ds| +ξi−p ⎪ ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ( ⎪ ζi−p ⎪ 1 ⎪ ⎪ ≤ 2H |ζi−p (di−p − ) − ξi−p bi−p | ⎪ ζi−p ⎪ ξi−p ⎪ ⎪ ⎪ di−p − ξi−p ⎪ ⎪ p ⎪ ) −(di−p − ζi−p )(a−c) ⎪ Σ ⎪ ξi−p ⎪+2LH ξ ⎪ c(i−p)j + 2H ξi−p e i−p ⎪ ⎪ ⎪ ⎪ j=1 ⎪ ⎪ ⎪ p ⎪ ( ) Σ ⎪ ζi−p 1 ⎪ ⎪ |ζ + (d − ) − ξ b |ξ + Lξ c(i−p)j ξ + ξ ξi−p , ⎪ i−p i−p i−p i−p i−p ⎪ ζ ⎪ ξi−p ⎪ di−p − ξi−p j=1 ⎪ i−p ⎪ ⎪ ⎩ i = p + 1, . . . , 2p.
(33.11) Now inequalities (33.6)–(33.9) imply that ||.ϕ(t + tn ) − .ϕ(t)||1 < e, for t ∈ [a, b]. Due to the arbitrary smallness of e, the assumption (L2) is true. Thus, the operator . is invariant in P. The Lemma 33.1 implies existence of the unique Poisson stable oscillation, ω(t), for (33.1). In the remaining part of the proof, we will show the stability of ω(t). Let us define the 2p-dimensional function w(t) = (u1 (t), . . . , up (t), v1 (t), . . . , vp (t)), and rewrite the INNs (33.3) as dw = Bw + G(t, w), dt ζ
(33.12) ζ
where B = {− ζξ11 , . . . , − ξpp , −(d1 − ζξ11 ), . . . , −(dp − ξpp )} is a diagonal matrix, G(t, w) = (G 1 (t, w), G 2 (t, w), . . . , G 2p (t, w)), is a vector-function such that
33.2 Main Result
473
⎧ 1 ⎪ ⎪ wi+p (t), i = 1, . . . , p, ⎪ ⎪ ⎪ ⎪ ξi ⎪ ζi−p ⎨ −(ξi−p bi−p − ζi−p (di−p − ))wi−p (t) G i (t, w) = ξi−p ⎪ p ⎪ ⎪ Σ ⎪ ⎪ ⎪ +ξi−p c(i−p)j gj (wj ) + ξi−p hi−p (t), i = p + 1, . . . , 2p. ⎪ ⎩ j=1
It is true that ω(t) = e
B(t−t0 )
.t ω(t0 ) +
eB(t−s) G(s, ω(s))ds. t0
¯ = (ω¯1 , ω¯2 , . . . , ω¯2p ) be oscillation of (33.1). We have that Let ω(t) ω(t) ¯ =e
B(t−t0 )
.t ω(t ¯ 0) +
eB(t−s) G(s, ω(s))ds. ¯ t0
(ζ (1 ζi ) i ; di − ; | − ξi bi + ζi (di − , and LG = maxi We denote by λ = mini ξi ξi ξi ) Σ p ζi )| + ξi L cij , i = 1, 2, . . . , p. j=1 ξi Then −λ(t−t0 ) ||ω(t ) − ω(t ||ω(t) − ω(t)|| ¯ ¯ 0 )||1 + 1≤e 0
.t
e−λ(t−s) LG ||ω(s)) − ω(s)|| ¯ 1 ds, t ≥ t0 .
t0
By using the Gronwall-Bellman Lemma, one can find ||ω(t) − ω(t)|| ¯ ¯ 0 )||1 e(LG −λ)(t−t0 ) , t ≥ t0 . 1 ≤ ||ω(t0 ) − ω(t
(33.13)
Consequently, ||ω(t) − ω(t)|| ¯ 1 → 0 as t → ∞, in accordance with condition (C8), and the oscillation ω(t) possesses the asymptotic property. .
33.3 Numerical Example Consider the INNs Σ dui (t) d 2 ui (t) − bi ui (t) + = −di cij gj (uj (t)) + hi (t), 2 dt dt j=1 3
(33.14)
474
33 Poisson Stability in Inertial Neural Networks
Fig. 33.1 Coordinates of the oscillation ω(t), which asymptotically converge to the coordinates of the Poisson stable oscillation of the inertial neural network (33.14)
where i = 1, 2, 3 d1 = 5, d2 = 6, d3 = 7, b1 = 4, b2 = 5, b3 = 8, f (u) = 0.45 arctan(u), ⎞ ⎞ ⎛ c11 c12 c13 0.04 0.02 0.03 ⎝ c21 c22 c23 ⎠ = ⎝ 0.02 0.04 0.02 ⎠ , 0.05 0.06 0.05 c31 c32 c33 ⎛
and h1 (t) = −26.3 (t) + 2, h2 (t) = 55.3 (t) + 1, h3 (t) = 23.(t) − 4. The function h(t) = (h1 (t), h2 (t), h3 (t)) is Poisson stable [16, 28]. The assumptions (C1)–(C8) are satisfied for INNs (33.14) with ζ1 = 3.2, ζ2 = 2.5, ζ3 = 4.4, ξ1 = 0.8, ξ2 = 1, ξ3 = 3, L = 0.45, Mg = 0.71, H = 2.2. Therefore, INNs (33.14) admits a unique Poisson stable oscillation, ϕ(t), according to Theorem 33.1. The oscillation defined on the real axis, and we cannot specify the exact initial value. But, due to the asymptotic property, tends to the Poisson stable oscillation, ϕ(t). Figures 33.1 and 33.2 show graphs of the oscillation ω(t) with initial values ω1 (0) = 0.523, ω2 (0) = 0.361, ω3 (0) = 0.375, which is attracted by φ(t).
References
475
Fig. 33.2 The trajectory of the oscillation ω(t) of the inertial neural network (33.14), which approaches to the Poisson stable motion
Acknowledgements MA and AZ are supported by 2247-A National Leading Researchers Program of TUBITAK, Turkey, N 120C138. MT and RS are supported by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (grant No. AP09258737 and No. AP08856170).
References 1. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79:2554–2558 2. Bouzerdoum A, Pinter R (1993) Shunting inhibitory cellular neural networks: derivation and stability analysis. IEEE Trans Circ Syst I Fundam Theory Appl 40:215–221 3. Cohen MA, Grossberg S (1983) Absolute stability and global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans Syst Man Cybern SMC-13 7:815-821 4. Babcock KL, Westervelt RM (1986) Stability and dynamics of simple electronic neural networks with added inertia. Phys D Nonlinear Phenom 23(1):464–469 5. Wheeler D, Schieve W (1997) Stability and chaos in an inertial two-neuron system. Phys D 105:267–284 6. Mauro A, Conti F, Dodge F, Schor R (1970) Subthreshold behavior and phenomenological impedance of the squid giant axon. J Gen Physiol 55:497–523 7. Koch C (1984) Cable theory in neurons with active, linear zed membrane. Biol Cybern 50:15– 33 8. Angelaki DE, Correia MJ (1991) Models of membrane resonance in pigeon semicircular canal type II hair cells. Biol Cybern 65:1–10 9. Ke YQ, Miao CF (2013) Stability and existence of periodic solutions in inertial BAM neural networks with time delay. Neural Comput Appl 23(3–4):1089–1099
476
33 Poisson Stability in Inertial Neural Networks
10. Qi J, Li C, Huang T (2015) Stability of inertial BAM neural network with time-varying delay via impulsive control. Neurocomputing 161:162–167 11. Zhang Z, Quan Z (2015) Global exponential stability via inequality technique for inertial BAM neural networks with time delays. Neurocomputing 151:1316–1326 12. Ke YQ, Miao CF (2013) Stability analysis of inertial Cohen-Grossberg-type neural networks with time delays. Neurocomputing 117:196–205 13. Yu S, Zhang Z, Quan Z (2015) New global exponential stability conditions for inertial CohenGrossberg neural networks with time delays. Neurocomputing 151:1446–1454 14. Rakkiyappan R, Gayathri D, Velmurugan G, Cao J (2019) Exponential synchronization of inertial memristor-based neural networks with time delay using average impulsive interval approach. Neural Process Lett 50(3):2053–2071 15. Qin S, Gu L, Pan X (2020) Exponential stability of periodic solution for a memristor-based inertial neural network with time delays. Neural Comput Appl 32:3265–3281 16. Akhmet M, Fen MO, Tleubergenova M, Zhamanshin A (2019) Poincare chaos for a hyperbolic quasilinear system. Miskolc Math Notes 20(1):33–44 17. Akhmet M, Tleubergenova M, Zhamanshin A (2020) Quasilinear differential equations with strongly unpredictable solutions. Carpathian J Math 36:341–349 18. Akhmet M (2021) Domain structured dynamics: unpredictability, chaos, randomness, fractals, differential equations and neural networks. IOP Publishing 19. Shcherbakov BA (1962) Classification of Poisson-stable motions. Pseudo-recurrent motions. Dokl Akad Nauk SSSR 146:322–324 (Russian) 20. Shcherbakov BA (1969) Poisson stable solutions of differential equations, and topological dynamics. Differ Uravn 5:2144–2155 (Russian) 21. Cheban D, Liu Zh (2020) Periodic, quasi-periodic, almost periodic, almost automorphic, Birkhoff recurrent and Poisson stable solutions for stochastic differential equations. J Differ Equ 269:3652–3685 22. Cheban D, Liu Zh (2019) Poisson stable motions of monotone nonautonomous dynamical systems. Sci China Math 62(7):1391–1418 23. Akhmet M, Tleubergenova M, Fen MO, Nugayeva Z (2020) Unpredictable solutions of linear impulsive systems. Mathematics 8:1798 24. Akhmet M, Seilova RD, Tleubergenova M, Zhamanshin A (2020) Shunting inhibitory cellular neural networks with strongly unpredictable oscillations. Commun Nonlinear Sci Numer Simul 89:105287 25. Akhmet M, Tleubergenova M, Nugayeva Z (2020) Strongly unpredictable oscillations of Hopfield-type neural networks. Mathematics 86:1978 26. Akhmet M, Tleubergenova M, Zhamanshin A (2021) Modulo periodic Poisson stable solutions of quasilinear differential equations. Entropy 23(11):1535 27. Driver RD (2012) Ordinary and delay differential equations. Springer Science Business Media 28. Akhmet M, Fen MO (2017) Poincaré chaos and unpredictable functions. Commun Nonlinear Sci Numer Simul 48:85–94
Chapter 34
Poisson Stable Dynamics of Hopfield-Type Neural Networks with Generalized Piecewise Constant Argument Marat Akhmet, Duygu Aru˘gaslan Çinçin, Madina Tleubergenova, and Zakhira Nugayeva
34.1 Introduction It is well known that there exist hybrid neural networks being neither discrete-time merely nor continuous-time . Neural network systems with impulsive activities or with arguments of piecewise constant forms can be considered among them. In this study, a novel Hopfield neural networks model including a Poisson stable input, and a piecewise constant argument of the generalised type is addressed. In fact, neural networks of Hopfield class may commit the phase variable’s values at predetermined instances in time to memory, and use these values in the midst of the process until the next moment, when they are modeled via differential equations involving an argument of piecewise constant forms [1–6]. The most part of the theoretical studies for differential equations with generalised piecewise constant functions was put forth and developed by the papers [7–12]. The proposals became most general not only in modeling, but also very powerful in methodological sense, since the equivalent integral equations have been suggested to open the research gate for methods of operators’ theory and functional analysis. The suggestions have been followed with impressive research of ordinary differential, impulsive differential, functional differential as well as partial differential equations. M. Akhmet (B) Department of Mathematics, Middle East Technical University, Ankara, Turkey e-mail: [email protected] D. A. Çinçin Department of Mathematics, Süleyman Demirel University, 32260 Isparta, Turkey e-mail: [email protected] M. Tleubergenova · Z. Nugayeva Department of Mathematics, K. Zhubanov Aktobe Regional University, 030000 Aktobe, Kazakhstan Institute of Information and Computational Technologies CS MES RK, Almaty 050000, Kazakhstan © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_34
477
478
34 Poisson Stable Dynamics of Hopfield-Type Neural …
In papers [13–16] and books [17, 18] discussing the presence of unpredictable solutions, a brand-new method was created to approve that Poisson stable solutions exist, using the fact that Poisson stable functions include unpredictable functions as a subset. This means that Poisson stability property must be checked to verify the unpredictability. The new approach for approving Poisson stability differs significantly from the comparability method by character of recurrence presented first withinside the paper [19]. Afterwards it has been utilized for our various studies [20–22]. In the present study, we aim to analyze the presence of Poisson stable trajectories using the technique described in the papers [23–27]. This study is the first that considers Poisson stable motions for Hopfield-type neural networks having generally described piecewise constant argument, which acknowledges the characteristics to become both advanced and delayed, and accordingly, this brings forth plentiful possibilities for the analysis of neural networks in theoretical and applied sense.
34.2 Preliminaries Throughout the present work, for a vector u = (u 1 , · · · , u m ), u i ∈ R, i = 1, · · · , m, and a square matrix A = (ai j )m×m , the norm ||u|| = max |u i | and ||A|| = max
1≤i≤m
m Σ
1≤i≤m
|ai j | are utilized respectively, where | · | is the absolute value.
j=1
Let two real valued sequences θk , ξk , k ∈ Z, be fixed and satisfy θk+1 > θk , and θk ≤ ξk ≤ θk+1 as well for each integer k, |θi | → ∞ while |i| → ∞. It is presumed that a positive number θ exists to fulfill the inequality θ ≥ θk+1 − θk for every k that lies in Z. The following neural network system of Hopfield-type involving a generalized piecewise constant argument is the major focus of this research: xi' (t) = −ai xi (t) +
m Σ j=1
bi j f j (x j (t)) +
m Σ
ci j g j (x j (γ (t))) + di (t),
(34.1)
j=1
where t, xi ∈ R, i = 1, 2, . . . m and γ (t) = ξk whenever θk ≤ t < θk+1 , k ∈ Z. Here, γ (t) represents the piecewise constant function. Moreover, ai > 0—the rate at which the network’s units self-regulate or reset their potentials when they are secluded from inputs and another units; m—the network’s number of neurons in total; xi (t)—the state of the unit i at time t; f j , g j —the activation functions that correspond to unit j’s incoming potentials; bi j , ci j —the weights of synaptic connections of the j-th unit on the ith unit; di (t)—the stimulus that changes with respect to time and corresponds to the external input coming to the ith unit from outside the neural network.
34.3 Main Result
479
Fig. 34.1 Hopfield neural network system (34.1) as a block diagram
We suppose that bi j and ci j are real parameters, f j , g j : R → R j = 1, 2, . . . , m are activation functions being continuous. Additionally, we assume the existence of constants λ > 0 and λ¯ > 0 in order to satisfy λ ≤ ai ≤ λ¯ for i values in the set {1, 2, . . . , m}. The following is a vector representation of system (34.1): x ' (t) = Ax(t) + B f (x(t)) + Cg(x(γ (t))) + D(t),
(34.2)
where x = colon(x1 , x2 , . . . , xm ) denotes the neuron state vector; f (x) = colon ( f 1 (x1 ), . . . , f m (xm , g(x) = colon(g1 (x1 ), . . . , gm (xm )) are the activations. Furthermore, A = diag(−a1 , −a2 , . . . , −am ), B = (bi j )m×m and C = (ci j )m×m are matrices, while D = colon(d1 , d2 , . . . , dm ) represents the input vector. Definition 1 [28] A continuous, bounded function v : R → Rm is said to be Poisson stable, if one can find a sequence tn , divergent to infinity, and for which v(t + tn ) converges uniformly to v(t) on bounded intervals of the real axis. Figure 34.1 interprets the block diagram of the neural network system (34.1) of Hopfield-type and having a piecewise constant argument. In addition, Table 34.1 lists the symbols used in this diagram.
34.3 Main Result Symbolize by P the space of m-dimensional functions ϕ : R → Rm , ϕ = (ϕ1 , ϕ2 , . . . , ϕm ), where the norm is given by ||ϕ||1 = sup ||ϕ(t)|| . The following propert∈R
ties are assumed for the functions that belong to this space:
480
34 Poisson Stable Dynamics of Hopfield-Type Neural …
Table 34.1 Characterization of the components given in the Fig. 34.1. Symbols Description Integrator block Sum block Gain blocks, with values A, B, C Transfer function block, with functions f and g, which are nonlinear MATLAB function block, with γ (t) as the piecewise constant function Input function Output function
D(t) x(t)
(P1) they are continuous uniformly; (P2) ∃ H > 0 so that ||ϕ||1 < H for each ϕ; (P3) ∃ tn , a sequence diverging to ∞ such that for each ϕ, ϕ(t + tn ) tends uniformly to ϕ(t) on any bounded and closed interval of R. It is assumed for system (34.2) that the following conditions have been met: ¯ − v|| for all u, v ∈ (B1) || f (u) − f (v)|| ≤ L||u − v|| and ||g(u) − g(v)|| ≤ L||u m ¯ R , where L , L are positive numbers; (B2) there are constants m f , m g , which are both positive and satisfy sup || f (x)|| ≤ m f , sup ||g(x)|| ≤ m g ;
||x|| 0 one can find δ > 0 such that the inequality ||ϕ(t1 ) − ϕ(t2 )|| < ε is valid provided t1 , t2 are from the same continuity interval and |t1 − t2 | < δ. β) for each ε > 0 one can find set of ε-almost periods τ ∈ T , which is respectively dense and τ satisfies the inequality ||ϕ(t + τ ) − ϕ(t)|| < ε if t ∈ R and |t − θk | > ε for all integer k. Definition 40.2 [39] Let gk (x), k ∈ Z, be a sequence of functions with the common domain D ⊆ Rn . If for arbitrary ε > 0, there is a relatively dense set of integers Q, satisfying ||gk+q (x) − gk (x)|| < ε for all x ∈ D, k ∈ Z, then the sequence gk (x), is said to be almost periodic uniformly in x. Lemma 40.1 [39] Let f i (t), i = 1, 2, . . . , n, be almost periodic functions such that the sequence θk , k ∈ Z, discontinuity moments is common for all the functions, the sequences Iki , k ∈ Z, i = 1, 2, . . . , p, are almost periodic, gki (x), k ∈ Z, i = 1, 2, . . . , l, are vector functions uniformly almost periodic on their domains. Moreover for the system (40.6) the condition (C1) is valid. Then for arbitrary positive ε and ν < ε there exist set of real numbers, R, and integer numbers, Q, which are relatively dense and satisfy the following inequalities 1. 2. 3. 4.
|| f i (t + τ ) − f i (t)|| < ε, for all t ∈ R, |t − θk | > ε i = 1, 2, . . . , n, i (x) − gki (x)|| < ε, for all x from the domain, for all k ∈ Z, i = 1, 2, . . . , l, ||gk+q i ||Ik+q − Iki || < ε, for all k ∈ Z, i = 1, 2, . . . , p, q ||θk − τ || < ν, for all k ∈ Z,
if τ ∈ R and q ∈ Q. Let us introduce conditions that will be used for almost periodicity: (A2) ai (t), bi j (t), ci (t) are (uniformly continuous) almost periodic functions for i = 1, 2, . . . , m. (A3) The sequences dik , ei jk , h i are almost periodic in k where i, j = 1, 2, . . . , m, k ∈ Z.
40.3 Periodic Solutions
547
Lemma 40.2 [39] Conditions (C1) and (A1)-(A2) imply that for each ε > 0 one can find set of almost periods τ of A(t) which is relatively dense, satisfying the inequality γ
||X (t + τ, s + τ ) − X (t, s)|| < εLe 2 (t−s)
(40.8)
where t ≥ s, |t − θk | > ε, and L is independent of ε and τ . Let us denote k(m τk , l f , m f ) = 1 + m τ (α + βl f )e(α+βl f )m τ +
lτ e(α+βl f )m τ (αh + βm f + σ ) , 1 − lτ (αh + βm f + σ )
( ¯ τk , l f , m f ) = m τ (α + βl f ) (1 + dk )e(α+βl f )m τ k(m
2 + dk + E k l I + lτ (αh + βm f + σ ) 2(α+βl f )m τ + e 1 − (αh + βm f + σ ) +
)
(2 + dk )(αh + βm f + σ )lτ e(α+βl f )m τ . 1 − lτ (αh + βm f + σ )
The following assertions are also needed through)the remaining part of the paper: ( m f B + c m I E + d(αh + βm f + σ ) (C8) K + < H; −γ 1 − eγ θ ( ¯ τk , l f , m f ) ) l I k(m τk , l f , m f )E + k(m lf B < 1; + (C9) K −γ 1 − eγ θ (C10) γ + Kβl f +
¯ τ ,l f ,m f ))) ln(1+K (l I k(m τk ,l f ,m f )E+k(m k θ
< 0.
Theorem 40.1 Conditions (C1)–(C10) and (A1)–(A3) imply the existence and uniqueness of asymptotically stable discontinuous almost periodic solution of (40.6).
40.3 Periodic Solutions In this section, we will focus on the periodic and asymptotically stable solution of recurrent neural network with structured impulses at non-prescribed moments. The following are the conditions that will be needed in what follows. One can find a number ω > 0 and a positive integer p which satisfy (P1) Impulse moments are (ω, p)-periodic. That is, θk+ p − θk = ω, k ∈ Z. (P2) Functions ai (t), bi j (t), ci (t) are ω−periodic in t, for i, j = 1, 2, . . . , m. (P3) Sequences dik , ei jk , h ik are p−periodic in k where i, j = 1, 2, . . . , m, k ∈ Z. (P4) Sequence of discontinuity surfaces is p-periodic in k, such that τk+ p (x) = τk for all k ∈ Z and x ∈ Rm . Since of the (ω, p)-periodicity of the sequence of impulse moments one can find positive numbers θ and θ , which satisfy θ ≤ θk+1 − θk ≤ θ¯ .
548
40 Oscillations in Recurrent Neural Networks with Structured …
Theorem 40.2 Conditions (C1)–(C10) and (P1)–(P4) imply the existence and uniqueness of a solution for (40.6) which is periodic and asymptotically stable.
40.4 Conclusion In this paper, RNN with impulses which are properly structured and happen at variable moments of time are investigated in a detailed approach. Specific and refined conditions on the coefficients have been newly developed. Our system is a structured one since the network’s impulsive part is entirely in the same form as the differential part. In application, it is logical since the impacts can be considered as limits of their differential counterparts. The new system considers the recurrent nature of the neural network in the impulsive part also because the sudden noises or impact disturbances can affect the rates or activation functions. Additionally, we displayed the physical aspects of the impacts. As impulsive actions are compatible with the model’s differential equation, the structured system covers all similar impulsive neural networks previously considered. The method of B-equivalence [19] is widely used for theoretical and practical research. One of the innovative parts of this paper is that the possibility of negative capacitance is not ignored since the advantages of negative capacitance are revealed. Hence, we have the most general system that additionally covers the systems with unstable differential equation parts yet with stable solutions. We studied the existence, uniqueness, and stability of the almost periodic motion for the system with structured and non-fixed impulses. These methods can be improved to apply in different types of neural networks such as shunting inhibitory cellular neural networks, Hopfield neural networks, and cellular neural networks. The extended version of the presented research, is to be published as [40] and [41], where the relation between the original system and the B-equivalent one is offered explicitly and the RNNs with nonfixed moments of impulses are researched in a detailed approach. Acknowledgements MA is supported by 2247-A National Leading Researchers Program of TUBITAK, Turkey, N 120C138, MT is supported by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (grant No. AP09258737 and No. AP08856170).
References 1. Yang W, Yu W, Cao J, Alsaadi FE, Hayat T (2017) Almost automorphic solution for neutral type high-order hopfield bam neural networks with time-varying leakage delays on time scales. Neurocomputing 267:06 2. Akhmet M, Fen MO, Kirane M (2016) Almost periodic solutions of retarded SICNNS with functional response on piecewise constant argument. Neural Comput Appl 27(8):2483–2495
References
549
3. Chen W, Luo S, Zheng WX (2016) Impulsive stabilization of periodic solutions of recurrent neural networks with discrete and distributed delays. In: 2016 IEEE international symposium on circuits and systems (ISCAS), pp 2286–2289 4. Chua LO, Yang L (1988) Cellular neural networks: applications. IEEE Trans Circ Syst 35:1273– 1290 5. Chua LO, Yang L (1988) Cellular neural networks: theory. IEEE Trans Circ Syst 35:1257–1272 6. Wang L, Zou X (2002) Exponential stability of Cohen-Grossberg neural networks. Neural Netw 15(3):415–422 7. Jun Xiang H, Cao J (2009) Exponential stability of periodic solution to Cohen-Grossberg-type bam networks with time-varying delays. Neurocomputing 72:1702–1711 8. Zhang Z, Zheng T (2018) Global asymptotic stability of periodic solutions for delayed complexvalued Cohen-Grossberg neural networks by combining coincidence degree theory with lmi method. Neurocomputing 289:02 9. Akhmet MU, Yılmaz E (2009) Hopfield-type neural networks systems with piecewise constant argument 10. Akhmet M, Fen MO (2013) Period-doubling route to chaos in shunting inhibitory cellular neural networks. In: 2013 8th International symposium on health informatics and bioinformatics, pp 1–5 11. Feˇckan M (2000) Existence of almost periodic solutions for jumping discontinuous systems. Acta Mathematica Hungarica 86(4):291–303 12. Zhao H, Feˇckan M (2017) Pseudo almost periodic solutions of an iterative equation with variable coefficients. Miskolc Math Notes 18:515–524 13. Wang J, Huang L (2012) Almost periodicity for a class of delayed Cohen-Grossberg neural networks with discontinuous activations. Chaos, Solitons Fractals 45(9):1157–1170 14. Qin S, Xue X, Wang P (2013) Global exponential stability of almost periodic solution of delayed neural networks with discontinuous activations. Inf Sci 220:367–378 15. Akça H, Alassar R, Covachev V, Covacheva Z, Al-Zahrani E (2004) Continuous-time additive Hopfield-type neural networks with impulses. J Math Anal Appl 290(2):436–451 16. Samoilenko AM, Perestyuk NA (1995) Impulsive differential equations. World Scientific 17. Vangipuram Lakshmikantham PSS, Bainov DD (1989) Theory of impulsive differential equations. World Scientific 18. Sun JQ, Xiong F, Schütze O, Hernández Castellanos C (2019) Global analysis of nonlinear dynamics, pp 203–210 19. Akhmet M (2010) Principles of discontinuous dynamical systems. Springer-Verlag, New York 20. Akhmet M, Alejaily EM (2019) Domain-structured chaos in a hopfield neural network. Int J Bifurc Chaos 29(14):1950205:1–1950205:7 21. Liu Y, Huang Z, Chen L (2012) Almost periodic solution of impulsive Hopfield neural networks with finite distributed delays. Neural Comput Appl 21(5):821–831 22. Stamov GT, Stamova IM (2007) Almost periodic solutions for impulsive neural networks with delay. Appl Math Modell 31(7):1263–1270 23. Wang C (2014) Almost periodic solutions of impulsive bam neural networks with variable delays on time scales. Commun Nonlinear Sci Numer Simul 19(8):2828–2842 24. Allegretto W, Papini D, Forti M (2010) Common asymptotic behavior of solutions and almost periodicity for discontinuous, delayed, and impulsive neural networks. IEEE Trans Neural Netw 21(7):1110–1125 25. Zhang X, Li C, Huang T (2017) Hybrid impulsive and switching Hopfield neural networks with state-dependent impulses. Neural Netw 93:176–184 26. Zhang X, Li C, Huang T (2017) Impacts of state-dependent impulses on the stability of switching Cohen-Grossberg neural networks. Adv Diff Equat 2017:1–21 27. Xia Y, Huang Z, Han M (2008) Existence and globally exponential stability of equilibrium for bam neural networks with impulses. Chaos, Solitons Fractals 37(2):588–597 28. Yılmaz E (2014) Almost periodic solutions of impulsive neural networks at non-prescribed moments of time. Neurocomputing 141:148–152
550
40 Oscillations in Recurrent Neural Networks with Structured …
29. Saylı ¸ M, Yılmaz E (2014) Global robust asymptotic stability of variable-time impulsive bam neural networks. Neural Netw 60:67–73 30. Khan A, Salahuddin S (2015) Negative capacitance in ferroelectric materials and implications for steep transistors. In: 2015 IEEE SOI-3D-subthreshold microelectronics technology unified conference (S3S), pp 1–3 31. Khan AI, Chatterjee K, Duarte JP, Lu Z, Sachid A, Khandelwal S, Ramesh R, Hu C, Salahuddin S (2016) Negative capacitance in short-channel FinFETs externally connected to an epitaxial ferroelectric capacitor. IEEE Electron Dev Lett 37:111–114 32. Si M, Su C-J, Jiang C, Conrad N, Zhou H, Maize K, Qiu G, Wu C-T, Shakouri A, Alam M, Ye P (2018) Steep-slope hysteresis-free negative capacitance MoS2 transistors. Nature Nanotechnol 13:01 33. Gopalsamy K, He X-Z (1994) Stability in asymmetric Hopfield nets with transmission delays. Physica D: Nonlinear Phenomena 76:344–358 34. Liu Y, You Z (2007) Multi-stability and almost periodic solutions of a class of recurrent neural networks. Chaos, Solitons and Fractals 33:554–563 35. Yang X, Li F, Long Y, Cui X (2010) Existence of periodic solution for discrete-time cellular neural networks with complex deviating arguments and impulses. J Franklin Inst 347(2):559– 566 36. Shi P, Dong L (2010) Existence and exponential stability of anti-periodic solutions of Hopfield neural networks with impulses. Appl Math Comput 216(2):623–630 37. Pinto M, Robledo G (2010) Existence and stability of almost periodic solutions in impulsive neural network models. Appl Math Comput 217(8):4167–4177 38. Bohner M, Stamov GT, Stamova IM (2020) Almost periodic solutions of Cohen-Grossberg neural networks with time-varying delay and variable impulsive perturbations. Commun Nonlinear Sci Numer Simul 80:104952 39. Akhmet M (2020) Almost periodicity, chaos, and asymptotic equivalence 40. Akhmet M, Erim G Almost periodic solutions of recurrent neural networks with state-dependent structured impulses. Discontinuity, Nonlinearity, and Complexity, (in press) 41. Akhmet M, Erim G Periodic oscillations of recurrent neural networks with state-dependent structured impulses. Discontinuity, Nonlinearity, and Complexity, (submitted)
Chapter 41
Topic Modeling Analysis of Tweets on the Twitter Hashtags with LDA and Creating a New Dataset Çilem Koçak, Tuncay Yigit, ˘ J. Anitha, and Aida Mustafayeva
41.1 Introduction Undoubtedly, the most important structure in communication between people is dec language. Natural language processing is a field of science that deals with the design and realization of computerized systems that can understand, interpret and decipher the structure and meaning of a real language that people use between each other, and produce something in that language. Although it was considered as a sub-field of artificial intelligence in previous years, after the successful results obtained by combining many theories, methods and technologies, it is now dec as a basic discipline field. To decipher natural languages, and to better understand their structure, to effectively facilitate communication between computers and humans, and to make computeraided language translations are the most common places where natural language processing is used. Language is a means of expression that serves to communicate thoughts, feelings and motives, directly or indirectly. Every language used by people is different from each other in terms of root and grammatical structure, despite this, the steps of natural language processing go through similar ways, but first you need to know the structure of that language very well. Therefore, it is necessary to study the structure of the language under four main headings [1]. Ç. Koçak (B) Computer Programming Department, Yalvaç Vocational School of Technical Sciences, Isparta University of Applied Sciences, Isparta, Turkey e-mail: [email protected] T. Yi˘git Department of Computer Engineering, Faculty of Engineering, Süleyman Demirel University, Isparta, Turkey J. Anitha Karunya Institute of Technology and Sciences, Coimbatore, India A. Mustafayeva Department of Information Technologies, Mingachevir State University, Mingachevir, Azerbaijan © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_41
551
552
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
Phonology; considers how the sounds of letters used in the language are used in the language. Although many languages use a common alphabet, the same letter in each language can be pronounced differently. That is why the goal here is to translate the spoken language into the written language. Octoberology; At this stage, the structure of the word in the language is considered, and if the suffixes it has received, the root structure or the changes have been changed, they are handled, so that the word is analyzed. The result of the examination forms the scope of the unit. Syntax; Different ordering of words in an established sentence can completely change the meaning and emphasis. That is why it is important in what way it should be sorted out when trying to give meaning to a sentence or when generating a new sentence. In the same way, the ordering of sentences in a large text has the same importance. Semantics; The meaning cannot be reached by passing the human language through a linear function. Although each word has its own meaning, it can add different meanings to the sentence in which it is used. At this point, artificial intelligence comes into play, and mathematical approaches are being developed to use language as a means of communicating [2–5].
41.2 Literature 41.2.1 The Problems Posed by Tweeting Twitter is the social media platform where users take an active role, communicate easily, and freely convey their ideas and thoughts. Texts called twet, which are also shared on Twitter, are used as a convenient data source for many analyses. Twitter, also called a micro-blog, is actively used to determine current events, crisis management, and make various announcements. It is possible to determine subjective information such as emotions, thoughts, opinions, attitudes that users specify in the text they share using different artificial intelligence methods. By performing analysis on texts twet users to solicit their opinions about a topic, in order to make decisions about strategic marketing decisions, financial ratios, forecasting, customer analysis, missing, opportunity and threat factors in the industry, the determination of the activities of competitors in advance of the determination in the process of searching for answers to questions such as foresight consists of [6]. The result of the classification depends on the quality of the dataset and the size of the dataset. In the process, it is aimed to create a data warehouse by taking tweets sent via twitter to create a data set. The data collected through Twitter is randomly selected from trending topics on a daily basis and stored for the data processing process on a daily basis.
41.2 Literature
553
˙ 41.2.2 Studies Conducted with Artificial Intelligence Conducted on Twitter in the Literature Twitter and social media tools used by scientists because it contains the data that occurs more information here up to date, especially in the field of natural language processing and computer science such as data mining used for research purposes and are available in market studies are carried out. As an example, studies such as predicting epidemic diseases in advance [7], finding side effects of the drugs we use [8], predicting the differentiation of perception over time [9], analyzing the perception of tweets posted by tourists coming to a tourist destination [10] can be given. In some studies, Emotion analysis has been considered as a Natural. Studies have been carried out by determining the class level or at the sentence level [11–13]. And more recently it has also been reduced to the level of expression [14, 15]. In some studies, they have used twits to obtain emotional information [16–18]. Akba¸s has designed a system that performs subject-based emotion analysis within the scope of the study. Turkish data collected via Twitter was used in the study, it was proposed that the data grouped by subject should be automatically generated by applying the word selection algorithm together with the Turkish emotion word list to words with a determined emotion level [19]. Nizam and Akin conducted emotion analysis on Turkish Twitter data using unsupervised learning methods. In the study, the effect of showing different distributions of data consisting of 3 different classes (positive, negative and neutral) on the success of classification was examined and as a result of the experiments, it was determined that the classification using an equal distribution dataset performed better than the classification using an unbalanced dataset [20].
˙ 41.2.3 Studies Conducted with Artificial Intelligence Conducted on Twitter in the Literature Doctors often rely on their personal knowledge, skills and experience when diagnosing the disease. However, no matter how skilled he is, a doctor cannot claim that the diagnosis of the disease is absolutely correct, and he definitely makes misdiagnoses. Therefore, at this point, artificial intelligence technologies come to the agenda. Because artificial intelligence has the ability to analyze large amounts of data, solve complex problems and predict with high accuracy. Deep neural network, one of the most up-to-date artificial intelligence techniques of today, expresses a series of computer models that are used effectively in obtaining information from images. Deep learning algorithms have been applied for activities in many medical specialties (most commonly, radiology and pathology). In addition, high performance has been get in the sense of running deep learning solutions in the areas of cancer biology, as like clinical imaging of several species [4]. In Fig. 41.1, artificial intelligence techniques and their relationship are given in estimating cancer diagnosis.
554
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
Fig. 41.1 The process of natural language processing and its basic elements
41.2.4 The Process of Natural Language Processing Along with the field of natural language processing, more significant results have begun to be obtained in the discovery of information in text mining. Natural language processing is a field of computer science that studies the design of computer systems with its own activity. Thanks to natural language processing studies, human– computer interaction has been brought to a certain level, and the achievements of the studies conducted have also increased over time [21]. In order to analyze documents in text mining studies, it is important to identify concepts that carry the meaning of the content of the text. These concepts are called words or groups of words and are expressed as terms. Extracting the terms in the document is a completely different topic and is an area that has been researched within the scope of natural language processing studies. The most important advantage of natural language processing is that in the process of document analysis, terms, that is, words, are separated from october suffixes and converted into their shortest forms without loss of meaning. Because words used for the same meaning can be found in different forms due to grammatical rules, and also, unless these different uses are removed, they can be treated as terms with different meanings, preventing the actual meaning of the documents from being reached. The activities carried out under natural language processing can be grouped into three groups [2].
41.2.4.1
Turkish Language
Turkish is one of the morphologically rich languages (MRL) with its penultimate structure. This means that most of the words are formed by adding suffixes to the roots of words october. The morphology of the language determines the rules of the language in the formation of the word. With the same root, many words can be formed that have different meanings from each other. This structure allows a larger number of words to be formed from a single root, and the task of Natural Language Processing (NLP) is more difficult than in other languages. It is not possible to collect them all word forms and theoretically an infinite number of words can be created to create a dictionary for MRL languages to be used in NLP [22]. English, for example, is morphologically weak and it is more convenient to create a dictionary.
41.3 Material-Method
555
Turkish needs a morphological parser break it down into its components into words. Such a parser may not usually return a response, because words have more than one possible structure. Each octoberix of the word should be examined for the possibility of changing the meaning of the word. Different meanings of the roots can change the emotion of the expression, so it should be carefully studied. But the morphological ambiguity of words makes this process even more difficult. Computers are not yet at the level where they can learn to communicate in natural language by observing like humans. For this purpose, it is necessary to teach computers some rules and at the same time give them a dictionary to use and a database related to their interests to understand. The parser is the most important element of the natural language processing process, the given sentence is analyzed syntactically in this part and the parsing steps are used as in Fig. 41.1. It aims to divide sentences into sentences into phrases. According to this approach, the basic and constituent unit of the language is the sentence. The sentence consists of two basic structures: the noun phrase and the action phrase. These phrases are also divided into smaller phrases within themselves [23]. At the same time, labeling is also called, after this process takes place, the words whose structure has been determined are passed through the semantic analysis process and an output sentence is created according to the input sentence. A dictionary is a structure necessary for the program to recognize words. Thanks to the parser dictionary, you can perform syntactic analysis, as well as this dictionary contains the meaning of the root and these roots so that the language being processed can be understood by the system. The narrator tries to find out the meanings of sentences and words that have already been analyzed using the database. This database may have a semantic structure. A producer is an item that allows you to generate words according to a certain rule and pattern, give feedback to the user, contact the user [23]. Figure 41.1 shows how the dictionary and database are used in which processes. You can also see it.
41.3 Material-Method 41.3.1 Data Collection Twitter is a social network and microblogging service that allows users to send messages in real time, called tweets. Tweets are short messages, limited to 140 characters long. Due to the nature of this microblogging service (fast and short messages), people use abbreviations, make typos, expressions and other characters denoting special meanings. Below is a brief terminology relationship. it was backed up by tweets. Emoticons: These are prints and letters that are represented in an illustrated form using facial-old-punctuation marks; they express the mood of the user. Target: Twitter users use the “@” symbol to refer to other users. users on the microblog. Referring to other users automatically alerts them in this way. Hashtags: Users often use hashtags to mark topics. This is primarily done to increase its visibility [24].
556
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
Fig. 41.2 The process of data collection and preprocessing
There are many methods for compiling publicly shared information via Twitter, which is a social media application. It is possible to compile shared files on a specific topic just by performing a simple search dec the browser. However, in this case, due to the policies set by Twitter, access to a sufficient number of posts cannot be provided to perform a comprehensive analysis. That is why the Twitter administration proposes a “Developer” account for non-commercial researchers. In this study, the necessary applications were made for this developer account and a special service was obtained via Twitter. Thanks to this service, all posts shared with the desired topics can be included in our dataset. Python programming language has been used to compile shares that have become accessible thanks to this service provided by Twitter at december intervals and to be included in the dataset. Tweets about the agendas on twitter between Dec.08.03.2021 and Dec.25.04.2021 have been compiled. After reweets, quotes, blank and duplicate tweets were deleted, there were 159,764 tweets left in the dataset (Fig. 41.2).
41.3.2 Data Processing When creating data sets, the images contained in the tweets should be cleaned from different structures such as folksonomy (tags), meaningless words, emoticons (emoji). In order to separate spelling and grammar errors from short text preprocessing, a data set creation process will be performed with tools such as zemberek, Rapidminer, NTKL, TurkishStemmer. In this process, firstly. The main goal is to find the right sentence, that is, to create a root. In this case, the following operations must be performed.
41.3 Material-Method
557
Removal of punctuation Marks Removal of unnecessary gaps Conversion of these expressions into optimal words Conversion of uppercase and lowercase letters Removing urls in Plaintext Removal of hastags and emojis If other Images are available, they should be removed For maximum data privacy, the user name of the person who tweets should be removed [25]. By extracting the texts, it is necessary to classify the words by calculating the weight values. The number of words used in the sentence, the frequency of use of the word, the size of the letters used in word writing, the number of repetitions of the letters are evaluated. This process can be explained as follows in general, the weight of the word is the number of words repeated one after the other in the same interpretation. the weight of words is calculated as the sum of the weight of the word representing the number of words in the interpretation, and thus n-grams, bi-grams, tri-grams, frequency analysis is created.
41.3.3 Creating a DATASET Preprocessing Considering that each tweet is a document; All the words in the tweets are expressed with a number (Tokenization). Words shorter than 3 characters have been deleted. Words that are considered ineffective (Stopwords) have been cleared. For example: I wonder if, in fact, most of them, again, like… Third-person expressions in words are made infinitive to the first person, verbs that declare time (Lemmatize). All words are reduced to the root (Stemmed). The stopwords list is obtained from the gensim library. Wordnet libraries in zemberek and nltk have been used for the root version. Zemberek code.google.com it is a free and open source natural language processing library developed on. Although it is developed with Java, the libraries produced by the developer group can be used in many software languages. Zemberek is still able to perform word-based operations. Spelling suggestions for incorrect spellings, spelling, ASCII analysis-Turkish or Turkish-ASCII conversions, Rootfinding, word this library can be accomplished by using a production function. Zemberek is the most famous NLP library for Turkish. It is an open source Java library [26]. Zemberek provides morphological analysis and spell checking functions for Turkish words. Along with the use of the Zemberek library, words are examined morphologically, neg october are processed, some important additions have been removed due to their significant contribution in terms of meaning. October Turkish
558
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
is a penultimate language, a word can have a full meaning with the use of suffixes. studying different meanings, and therefore suffixes, is a very important task in order not to october loss of information. Zemberek gives all the possible results for the morphological analysis of a word. In the ag-adhesive structure, more than one result (ambiguity) may appear for a word. the result set includes all possible combinations of root suffixes (October, 4) in a decreasing order of probability (Fig. 41.3). Zemberek tries to separate a word into its roots first of all and decides whether the word is Turkish or not. He studies the word morphologically and identifies the possibilities that may be the root of the word. Then october possible additions to the root are included. October Turkish is the word given at the beginning of the process and this root addition matches one of the outputs after processing, then the appropriate root and suffixes have been found at the same time it has been decided that the word is Turkish. He does this with the help of 30 thousand root guides included in the library. Another reason for using this root dictionary is that it should be specified in special cases. In order for the root candidates of a given word to be found, Zemberek performs the process of quickly determining these roots by placing them in a special tree. You can see this figure in Figs. 41.4 and 41.5. Fig. 41.3 The principle of operation of Zemberek
Fig. 41.4 An example of Zemberek word analysis: “Ispartalısınız”
41.3 Material-Method
559
Fig. 41.5 An example of Zemberek word analysis: “seminerdedir”
41.3.4 Analysis and Classification Classification is briefly expressed as the process of assigning predefined class tags to existing data. In the supervised learning process, a set of labeled data is used to train the known classifier, and then a training model is created for labeling the unknown dataset. There are binary, multiple classification types according to the number of classes. Classification of texts, on the other hand, is the process of assigning predefined class tags to an existing text or document. It is used in areas such as text classification, Web page classification, blog classification, author detection, emotion analysis, cyberbullying detection. Mechanical Turk is one of the favorites in data labeling [27, 28]. Due to the peculiarity of the Turkish language, it is quite difficult to work with textual data, and it is time consuming to achieve meaningful results. The difference that distinguishes text mining and data mining is also revealed here, because the process of creating a text mining dataset is time consuming and quite laborious. Most of the studies conducted by mining the text were conducted in English, which is a valid language. It is not possible to obtain accurate results by translating the designed rules and algorithms into Turkish one-on-one. It is unlikely from the point of view of the rules of grammar of the Turkish language. At this stage, it is aimed to find and edit spelling errors using natural language processing techniques [28]. During the study process, the data are divided into three different groups and divided into training data, test data and verification data, and the training data are used to train the specified classification algorithm. The test data are used to test the classification algorithm being trained. The validation data is used to evaluate the results of the classification algorithm and determine the percentage of accuracy. The data obtained from the classification algorithm are divided into classes as bully and non-bully words depending on the designated groups. In the next classification stage,
560
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
it is aimed to classify the types of words that are difficult to determine. In this way, the words containing cyberbullying are detected in the data. In the studies conducted, different results were achieved by using different methods and changing the data sets.
41.3.5 Topical Modeling Topic Modeling is a subject modeling method used in different disciplines and is a probability model that summarizes a large number of documents by words and allows these documents to be evaluated under certain headings [29]. Topic Modeling is a kind of statistical model that explores abstract topics in documents collected in machine learning and natural language processing. Topic modeling is a text mining that, in a sense, explores the hidden semantic connections within texts. Subject modeling is an effective method used to automatically understand, investigate, and summarize large archives. Advantages • Uncovering hidden themes in the document * Classification of discovered themes • Deciphering the relationships between the themes (Fig. 41.6) 41.3.5.1
Latend Dirichlet Analısys (LDA)
The acceptance made by these studies is that the idea contained in the texts will be more clearly reflected in the title. The biggest difficulty of these approaches is that the idea and emotion are present in the titles at the same time. Title One of the most well-known applications of priority, model-based studies is the method, which is briefly referred to as long Latent Dirichlet Allocation (LDA) in the literature and can be translated into Turkish as the hidden Dirichlet distinction. In this approach, the mixtures of hidden titles of texts are based on a statistical model and this model is extracted from the word distributions in the texts [30]. Linear Dirichlet Allocation (LDA) is one of the most popular Topic Modeling methods. Each document, the Fig. 41.6 Topic modeling working structure
41.4 Research Findings
561
subject of which must be determined, consists of different words. The probability that each word belongs to a theme is calculated. P(θ1:M , z1:M , β1:k , |D; α1:M , η1:k ) *α—the parameter that governs how the distribution of documents is *θ—i, j of the document. The possibility of including the subject *η—the parameter that governs how the distribution of each subject will be *β—β(i, j) i. the subject is j. The possibility of including the word • Z—number of themes • D—all data. In this case, our optimization equation will be as follows. * γ, φ and λ denote the variational parameters. • D(q||p) represents the KL divergence value. • At this point, the aim is to find the variation parameters that minimize the KL divergence value. γ ∗ , φ ∗ , λ∗ = arg min D(q(θ, z, β|γ , φ, λ)|| p(θ, z, β|D; α, η)) (γ ,φ,λ)
41.4 Research Findings 41.4.1 Bi-gram, Tri-gram Word Statistics Consistency and Confusion values were measured by running LDA for different theme numbers to determine the number of themes that should be separated from the data (Figs. 41.7, 41.8, 41.9 and 41.10). The highest coherence value (Coherence) was found to be 0.6090206557004757 in 20 themed trials. The optimal coherence and perfection values indicate that the 20 themes are more distinctive, more consistent, and less mixed. Therefore, it is selected that the data is divided into 20 themes and the data is divided into themes by running LDA for longer iteration numbers (Fig. 41.11).
562
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
Fig. 41.7 Bi-gram and tri-gram results Fig. 41.8 Word statistics
41.4 Research Findings Fig. 41.9 Consistency analysis
Fig. 41.10 Perplexity analysis
Fig. 41.11 Identified themes
563
564
41 Topic Modeling Analysis of Tweets on the Twitter Hashtags with …
41.5 Conclusion and Suggestions When analyzing the emotion analysis studies conducted on Turkish data in the literature, it is observed that it is difficult to classify texts written in everyday colloquial speech, especially. In recent studies, it has been determined that the focus is more on classification methods rather than attribute selection and that various results are obtained with different techniques. At the preprocessing stage, we applied root extraction, removed the stop words and examined several of them. Situations that will not cause any loss of information. The morphological structure of Turkish is discussed. The algorithm is able to successfully parse the themes, despite the fact that there are irrelevant tweets in the themes. In order to better optimize the preprocessing steps and the LDA algorithm so that accuracy can be improved; As a result • A new data set has been created for researchers who will work in the field of natural language processing. • The generated dataset is divided into 20 different themes with optimal consistency and confusion values using the LDA algorithm. • When the themes created are examined, it is understood that the national agenda can be distinguished by a successful artificial intelligence. • In future studies, it is planned to include emotion analysis and examine the emotions and their intensity within the themes. • In order to parse the themes more accurately, the LDA algorithm will be further optimized with an october clustering algorithm. In future studies, the dataset will be used for the detection of cyberbullying and it will be presented for use in different studies by obtaining a Turkish dataset first. In the second stage, the results will be shared by using the ISTE dataset to identify the types of cyberbullying that have seven different sub-branches.
References 1. Delibas A (2008) Do˘gal dil i¸sleme ile Türkçe yazım hatalarının denetlenmesi. Doctoral dissertation, Fen Bilimleri Enstitüsü 2. Özbilici A (2006) Türkçe Do˘gal Dili Anlamada ˙Ili¸skisel Ayrık Bilgiler Modeli ve Uygulaması, Sakarya Üniversitesi FBE, Yüksek Lisans Tezi 3. Nabiyev VV (2010) Yapay Zeka: ˙Insan-Bilgisayar Etkile¸simi, Seçkin Yayıncılık, 3. Baskı, Ankara 4. Kesgin F (2007) Türkçe Metinler için Konu Belirleme Sistemi. ˙Istanbul Teknik Üniversitesi Fen Bilimleri Enstitüsü Yüksek Lisans Tezi 5. Say B (2003) Türkçe ˙Için Biçimbirimsel ve Sözdizimsel Olarak ˙I¸saretlenmi¸s A˘gaç Yapılı Bir Derlem Olu¸sturma, TÜB˙ITAK EEEAG Projesi 6. Onan A (2017, Apr) Sarcasm identification on twitter: a machine learning approach. In: Computer science on-line conference. Springer, Cham, pp 374–383
References
565
7. Szomszor M, Kostkova P, De Quincey E (2010) #Swineflu: twitter predicts swine flu outbreak in 2009. In: International conference on electronic healthcare. Springer, Berlin, pp 18–26 8. Bian J, Topaloglu U, Yu F (2012) Towards large-scale twitter mining for drug-related adverse events. In: Proceedings of the 2012 international workshop on smart health and wellbeing. ACM, pp 25–32 9. Nguyen LT, Wu P, Chan W, Peng W, Zhang Y (2012) Predicting collective sentiment dynamics from time-series social media. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining. ACM, p 6 10. Claster WB, Dinh H, Cooper M (2010) Naïve Bayes and unsupervised artificial neural nets for Cancun tourism social media data analysis. In: Nature and biologically ınspired computing (NaBIC), 2010 Second world congress on IEEE, pp 158–163 11. Turney (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. ACL 12. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity analysis using subjectivity summarization based on minimum cuts. ACL 13. Hu M, Liu B (2004) Mining and summarizing customer reviews. KDD 14. Wilson T, Wiebe J, Hoffman P (2005) Recognizing contextual polarity in phrase level sentiment analysis. AC 15. Agarwal A, Biadsy F, Mckeown K (2009) Contextual phrase-level polarity analysis using lexical affect scoring and syntactic n-grams. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Mar 2009, pp 24–32 16. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Technical report, Stanford 17. Bermingham A, Smeaton A (2010) Classifying sentiment in microblogs: is brevity an advantage is brevity an advantage? ACM, pp 1833–1836 18. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC 19. Akba¸s E (2012) Aspect based opinion mining on Turkish tweets, Yüksek Lisans Tezi, Bilkent Üniversitesi, Fen Bilimleri Enstitüsü, Ankara 20. Nizam H, Akın SS (2014) Sosyal Medyada Makine Ö˘grenmesi ile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin Performanslarının Kar¸sıla¸stırılması. XIX. Türkiye’de ˙Internet Konferansı, ˙Izmir 21. Deliba¸s A (2008) Do˘gal Dil ˙I¸sleme ile Türkçe Yazım Hatalarının Denetlenmesi, ˙Istanbul Teknik Üniversitesi FBE, Yüksek Lisans Tezi 22. Boynukalın Z (2012) Emotion analysis of Turkish texts by using machine learning methods. MSc, Middle East Technical University, Ankara, Turkey 23. Yıldırım E, Çetin F, Eryi˘git G, Temel T (2015) The impact of NLP on Turkish sentiment analysis. Türkiye Bili¸sim Vakfı Bilgisayar Bilimleri ve Mühendisli˘gi Dergisi 7(1):43–51 24. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp 30–38 25. Yaz˘gılı E, Baykara M (2019, Nov) Cyberbullying and detection methods. In: 2019 1st International ınformatics and software engineering conference (UBMYK) 26. Yılmaz H, Yumu¸sak S. Açık Kaynak Do˘gal Dil ˙I¸sleme Kütüphaneleri. ˙Istanbul Sabahattin Zaim Üniversitesi Fen Bilimleri Enstitüsü Dergisi 3(1):81–85 27. Qi X, Davison BD (2009) Web page classification. ACM Comput Surv 41(2):1–31 28. Yüksel AS, Tan FG (2018) Metin madencili˘gi teknikleri ile sosyal a˘glarda bilgi ke¸sfi. Mühendislik Bilimleri ve Tasarım Dergisi 6(2):324–333 29. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation David. J Mach Learn Res 3:993–1022 30. Seker SE (2016) Duygu Analizi (Sentimental analysis). YBS Ansiklopedi 3(3):21–36
Chapter 42
Hopfield-Type Neural Networks with Poincaré Chaos Marat Akhmet, Duygu Aru˘gaslan Çinçin, Madina Tleubergenova, Roza Seilova, and Zakhira Nugayeva
42.1 Introduction The Hopfield model for neural networks [1] is a type of artificial neural networks that imitate the functions of human brain, such as information processing, data storage and pattern recognition. In recent years, the theory of neural networks has attracted the attention of many researchers [2–5]. A large number of scientists from biology, engineering, mathematics, and physics are participating in the research. This is due, on the one hand, to the fruitfulness of the application in this area of new ideas and theories that have appeared in the relevant special sciences, and on the other hand, neural networks themselves are finding more and more numerous applications in biology, medicine and technology.
M. Akhmet (B) Department of Mathematics, Middle East Technical University, Ankara, Turkey e-mail: [email protected] D. A. Çinçin Department of Mathematics, Süleyman Demirel University, Isparta 32260, Turkey e-mail: [email protected] M. Tleubergenova · R. Seilova · Z. Nugayeva Department of Mathematics, K. Zhubanov Aktobe Regional University, Aktobe 030000, Kazakhstan e-mail: [email protected] R. Seilova e-mail: [email protected] Z. Nugayeva e-mail: [email protected] Institute of Information and Computational Technologies CS MES RK, Almaty 050000, Kazakhstan © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_42
567
568
42 Hopfield-Type Neural Networks with Poincaré Chaos
Today, the direction of neural networks is rapidly developing due to chaos. Therefore, the study of neural networks with chaotic oscillations is of great interest to many researchers [6–13]. Chaotic oscillations, described by the recently introduced unpredictable functions [14], are enabling studies the behavior of signals in neural networks. It should be noted that chaos based on the presence of an unpredictable trajectory is called Poincaré chaos. The study of unpredictable functions were considered in the papers [14–19] and in the books [20, 21]. In the present research, a Hopfield-type model with unpredictable inputs is investigated. The existence and uniqueness of asymptotically stable strongly unpredictable oscillations are proved.
42.2 Preliminaries In the article the Hopfield-type model of the form xi' (t) = −αi xi (t) +
m .
βi j g j (x j (t)) + h i (t),
(42.1)
j=1
where t is the time variable, αi > 0, is in the focus of investigation. Moreover, xi (t) is for the state of the ith unit at time t; m is for the number of neurons in the network; αi is for the rates with which the units self-regulate or reset their potentials when isolated from other units and inputs; βi j is for the synaptic connection weight of the unit j on the unit i; g j is for the activation function of the incoming potentials of the unit j; h i (t) is for the time-varying stimulus, corresponding to the external input from outside the network to the unit i. We propose that the coefficients βi j are real numbers, the activation functions gi : R → R, and perturbations h i : R → R are continuous. Let us introduce the norm ||h|| = max1≤i≤m |h i | , where |·| is the absolute value, h = (h 1 , . . . , h m ) and h i ∈ R, i = 1, . . . , p. Definition 42.1 [16] A sequence i k , k ∈ Z, i k ∈ Rm is called unpredictable if it is bounded and there exist a number ε0 > 0 and sequences ζn , ηn , n ∈ N, of natural numbers, which diverge to infinity and |i k+ζn − i k | → 0 for n → ∞ and each k in a bounded interval and |i ζn +ηn − i ηn | ≥ ε0 for each n ∈ N. Definition 42.2 [14] A function x : R → Rm is unpredictable if it is continuous uniformly, bounded and there exist numbers e0 > 0, δ > 0 and tn , sn , n ∈ N, of real numbers, which diverge to infinity such that x(t + tn ) → x(t) as n → ∞ uniformly on compact subsets of R and ||x(t + tn ) − x(t)|| ≥ e0 for each t ∈ [sn − δ, sn + δ] and n ∈ N. Some coordinates of vector-functions cannot be scalar unpredictable functions, and in this case we need to consider unpredictable functions in each coordinate, i.e. strongly unpredictable functions.
42.3 Main Result
569
Definition 42.3 [18] A function u : R → Rm , x = (x1 , . . . , xm ), is strongly unpredictable if it is continuous uniformly, bounded and there exist numbers e0 > 0, δ > 0 and sequences tn , sn , n ∈ N, of real numbers, which diverge to infinity such that x(t + tn ) → x(t) as n → ∞ uniformly on compact subsets of R and |xi (t + tn ) − xi (t)| ≥ e0 for each i = 1, . . . , m, t ∈ [sn − δ, sn + δ] and n ∈ N.
42.3 Main Result Let K be the space of vector-functions, ν : R → Rm , ν = (ν1 , . . . , νm ), with norm ||ν||1 = supt∈R ||ν(t)||, satisfying the following conditions: (K1) ∃ H > 0, which satisfies ||ϕ||1 < H, if ν(t) ∈ K; (K2) ν(t) is continuous uniformly; (K3) ∃{tn }, tn → ∞ as n → ∞, ν(t + tn ) → ν(t) uniformly on compact subsets of R for each ϕ(t) ∈ K. We assume that the following conditions are valued: (A1) |gi (u 1 ) − gi (u 2 )| ≤ L|u 1 − u 2 |, i = 1, . . . , m for all u 1 , u 2 ∈ R, where L > 0 is a constant; (A2) the inequalities 1 < γ ≤ αi ≤ γ¯ , i = 1, . . . , m, are valid with positive numbers γ , γ¯ ; (A3) |h i (t)| ≤ H and |gi (t)| ≤ m¯ i , where H and m¯ i are positive numbers, for all i = 1,. . . . , m and t ∈ R; m |βi j |m¯ j < H (γ − 1); (A4) maxi .j=1 m (A5) L maxi j=1 |βi j | < γ . A bounded on R function x(t) = {xi }, i = 1, 2, . . . , m, is a solution of (42.1), if and only if it satisfies the following integral equation: .t xi (t) =
⎡ e−αi (t−s) ⎣
m .
⎤ βi j g j (x j (s)) + h i (s)⎦ ds.
j=1
−∞
Define on K the operator, .ν(t) = (.1 ν(t), . . . , .m ν(t)) as .t .i ν(t) =
⎡ e−αi (t−s) ⎣
−∞
m .
⎤ βi j g j (ν j (s)) + h i (s)⎦ ds, i = 1, . . . , m. (42.2)
j=1
Lemma 42.1 .K ⊆ K. Lemma 42.2 . : K → K is a contractive operator. The following assertion is proved by using Lemmas 42.1 and 42.2.
570
42 Hopfield-Type Neural Networks with Poincaré Chaos
Theorem 42.1 If the conditions (A1)–(A5) are fulfilled and function h = (h 1 , . . . , h m ) in system (42.1) is strongly unpredictable, then the system (42.1) has a unique strongly unpredictable solution, which is asymptotically stable.
42.4 Examples Example 42.1 A sample of unpredictable function will be constructed. We shall apply the result of paper [20], where was it proved that for each μ ∈ [3 + (2/3)1/2 , 4], there exits solution (unpredictable sequence) of the logistic map χi+1 = μχi (1 − χi ), i ∈ Z.
(42.3)
Denote by τi , i ∈ Z, the unpredictable solution of Eq. (42.3) with μ = 3.9, in [0, 1]. Then one can find a number e0 > |0 and sequences ζn , ηn , of natural numbers, | diverging to |infinity such |that |τi+ζn − τi | → 0 for n → ∞, and each i in a bounded interval and |τηn +ζn − τηn | ≥ e0 , if n ∈ N. Consider the function .t U (t) =
e−2.5(t−s) .(s)ds,
(42.4)
−∞
where .(t) = τi for t ∈ [i, i + 1), i ∈ Z. It is bounded on R, and satisfies supt∈R |U (t)| ≤ 2/5. We do not know the initial value of U (t), consequently, are not able to visualize it. Hence, we represent the function U (t) in the form .t U (t) =
e
−2.5(t−s)
.(s)ds = e
−2.5t
−∞
. where U0 =
0
−∞
.t U0 +
e−2.5(t−s) .(s)ds,
(42.5)
0
e2.5s .(s)ds.
We cannot simulate an unpredictable function, because the initial value is unknown. That is why, we will build a graph function W (t), approaching to the function U (t). Let us determine W (t) = e
−2.5t
.t W0 + 0
e−2.5(t−s) .(s)ds,
(42.6)
42.4 Examples
571
0.4 0.2
W (t)
0 −0.2 −0.4 −0.6 −0.8
0
10
20
30
40
50 t
60
70
80
90
100
Fig. 42.1 The simulation of U (t)
where W0 is a number, which is not necessarily equal to U0 . Subtract equality (42.6) from the equality (42.5) to obtain that U (t) − W (t) = e−2.5t (U0 − W0 ), t ≥ 0. From the last equation it implies, that the W (t) is exponential approaches to the unpredictable function U (t). Therefore, we can apply a curve, which describes the unpredictable solution, by the graph of the approaching function W (t), with initial value W (0) = −0.7 (Fig. 42.1). Example 42.2 We shall consider the Hopfield model ⎛ ⎞ ⎞⎛ 2 0 0 x1 (t) d x(t) = − ⎝ 0 3.5 0 ⎠ ⎝ x2 (t) ⎠ dt x3 (t) 0 0 4 ⎛ ⎞ ⎞⎛ 0.04 0.03 0.05 0.04arctg(x1 (t)) + ⎝ 0.01 0.03 0.02 ⎠ ⎝ 0.04arctg(x2 (t)) ⎠ 0.04arctg(x3 (t)) 0.03 0.01 0.06 ⎛ ⎞ −16U 3 (t) + 2 + ⎝ 4U (t) − 1 ⎠ . 5U (t) + 4
(42.7)
By Lemmas 1.4 and 1.5 [17], the perturbation function (−16U 3 (t) + 2, 4U (t) − 1, 5U (t) + 4) is unpredictable. The conditions of theorem are true with γ = 2, L = 0.04 and H = 3. According to Theorem 42.1, system (42.7) possesses a unique unpredictable solution, that has asymptotic properties.The graph of the solution of (42.7) with the initial conditions φ1 (0) = 3.393, φ2 (0) = 1.430, φ3 (0) = 3.345, is seen in Figs. 42.2 and 42.3.
φ1
572
42 Hopfield-Type Neural Networks with Poincaré Chaos
3 2.5 2
0
10
20
30
40
50
60
70
80
90
100
60
70
80
90
100
60
70
80
90
100
t
φ
2
2 1.5 1 0
10
20
30
40
50
φ
3
t 3 2 1 0 −1
10
0
30
20
50
40
t Fig. 42.2 The simulation of φ(t)
3 2.5
φ
3
2 1.5 1 0.5 0 2.6 2.8 3 3.2
φ1
1.1
1.2
1.3
1.4
φ2
Fig. 42.3 The space state simulation for φ(t), the trajectory of (42.7)
1.5
1.6
1.7
1.8
References
573
Acknowledgements MA is supported by 2247-A National Leading Researchers Program of e8ÜB˙ITAK, Turkey, N 120C138. MT, RS and ZN are supported by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (grants No.AP08856170 and No. AP09258737).
References 1. Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A 79:2554–2558 2. Dong Q, Matsui K, Huang X (2002) Existence and stability of periodic solutions for Hopfield neural network equations with periodic input. Nonlinear Anal 49:471–479 3. Akhmet M, Karacaören M (2018) A Hopfield neural network with multi-compartmental activation. Neural Comput Appl 29:815–822 4. Akhmet MU, Arugaslan D, Yilmaz E (2010) Stability analysis of recurrent neural networks with piecewise constant argument of generalized type. Neural Netw 23:805–811 5. Jin D, Peng J (2009) A new approach for estimating the attraction domain for Hopfield-type neural networks. Neural Comput 21:101–120 6. Aihara K, Takabe T, Toyoda M (1990) Chaotic neural networks. Phys Lett A 6:333–340 7. Das A, Roy AB, Das P (2000) Chaos in a three dimensional neural network. Appl Math Model 24:511–522 8. Yuan Q, Li QD, Yang X-S (2009) Horseshoe chaos in a class of simple Hopfield neural networks. Chaos Solitons Fractals 39:1522–1529 9. Xu K, Maidana JP, Castro S et al (2018) Synchronization transition in neuronal networks composed of chaotic or non-chaotic oscillators. Sci Rep 8:8370 10. Sussillo D, Abbott LF (2009) Generating coherent patterns of activity from chaotic neural networks. Neuron 63:544–557 11. Liu Q, Zhang S (2012) Adaptive lag synchronization of chaotic Cohen-Grossberg neural networks with discrete delays. Chaos 22:033123 12. Ke Q, Oommen J (2014) Logistic neural networks: their chaotic and pattern recognition propertie. Neurocomputing 125:184–194 13. He G, Chen L, Aihara K (2008) Associative memory with a controlled chaotic neural network. Neurocomputing 71:2794–2805 14. Akhmet M, Fen MO (2017) Poincaré chaos and unpredictable functions. Commun Nonlinear Sci Numer Simul 48:85–94 15. Akhmet M, Fen MO (2016) Unpredictable points and chaos. Commun Nonlinear Sci Numer Simul 40:1–5 16. Akhmet M, Fen MO (2018) Non-autonomous equations with unpredictable solutions. Commun Nonlinear Sci Numer Simul 59:657–670 17. Akhmet M, Fen MO, Tleubergenova M, Zhamanshin A (2019) Unpredictable solutions of linear differential and discrete equations. Turk J Math 43:2377–2389 18. Akhmet M, Tleubergenova M, Zhamanshin A (2020) Quasilinear differential equations with strongly unpredictable solutions. Carpathion J Math 36:341–349 19. Akhmet M, Tleubergenova M, Zhamanshin A (2019) Poincaré chaos for a hyperbolic quasilinear system. Miskolc Math Notes 20:33–44 20. Akhmet MU, Fen MO, Alejaily EM (2020) Dynamics with Chaos and fractals. Springer, Switzerland 21. Akhmet MU (2021) Domain structured dynamics: unpredictability, chaos randomness, fractals, differential equations and neural networks. IOP Publishing, UK
Chapter 43
Face Expression Recognition Using Deep Learning and Cloud Computing Services Hilal Hazel Cumhuriyet, Volkan Uslan, Ersin Yavas, ¸ and Huseyin Seker
43.1 Introduction Emotional facial expressions are a communication tool between humans and very important in their daily social life. Emotional facial expressions are defined as a faster language that facilitates daily communication in our lives, revealing people’s intentions, goals, ideas, behaviors, and interpretations of others. Therefore, emotions are very important tools for social order. The main tool for reading emotions is usually the face. The face is the primary center of emotions, and more than ten thousand facial expressions can be displayed. The interpretation of the emotional facial expressions is an important research topic [1] and is still yet unknown how it relates with the human visual system [2]. Ekman introduced six basic emotions: happiness, sadness, anger, disgust, fear, astonishment, and the neutral emotion envisioned in most works, these emotions have become universal among people [3]. Ekman stated that emotional expressions are important in the regulation and devel opment of interpersonal relationships. When emotions are expressed on our face, three anatomically almost independent areas are used: eyebrow and forehead re gion, eyelids and lower part of the face (cheeks, nose, lips, mouth and chin). When an emotion is felt, these zones can be used in double and triple combinations or alone. Most people can accurately interpret facial expressions that appear all over the face for more than 1–2 s; However, expressions that quickly appear or disappear in certain areas of the face can be misleading. H. H. Cumhuriyet (B) · V. Uslan Piri Reis University, Istanbul, Turkey e-mail: [email protected] E. Yava¸s Bartın University, A˘gdac, Turkey e-mail: [email protected] H. Seker Birmingham City University, Birmingham, United Kingdom e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_43
575
576
43 Face Expression Recognition Using Deep …
Although recognizing emotional facial expressions has a slightly misleading share for humans, it is much more difficult for machines to recognize facial expressions than humans. Lately, image-based facial expressions have aroused interest in the scientific community [4, 5]. Many efforts have been made to reveal psychological analyses of emotions through the use of computer technology [6]. One such technology can be used for image recognition is the deep learning, which has proven to be very good at studying complex multivariate data structures, has been used in many areas of science such as crowd analysis [7], radiology [8], biomedicine [9]. The term deep learning started to be used frequently in the literature after Hinton revealed that multilayered neural networks can be educated efficiently [10]. The performance of many classification problems in the field of computer vision has been increased by the use of deep learning techniques. Facial expression recognition with deep learning methods has been a research topic that has been developed in recent years [11] and lately convolutional neural networks (CNN) has attracted the attention of researchers, one of the sub-branches of deep learning [12, 13]. This is because convolutional neural networks can work efficiently with large amount of images. The computational burden of facial expression analysis requires more resources and increasingly more research studies get the benefits of cloud computing. Cloud com puting technology is one of the most innovative and open to development technol ogies that makes it possible to work with different technological tools in various research contexts (e.g. genomic data analysis [14]), to access, transfer and share data quickly, easily and practically. Hence, raising the competence of computing methodology or adding more capabilities and trends that can work in advanced in frastructure, enable the most up-to-date software, and allow the most up-to-date methods for people to learn, has become very important. Both costly and subscrip tion-based services and facilities are provided by cloud computing, which supports instantly expanding and increasing the competence of communication and infor mation technology. In this study, deep learning was used to classify facial expression images. The gen erated model has achieved promising results (0.6932) in facial expression recogni tion on FER-2013 dataset. In addition, it has been observed that cloud-based ser vices are more effective and efficient in terms of the effect of free GPU and execution speeds, and also in terms of image classification of large data. The rest of the paper starts by presenting the related work in Sect. 43.2. The results and discus sion are provided in Sect. 43.3. The conclusions are given in Sect. 43.4.
43.2 Related Works Alenazy and Alqahtani in their work [15], they used semi-supervised deep belief network for facial expression recognition. Some of the parameters on their method are optimized using the gravitational search algorithm. Ly et al. proposed a deep learning based method that fuses 2D and 3D facial modalities to im prove the efficiency for in-the-wild facial expression recognition [16]. Ruan et al. presented a feature decomposition and reconstruction learning method to enhance the facial expression analysis for both in-the-lab and in-the-wild facial ex pression recognition
43.3 Method and Material
577
[17]. Guha et al. took a computational approach to an alyse the facial expressions of children with autism [18]. Yang et al. used a weighted mixture deep neural network based on double-channel facial images [19]. Kim et al. proposed a hierarchical deep neural network structure for facial expression recognition and reported accuracy improvement in two different facial expression recognition datasets [20].
43.3 Method and Material 43.3.1 Deep Learning Deep learning is a powerful machine learning method. What makes deep learning different from other machine learning methods is its flexibility to be modeled in many layers in order to capture interactions from that layers to represent an object. A deep learning model maps inputs to outputs by the use of numerous intermediate variables distributed in various layers as nodes. Each intermediate variable is an output of some function that relates to a number of intermediate variables coming from the previous layer. One exception is the input layer where an intermediate variable becomes the input feature. When the input is an image, a deep learning model often includes convolutional layers. When the visual system in the human brain is examined closely, it becomes more understandable how the principle of deep learning works. More basic features such as edges and corners are recognized in the first place where the signal hits the eye. In the following layers, these edges and corners can be brought together to recognize the shapes of the mouth and nose, in the later layers the faces, and in the following layers, the features of the whole image such as the placement of the people and objects in the scene.
43.3.2 Convolutional Neural Networks Convolutional Neural Networks became prominent in research after this method achieved a considerable success in the ImageNet classification where 1.2 million high-resolution images classified into one thousand class labels [21]. From then un til now many advances made in convolutional neural networks [22, 23]. One of the building blocks of a convolutional network is actually the convolution. In the case of a convolutional network, the primary purpose of the “convolution” is to draw out the essence of the properties from the entry image. Each image can be defined as a set of pixels characterized as a matrix or a string. In the layer of convolution, a scan process is performed on image data in order to find some unique features or patterns. The convolution layer is the initial layer to reproduce properties from the entry image. In this phase two matrices combines together and forms an output matrix. In the pooling layer, the max-pooling and average pooling methods are generally used. There are no parameters learned at this layer of the network. The pooling methods keep the number of channels of the input matrix constant,
578
43 Face Expression Recognition Using Deep …
Fig. 43.1 Illustration of the facial expression recognition pipeline. Image adapted from [14]
reducing the height and width information. It is a stage used to minimize computational complexity. The main purpose of pooling method, a sample-based decoupling transac tion, is to shrink an entry representation, reduce its size, and allow assumptions to be made about the feature found in the compartmented subzone. Next comes the fully connected layer (dense layer). A fully connected layer follows the convolu tional and pooling layers and outputs to the one of the class labels. Figure 43.1 illustrates the convolutional neural network pipeline. It should also be noted that the parame ters are important factors for the effective operation of the convolutional neural net works [24].
43.3.3 Cloud Computing Services National Institute of Standards and Technology (NIST) has proposed the definition [25] “Cloud computing is a model for enabling ubiquitous, convenient, ondemand network access to a shared pool of configurable computing resources (e.g., net works, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models.” The three cloud service models are Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). Cloud-context environments take short time and less effort to complete large tasks, and at the same time, the cloud is a constant active computing and storage resource that allows users to adapt their consumption to their needs. Cloud computing allows to reduce costs and using cloud environments is a more advantageous way to access resources that require tangible computing power. Graphics processors are privatized hardware for parallel computing. By this means, they can perform any transactions much faster by utilizing many CPUs together. As a result, cloud computing provides a working environment where even an individual researcher have the opportunity to work with high-scale data and increased performance. It also protects data in high security cloud servers. It does not cause data loss due to data security, hardware failures within the network. It provides the opportunity to access information at any
43.4 Results and Discussion
579
time. Cloud-based applications and data can be accessed virtually by any device connected to the internet.
43.4 Results and Discussion In this work, the CNN was used to classify emotional facial expressions. Each im age (48 × 48) was transformed to a feature array as the input layer. We preferred to follow the sequential model method, which enabled us to create a structure in se quential layers. The artificial neural network that is created with the sequential func tion can only consist of layers that are connected to each other in sequence. In order for deep learning methods to model nonlinear relationships, operations performed at each node are passed through a nonlinear activation function. ReLU (Rectified Linear Unit) activation function assigns zero to values below zero and positive val ues otherwise. Categorical crossentropy is an objective function that is often used in binary classification problems. “Adam” optimization method and the accuracy value were used as the criterion to be followed. While it is possible to keep track of multiple criteria, it is more preferred that the objective function should be a single function. The final task is to train the model. At this stage, we come across two concepts: epochs and batch size. The cluster size (batch size) designate the number of examples to be included to train the artificial neural network at a time. Once the entire dataset passes through the artificial neural network, it is named an epoch. In this study, categorical crossentropy is used as the loss function. Basically categorical crossentropy measures the distance between two probability distributions. The dataset used to realize this paper is taken from the Kaggle Face Expression Recognition Challenge [26]. This challenge introduced and aimed for building and comparing models that could best predict a facial expression from human face pho tos [27]. The database consist of a total of 35,887 pre-trimmed facial expression images. These images were splitted into 28,709 train, 3589 test, and 3589 validation facial expression images. Each sample is a 48 × 48 pixel grayscale image of a human face photo and is labeled with one of the 7 emotion classes that are universally ac cepted which are anger, disgust, fear, happiness, sadness, surprise and neutral. Figure 43.2 illustrates the image distribution of 7 emotional facial expressions within the da taset. It should be noted that this dataset is imbalanced. Happy images constitute 25% whereas disgust images constitute only 1.5% of the total images. This study implemented more than one CNN model and for each model different hyperparam eters were used. The models were tested with various kernel initalizer values and activation, padding and optimization parameters. The deep learning model achieved 0.6932 accuracy with ReLU activation parameter and “Adam” optimization which is quite reasonable on this challenging dataset as our rate is slightly lower than the first model (0.7116) but higher than the second model (0.6926) among 56 models competed in the facial expression recognition challenge. The confusion matrix pre sented in Fig. 43.3 shows the performance of the implemented model in terms of true and predicted class labels.
580
43 Face Expression Recognition Using Deep …
Fig. 43.2 Image distribution of 7 emotional facial expressions
Fig. 43.3 Confusion Matrix. 0: Anger, 1: Disgust, 2: Fear, 3: Happy, 4: Sad, 5: Sur prise, 6: Neutral
43.4 Results and Discussion
581
Figure 43.4 illustrates class prediction of some of the facial expressions of grayscale test images that are included in the dataset. The facial expressions may be tricky in some cases for a human eye to recognize. The recognition of facial expression task is also a challenging one in terms of computer vision as it might be difficult to distinguish the true emotion expressed in some human face photos. Some sad people may wear a neutral expression and do not show that they are sorry, in this case it is similar to a neutral expression. In addition, the situation is the same in expressions of fear and surprise. Both expressions grow eyes, eyebrows raise, upper eyelids lift, and the upper section of the face wrinkled. In the expression of surprise, the eyes open as much as possible, as in the expression of fear, the eyebrows rise and the upper sec tion of the face wrinkles. In the expression of fear and surprise, the upper section of the face is almost the same, so it has similarities in the recognition of the expression. It is evident that lips fall towards the bottom as a common feature in both expressions of anger and sadness. Figure 43.5 illustrates the predicted facial expressions of gray scale test images of some movie artists. Three images which are not included in the dataset are taken from the internet for further testing the CNN model that was built. Each image has been transformed using numpy array reshaping for a target size (48,48). In addition, “Adam” method was chosen to be the optimization method and accuracy was used as the metric to follow during the model building process. The first image is the famous Turkish actress Türkan Soray. ¸ The generated CNN model recognized Türkan Soray’s ¸ facial expression in the im age as happy. In the expression of happiness, both cheeks are upward, the edges of the eyes are wrinkled and the muscles around the mouth are prominent. Therefore, our model correctly recognized the happy expression. The second image is the fa mous British actress Keira Knightley and the generated CNN model recognized Keira Knightley’s facial expression in the image as neutral. Some people may wear a neutral expression and do not show that they are sad or happy. The third image is Macaulay Culkin, whom we know from the movie ‘Home Alone’ was used. The generated CNN model recognized Macaulay Culkin’s facial expression in the image as fear. The eyes are enlarged, the eyebrows are lifted up, and the forehead is wrin kled.
Fig. 43.4 Facial expression prediction performance of CNN model on grayscale test images
582
43 Face Expression Recognition Using Deep …
Fig. 43.5 Prediction of facial expressions of movie artists that are not included the dataset
As there are many ten thousands of images to be used in deep learning model, the model demands high performance during the process. GPU is a useful and time saving factor when classifying images. Google Colab’s free GPU service is benefi cial in many respects and we used Google Colab in order to accelerate the training task. While performing image classification, the runtime type chosen was GPU. It was seen that the GPU speeds up the training of CNN models and reduced the time we had to spend while training our CNN model. Google Colab enabled our work to be recorded automatically and allowed us to share it with other users in a simple way. The Jupyter notebook read the images that is mounted on the drive and offered
References
583
a very useful environment. It is clear that the benefits of cloud computing makes it easier for a researcher to conduct his/her study. Additionally, Google Colab’s cloud service environment offered for free will benefit deep learning enthusiasts who want to do similar studies in many ways. People who interested in deep learning increas ingly benefits from this raise in performance power and cheap prices that the cloud computing offers. As a result of the increase in processing power, it became possible to use deeper models in practice. To summarize, deep learning has become so inter esting as patterns able to evolve into deeper and additional levels of complexity, with the latest proposed algorithms that can train them, the ability of these networks to be trained with huge amount of data, and the whole process can be carried out using cloud services.
43.5 Conclusions and Future Work The emotional facial expression recognition systems attracted a huge interest due to their ability to identify and detect the emotional state of a person. The recognition of faces via computer vision have shown tremendous improvement lately with the use of deep learning. In this study, a facial recognition system that identifies seven different emotions is proposed. The facial expression dataset was composed of 35,887 images splitted into 28,709 train, 3589 test, and 3589 validation facial expression images. Our model achieved a high test accuracy rate (0.6932) with Relu activation parameter and “Adam” optimization. However, facial expression datasets may suffer from factors such as blurriness, pose and light intensities in images. Therefore, the performance of models might be affected from these issues and our model has no exception. As a future work, the model can be designed as a realtime process to allow recognition of emotions from facial expressions in full motion video streaming.
References 1. Blair RJR (2003) Facial expressions, their communicatory functions and neuro–cognitive substrates. Philosophical Trans Royal Soc London. Series B: Biol Sci 358(1431):561–572 2. Lin Z (2008) Dimension-based attention in the recognition of facial identity and facial expression. Nature Precedings 3. Ekman P (1972) Expressions of emotion. In: Nebraska symposium on motivation, vol 19. University of Nebraska Press 4. Kumari J, Rajesh R, Pooja KM (2015) Facial expression recognition: a survey. Proc Comput Sci 58:486−491 5. Fasel B, Luettin J (2003) Automatic facial expression analysis: a survey. Pattern Recogn 36:259–275 6. Cowie R et al (2001) Emotion recognition in human-computer interaction. IEEE Signal Process Mag 18:32–80 7. Kizrak MA, Bolat B (2018) A comprehensive survey of deep learning in crowd analysis. Bilisim Teknolojileri Dergisi 11(3):263–286
584
43 Face Expression Recognition Using Deep …
8. Yamashita R et al (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629 9. Wainberg M et al (2018) Deep learning in biomedicine. Nat Biotechnol 36(9):829–838 10. Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Magazine 29(6):82–97 11. Li S, Deng W (2020) Deep facial expression recognition: a survey. IEEE Trans Affective Comput 12. Sari M, Moussaoui A, Hadid A (2020) Automated facial expression recognition using deep learning techniques: an overview. Int J Inform Appl Mathem 3(1):39–35 13. Ranjan R et al (2018) Deep learning for understanding faces: machines may be just as good, or better, than humans. IEEE Signal Process Magazine 35(1):66−83 14. Langmead B, Nellore A (2018) Cloud computing for genomic data analysis and collaboration. Nature Rev Genetics 19.4:208−219 15. Alenazy WM, Alqahtani AS (2021) Gravitational search algorithm based optimized deep learning model with diverse set of features for facial expression recognition. J Ambient Intell Humaniz Comput 12:1631–1646 16. Ly TS et al (2019) A novel 2D and 3D multimodal approach for in-the-wild facial expression recognition. Image and Vision Comput 92:103817 17. Ruan D et al (2021) Feature decomposition and reconstruction learning for effective facial expression recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7660–7669 18. Guha T et al (2016) A computational study of expressive facial dynamics in children with autism. IEEE Trans Affect Comput 9(1):14–20 19. Yang B et al (2017) Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6:4630–4640 20. Kim JH et al (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285 21. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv in Neural Inform Process Syst 25:1097–1105 22. Jiuxiang G et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377 23. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352–2449 24. Kurt F (2018) Analysis of the effects of hyperparameters ın convolutional neural networks 25. Mell P, Grance T (2011) The NIST definition of cloud computing 26. Kaggle (2013) Challenges in representation learning: facial expression recognition challenge. https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-rec ognition-challenge/data 27. Goodfellow IJ, Erhan D, Carrier PL et al (2015) Challenges in representation learning: a report on three machine learning contests. Neural Netw 64:59−63
Chapter 44
Common AI-Based Methods Used in Blood Glucose Estimation with PPG Signals Ömer Pektas¸ and Murat Köseoglu ˘
44.1 Introduction Diabetes is a painful and costly disease arising from the lack of insulin hormone produced by pancreas. Also, ineffective usage of insulin hormone can cause diabetes. The human body with diabetes is not able to use the glucose, which is transferred from the foods into the blood, so the glucose level in the blood rises. Different health complications such as kidney disease, eye problems, high blood pressure (hypertension), skin and foot complications can be resulted from Diabetes. Also, it may cause a stroke or heart attack. Attentive and convenient governance of diabetes and continuous monitoring can avoid most of mentioned complications. The recent researchs based on the data taken from International Diabetes Federation shows that the number of people having Diabetes as of 2019 is approximately 8.8% of the world population. A rational increase is observed in the number of people with diabetes since 2010 when this ratio was 6.4%. So, one can see that the number of people with diabetes is 415 million. It is predicted that this number will be about 642 million as of 2040 [1]. The prevalence of diabetes in women and men is 9.0% and 9.6%, respectively as of 2019, a graphic showing the distribution of the disease on the age groups is presented inin Fig. 44.1. As seen in Fig. 44.1, the diabetes prevalence in people aged 65–79 years is about 19.9% [2]. Glucose monitoring is very important for patients with chronic diabetes. This monitoring is grouped as invasive and non-invasive. Invasive glucose monitoring requires taking a blood sample by puncturing the patient’s fingertip with a needle. This Ö. Pekta¸s (B) Department of Electrical and Energy, Vocational School of Technical Science, Karamanoglu Mehmetbey University, Karaman, Turkey e-mail: [email protected] M. Köseo˘glu Department of Electrical and Electronics Engineering, Faculty of Engineering, Inonu University, Malatya, Turkey © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_44
585
586
44 Common AI-Based Methods Used in Blood Glucose …
Fig. 44.1 Diabetes prevalence by sex and age in 2019 [2]
makes frequent monitoring inconvenient, painful and costly for users. For this reason, non-invasive glucose monitoring, which is painless, convenient, and comfortable, has become the focus of current and future studies in this area. There are different noninvasive glucose monitoring methods which differ according to their sensor types, data processing and prediction method. As a general method, artificial intelligence is often used to make such glucose monitoring systems accurate and efficient [3]. In the studies made in recent years, a glucose monitoring method, which analyses and uses the data extracted from PPG signal via machine learning methods, is of interest since it exhibits good performance on BGL estimation. The “PPG” term was derived from the Greek words “plethysmos” and “graphein”. “Plethysmos” and “Graphein” mean “increasing” and “write”, respectivley, hence, the term of “Photoplethysmogram” is obtained [4]. PPG is a non-invasive electrooptical method, and it is used to receive data indicating the volume of blood flowing at any test site of the body close to the skin. A PPG signal consists of a nonpulsatile component (DC) and a pulsatile component (AC). The DC component passes through the skin, muscle, and bone, but it cannot pass through blood vessels. The AC component which is light passes directly through blood vessels [5]. The PPG signal obtained from individuals is recorded for three minutes to collect data for glucose estimation [6]. Recently, artificial intelligence techniques and algorithms have been often employed in many areas and have become one of the methods for blood glucose prediction. The data received from actual patients are used by applying data-based models and methods to learn and extract patterns from the obtained data. These models can be used to monitor patients and estimate possible increase or decrease in blood glucose content. Additionally, these patterns may be also used to create advices considering each user’s special and personal data [7]. In this paper, AI-based techniques are studied for blood glucose estimation from PPG signals. These methods are Cepstral coefficients, random forest (RF), decision tree (DT), k-nearest neighbor (KNN), support vector machine (SVM), artificial neural network (ANN), and naive bays (NB) used in BGL estimation. These methods are investigated and compared in the aspect of sensitivity, accuracy, and specificity, and the pros and cons of these methods in BGL estimation are determined.
44.2 AI-Based Non-invasive BGL Methods
587
44.2 AI-Based Non-invasive BGL Methods The algorithms used for estimating blood glucose level with machine language methods are illustrated as a block diagram in Fig. 44.2.
44.2.1 Pulse Based Cepstral Coefficients In this method, extracted single pulse-based cepstral features from PPG use for BGL estimation. To calculate cepstral coefficients, firstly, PPG signal power spectrum is estimated. Cepstral field analysis converts the convolutional signal into an additional form. While the PPG signal is not a convolved signal, it is dissociated by implementing the cepstral field analysis. This process provides a representative of the compact signal with a reduced size feature vector. In the pulse-based Cepstral coefficients, each pulse is separated from the one-minute PPG signal with a constant start and endpoint. These points are positioned according to the pulse periodicity and local minimums. The number of heart beats hinges on the heart health, and it changes between 72 and 100 beats per minute [8].
44.2.2 Support Vector Machine (SVM) SVM is a machine learning method that uses both regression and classification. In this method, each data point is depicted on the N-dimensional hyperplane. There are many hyperplanes to separate these data points into two different classes. Among them, the one with the maximum margin should be chosen. The top margin is the distance between two data points. A linear hyperplane is chosen between two classes in SVM. There is no need for manual addition to have a hyperplane in SVM. A
Fig. 44.2 Block diagram of estimating BGL
588
44 Common AI-Based Methods Used in Blood Glucose …
technique referred to as kernel trick is used, this technique is based on a function which treats a low-dimensional input space and converts it into a higher-dimensional space. It performs exceptionally the conversion of complex data and determines the data separation process according to defined outputs. The data points used here are training data. The SVM classifier performs classification by mapping input vectors to decision values. When the training dataset is linearly separable, these hyperplanes are chosen so that there are no points between them, and the distance between the hyperplanes is tried to be maximized. In short, SVM determines the optimal decision boundary separating the data points from different classes. It estimates the class of the new data point by considering the boundary of separation [9]. A kernel is preferred for mapping the input data to high dimensional feature space, due to overcoming with nonlinearly separable data [10].
44.2.3 Decision Tree (DT) This model aims to learn the classification of objects based ona set of samples whose predictions are known [11]. The decision tree is a supervised learning algorithm with a tree-like structure. Decision trees behave like the ability of human cognition and thought when making decisions. A decision tree is a model which is illustrating such mappings. DT comprise of tests or feature nodes attached to two or more sub-trees and leaves or decision nodes labeled with a class that represents the decision [12]. The internal node reflects the attributes or properties of the dataset. The decision rule is shown by branches, and the result is represented by leaf node. A decision tree makes decisions by dividing nodes into sub-nodes, as shown in Fig. 44.3. This process continues until only the nodes remain in the training process. The success and performance of a decision tree is directly proportional to this process [9]. The outcome of decision tree regression should be an actual number [6]. The realization of adecision tree is not difficult, besides the advantages, it predicts the outcomes more accurately. The installation of new nodes is renewed until a basic condition has not been satisfied [13].
44.2.4 Random Forest Regression (RFR) RFR is a supervised learning method that uses classification and regression. In RFR, each node is fragmented at its best point, which represents a randomly selected set of references at a node level. This regression falls under a class of algorithms that calculate the result in the group. The result obtained is given as a simple average of simple estimators. This model requires a large memory to store the final model. Parallel training in the whole tree model allows the method to achieve success faster [6].
44.2 AI-Based Non-invasive BGL Methods
589
Fig. 44.3 Decision tree [9]
44.2.5 K-Nearest Neighbor (KNN) KNN classification is based on the thought that similar objects are placed in close affinity by the k nearest neighbors. KNN considers the classification of its neighbors and classifies the data point based on this information. It is basically based on the idea that objects closest to each other have similar properties. The simplest form of KNN is the nearest neighbor rule that while K = 1 [14]. There are two stages in the KNN algorithm; training and classification. The training phase stores training data, and associated class labels. Test data points are classified by assigning the label with the highest votes among the k closest training samples. The classification of test data points is provided by appointing the label with the highest votes among the k closest training samples. The Euclidean distance between data points in space is used to calculate k nearest neighbors [15].
44.2.6 Artificial Neural Network (ANN) ANN is used to get maximum accuracy besides these methods. As a mathematical model, ANN is based on the functioning and structure of biological neurons. The connectivity of neurons in ANN works on the principle of the connectionist approach. Additionally, ANN includes one or more hidden layers which handle the information via neurons. In there each node works as an activation node; for acquiring a better result, it classifies the outcome of artificial neurons [13]. After the regression of the PPG signal, hidden layer topology is used to build a structure. To obtain maximum accuracy, preliminary experiments are conducted based on the variations in neurons in each hidden layer. Zhang et al. [16] have used ANN in their study for BGL estimation. The invasive blood glucose meter was used to measure the subject’s true
590
44 Common AI-Based Methods Used in Blood Glucose …
BGL. This data was combined with a filtered PPG signal using the ANN integrated into a field programmable gate array (FPGA) and used to determine the final BGL value [16].
44.2.7 Naïve Bayes (NB) This model is a probabilistic classifier which is based on Bayes theorem [17]. Bayes’ theorem calculates the probability of a given hypothesis. Naive Bayes is a machine learning model used for large volumes of data. The probabilities of membership are estimated for each class, such as the probability of data points associated with a particular class. The class with the highest probability is evaluated as the most suitable class. NB classifiers conclude that not all variables or properties are interrelated. Bayes’ Theorem: P(A|B) =
P(B|A)P( A) P(B)
(44.1)
where P(B|A) = Likelihood Probability, P(A|B) = Posterior Probability, P(B) = Marginal Probability and P(A) = Prior Probability [9]. NB is a DT-based classification algorithm that only differs in the representation of its result. The DT provides the rules at the end, NB also determines the probability. The purpose of the usage of both algorithms is prediction. Additionally, NB provides a circumstantial probability. The key advantage of NB is that it can handle a small dataset. Bayes’ theorem finds the applicability of the property associated with an object using important data [13].
44.3 Conclusion and Suggestions Habbu et al. (2019) have used single pulse-based and frame-based cepstral coefficients based on PPG data for the estimation of BGL. They are compared based on R2, Pearson and Spearman coefficient of correlation, and Clarke error grid analysis performance metrics while using neural network. They demonstrated that the Single Pulse technique comes through the limitations of the time-varying of the PPG signal. Cepstral domain single pulse features demonstrate a high correlation in BGL estimation in terms of R2, Pearson and Spearman correlation coefficient, as good as Clarke error grid analysis. Therefore, this method could be considered as a reliable feature set for BGL estimation. As a result of the comparison, the use of the single pulsebased coefficients obtained from PPG data has yielded more accurate results in the ratio of 95%. Thus, a non-invasive PPG-based continuous blood glucose monitoring can be applied using a single pulse-based cepstral coefficients technique [8].
References
591
Table 44.1 Comparative study of related studies for diabetes prediction with Pima Indian Dataset [13] Authors
Methods
Accuracy obtained (in %)
Iyer et al. [18]
NB
79.56
Kumari and Chitra [19]
SVM
78
Naz and Ahuja [20]
ANN
87.46
Wang [21]
KNN and DT
90.03
Naz and Ahuja (2020) have used the NB, DT and ANN for diabetes prediction. The comparition of results have showed that the DT has best accuracy outcome with 96.62%. The other obtained results were 90.34% for ANN and 76.33% for NB. Thus, DT algorithm presents the highest accuracy amongst these methods for prediction of diabetes [13]. On the other hand, a comparison of the studies using the same Pima Indian dataset for detection of diabetes, is shown in Table 44.1. NB, SVM, ANN, KNN, and DT are taken separately from each study for comparison of accuracy. Thus, the results show that KNN and DT present the best accuracy percentage with 90.03% for diabetes [13]. In another study, Siam et al. (2021) have used KNN, DT, and NB classifiers for the prediction of diabetes. They have compared the results of these machine learning algorithms with actual results of patients, which were recorded using the conventional invasive method. As a result, they concluded that the DT algorithm exhibits a considerable success in comparison with other mehods with an accuracy of 89.97% for the prediction of diabetes mellitus [9]. As a future work, it is aimed to develop a more precise and successful method by considering the existing methods, especially DT and Cepstral coefficient methods. Thus, we work on an algorithm to increase performance and accuracy rate in BGL estimation. The developed algorithm is planned to be integrated into a smart watch through a simple and user-friendly interface for continuous monitoring of BGL.
References 1. World Population Review (2021) (Online). Available https://worldpopulationreview.com/cou ntry-rankings/diabetes-rates-by-country 2. Saeedi P et al (2019) Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the international diabetes federation diabetes atlas, 9th edition. Diabetes Res Clin Pract 157:107843. https://doi.org/10.1016/j.diabres.2019.107843 3. Zhang Y, Zhang Y, Siddiqui SA, Kos A (2019) Non-invasive blood-glucose estimation using smartphone PPG signals and subspace kNN classifier. Elektrotehniski Vestnik 86(1/2):68–74 4. Gupta SS, Kwon TH, Hossain S, Kim KD (2021) Towards non-invasive blood glucose measurement using machine learning: an all-purpose PPG system design. Biomed Signal Process Control 68:102706
592
44 Common AI-Based Methods Used in Blood Glucose …
5. Resit Kavsaoglu A, Polat K, Recep Bozkure M, Hariharan M (2013) Feature extraction for biometric recognition with photoplethysmography signals. IEEE Conference Publications 6. Priyadarshini RG, Kalimuthu M, Nikesh S, Bhuvaneshwari M (2021) Review of PPG signal using machine learning algorithms for blood pressure and glucose estimation. In: IOP conference series: materials science and engineering, vol 1084, no 1. IOP Publishing, p 012031 7. Muñoz-Organero M, Queipo-Álvarez P, García Gutiérrez B (2021) Learning carbohydrate digestion and insulin absorption curves using blood glucose level prediction and deep learning models. Sensors 21(14):4926 8. Habbu SK, Joshi S, Dale M, Ghongade RB (2019) Noninvasive blood glucose estimation using pulse based cepstral coefficients. In: 2019 2nd International Conference on Signal Processing and Information Security (ICSPIS). IEEE, pp 1–4 9. Siam AI et al (2021) PPG-based human identification using Mel-frequency cepstral coefficients and neural networks. Multimed Tools Appl 14(10):869–880. https://doi.org/10.1007/s12046019-1118-9 10. Nirala N, Periyasamy R, Singh BK, Kumar A (2019) Detection of type-2 diabetes using characteristics of toe photoplethysmogram by applying support vector machine. Biocybern Biomed Eng 39(1):38–51. https://doi.org/10.1016/j.bbe.2018.09.007 11. Zorman M, Podgorelec V, Kokol P, Peterson M, Šprogar M, Ojsteršek M (2001) Finding the right decision tree’s induction strategy for a hard real world problem. Int J Med Inform 63(1–2):109–121. https://doi.org/10.1016/S1386-5056(01)00176-9 12. Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q (2013) Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci 29(2):93–99. https:// doi.org/10.1016/j.kjms.2012.08.016 13. Naz H, Ahuja S (2020) Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metab Disord 19(1):391–403. https://doi.org/10.1007/s40200-020-00520-5 14. Tjahjadi H, Ramli K (2020) Noninvasive blood pressure classification based on photoplethysmography using K-nearest neighbors algorithm: a feasibility study. Inf 11(2). https://doi.org/ 10.3390/info11020093 15. Prabha A, Yadav J, Rani A, Singh V (2021) Non-invasive diabetes mellitus detection system using machine learning techniques. In: 2021 11th ınternational conference on cloud computing, data science & engineering (confluence). IEEE, pp 948–953 16. Zhang G, Mei Z, Zhang Y, Ma X, Lo B, Chen D, Zhang Y (2020) A noninvasive blood glucose monitoring system based on smartphone PPG signal processing and machine learning. IEEE Trans Ind Inf 16(11):7209–7218 17. Kadar JA, Agustono D, Napitupulu D (2018) Optimization of candidate selection using naïve Bayes: case study in company X. J Phys Conf Ser 954(1):012028 (IOP Publishing) 18. Iyer A, Jeyalatha S, Sumbaly R (2015) Diagnosis of diabetes using classification mining techniques. arXiv preprint arXiv:1502.03774 19. Kumari VA, Chitra R (2013) Classification of diabetes disease using support vector machine. Int J Eng Res Appl 3(2):1797–1801 20. Naz H, Ahuja S (2020) Deep learning approach for diabetes prediction using PIMA Indian dataset. J Diabetes Metabolic Disorders 19(1):391–403 21. Wang Q (2017) License plate recognition via convolutional neural networks. In: 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE, pp 926–929
Chapter 45
Capturing Reward Functions for Autonomous Driving: Smooth Feedbacks, Random Explorations and Explanation-Based Learning M. Cemil Güney
and Yakup Genç
45.1 Introduction Learning to drive is still a challenging and under explored task, especially a system that fed with high space inputs. Such a driving policy is required to be capable to solve complex problems such as lane tracking, localization, drivable path recognition, steering control. Deep reinforcement learning methods have been applied to various autonomous driving tasks to teach machines to control cars in a simulated environment. For example for lane following task Deep Reinforcement Learning agents can successfully learn autonomous maneuvering in scenario of complex road curves [14]. We see that there are successful applications on training a neural network with Q-learning algorithm [17] for autonomous driving tasks. In Sharifzadeh et al. [15] work, they propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. They evaluate their performance of this approach in a 2d simulation-based autonomous driving scenario which developed by them. They show that, after a few learning steps, their agent generates collision-free motions and performs human-like lane change behavior. Humans are the experts of the driving policy. They know the environment and how to act even in the simulation (we can assume simulation as game). They learn how to drive by another humans’ feedback. They get better and better on driving and they can drive safe without experimenting any crash. Human in the loop reinforcement learning methods called with different names such as reward shaping [19] and inverse reinforcement learning [1]. Reward shaping can be done by different kind of feedM. C. Güney (B) · Y. Genç Computer Engineering Department, Gebze Technical University, Kocaeli, Turkey e-mail: [email protected] Y. Genç e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_45
593
594
45 Capturing Reward Functions for Autonomous Driving …
backs, such as grounded languages [18] or scalar valued feedbacks [9]. Scalar valued feedback (binary feedback) such as positive and negative allows getting feedbacks from humans with not expertise in the task. In the Knox et al. [9] they proposed a method called TAMER framework that interactively shapes agents policy with scalar valued human feedbacks. They model the human feedback function and choosing actions based on the modeled feedback function. It’s efficient in the case of sample size than the other methods. In Warnell et al. [16] they propose an extension method to TAMER called Deep TAMER. The Deep TAMER uses CNNs to learn directly from high dimensional state spaces. They also use replay memory that used generally in deep reinforcement learning algorithms [11]. The Deep TAMER learns a successful policy in short amount of time in different tasks. In Arakawa et al. [2] they use both Deep TAMER and DQN with summing up their weighted estimated functions. They efficiently used both human feedback and environment rewards. In the Cristiano et al. [5] they used the human feedback to learn the human preferences and they used the learned human feedback function as reward function to train a Reinforcement Learning algorithm. In this work we pursue to integrate human feedback with learning to a drive policy directly from high dimensional state space to action efficiently. We think that there is a human can demonstrate how to drive and monitors the agents actions and provides scalar valued feedbacks such as “good” and “bad”. The human has a right to not give any feedback that he thinks if the model is doing good and if the human is not sure for the state. The overall flow of the proposed method shown in Fig. 45.1. The proposed method is an extension to Deep TAMER and TAMER frameworks to work in a driving simulation. We use “smooth feedbacks” instead direct human feedbacks to use the feedbacks efficiently in size and hesitate to give sharp feedbacks for situations that the agent can take different actions for same observations. There are also human demonstrations that accelerates the learning process and gives a chance to agent to see the proper actions if it couldn’t experiment that scene. We also use the random explorations method to overcome bias to specific actions and let the agent fail and learn. We also replay scene with starting environment from the last position to let agent overcome obstacles. We show that human feedback is enough to learn a basic driving policy and with human demonstrations, random explorations and smooth feedbacks the proposed method learns a better policy in less time than previous works.
45.2 Problem Formulation The problem is learning to drive directly from high dimensional state space efficiently using. We try to solve this problem with human feedback with extending Deep TAMER and TAMER frameworks. Differently from previous works we use techniques from successful reinforcement learning algorithms, we apply human feedback smoothly instead of weighting and we use the human demonstrations. We think that the agent life cycle is going through the sequences like a well known sequential decision-making problem. At every time step t the agent gets an observation ot ∈ O from the environment and takes an action at ∈ A from the action
45.2 Problem Formulation
595
Fig. 45.1 The figure of the proposed method in this work. The human plays the car as an agent and we collect its actions as positive feedbacks. Also, the human gives feedbacks as “good” and “bad” to the actions of agent and observations sequences shown in the interface. The feedback model Hˆ estimated from samples Ddemo and Dglobal . The policy is the maximum estimated feedback for an action
list and sends it to the environment. In classical Reinforcement Learning systems the environment returns a reward rt ∈ R to the agent to signal the current agent’s status and the aim of the agent is to maximize the discounted sum of the reward signal at each time step [12]. Different from reinforcement learning algorithms the observed reward not used to learn a policy. We assumed that there is a human also observes the observations at the same time actions taken by the agent. The human provides a scalar feedback h t to the current observation and the action. The human gives either positive or negative feedback. The goal is to maximize the human feedback for actions A. It’s almost impossible and inefficient to receive feedback from a human for every time step of the environment. The human feedback is sparse and it has delays. As in the previous works [16] we assume that there is a hidden function H : O × A → R that human knows naturally that ends up with a good policy that can drive. The optimal policy π : O → A to drive will be derived from the estimated feedback function Hˆ . Thus, the optimal policy equation will be as in: π(o) = max Hˆ (o, a) a
(45.1)
We can consider this problem as a supervised learning problem to learn the human feedback. Let o and (a) as sequence of observations and actions. The loss function to assess the quality of the estimation of human feedback will be look like this:
596
45 Capturing Reward Functions for Autonomous Driving …
L( Hˆ ; o, a, h) =
Σ
|| Hˆ (o, a) − h||2
(45.2)
o∈o,a∈a
The problem to maximize the driving policy turns in to minimization the expected of the loss function (45.2) which is the quality of estimation of the human feedback function that can produce successful policy to drive. [ ] Hˆ ∗ = arg min E o,a L( Hˆ ; o, a, h) Hˆ
(45.3)
In the next sections we will describe how we optimize the minimization problem that will turn in to solution of learning to drive in a simulation environment.
45.3 Method In this section we describe the techniques we use to get an optimal driving policy using the human feedback. First of all we optimize the loss function (45.3) using stochastic gradient descent (SGD) where η is the learning rate and the k is the update iteration, as in: Hˆ (o, a)k+1 = Hˆ (o, a)k − ηk ∇ Hˆ L( Hˆ k ; o, a, h)
(45.4)
In Eq. (45.4) h are collected feedbacks in time for observations o and actions a. Its optimized in every step k gradually using the SGD with apply the learning rate η like a supervised learning problem. Autoencoder In original TAMER framework [9] the agent uses the latest parameters of the model for the policy and updates the parameters immediately after receiving the feedback. The problem with the neural networks over high state space inputs the parameters are huge to adjust fast enough. So we use autoencoder like in previous works to pretrain the CNN layers and used as froze in estimation of Hˆ function. Explanation-based Learning Due to slow on learnings and limited amount of data it takes too much time to converge for a good policy for the agents. To increase human feedback data and make the agents collect same experiences that a human can do, we collect human demonstrations while they play the environment directly. The human actions a and the observations o with assuming positive feedback for taken actions and negative for not taking actions as sequence of sets stored in Ddemo . Randomly sampled sets from Ddemo used to initialize the parameters of the human feedback estimation model with pretraining. Then after each j update we train the model with subsampled sets from Ddemo . Random Explorations The environment is appropriate to bias agents to actions that makes the agent go straight. When a policy stuck at local maximum it is too hard to get over from it even using the demonstrations. For better exploration we use epsilon greedy method which is well known in Reinforcement Learning from the DQN [12].
45.4 Experiments
597
Basically we chose action uniformly random based on the epsilon probability which reduces in time or from the actual policy. We also reset the exploration at every step l to overcome the bias into states that we don’t want. Feedback Replay Buffer From the DQN [12] the feedback replay buffer makes the learning robust and quicker for Reinforcement Learning algorithms. As in the previous works we store the all the observations o, actions a and feedbacks h sequences in set of Dglobal . We uniformly random sampled from the Dglobal while we are optimizing the policy. Replay Scene Because there are too many states to starts for agent, it is hard to overcome its problems seeing only one time. We start environment in the next episode from the last positive rewarded state. So that forces the agent to replay the same scene until get a positive reward r from the environment. This helps humans to give more feedback h for problematic states and slightly overfits to the bad states. Smooth Feedback All the decisions made by human to give feedback could not be sharp in every state. For example, when transitioning from a left turn to straight the agent could take left action and straight action. We assume that taken action is also can be possible for the next state and the previous state with less probability. So for the human feedback this is also can be assumed for human feedback. By this assumption we apply smooth version of the feedback to previous and to the next state when feedback received. Let’s assume we get feedback h at time step t for action at and observation ot . We also collect the another set for t − 1 and t + 1 states with N (h) feedback for observations ot−1 , ot+1 and actions at−1 , at+1 . At the end we had three sequence of feedback from single human feedback with smoothed by Gaussian. In summary from top to end; We first pretrain the CNN layers using autoencoder with environment states. We collect human demonstrations as positive feedback sequences in Ddemo to use them as regularizer with training uniformly sampled at every time step j. During the training model agent always uses the latest parameters human feedback estimation function Hˆ . We use epsilon greedy while we chose the taken actions. The observation, action and feedback (o, a, h) sequences collected in Dglobal to use in training at every feedback and for some interval. We always start agent from the last positive rewarded state to slightly bias over problematic states. We use apply Gaussian filter to received feedback to make it smooth and increase the collected sequence size.
45.4 Experiments In this section we will describe the driving environment, the user interface to collect feedback, how we evaluate the proposed method and the results. We try methods experimentally in a 2D self-driving car simulation. Its based on CARLA [7] simulation which is 2D simulation but we interfaced it through 2d scenes. The environment usage is Open AI gym [4] compatible, so we could use it with methods can run on gym environments.
598
45 Capturing Reward Functions for Autonomous Driving …
We compare the proposed method with Deep TAMER [16] and DQN [12] methods in different scenarios in the driving environment. In summary of the results, the proposed algorithm converges fast and better than Deep TAMER and DQN methods in most driving scenarios.
45.4.1 The Environment In this work our environment is a driving simulation. Simulations are important parts of self-driving cars on research. Simulation makes it easy to develop and evaluate driving systems. We see that researchers uses the simulations for training reinforcement learning models to drive, creating driving datasets and transferring the learned knowledge to real world [8]. We build environment over CARLA Simulator [7] because it has realistic dynamics and sensors that we need. It supports both RGB camera and semantic segmented camera in Cityscapes palette [6]. CARLA includes various maps, from simple towns to complex highways. We chose a simple town for this work. The town includes junctions, curved roads and separated roads by lines. In this work for the environment, the main idea is to keep it simple. Because reinforcement learning models needs huge amount of training time to converge. Having a complex simulation will result with complex and huge models that need computation power. To achieve that simplicity, we try to build 2d like environment. CARLA does not support a 2d sensor. So we put the camera sensor on top of the car to see it as bird’s eye view. The front part of the car has 2/3 of image and the pack part 1/3. The camera always follows the car, even the rotations of the car applied to the camera. Thus, the car always looks the same way through. The camera sensor isn’t that simple for a model even with 2d view. There are lots of objects needs to be detected by model. We use the semantic segmentation sensor on the same position of the camera sensor. In the default sensor, there are 12 classes. We reduced it into 6 classes. These are road, lines, out of road, car, pipes and unrecognized. The semantic segmented image showed in Fig. 45.2. The car is the agent of our models. It has only 3 actions we set. These are coast, left and right move with half degree. The velocity of the agent is static and it’s 20 km/h. With these speed and moving it can easily turn corners and junctions. There is no break as action so it keeps moving. The environment don’t have any pedestrian or cars to traffic. We did not look for any traffic rules except keeping in the right lane. Reward mechanisms are an important part of the environments. Our main goal is the keep the car in the center of the road. From the CARLA API we can get the closes’ waypoint in the center of the road. So the agent gets positive reward in the center of the road, less and negative rewards based on the distance to the center of the road.
45.4 Experiments
599
Fig. 45.2 The user interfaces. On left we see how the camera sensor located, on right we saw semantic segmented version of the sensor that the models observes. At the end of image the interface shows the current action of the agent and the count of feedbacks given
45.4.2 The Interface In this work there are two interfaces, one for human plays to collect demonstrations and the two that collect human feedbacks for the actions of agents. The interface to collect demonstrations let humans control the car through keyboard arrows like a racing game. The human acts like an agent, and there is no difference for humans in actions from agents. They also have only three actions coast, left and right. The car gets speed by itself, humans only controls the steering through actions. The feedback interface collect feedbacks in parallel from human while the latest agent running on the environment. Human watches agent actions from the camera sensor on top of it. Human see both RGB camera and semantically segmented camera. The interface includes the actions taken at that time. Also, we can see how much feedback given. The feedback doesn’t collect by two buttons as positive and negative. Because people often can give direct action from the view of their position. So we get the action should be taken at the step from humans. If the action feedback same with the action of the agent it collected as positive feedback, if the action feedback is different from the action of the agent it collected as negative feedback.
600
45 Capturing Reward Functions for Autonomous Driving …
45.4.3 Evaluation We evaluate the performance of the proposed model on the environment we explained. We also compare the model with DQN, Deep TAMER and random baseline. We used the stablebaseline3 [13] implementation of the DQN model. The learning parameters are from the Leurent’s 2D highway environment [10] in that work DQN model converges and learns the driving policy in a highway environment. The environment doesn’t continue if the agent is in out of lane, so the steps is a useful metric to evaluate the agents’ performance. The reward is not that useful because it’s strict to the center of the lane, so unless if a car is not in the center it will get negative reward even it’s in the lane. So that is not a useful metric to evaluate and compare the performances of agents. The task is very easy for humans but it’s challenging for agents to learn a policy in a short time. As we see in the Fig. 45.3 and like we expected the DQN failed to learn a policy in the time the proposed solution learned a policy that can drive without crash. The DQN is failed because we train models in only 100 episodes and it requires huge amount of experiments to achieve successful policies [3]. Also, we see that the Deep TAMER has signs to learn a policy but it’s not as good as the proposed method. This is because it does not have enough human feedback for each scenario and couldn’t explore as much as proposed solution. In Fig. 45.4 we saw that our method learn a policy to turn left and right corners successfully. But others couldn’t achieve it. On the straight road scene, all the methods
Fig. 45.3 Average steps size per episode. Our method gets enough longer steps to prove its learned a useful driving policy. Others are around the random baseline
45.5 Conclusion
601
Fig. 45.4 Average steps size for each scene per episode. We see that in the left scene and in the right scene our method is learned a useful policy to go long enough. In the straight scene our method faster to converge than others. In the intersection scene none of the models succeed to find an optimal policy
learn to go straight at some point, but ours learned faster than other methods, and it keeps the learned policy. In the intersection scene none of the methods accomplish to learn a successful policy.
45.5 Conclusion In this work, we build a learning system that learns a policy to drive in a simulated environment using human feedback as an extension of Deep TAMER and TAMER frameworks. The proposed model learns to drive from high dimensional state space end to end manner just using human feedback instead of inefficient number of recurrence/experiments and a static reward function. In this work we also improve the human effort and use the human feedback and demonstrations efficiently. In summary the extensions to previous works; We let humans to play the environment and demonstrate a successful driving. We use smooth feedbacks instead using the feedback direct and sharp. With that we had opportunity to increase feedback sequences in size. We used random explorations to prevent biases to some actions occurs more than others. At last we apply replay scene mechanism to partially overfit to scenarios that model couldn’t learn a policy with fewer data.
602
45 Capturing Reward Functions for Autonomous Driving …
We compare the proposed method with previous works and DQN in a 2D driving simulation. The environment is derived from CARLA simulation and the map contains left turn, right turn, straight and intersection scenarios. The proposed method learns a better driving policy fast and robust from the other works.
References 1. Abbeel P, Ng AY (2010) Inverse reinforcement learning. Springer US, Boston, MA, pp 554– 558. https://doi.org/10.1007/978-0-387-30164-8_417 2. Arakawa R, Kobayashi S, Unno Y, Tsuboi Y, Maeda S (2018) Dqn-tamer: human-in-the-loop reinforcement learning with intractable feedback. ArXiv: abs/1810.11748 3. Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: ICML 4. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym 5. Christiano P, Leike J, Brown TB, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In: NIPS 6. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR) (Jun 2016). https://doi. org/10.1109/cvpr.2016.350 7. Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16 8. Janai J, Güney F, Behl A, Geiger A (2020) Computer vision for autonomous vehicles: problems, datasets and state of the art. Foundat Trends®in Comput Graph Vis 12(1-3):1–308. https://doi. org/10.1561/0600000079 9. Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the tamer framework. In: Proceedings of the fifth international conference on knowledge capture. K-CAP ’09, Association for Computing Machinery, New York, NY, USA, pp 9–16. https://doi.org/10. 1145/1597735.1597738 10. Leurent E (2018) An environment for autonomous driving decision-making. https://github. com/eleurent/highway-env 11. Liu R, Zou JY (2018) The effects of memory replay in reinforcement learning. In: 2018 56th annual allerton conference on communication, control, and computing (Allerton), pp 478–485 12. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning 13. Raffin A, Hill A, Ernestus M, Gleave A, Kanervisto A, Dormann N (2019) Stable baselines3. https://github.com/DLR-RM/stable-baselines3 14. Sallab AE, Abdou M, Perot E, Yogamani S (2016) End-to-end deep reinforcement learning for lane keeping assist 15. Sharifzadeh S, Chiotellis I, Triebel R, Cremers D (2016) Learning to drive using inverse reinforcement learning and deep q-networks. arXiv preprint arXiv:1612.03653 16. Warnell G, Waytowich NR, Lawhern V, Stone P (2018) Deep tamer: interactive agent shaping in high-dimensional state spaces. ArXiv: abs/1709.10163 17. Watkins CJCH, Dayan P (1992) Q-learning. In: Machine learning, pp 279–292 18. Waytowich NR, Barton SL, Lawhern V, Warnell G (2019) A narration-based reward shaping approach using grounded natural language commands. ArXiv: abs/1911.00497 19. Wiewiora E (2010) Reward shaping. Springer US, Boston, MA, pp 863–865. https://doi.org/ 10.1007/978-0-387-30164-8_731
Chapter 46
Unpredictable Solutions of a Scalar Differential Equation with Generalized Piecewise Constant Argument of Retarded and Advanced Type Marat Akhmet, Duygu Aru˘gaslan Çinçin, Zakhira Nugayeva, and Madina Tleubergenova
46.1 Introduction Differential equations and their theory are of vital importance in modeling real life problems. It provides to understand, conceive and interpret the dynamical properties of many real processes in biology, medicine, mechanics, physics, economics, electronics and so on. In line with the new needs of mathematical modeling, this theory has been developed intensively. Differential equations including discontinuous effects have arised as a result of these needs. Then, it has become possible to describe and analyze more natural characteristics of the real problems. Differential equations involving piecewise constant argument (PCA) [1] fall within the category of differential equations with discontinuities [2, 3]. This category of differential equations has been generalized in [4, 5], where piecewise constant functions of any kind, referred to as piecewise constant argument of generalized type (PCAG), are taken as arguments. Although most of the existing results for differential equations M. Akhmet (B) Department of Mathematics, Middle East Technical University, Ankara, Turkey e-mail: [email protected] D. A. Çinçin Department of Mathematics, Süleyman Demirel University, Isparta 32260, Turkey e-mail: [email protected] Z. Nugayeva · M. Tleubergenova Institute of Information and Computational Technologies CS MES RK, Almaty 050000, Kazakhstan e-mail: [email protected] M. Tleubergenova e-mail: [email protected] Department of Mathematics, K. Zhubanov Aktobe Regional University, Aktobe 030000, Kazakhstan © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_46
603
604
46 Unpredictable Solutions of a Scalar Differential Equation …
having PCA was obtained using reduction to discrete equations or numerical methods [1, 6], equivalent integral equations for the examination of differential equations including PCAG was used first time in [4, 5]. With the help of using equivalent integral equations, qualitative behaviour of EPCAG have begun to be studied in a more generic form. There exist several theoretical and applied researches on this subject [7–21]. Unpredictable functions have been defined in [22] and developed in [23–32]. After they are adapted to differential equations, it is seen that they are very useful for the simplification of the chaos analysis via differential equations. Differential equations theory focuses mainly on the oscillation phenomenon due to the needs of science and technology. Accordingly, solutions being periodic, quasi-periodic as well as almost periodic are studied extensively by the researchers [19, 33–36]. In fact, unpredictable functions can be considered as a new type of oscillations [22]. Thereafter, it was shown that, for a differential equation, an unpredictable solution’s existence proves Poincaré chaos presence in the case of a distinctive dynamic within a functional space [22, 24]. As a consequence of this important feature, investigations performed on unpredictable solutions are as valuable as studying chaos. In the current work, we aim at studying unpredictable motions of a scalar differential equation which has mixed type PCAG, being both retarded and advanced. Since differential equations with PCAG play a considerable role in applications such as population dynamics, neural networks and mechanics [3, 10, 11, 16–18, 37–39], it is noteworthy to combine these equations with chaos notion.
46.2 Preliminaries Take N, R, Z to signify respectively the entire natural numbers set, real numbers set, integers. We specify sequences {τi }i∈Z and {ζi }i∈Z , which are accepted to be real valued and satisfy τi ≤ ζi ≤ τi+1 for all i ∈ Z, |τi | → ∞ as |i| → ∞. The main scope of our study is the following scalar differential equation with PCAG of mixed type x ' (t) = ax(t) + f (x(t)) + g(x(γ (t))) + h(t),
(46.1)
where t, x ∈ R, a is a negative constant and γ (t) = ζi for the values of t lying in the half open interval [τi , τi+1 ), i ∈ Z; f, g : D → R are continuous functions on an open interval D = (−H, H ), H > 0. Besides, the function h : R → R is supposed to be bounded and uniformly continuous. The definition given below is very important for our research. The main results depend on this definition which identifies a real valued unpredictable function and thus it can be considered as a birth of the present research [22]. Definition 46.1 A function α : R → R, which is continuous uniformly and bounded, is unpredictable if one can find positive constants e0 , σ and sequences tn , u n , both diverging to ∞, such that α(t + tn ) → α(t) uniformly when n → ∞ on the compact subsets of R and e0 ≤ |α(t + tn ) − α(t)| for each t placed in [u n − σ, u n + σ ], n ∈ N.
46.3 Results on Unpredictable Solutions
605
The undermentioned conditions are assumed to be fulfilled throughout the entire study. (A1) | f (u) − f (v)| ≤ l1 |u − v| and |g(u) − g(v)| ≤ l2 |u − v| for all u, v ∈ D, where l1 , l2 are Lipschitz constants; (A2) supx∈D | f (x)| ≤ m f and supx∈D |g(x)| ≤ m g for positive numbers m f and mg; (A3) supt∈R |h(t)| ≤ m h for a positive number m h ; 1 (A4) − (m f + m g + m h ) < H ; a 1 (A5) − (l1 + l2 ) < 1; a (A6) there exists a number τ > 0 satisfying τi+1 − τi ≤ τ for all i ∈ Z. Meanwhile, we shall adopt the following notation throughout this study: K=
1 ). ( 1 − τ (−a + l1 )(1 + l2 τ )e(−a+l1 )τ + l2
(A7) a (+ l1 + Kl2 < 0; ) (A8) τ (−a + l1 )(1 + l2 τ )e(−a+l1 )τ + l2 < 1; (A9) a sequence ηn can be found with ηn → ∞ as n → ∞ such that τi−ηn + tn − τi → 0 and ζi−ηn + tn − ζi → 0 as n → ∞ over any finite interval consisting of the integers, for tn stipulated by Definition 46.1.
46.3 Results on Unpredictable Solutions Let B denote the set of real valued functions ϒ : R → R with ||ϒ||1 = supt∈R |ϒ(t)| . In what follows, a function ϒ which lies in B is assumed to satisfy the properties given below: (B1) ϒ is uniformly continuous; (B2) ||ϒ||1 < H ; (B3) ∃ a sequence tn with tn → ∞ as n → ∞ such that ϒ(t + tn ) → ϒ(t) uniformly on any closed, bounded interval of R. The next assertion, which will be very useful in our work, can be stated on the lines of the results in [40]. Lemma 46.1 On R, a bounded function x(t) is a solution of (46.1) if the equality x(t) =
.t ( −∞
) f (x(s)) + g(x(γ (s))) + h(s) ea(t−s) ds.
(46.2)
606
46 Unpredictable Solutions of a Scalar Differential Equation …
is satisfied by x(t). We set an operator T on B as indicated below T ϒ(t) =
.t (
) f (ϒ(s)) + g(ϒ(γ (s))) + h(s) ea(t−s) ds.
−∞
Next, we prove that this operator T is invariant in B. Lemma 46.2 T is an invariant operator in the set B. Proof We shall show that T B is a subset of B. At first, we calculate the derivative of T ϒ(t) with regard to the quantity t: dT ϒ(t) = f (ϒ(t)) + g(ϒ(γ (t))) + h(t) dt .t ( ) +a f (ϒ(s)) + g(ϒ(γ (s))) + h(s) ea(t−s) ds. −∞
It follows from the last equality that | | | dT ϒ(t) | | | | dt | ≤ | f (ϒ(t))| + |g(ϒ(γ (t)))| + |h(t)| .t ( ) | f (ϒ(t))| + |g(ϒ(γ (t)))| + |h(t)| ea(t−s) ds −a −∞
) ( ≤ 2 m f + mg + mh dT ϒ(t) for all t ∈ R. We see that the derivative is bounded, which implies in turn dt that T ϒ is uniformly continuous. Hence, property (B1) holds true for T ϒ. Furthermore, ϒ ∈ B satisfies the following inequality |T ϒ(t)| ≤
.t (
) | f (ϒ(s))| + |g(ϒ(γ (s)))| + |h(s)| ea(t−s) ds
−∞
.t ≤ −∞
( f ) ) 1( m + m g + m h ea(t−s) ds = − m f + m g + m h . a
Due to the condition (A4), we have ||T ϒ||1 < H. Therefore, property (B2) is valid for T ϒ.
46.3 Results on Unpredictable Solutions
607
Now, we shall center on the property (B3). With this purpose in mind, we are supposed to prove that there is a sequence tn with tn → ∞ when n → ∞ such that for every T ϒ ∈ B, T ϒ(t + tn ) → T ϒ(t) uniformly on each interval of R, which is closed and bounded. We choose a random number ε > 0 and an interval [a0 , b] ⊂ R with b > a0 . It suffices to find that |T ϒ(t + tn ) − T ϒ(t)| < ε for sufficiently large n and t ∈ [a0 , b]. Let us take two constants c and e with c < a0 , e > 0 such that the following inequalities are satisfied: −
) 2( ε l1 H + l2 H + m h ea(a0 −c) < , a 4
(46.3)
ε e (1 + l1 ) < , a 4
(46.4)
ε 2( p + 1)l2 e (1 − eaτ ) < . a 4
(46.5)
2 pl2 H −ae ε (e − 1) < . a 4
(46.6)
− −
−
If we pick n sufficiently large in order that |ϒ(t + tn ) − ϒ(t)| < e and |h(t | | + tn ) − h(t)| < e for the values of t belonging to [c, b], as well |τk−ηn + tn − τk | < e for τk ∈ [c, b], k ∈ Z, then the following inequality . n( | t+t ) | f (ϒ(s)) + g(ϒ(γ (s))) + h(s) ea(t+tn −s) ds |T ϒ(t + tn ) − T ϒ(t)| = | −∞
−
.t (
| ) | f (ϒ(s)) + gϒ(γ (s))) + h(s) ea(t−s) ds |
−∞
| . t (( ) | f (ϒ(s + tn )) − f (ϒ(s)) =| −∞
) | ( ) | + g(ϒ(γ (s + tn ))) − g(ϒ(γ (s))) + h(s + tn ) − h(s) ea(t−s) ds | ≤
.t (
l1 |ϒ(s + tn ) − ϒ(s)|
−∞
) + l2 |ϒ(γ (s + tn )) − ϒ(γ (s))| + |h(s + tn ) − h(s)| ea(t−s) ds
is fulfilled. Dividing the last integral given on (−∞, t] as a sum of two integrals over the intervals (−∞, c] and [c, t], we attain
608
46 Unpredictable Solutions of a Scalar Differential Equation … .c ( l1 |ϒ(s + tn ) − ϒ(s)|
|T ϒ(t + tn ) − T ϒ(t)| ≤
−∞
) + l2 |ϒ(γ (s + tn )) − ϒ(γ (s))| + |h(s + tn ) − h(s)| ea(t−s) ds +
.t (
l1 |ϒ(s + tn ) − ϒ(s)|
c
) + l2 |ϒ(γ (s + tn )) − ϒ(γ (s))| + |h(s + tn ) − h(s)| ea(t−s) ds ≤−
.t ) 2( l1 H + l2 H + m h ea(a0 −c) + (1 + l1 )eea(t−s) ds a c
.t l2 |ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds
+ c
) 1 2( l1 H + l2 H + m h ea(a0 −c) − (1 + l1 )e a a .t + l2 |ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds.
≤−
c
We fix t ∈ [a0 , b] and assume without losing the generality that τk ≤ τk−ηn + tn and τk ≤ τk−ηn + tn = c < τk+1 < τk+2 < · · · < τk+ p ≤ τk+ p−ηn + tn ≤ t < τk+ p+1 so that the interval [c, t] consists of precisely p discontinuity moments τk+1 , τk+2 , . . . , τk+ p . We will estimate an upper bound for the integral .t I1 :=
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds. c
Let us represent the definite integral I1 given on the interval [c, t] as a sum of finite number of integrals on 2 p + 1 subintervals [c, τk+1 ], [τk+1 , τk+1−ηn + tn ], . . . , [τk+ p−ηn + tn , t]: .τk+1 I1 = |ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds c τk+1−η . n +tn
+
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds τk+1
.τk+2 + τk+1−ηn +tn
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds
46.3 Results on Unpredictable Solutions
609
.. . .t +
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds
τk+ p−ηn +tn
.τi+1
.
k+ p−1
=
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds
i=k τ i−ηn +tn τi+1−η . n +tn
.
k+ p−1
+
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds
i=k
τi+1
.t |ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds.
+ τk+ p−ηn +tn
If we use, for convenience, the following notations .τ j+1 A j :=
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds τ j−ηn +tn
and
τ j+1−η . n +tn
B j :=
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds τ j+1
for every j being an element of the set {k, k + 1, . . . , k + p − 1}, then we can express the integral I1 as noted below: .
k+ p−1
I1 =
j=k
.
.t
k+ p−1
Aj +
j=k
Bj +
|ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds.
τk+ p−ηn +tn
If t belongs to the interval [τ j−ηn + tn , τ j+1 ) for some j ∈ Z, then we have γ (t) = ζ j . Besides, the condition (A9) brings about the equality γ (t + tn ) = ζ j+ηn . Hereby, we get
610
46 Unpredictable Solutions of a Scalar Differential Equation …
.τ j+1 Aj =
|ϒ(ζ j+ηn ) − ϒ(ζ j )|ea(t−s) ds τ j−ηn +tn
.τ j+1 =
|ϒ(ζ j + tn + o(1)) − ϒ(ζ j )|ea(t−s) ds τ j−ηn +tn
.τ j+1 =
|ϒ(ζ j + tn ) − ϒ(ζ j ) + ϒ(ζ j + tn + o(1)) − ϒ(ζ j + tn )|ea(t−s) ds τ j−ηn +tn
≤
.τ j+1 ( ) |ϒ(ζ j + tn ) − ϒ(ζ j )| + |ϒ(ζ j + tn + o(1)) − ϒ(ζ j + tn )| ea(t−s) ds τ j−ηn +tn
≤
.τ j+1 ( ) e + |ϒ(ζ j + tn + o(1)) − ϒ(ζ j + tn )| ea(t−s) ds. τ j−ηn +tn
Resulting from the uniform continuity of the function ϒ, for e > 0 and large enough n, one can find a δ > 0 such that |ϒ(ζ j + tn + o(1)) − ϒ(ζ j + tn )| < e whenever |ζ j+ηn − ζ j − tn | < δ. As a consequence, we find that .τ j A j ≤ 2e
ea(t−s) ds ≤ −
τ j−1−ηn +tn
2e (1 − eaτ ) a
for each j ∈ {k, k + 1, . . . , k + p − 1}. Applying an approach similar to that used for the estimation of the integral A j , it yields .t |ϒ(γ (s + tn )) − ϒ(γ (s))|ea(t−s) ds ≤ − τk+ p−1−ηn +tn
2e (1 − eaτ ). a
In addition, we obtain for each j ∈ {k, k + 1, . . . , k + p − 1} that τ j−η . n +tn
B j ≤ 2H
ea(t−s) ds ≤ − τj
2H −ae (e − 1) a
by virtue of the assumption (A9). Thus, we have I1 ≤ −
2( p + 1)e 2 p H −ae (1 − eaτ ) − (e − 1). a a
46.3 Results on Unpredictable Solutions
611
After all of these estimations, it is true for every t ∈ [a0 , b] that ) 2( e l1 H + l2 H + m h ea(a0 −c) − (1 + l1 ) a a 2 p H l2 −ae 2( p + 1)l2 e − (1 − eaτ ) − (e − 1). a a
|T ϒ(t + tn ) − T ϒ(t)| ≤ −
Based on the inequalities (46.3)–(46.6), we conclude that |T ϒ(t + tn ) − T ϒ(t)| < ε, ∀t ∈ [a0 , b]. This means that the function T ϒ satisfies the property (B3) as desired. Since the properties (B1), (B2) and (B3) are come true for T , we see that it is an invariant . operator in B. Next, we proceed to verify that T : B → B is a contraction operator. Lemma 46.3 The operator T from B to B is contractive. Proof For ϒ1 , ϒ2 ∈ B and t ∈ R, one can notch up that .t ( ) l1 |ϒ1 (s) − ϒ2 (s)| + l2 |ϒ1 (γ (s)) − ϒ2 (γ (s))| ea(t−s) ds |T ϒ1 (t) − T ϒ2 (t)| ≤ −∞ .t
≤
( ) l1 ||ϒ1 (s) − ϒ2 (s)||1 + l2 ||ϒ1 (s) − ϒ2 (s)||1 ea(t−s) ds
−∞
1 ≤ − (l1 + l2 ) ||ϒ1 (t) − ϒ2 (t)||1 . a Thus, we have
1 ||T ϒ1 − T ϒ2 ||1 ≤ − (l1 + l2 ) ||ϒ1 − ϒ2 ||1 a
∀t ∈ R. Since the inequality defined by (A5) indicates that T : B → B is a contraction operator on B, the proof is completed. .
The following result provides us noteful information for the stability analysis. Lemma 46.4 [2] Let the conditions (A1), (A6), (A8) be valid and z(t) be a continuous function with ||z(t)||1 < H . If v(t) is a solution of v ' (t) = av(t) + f (v(t) + z(t)) − f (z(t)) + g(v(γ (t)) + z(γ (t))) − g(z(γ (t))),
(46.7)
612
46 Unpredictable Solutions of a Scalar Differential Equation …
then the following inequality |v(γ (t))| ≤ K|v(t)|
(46.8)
is true for each real number t. [ Proof Let t belong to the interval τi , τi+1 ) for some i ∈ Z. We may have either τi ≤ ζi ≤ t < τi+1 or τi ≤ t < ζi < τi+1 . If τi ≤ ζi ≤ t < τi+1 is the case, then it follows that v(t) = v(ζi ) +
.t ( ) av(s) + f (v(s) + z(s)) − f (z(s)) + g(v(ζi ) + z(ζi )) − g(z(ζi )) ds ζi
≤ |v(ζi )| +
.t (
) − a|v(s)| + l1 |v(s)| + l2 |v(ζi )| ds
ζi
.t (−a + l1 ) |v(s)|ds.
≤ |v(ζi )|(1 + l2 τ ) + ζi
The Gronwall-Bellman Lemma yields the inequality |v(t)| ≤ |v(ζi )|(1 + l2 τ )e(−a+l1 )τ . Moreover, we have v(ζi ) = v(t) −
.t ( ) av(s) + f (v(s) + z(s)) − f (z(s)) + g(v(ζi ) + z(ζi )) − g(z(ζi )) ds. ζi
Thus, |v(ζi )| ≤ |v(t)| +
.t (
) − a|v(s)| + l1 |v(s)| + l2 |v(ζi )| ds
ζi
.t ( ) (−a + l1 )(1 + l2 τ )e(−a+l1 )τ |v(ζi )| + l2 |v(ζi )| ds ≤ |v(t)| + ζi
) ( ≤ |v(t)| + τ (−a + l1 )(1 + l2 τ )e(−a+l1 )τ + l2 |v(ζi )|. [ We deduce from the condition (A8) that |v(ζi )| ≤ K|v(t)| for each t ∈ τi , τi+1 ) , i ∈ Z. In fact, (46.8) holds whenever τi ≤ ζi ≤ t < τi+1 , i ∈ Z. If we treat the other case τi ≤ t < ζi < τi+1 , i ∈ Z, in a similar way, the same result will follow. Thus, (46.8) . is satisfied for each real number t.
46.3 Results on Unpredictable Solutions
613
The following theorem, which can be considered as the main objective of this study, is related to the results on the existence and uniqueness of an unpredictable solution being exponentially stable. Theorem 46.1 Let the conditions (A1)–(A9) be satisfied and let h be an unpredictable function. Then, there is a unique unpredictable solution of Eq. (46.1) and it is exponentially stable. Proof The formal argument proceeds in several steps. First of all, completeness of the set B will be considered. Denote a Cauchy sequence in B by φk (t), where the limit of φk (t) as k tends to ∞ is φ(t) on the real axis. We can say that φ(t) is a uniformly continuous and bounded function [40], which means that (B2) and (B3) have been already achieved by φ(t). We are reduced to proving that (B3) is satisfied by φ(t) as well. Note that we have the following inequality |φ(t + tn ) − φ(t)| ≤ |φ(t + tn ) − φk (t + tn )| + |φk (t + tn ) − φk (t)| + |φk (t) − φ(t)|. Consider a closed, bounded interval I on the real axis. In the case of a sufficiently small ε > 0 and t ∈ I , each difference in absolute value on the right of the previous inequality can be made smaller than 3ε so that we get |φ(t + tn ) − φ(t)| < ε on the interval I. Hence, the sequence of the functions φ(t + tn ) converges uniformly to the limit function φ(t) on I, which approves that the space B is complete. Recall that T is an invariant and contractive operator in B on the grounds of Lemmas 46.2 and 46.3, respectively. Owing to the Banach fixed point theorem, it is concluded that we have only one fixed point z(t) ∈ B of the operator T. In fact, it is the unique solution of the scalar Eq. (46.1) and this proves the uniqueness of the solution. In the coming step, we will show that this unique solution is an unpredictable function. It is possible for one to find numbers n 1 , n 2 ∈ N and β > 0 so as to satisfy the inequalities β < σ,
(46.9)
( 2 1) 3 1 ≥ β (a − l1 )( + ) − 2l2 + , n1 n2 2 2n 1
(46.10)
and |z(t + s) − z(t)| < e0 min{
1 1 , }, t ∈ R, |s| < β. 4n 1 n 2
(46.11)
Let β, n 1 , n 2 and a natural number n be fixed. We define Δ = |z(u n + tn ) − z(u n )|. In e0 e0 what follows, we will take the cases Δ ≥ and Δ < into consideration separately. n1 n1 e0 . Then, we get (a) Consider first the case when Δ ≥ n1
614
46 Unpredictable Solutions of a Scalar Differential Equation …
|z(t + tn ) − z(t)| ≥ |z(u n + tn ) − z(u n )| − |z(u n ) − z(t)| e0 e0 1 e0 − − = e0 − |z(t + tn ) − z(u n + tn )| > n1 4n 1 4n 1 2n 1 for u n − β ≤ t ≤ u n + β, n ∈ N. e0 . In this case, (46.11) leads to (b) Consider second the case when Δ < n1 |z(t + tn ) − z(t)| ≤ |z(u n + tn ) − z(u n )| + |z(u n ) − z(t)| e0 e0 1 2 e0 + + = ( + )e0 + |z(t + tn ) − z(u n + tn )| < n1 n2 n2 n1 n2
for u n ≤ t ≤ u n + β. It is obvious that z(t) satisfies the integral equation .t ( ) az(s) + f (z(s)) + g(z(γ (s))) + h(s) ds. z(t) = z(u n ) + un
On the basis of this integral equation, we can also write that z(t + tn ) = z(u n + tn ) +
.t ( ) az(s + tn ) + f (z(s + tn )) + g(z(γ (s + tn ))) + h(s + tn ) ds. un
Thus, we obtain the following difference .t z(t + tn ) − z(t) = z(u n + tn ) − z(u n ) + a
(z(s + tn ) − z(s))ds un
.t ( f (z(s + tn )) − f (z(s)))ds
+ un
.t
.t (g(z(γ (s + tn ))) − g(z(γ (s))))ds +
+ un
(h(s + tn ) − h(s))ds, un
which implies in turn that |z(t + tn ) − z(t)| ≥ −|z(u n + tn ) − z(u n )| .t
.t |z(s + tn ) − z(s)|ds −
+a un
| f (z(s + tn )) − f (z(s))|ds un
.t
.t |g(z(γ (s + tn ))) − g(z(γ (s)))|ds +
− un
≥−
|h(s + tn ) − h(s)|ds un
e0 1 2 1 2 + aβ( + )e0 − l1 β( + )e0 n1 n1 n2 n1 n2
46.3 Results on Unpredictable Solutions
615
.t − l2
|z(γ (s + tn )) − z(γ (s))|ds +
β e0 2
un
for u n + β2 ≤ t ≤ u n + β. Denote
.t
I2 =
|z(γ (s + tn )) − z(γ (s))|ds. un
Let us fix t ∈ [u n + β2 , u n + β]. We can choose β small enough in order that τi−ηn + tn ≤ u n < u n + β2 ≤ t ≤ u n + β < τi+1 for a certain integer i. Since γ (t) = ζi is satisfied for the values of t in the interval [u n + β2 , u n + β], the equality γ (t + tn ) = ζi+ηn also holds due to the condition (A9). Note that the solution z(t) is uniformly continuous since it lies in the set B. It can be inferred that for large enough n and e0 > 0, there is a positive number δ such that |z(ζi+ηn ) − z(ζi )| ≤ |z(ζi + tn ) − z(ζi )| + |z(ζi + tn + o(1)) − z(ζi + tn )| < 2e0 whenever |ζi+ηn − ζi − tn | < δ. As a result, we find that I2 ≤ 2βe0 . Using inequality (46.10), we conclude that 1 2 1 2 β e0 + a( + )βe0 − l1 ( + )βe0 − 2l2 βe0 + e0 n1 n1 n2 n1 n2 2 e0 3e0 e0 ≥ . ≥− + n1 2n 1 2n 1
|z(t + tn ) − z(t)| ≥ −
β Hence, the solution z(t) is unpredictable with u n = u n + 3β 4 and σ = 4 . Our last task will be related to the stability analysis of the solution z(t). Let v(t) = y(t) − z(t), where y(t) denotes any other solution of the Eq. (46.1). One can check that v(t) is a solution of the Eq. (46.7). Therefore, the inequality
|v(t)| ≤ e
a(t−t0 )
.t |v(t0 )| +
e
a(t−s)
( ) l1 |v(s)| + l2 |v(γ (s))| ds
(46.12)
t0
is valid. If we use the result of Lemma 46.4 in (46.12), we arrive at the inequality given by
|v(t)| ≤ e
a(t−t0 )
.t |v(t0 )| +
ea(t−s) (l1 + Kl2 )|v(s)|ds. t0
616
46 Unpredictable Solutions of a Scalar Differential Equation …
This comes to mean e−at |v(t)| ≤ e−at0 |v(t0 )| + (l1 + Kl2 )
.t eas |v(s)|ds. t0
If we utilize the Gronwall-Bellman Lemma [40], we obtain that |v(t)| ≤ |v(t0 )|e(a+l1 +Kl2 )(t−t0 ) . In other words, we have |y(t) − z(t)| ≤ |y(t0 ) − z(t0 )|e(a+l1 +Kl2 )(t−t0 ) .
(46.13)
Finally, it follows from the assumption (A7) that z(t), the unpredictable solution of (46.1), is exponentially stable. The proof is completed. .
46.4 Example with a Numerical Simulation To investigate the presence of an unpredictable solution, we need to consider the following logistic map λi+1 = μλi (1 − λi ),
(46.14)
where i ∈ Z. In the paper [22], it was proved that for each μ ∈ [3 + ( 23 )1/2 , 4], the map given by (46.14) owns an unpredictable solution. Let χi , t ∈ [i, i + 1), i ∈ Z, refer to an unpredictable solution of (46.14) with μ = 3.92. In what follows, we will utilize the unpredictable function .t τ (t) =
e−3(t−s) .(s)ds, t ∈ R,
−∞
with .(t) = χi for t ∈ [i, i + 1), i ∈ Z. The function is bounded on the whole real axis such that supt∈R |.(t)| ≤ 13 . The argument function γ (t) = ζk is defined by the sequences τk = k, ζk = τk +τk+1 + χk = 2k+1 + χk , k ∈ Z. 2 2 Consider the following scalar differential equation PCAG of mixed type x ' (t) = −0.4x(t) + 0.05 tanh + 12τ 3 (t) − 0.2.
( x(t) ) 8
+ 0.04 tanh
( x(γ (t)) ) 6
(46.15)
46.5 Conclusion
617
Fig. 46.1 Graph of the function ψ(t), which exponentially converges to the unpredictable solution x(t) of the Eq. (46.15)
Moreover, h(t) = 12τ 3 (t) − 0.2 is an unpredictable function in accordance with Lemmas 1.4 and 1.5 given in [25]. We can see that the conditions (A1)–(A9) are valid for the Eq. (46.15) with l1 = 0.00625, l2 = 0.00667, m f = 0.05, m g = 0.04, and moreover m h = 0.64, H = 1.8. Thus, by the Theorem 46.1, Eq. (46.15) has a unique exponentially stable unpredictable solution x(t). To imagine the behavior of the unpredictable solution x(t), we consider the simulation of another solution ψ(t), with initial values ψ(0) = 0.4956, which approaches to this unpredictable solution , as time increases. Thus, instead of the curve describing the unpredictable solution x(t), one can consider the graph of ψ(t) (Fig. 46.1).
46.5 Conclusion In the present study, a scalar differential equation with piecewise constant argument of generalized type of both delayed and advanced forms is considered. Presence of a deviating argument as γ (t) makes equation more realistic in several applications and more complex to analyze. Therefore, it is noteworthy to deal with such an equation together with the theory of unpredictable functions. In real world problems, there are several models where one dependent variable changes with respect to the independent variable t. Hence, our findings which consider a new type of oscillations in terms of unpredictable functions contribute not only to the theory of differential equations but also to many real world applications in various fields of the science. Moreover, since existence of an unpredictable solution approves the Poincaré chaos, our results are also considerable for the theory of chaos. Example and the simulation concerning our findings play an important role for the verification and visualization of the reached theoretical results.
618
46 Unpredictable Solutions of a Scalar Differential Equation …
Acknowledgements M. Akhmet has been supported by 2247-A National Leading Researchers Program of TÜB˙ITAK (The Scientific and Technological Research Council of Turkey), Turkey, N 120C138. M. Tleubergenova and Z. Nugayeva have been supported by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (grants No. AP09258737 and No. AP14870835).
References 1. Wiener J (1993) Generalized solutions of functional differential equations. World Scientific, Singapore 2. Akhmet M (2011) Nonlinear hybrid continuous/discrete-time models. Atlantis Press, Paris 3. Akhmet MU, Yilmaz E (2014) Neural networks with discontinuous/impact activations. Springer, New York 4. Akhmet MU (2005) On the integral manifolds of the differential equations with piecewise constant argument of generalized type. In: Agarval RP, Perera K (eds) Proceedings of the conference on differential and difference equations and applications. Hindawi Publishing Corporation, Melbourne, Florida, pp 11–20 5. Akhmet MU (2007) Integral manifolds of differential equations with piecewise constant argument of generalized type. Nonlinear Anal 66:367–383 6. Cooke KL, Wiener J (1984) Retarded differential equations with piecewise constant delays. J Math Anal Appl 99:265–297 7. Akhmet MU (2008) Stability of differential equations with piecewise constant arguments of generalized type. Nonlinear Anal 68:794–803 8. Akhmet MU (2008) Almost periodic solutions of differential equations with piecewise constant argument of generalized type. Nonlinear Anal Hybrid Syst 2:456–467 9. Akhmet MU, Aru˘gaslan D (2009) Lyapunov-Razumikhin method for differential equations with piecewise constant argument. Discrete and Continuous Dyn Syst Ser A 25:457–466 10. Akhmet MU, Aru˘gaslan D, Yılmaz E (2010) Stability analysis of recurrent neural networks with piecewise constant argument of generalized type. Neural Netw 23:805–811 11. Akhmet MU, Aru˘gaslan D, Yılmaz E (2010) Stability in cellular neural networks with a piecewise constant argument. J Comput Appl Math 233:2365–2373 12. Akhmet MU, Aru˘gaslan D, Cengiz N (2018) Exponential stability of periodic solutions of recurrent neural networks with functional dependence on piecewise constant argument. Turk J Math 42:272–292 13. Aru˘gaslan Çinçin D, Cengiz N (2020) Qualitative behavior of a Liénard-type differential equation with piecewise constant delays. Iran J Sci Technol Trans Sci 44:1439–1446 14. Wu A, Liu L, Huang T, Zeng Z (2017) Mittag-Leffler stability of fractional-order neural networks in the presence of generalized piecewise constant arguments. Neural Netw 85:118–127 15. Xi Q (2018) Razumikhin-type theorems for impulsive differential equations with piecewise constant argument of generalized type. Adv Differ Equ 267:1–16 16. Xi Q (2016) Global exponential stability of Cohen-Grossberg neural networks with piecewise constant argument of generalized type and impulses. Neural Comput 28:229–255 17. Li X (2014) Existence and exponential stability of solutions for stochastic cellular neural networks with piecewise constant argument. J Appl Math 2014:1–11 18. Pinto M, Sepúlveda D, Torres R (2018) Exponential periodic attractor of impulsive Hopfieldtype neural network system with piecewise constant argument. Electron J Qual Theory Differ Equ 34:1–28 19. Castillo S, Pinto M (2015) Existence and stability of almost periodic solutions to differential equations with piecewise constant argument. Electron J Differ Equ 58:1–15
References
619
20. Zou Ch, Xia Y, Pinto M, Shi J, Bai Y (2019) Boundness and linearisation of a class of differential equations with piecewise constant argument. Qual Theory Dyn Syst 18:495–531 21. Pinto M (2009) Asymptotic equivalence of nonlinear and quasi linear differential equations with piecewise constant arguments. Math Comput Model 49:1750–1758 22. Akhmet MU, Fen MO (2017) Poincare chaos and unpredictable functions. Commun Nonlinear Sci Numer Simul 48:85–94 23. Akhmet MU, Fen MO (2017) Existence of unpredictable solutions and chaos. Turk J Math 41:254–266 24. Akhmet MU, Fen MO (2018) Non-autonomous equations with unpredictable solutions. Commun Nonlinear Sci Numer Simul 159:657–670 25. Akhmet M, Fen MO, Tleubergenova M, Zhamanshin A (2019) Unpredictable solutions of linear differential and discrete equations. Turk J Math 43:2377–2389 26. Akhmet M, Tleubergenova M, Zhamanshin A (2020) Quasilinear differential equations with strongly unpredictable solutions. Carpathian J Math 36:341–349 27. Akhmet M, Tleubergenova M, Fen MO, Nugayeva Z (2020) Unpredictable solutions of linear impulsive systems. Mathematics 8:1798 28. Akhmet M, Tleubergenova M, Nugayeva Z (2020) Strongly unpredictable oscillations of Hopfield-type neural networks. Mathematics 8:1791 29. Akhmet M, Seilova R, Tleubergenova M, Zhamanshin A (2020) Shunting inhibitory cellular neural networks with strongly unpredictable oscillations. Commun Nonlinear Sci Numer Simul 89:105287 30. Akhmet M, Tleubergenova M, Akylbek Z (2020) Inertial neural networks with unpredictable oscillations. Mathematics 8:1797 31. Akhmet M, Fen MO, Tleubergenova M, Zhamanshin A (2019) Poincare chaos for a hyperbolic quasilinear system. Miskolc Math Notes 20:33–44 32. Akhmet M, Aru˘gaslan Çinçin D, Tleubergenova M, Nugayeva Z (2021) Unpredictable oscillations for Hopfield-type neural networks with delayed and advanced arguments. Mathematics 9:571 33. Farkas M (1994) Periodic motion. Springer, New York 34. Hino Y, Naito T, VanMinh N, Shin JS (2001) Almost periodic solutions of differential equations in Banach spaces. CRC Press 35. Corduneanu C (2009) Almost periodic oscillations and waves. Springer, New York 36. Akhmet MU (2020) Almost periodicity, chaos, and asymptotic equivalence. Springer, New York 37. Aru˘gaslan D, Cengiz N (2018) Existence of periodic solutions for a mechanical system with piecewise constant forces. Hacet J Math Stat 47:521–538 38. Aru˘gaslan D, Özer A (2014) Stability analysis of a predator-prey model with piecewise constant argument of generalized type using Lyapunov functions. Neliniini Koliv 16:452–459; (2013) trans: J Math Sci (NY) 203:297–305 39. Akhmet MU, Aru˘gaslan D, Liu X (2008) Permanence of nonautonomous ratio-dependent predator-prey systems with piecewise constant argument of generalized type. Dyn Continuous Discrete Impulsive Syst Ser A Math Anal 15:37–51 40. Hartman P (1982) Ordinary differential equations. Birkhäuser, Boston
Chapter 47
Classification of Naval Ships with Deep Learning Onurhan Çelik
and Aydın Çetin
47.1 Introduction The identification and classification of ships play an important role in target detection analysis in areas such as port security, search and rescue operations, and situational awareness at sea. There are cameras and electro-optical systems on both coastal surveillance platforms and mobile platforms in the military field. The images taken from these systems must be processed and automatically detected and classified in decision support systems. It is advantageous to have information about the identity of the target ships in decision support systems. Threat probabilities can be deduced according to the military/civil, platform type, weapons, and sensors of the detected targets. Therefore, making a classification on the images obtained from sensors such as electro-optical sensors of the platforms will contribute to these analyzes. Today, autonomous systems are used in all areas of our lives. One of the most important areas of use is military platforms. Thanks to the developing technologies of both unmanned platforms and manned platforms, the requirements of the autonomous system have begun to be met. In this way, human intervention in decision stages and results is reduced. It has also become important to process and make sense of the increasing data. Artificial intelligence technologies are used to reach information from these large data sets. Application solutions are created by using artificial intelligence technologies in many fields such as health, education, defense, and communication. Deep Learning methods, a sub-branch of artificial intelligence, are used for the classification of image data. Many image processing solutions are made with Deep O. Çelik (B) HAVELSAN, Ankara 06510, Turkey e-mail: [email protected] A. Çetin Gazi University, Ankara 06500, Turkey © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_47
621
622
47 Classification of Naval Ships with Deep Learning
Learning. In this study classification of naval ship images according to their military types is presented. The study was carried by use of Convolutional neural networks (CNN), which is proved to be very successful in image classification. First, since datasets containing military information are not available as open source, a dataset was created to contain consistent data from various online sources. Then, using this dataset, the classification process is performed with CNN models. In the following sections of the article; related studies on ship classification were presented in Sect. 47.2, dataset creation and preparation are explained in Sect. 47.3, the classification with the CNN model, and the results of the study are explained in Sects. 47.4 and 47.5 respectively.
47.2 Related Works Premaratne et al. [9] carried out a study to classify ships by superstructure moment invariants. They used Inverse Synthetic Aperture Radar (ISAR) images which relies superstructure of ships. In this study, they used Hu Moments to extract superstructure moment invariants feature. Chang et al. [2] conducted a ship detection study with the deep learning architecture YOLOv2 model from SAR (Synthetic Aperture Radar) sourced satellite images. Due to the low resolution of the images taken from SAR sources and the incomplete appearance of the ship, they only worked on ship detection. They achieved 90.05% accuracy with the model they used. They provided a 5.8 times faster detection speed than the earlier faster-R-CNN model. Liu et al. [3] conducted ship detection and classification studies using the CNN model with the dataset they created on Google Earth. These images show ships from above like SAR images but have higher image quality. Thus, it is a more suitable data set for classification. In the study, they achieved 99% detection accuracy and 95% classification accuracy. Dao-Duc et al. [1] created a dataset with ship images randomly selected from the online Shipspotting website. The dataset contains 130,000 images and 35 different classes. On this dataset, they made classification studies according to ship types with AlexNet, which is a CNN model. They created models with two different configurations that vary according to the parameters. They achieved a success rate of 80.91% from the first and 95.43% from the second. Zhang et al. [4] carried out a study to be used in autonomous sea surface vessels (ASVs). ASVs have to classify other ships according to international maritime traffic rules, both day and night. First of all, they created a dataset as a result of a 9-day recording. There are 6 ship types and 2865 images. Then they tried to classify the pictures in this dataset with CNN. They achieved an accuracy rate of 87.4% in the dataset they separated as night and day. Yüksel et al. [15] carried out a study to extract of ship signatures from silhoutte images of three-dimensional ship models and ship recognition from optical images. They created synthetic database and ship classification and recognition are performed using segmentation and Artificial Neural Networks.
47.3 Dataset
623
Milicevic et al. [10] conducted ship classification with data augmentation and transfer learning techniques. They achieved an accuracy rate of 78% for the validation dataset with the VGG-19 network. Leclerc et al. [13] conducted pre-trained CNNs based on the Inception and ResNet architectures to perform ship classification. They performed transfer learning experiments on limited ship image dataset. Their optimal solution achieved 78.73% accuracy. Gundogdu et al. [5] created a dataset of ship images called MARVEL in their work. They generated this dataset from random images available on the online Shipspotting website. The dataset contains 140,000 images and 26 classes. As a result of their classification using the CNN model AlexNet, they reached an accuracy rate of 73%.
47.3 Dataset The main types of naval ships are shown in Fig. 47.1. When viewed from the side in this way, the types of ships differ in terms of antennas, weapons, masts, radomes, funnel etc. on the hull. These structures are called superstructures. Classification of naval ships can be made by the presence of these superstructures in different numbers, locations, and sizes.
Fig. 47.1 Naval ship types
624
47 Classification of Naval Ships with Deep Learning
While it is very difficult to distinguish the ship’s baseline from SAR images, today’s modern cameras and electro-optical systems make it much easier to detect superstructure features in more detail. Previously, the datasets were predominantly civilian ship categories and thus classification studies were performed on civilian ships. Therefore, in this study, we focus on naval ship classification rather than civil ones. For this purpose, we first created a new dataset consisting of naval ships. The dataset was mostly generated from the online Shipspotting website [6] and images from the naval ship types section of the Wikipedia website [7]. The created dataset contains a total of 41,426 images for the 7 different ship types. These are Aircraft Carrier, Destroyer, Frigate, Landing Ship, Mine Warfare Patrol, and Submarines. Since these ship types are the main types, they differ from each other in appearance. In sub-ship types, such as corvette or gunboat, these differences are not so much. The type-based distribution of data and sample images from the naval ship dataset are shown in Figs. 47.2 and 47.3 respectively. Images in Fig. 47.3 are prepared in 256 × 256-pixel resolutions. The partitions of the main body of the ships are shown in Fig. 47.4. Ship images in the dataset are at different angles. Since the direction of the ships is not included in the source from which the data is taken, a conversion method cannot be made for the ships. If these images were taken through the sensors carried by the military platforms, additional information on the images would be included. Thus, a conversion of the ship’s images by angle could be made. Figure 47.5 shows the flowchart of the Dataset and Classification Processes. Because there is no angle information in the created dataset a sub-dataset was prepared and the dataset was divided into two different categories as images taken from the sides (port and starboard view) of the ships and others.
Fig. 47.2 Dataset of naval ship types
47.3 Dataset
Fig. 47.3 Sample ımages from naval ship dataset
Fig. 47.4 Sides of ship
Fig. 47.5 Flowchart of the dataset and classification processes
625
626
47 Classification of Naval Ships with Deep Learning
We performed training for two different classes by manually labeling some of the data. Therefore, we have obtained 8000 images in 7 different categories for the classification.
47.4 Classification We performed the classification on the VGGNet-16 model whose layered architecture is shown in Fig. 47.6. Recent studies in the literature suggest that the VGGNet16 model is the most effective on ship classification. VGGNet is a convolutional neural network model proposed by Simonyan and Zisserman. The model won ILSVR (Imagenet) competition in 2014. Imagenet is a dataset of over 14 million images belonging to 1000 classes. VGGNet-16 has 16 layers that have weights. The network has 13 convolutional layers and followed by 3 fully connected layers. Experimental studies and training of the model were carried on the Google Colab where provides free cloud service with free GPU (NVIDIA Tesla K80). We used TensorFlow 2.7 library on Python 3.7 programming environment. Considering the hyperparameters in previous studies, for the classification, an Adam optimizer with a 0.001 learning rate was used. The batch size and dropout value were set to 128 and 0.5 respectively. 10 epochs in each classification run were preferred. The more consistent the dataset is in classification studies, the higher the success rate. So, three classifications for three different subsets of the same dataset were performed. For this purpose, first, the entire dataset with 7 classes was classified and a 63% validation accuracy was obtained. Then, classification was carried out again with 7 classes with a sub-dataset containing side views of the ships. In this case, the validation accuracy was increased to 79%. Finally, experiments were conducted with naval ship types with data containing four classes (Aircraft Carrier, Destroyer, Patrol, Submarine) as seen in Fig. 47.3 with different structural properties in the sub-dataset. In this case, 93% validation accuracy was obtained. The classification accuracy of the model is summarized in Table 47.1. Comparison of the accuracy of our study with previous studies (Table 47.2) reveals that classification accuracy of VGGNet-16 trained with Subdataset with 4 classes is
Fig. 47.6 VGGNet-16 architecture
References Table 47.1 Classifying accuracy
Table 47.2 Comparisons of classification
627 Dataset
# of classes
Accuracy (%)
Full
7
63
Sub
7
79
Sub
4
93
Model
Accuracy (%)
Dataset
AlexNet [5]
73.14
Civilian
AlexNet [8]
88.22
Civilian + Military
VGGNet-16
79.95
Military (Subdataset with 7 classes)
VGGNet-16
93.36
Military (Subdataset with 4 classes)
higher than the classification accuracy of both studies with AlexNet and VGGNet-16 trained with Subdataset with 7 classes.
47.5 Conclusions In this study, a new dataset from open sources was created. Then, classification was performed with today’s successful CNN model VGGNet-16 and the results were compared with the previous studies in the literature. The source images should embrace the actual structure of the ship. The results of the study reveals that the accuracy rate of the model is effected by the angle of objects in images. Therefore, images of the ship taken by the port and starboard improved the training accuracy. The next studies include application of different deep learning models with the augmentation of the dataset and real time recognition of naval ships.
References 1. Dao-Duc C, Xiaohui H, Mor‘ere O (2015) Maritime vessel images classification using deep convolutional neural networks. In: Proceedings of the sixth ınternational symposium on ınformation and communication technology, SoICT 2015, ACM, New York, pp 276–281 2. Chang Y, Anagaw A, Chang L, Wang YC, Hsiao C, Lee W (2019) Ship detection based on YOLOv2 for SAR imagery. Remote Sensing 11(7):786 3. Liu Y, Cui HY, Kuang Z, Li GQ (2017) Ship detection and classification on optical remote sensing ımages using deep learning. In: ITM web of conferences 2017, vol 12 4. Zhang MM, Choi J, Daniilidis K, Wolf MT, Kanan C (2015) VAIS: a dataset for recognizing maritime ımagery in the visible and ınfrared spectrums. In: Computer vision and pattern recognition workshops (CVPRW) 2015
628
47 Classification of Naval Ships with Deep Learning
5. Gundogdu E, Solmaz B, Yücesoy V, Koç A (2017) MARVEL: a large-scale ımage dataset for maritime vessels. In: Lai SH, Lepetit V, Nishino K, Sato Y (eds) Computer vision—ACCV 2016. ACCV 2016. Lecture notes in computer science, vol 10115. Springer, Cham 6. Shipspotting Ship Tracker. http://www.shipspotting.com/gallery/categories.php (Accessed on 13 Aug 2021) 7. Wikipedia https://en.wikipedia.org/wiki/List_of_naval_ship_classes_in_service (Accessed on 22 Aug 2021) 8. Atalar O, Bartan B (2017) Ship classification using an ımage dataset. Corpus ID: 29004678 9. Premaratne P, Safaei F (2009) Ship classification by superstructure moment ınvariants. In: Proceedings of the ınternational conference on ıntelligent computing, Ulsan, Republic of Korea, pp 327–335 10. Milicevic M, Zubrinic K, Obradovic I, Sjekavica T (2018) Data augmentation and transfer learning for limited dataset ship classification. WSEAS Trans Syst Control 2018(13):460–465 11. Abadi M et al. (2015) TensorFlow: large-scale machine learning on heterogeneous systems. http://www.tensorflow.org 12. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large scale ımage recognition. arXiv 1409.1556 13. Leclerc M, Tharmarasa R, Florea M, Boury-Brisset A, Kirubarajan T, Duclos Hindie N (2018) Ship classification using deep learning techniques for maritime target tracking. In: 21st ınternational conference on ınformation fusion (FUSION), pp 737–744 14. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255 15. Yüksel GK, Yalıtuna B, Tartar ÖF, Adlı FC, Eker K, Yörük O (2016) Ship recognition and classification using silhouettes extracted from optical images. In: 24th signal processing and communication application conference (SIU). pp 1617–1620
Chapter 48
Investigation of Mass-Spring Systems Subject to Generalized Piecewise Constant Forces Marat Akhmet, Duygu Aru˘gaslan Çinçin, Zekeriya Özkan, and Madina Tleubergenova
48.1 Introduction and Preliminaries In the literature, there are many mathematical models constructed through differential equations to examine real life problems. In most of these models, present states of the process are used. However, in some cases, these differential equations cannot express real life problems with a realistic approach since previous states can influence the present states and future states in a significant way. Because of this, the development and applications of the qualitative theory of differential equations with discontinuous effects have been and continue to be the focus of great attention [1–13]. The study of differential equations with piecewise constant arguments (EPCA) has been started in [14–17]. In [18, 19], analytic solutions of the first order differential equations with piecewise constant argument are studied are studied, where the greatest integer function [t] is taken as an argument. Hence, the difference between two adjacent arguments is always constant. In the real life problems, taking variation between arguments nonconstant will give more sensitive solutions. For this reason, a new type M. Akhmet (B) Department of Mathematics, Middle East Technical University, Ankara, Turkey e-mail: [email protected] D. A. Çinçin Department of Mathematics, Süleyman Demirel University, Isparta 32260, Turkey e-mail: [email protected] Z. Özkan Ortaköy Vocational School, Aksaray University, Aksaray 68400, Turkey M. Tleubergenova Department of Mathematics, K. Zhubanov Aktobe Regional University, Aktobe 030000, Kazakhstan Institute of Information and Computational Technologies CS MES RK, Almaty 050000, Kazakhstan © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_48
629
630
48 Investigation of Mass-Spring Systems Subject …
of piecewise constant argument, called as piecewise constant argument of generalized type (PCAG), is introduced by Akhmet ([2]). Differential equations with PCAG have been studied extensively in [1–10]. In present paper, damped and undamped mass-spring systems which have external forces with piecewise constant argument of generalized type have been examined. For this aim, our paper is arranged as follows. Section 48.2 defines the parameters of mass-spring systems having external forces with piecewise constant argument of generalized type and gives assumptions with some notations. In Sects. 48.2.1 and 48.2.2, since by the Laplace transform differential equations are converted to algebraic equations which can be solved easily, we investigate the solutions of springmass systems with generalized piecewise constant external forces using the Laplace transform. In Sect. 48.3, we give our conclusions. Let Z, N and R be the sets of all integers, natural and real numbers, respectively. Fix two real-valued sequences (θi ), (ζi ), i ∈ Z, such that θi < θi+1 , θi ≤ ζi < θi+1 for all i ∈ Z, |θi | → ∞ as |i| → ∞, and assume that there exists a number θ > 0 such that θi − θi+1 ≤ θ, i ∈ Z. The most general form of differential equations with piecewise constant argument of generalized type can be expressed as follows [2] x ' (t) = f (t, x(t), x(γ (t)))
(48.1)
where t ∈ R, x ∈ Rn , γ (t) = ζi , for t ∈ [θi , θi+1 ), i ∈ Z. It is clear that the piecewise constant argument function β(t) is a special case of the argument function γ (t) of alternate type: retarded and advanced. In fact, γ (t) = β(t) for θi = ζi , i ∈ Z. In the present paper, we shall consider the following undamped and damped mass-spring systems with piecewise constant argument of generalized type m x¨ + kx = Ax(β(t)) (48.2) and m x¨ + c x˙ + kx = Ax(β(t))
(48.3)
where x ∈ R, t ∈ R, and β(t) = θi , if t ∈ [θi , θi+1 ), i ∈ Z. The system (48.2) and (48.3) have discontinuities at the moments θi , i ∈ Z, since the piecewise function β(t) is not continuous at θi , i ∈ Z. But the solutions of the systems (48.2) and (48.3) present a continuous and continuously differentiable dynamic motion within intervals t ∈ [θi , θi+1 ), i ∈ Z. A classical mechanical mass-spring system can be written as m x¨ + c x˙ + kx = 0,
(48.4)
where x(t) is the displacement vector, m is the mass of the spring, c > 0 is the damping coefficient and k > 0 is the spring constant. The system (48.4) has a damped harmonic motion. In other words, when the friction force exists, the oscillation amplitude goes to vanishing. The discriminant of the characteristic equation
48.1 Introduction and Preliminaries
631
λ2 +
c k λ+ =0 m m
is given by the following equation Δ =
k c2 −4 . m2 m
According to the sign of Δ , the behavior of the system is characterized. If Δ > 0, Δ = 0 and Δ < 0, then the system (48.4) has, respectively, over damped, critical damped and under damped motion. If the system has an external force F, the system is given as follows m x¨ + c x˙ + kx = F. In Dai and Singh [20, 21], studied the following system m x¨ + c x˙ + kx = Ax([t]) where A indicates the magnitude of the force. They consider the greatest integer function as deviating argument to describe the force and use the discretization method to obtain the solution. This system is discontinuous at the moments [t], since the piecewise constant force F = Ax([t]) has discontinuities at the moments [t]. We take external forces with piecewise constant argument of generalized type β(t) instead of the greatest integer function for the systems (48.2) and (48.3). Let y1 = x and y2 = x ' . Then the mass-spring system (48.3) can be reduced to the following first order differential equation system y ' (t) = A0 (t)y(t) + A1 y(β(t)).
(48.5)
Here, the matrices A0 and [A1 depending [ ] on the parameters ] of the mass-spring system 0 1 0 0 (48.3) are given by A0 = and A1 = A . − mk − mc 0 m We will examine both damped and undamped mass-spring systems with generalized piecewise constant forces. While doing this, we will not transfer the system into discrete equations. We will also assume that the system has an harmonic motion under damped, that is, Δ = ( mc )2 − 4 mk < 0. Now, we will state the definition given in [2]. Definition 1 A continuous function y(t) is a solution of (48.5) on R if: (i) the derivative y ' (t) exists at each point t ∈ R with the possible exception of the points θi , i ∈ Z, where one-sided derivatives exist; (ii) equation (48.5) is satisfied for y(t) on each interval (θi , θi+1 ), i ∈ Z, and it holds for the right derivative of y(t) at the points θi , i ∈ Z.
632
48 Investigation of Mass-Spring Systems Subject …
48.2 Dynamics of Mass-Spring Systems Subject to Generalized Piecewise Constant Forces From now on, we will assume without loss of generality that θ0 = 0 and we will denote δ(θ0 , t) as the number of points θi in the interval (θ0 , t). Then we can start to investigate mass-spring systems.
48.2.1 Undamped Spring-Mass System Consider the following mechanical problem of undamped mass-spring system: m x(t) ¨ + kx(t) = Ax(β(t)) x(θ0 ) = d0
(48.6)
x(θ ˙ 0 ) = v0 where A is a positive real parameter. Before solving (48.6), let us state the definition of the solution x(t) for this problem on the interval [θ0 , ∞) [20, 21]. Definition 2 A solution of Eq. (48.6) on [θ0 , ∞) is a function x(t) that satisfies the following four conditions: (i) x(t) and x(t) ˙ are continuous on [θ0 , ∞), (ii) x(t) ¨ exists at each point t ∈ [θ0 , ∞), there may be exceptional points t = θi ∈ [θ0 , ∞), i = 0, 1, 2, . . ., where one-sided derivatives exist, (iii) the solution x(t) satisfies Eq. (48.6) on each interval [θi , θi+1 ) ⊂ [θ0 , ∞), (iv) there is a continuous linear dynamic system on each of the interval [θi , θi+1 ), corresponding to the piecewise constant system governed by the Eq. (48.6). Then, we give the following proposition for the solution of (48.6). Proposition 1 The solution of Eq. (48.6) on the interval [θ0 , ∞) is given by [ x(t) =
( 1−
a ν2
)
cos (ν(t − β(t))) +
a ν2
sin (ν(t − β(t))) ν
] . 1 k=δ(θ0 ,t)
[ Mk
] d0 , v0 (48.7)
where
[ Mk =
ν2 =
k , m
a=
A , sk m
(1 −
a ) cos(νsk ) + νa2 ν2 −ν(1 − νa2 ) sin(νsk )
sin(νsk ) ] , ν cos(νsk )
= θk − θk−1 , k = 1, 2, 3, . . . , and 1 . k=δ(θ0 ,t)
Mk = Mδ(θ0 ,t) Mδ(θ0 ,t)−1 · · · M3 M2 M1 .
48.2 Dynamics of Mass-Spring Systems Subject …
633
Proof On an arbitrary interval [θn , θn+1 ), let us assume that xn (t) is a solution of problem (48.6) with the initial conditions xn (θn ) = dn and x˙n (θn ) = vn . On this interval, after dividing both sides of Eq. (48.2) by m, the initial value problem (48.6) becomes (48.8) x¨n (t) + ν 2 xn (t) = axn (θn ) with the initial conditions xn (θn ) = dn and x˙n (θn ) = vn .
(48.9)
Solution of the initial value problem (48.8)–(48.9) on [θn , θn+1 ) is obtained as follows [ xn (t) =
(
1−
a ν2
)
cos (ν(t − θn )) +
a ν2
sin (ν(t − θn )) ν
][
] dn . vn
(48.10)
In the same manner, on the interval [θn−1 , θn ) we can write the solution as [ ][ ] ) ( sin (ν(t − θn−1 )) dn−1 xn−1 (t) = 1 − νa2 cos (ν(t − θn−1 )) + νa2 (48.11) vn−1 ν where dn−1 and vn−1 are defined similarly as the initial conditions in the Eq. (48.9). As we mention in the Definition 2; since the solution x(t) and its derivative x(t) ˙ are continuous on the interval [θ0 , ∞), the following conditions are satisfied xn (θn ) = xn−1 (θn ) = dn and x˙n (θn ) = x˙n−1 (θn ) = vn .
(48.12)
Using these equalities in the Eqs. (48.10) and (48.11), we obtain a recurrence relation between dn , vn and dn−1 , vn−1 in a matrix form as given below [
dn vn
[
] =
(1 −
a ) cos(νsn ) + νa2 ν2 −ν(1 − νa2 ) sin(νsn )
] sin(νsn ) ] [ dn−1 . ν vn−1 cos(νsn )
For simplicity, let us denote [ sin(νsk ) ] (1 − νa2 ) cos(νsk ) + νa2 . Mk = ν −ν(1 − νa2 ) sin(νsk ) cos(νsk )
(48.13)
(48.14)
If we use an iterative procedure, we obtain dn , vn in terms of d0 , v0 as follows [
where
1 . k=n
dn vn
] =
1 . k=n
[ Mk
d0 v0
]
Mk = Mn Mn−1 · · · M3 M2 M1 .
(48.15)
634
48 Investigation of Mass-Spring Systems Subject …
Since xn represents the solution on an arbitrary interval θn ≤ t < θn+1 and both x(t) and x(t) ˙ are continuous on the interval [θ0 , ∞), the solution on the interval t ∈ [θ0 , ∞) can be expressed by [ x(t) =
(
1−
a ν2
)
cos (ν(t − β(t))) +
a ν2
sin (ν(t − β(t))) ν
]
1 .
[ Mk
k=δ(θ0 ,t)
] d0 . v0
(48.16)
. Hence, proposition is proved. Next, we will give the solution of the mechanical problem (48.6) using Laplace transform. Theorem 1 (Laplace Solution) The solution of the initial value problem (48.6) on the interval [θ0 , ∞) is given as follows v0 ad0 sin(νt) + 2 (1 − cos(νt) ν ν ∞ [ ] Σ a + 2 Ω(n)u θ n+1 (t) 1 − cos (ν(t − θn+1 )) , ν n=0
x(t) = d0 cos(νt) +
(48.17)
where Ω(n) is defined using x(t) in Eq. (48.16) as Ω(n) = x(θn+1 ) − x(θn ). Here, we consider the value of x(θn+1 ) as x(θn+1 ) = lim− x(t). x→θn+1
In an explicit way, [ Ω(n) = (1 −
(
a ) ν2
[ ] 1 ) sin(νs ) ] . d n+1 Mk 0 cos(νsn+1 ) − 1 v0 ν k=n
where Mk is the matrix given by Eq. (48.14). Proof Let us divide both sides of the Eq. (48.2) by m. Then, we get x(t) ¨ + ν 2 x(t) = ax(β(t)).
(48.18)
Next, let us rewrite Eq. (48.18), using the following series definition of x(β(t)): x(β(t)) = x(0) +
) x(θn+1 ) − x(θn ) u θn+1 (t).
(48.19)
) x(θn+1 ) − x(θn ) u θn+1 (t),
(48.20)
∞ ( Σ n=0
Hence, we have x(t) ¨ + ν 2 x(t) = ad0 + a
∞ ( Σ n=0
48.2 Dynamics of Mass-Spring Systems Subject …
635
where u θn (t) is the unit step function defined by . u θn (t) =
0 if t < θn ; 1 if t ≥ θn .
Now, we will take the Laplace transform of Eq. (48.20). After taking the Laplace transform, we get ˙ + ν 2 X (s) = s 2 X (s) − sx(0) − x(0)
∞ ( ) e−(θn+1 )s Σ ad0 +a . x(θn+1 ) − x(θn ) s s n=0
˙ = v0 , rearranging the last equality and solving With the equalities x(0) = d0 and x(0) for X (s), we obtain X (s) =
∞ ( ) e−(θn+1 )s Σ ad0 d0 s + v0 + ) − x(θ ) + a .(48.21) x(θ n+1 n s2 + ν2 s(s 2 + ν 2 ) s(s 2 + ν 2 ) n=0
Applying the inverse Laplace transform to Eq. (48.21), we get the solution x(t) of the mechanical problem (48.6) as follows v0 ad0 sin(νt) + 2 (1 − cos(νt)) ν ν ∞ [ ] Σ a + 2 Ω(n)u θn+1 (t) 1 − cos (ν(t − θn+1 )) . ν n=0
x(t) = d0 cos(νt) +
(48.22)
.
The theorem is proved.
48.2.2 Damped Spring-Mass System Consider the following mechanical problem of damped mass-spring system: m x(t) ¨ + c x˙ + kx(t) = Ax(β(t)) x(θ0 ) = d0
(48.23)
x(θ ˙ 0 ) = v0 where A is a positive real parameter. Proposition 2 The solution of the initial value problem (48.23) on the interval [θ0 , ∞) is given by
636
48 Investigation of Mass-Spring Systems Subject …
⎡ ( x(t) = e
−α(t−θ0 )
)(
⎤ cos(ξ(t−β(t)))
sin(ξ(t−θn )) ⎦ ξ
)
+ αξ sin(ξ(t−θn )) + Ak eα(t−β(t))
where 2α = ⎡( Nk = ⎣
⎣
1− Ak
c , m
1−
/
ξ= A k
)(
k m
k=δ(θ0 ,t)
[ ] d Nk 0 (48.24) v0
− α 2 , Nk , k = 1, 2, 3, · · · , is a matrix given by
cos(ξ sk ) +
α ξ
) sin(ξ sk ) + Ak eα(ξ sk )
(1 − Ak )(− αξ − ξ ) sin(ξ sk ) 2
and
1 .
1 .
⎤ sin(ξ sk ) ⎦ ξ cos(ξ sk ) − αξ sin(ξ sk )
Nk = Nδ(θ0 ,t) Nδ(θ0 ,t)−1 · · · N3 N2 N1 .
k=δ(θ0 ,t)
Proof To prove proposition, we will start on an arbitrary interval [θn , θn+1 ). After dividing both sides by m on this interval, the initial value problem (48.23) can be written for t ∈ [θn , θn+1 ) as follows x¨n (t) + 2α x˙n (t) +
k A xn (t) = xn (θn ) m m
(48.25)
with the initial conditions xn (θn ) = dn and x˙n (θn ) = vn ,
(48.26)
where dn and vn are defined as in the initial values of the problem (48.23). If we solve (48.25)–(48.26), we obtain the solution in matrix form as follows xn (t) = e
−α(t−θn )
[(
1− Ak
)(
)
cos(ξ(t−θn ))+ αξ sin(ξ(t−θn )) + Ak eα(t−θn )
sin(ξ(t−θn )) ξ
][ ] dn vn
(48.27)
In the same manner, on the interval [θn−1 , θn ) we can write the solution using matrix form as follows ⎡ ( ⎤ )( [ ] 1− Ak cos(ξ(t−θn−1 )) dn sin(ξ(t−θ )) −α(t−θn−1 ) ⎣ n−1 ⎦ ) (48.28) xn−1 (t) = e ξ vn + α sin(ξ(t−θn−1 )) + A eα(t−θn−1 ) ξ
k
where dn−1 and vn−1 are initial values defined similar to the ones given in (48.26). Since the solution x(t) and its derivative x(t) ˙ are continuous on the interval [θ0 , ∞), the following conditions are satisfied xn (θn ) = xn−1 (θn ) = dn and x˙n (θn ) = x˙n−1 (θn ) = vn .
(48.29)
48.2 Dynamics of Mass-Spring Systems Subject …
637
Continuing in the same manner, we obtain a recurrence relation between dn , vn and dn−1 , vn−1 given by the following matrix form ⎡
(
)(
cos(ξ sn ) [ ] ) ⎢ dn −αsn ⎢ =e + αξ sin(ξ sn ) + Ak eα(sn ) ⎣ vn 2 (1 − Ak )(− αξ − ξ ) sin(ξ sn ) 1− Ak
⎤
sin (ξ(t − θn−1 )) ξ cos(ξ sn ) −
α ξ
[ ] ⎥ dn−1 ⎥ ⎦ vn−1 .
sin(ξ sn )
(48.30)
For simplicity, let us denote ⎡(
⎤ sin(ξ sn ) ⎦, ξ Nk = ⎣ A α2 α (1 − k )(− ξ − ξ ) sin(ξ sk ) cos(ξ sk ) − ξ sin(ξ sk ) (48.31) k = 1, 2, 3, · · · . If we use an iterative procedure, we obtain dn , vn in terms of d0 , v0 as follows [ ] . [ ] 1 dn d Nk 0 = (48.32) vn v0 1−
A k
)(
cos(ξ sk ) +
α ξ
) sin(ξ sk ) + Ak eαsn
k=n
where
1 .
Nk = Nn Nn−1 · · · N3 N2 N1 .
k=n
Since xn represents the solution on an arbitrary interval θn ≤ t < θn+1 and both x(t) and x(t) ˙ are continuous on the interval [θ0 , ∞), the solution on the interval t ∈ [θ0 , ∞) can be stated as ⎡ x(t) = e
−α(t−θ0 )
⎣
(
1− Ak
)(
⎤ cos(ξ(t−β(t)))
)
+ αξ sin(ξ(t−β(t))) + Ak eα(t−β(t))
sin(ξ(t−β(t))) ξ
⎦
1 . k=δ(θ0 ,t)
[ Nk
] d0 . v0
(48.33) The proposition is proved. . In what follows, we will obtain the solution of damped mechanical system (48.23) using the Laplace transform. Theorem 2 (Laplace Solution) The solution of the initial value problem (48.23) on the interval [θ0 , ∞) is given by ( ( ) 2α −αt A d0 1 − e sin(ξ t) − e−αt cos(ξ t) 2 2 m(α + ξ ) ξ ∞ [ ]) Σ 2α sin(ξ(t − θn+1 )) − cos(ξ(t − θn+1 ))] φ(n)u θn+1 (t) 1 − e−α(t−θn+1 ) [ + ξ
x(t) =
n=0
+ d0 e−αt cos(ξ t) +
v0 + 2αd0 −αt e sin(ξ t), ξ
(48.34)
638
48 Investigation of Mass-Spring Systems Subject …
where φ(n) is constructed using x(t) in Eq. (48.33) and it is given by φ(n) = x(θn+1 ) − x(θn ). Proof Firstly, let us divide both sides of the Eq. (48.3) by m and use series definition of β(t). Then, applying Laplace transform to Eq. (48.3), we get ( k A x(0) ˙ + 2α(s X (s) − x(0)) + X (s) = s X (s) − sx(0) − x(0) m m s ) ∞ ) ( −(θ )s Σ e n+1 . x(θn+1 ) − x(θn ) + s n=0 2
˙ = v0 , rearranging the last equality Using the initial conditions x(0) = d0 and x(0) and solving for X (s), we obtain ) Σ( ) e−(θn+1 )s d0 A( x(θn+1 ) − x(θn ) + 2 2 2 2 m s((s + α) + ξ ) n=0 s((s + α) + ξ ) ∞
X (s) = +
v0 + 2αd0 + d0 s . (s + α)2 + ξ 2 (48.35)
Then, if we apply the inverse Laplace transform to (48.35), we find that ( ( ) A 2α −αt −αt sin(ξ t) − e cos(ξ t) d e 1 − 0 m(α 2 + ξ 2 ) ξ ∞ [ ]) Σ −α(t−θn+1 ) 2α + φ(n)u θn+1 (t) 1 − e [ sin(ξ(t − θn+1 )) − cos(ξ(t − θn+1 ))] ξ n=0
x(t) =
+ d0 e−αt cos(ξ t) +
v0 + 2αd0 −αt e sin(ξ t), ξ
(48.36)
where φ(n) is defined as φ(n) = x(θn+1 ) − x(θn ) using the formula for x(t) given by Eq. (48.33). In an explicit way, it can be written that [ ( ( )( φ(n) = e−α(θn+1 −θ0 ) 1 − Ak cos(ξ sn ) + −e
−α(θn −θ0 )
[
10
]
). 1 k=n
[ Nk
] d0 , v0
α ξ
] ) sin(ξ sn ) sin(ξ sn ) + Ak eα(ξ sn ) ξ
where Nk is given in Eq. (48.31). The theorem is proved.
.
References
639
48.3 Conclusion In the present paper, two mechanical problems are considered by taking generalized piecewise constant arguments. These mechanical problems are damped and undamped mass-spring systems which are modeled mathematically in a more general form by piecewise constant argument of generalized type. During the investigation, two methods are used to obtain solutions of these mechanical systems. As a first method, classical method is used and as a second method, Laplace transform method is used. It is first in the literature that Laplace transform is introduced to investigate the differential equations with piecewise constant argument. Laplace transform method is very useful to analyze problems especially in engineering. Hence, we believe that our findings are noteworthy for applications. As a result, it is remarkable that we have obtained solutions of the mechanical problems for any real number initial values. Acknowledgements M. Akhmet has been supported by 2247-A National Leading Researchers Program of TÜBITAK (The Scientific and Technological Research Council of Turkey), Turkey, N 120A138. M. Tleubergenova has been supported by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (grants No. AP09258737 and No. AP08856170).
References 1. Akhmet MU (2007) Integral manifolds of differential equations with piecewise constant argument of generalized type. Nonlinear Anal 66:367–383 2. Akhmet MU (2011) Nonlinear hybrid continuous discrete-time models. Atlantis Press, Amsterdam-Paris 3. Akhmet MU (2005) On the integral manifolds of the differential equations with piecewise constant argument of generalized type. In: Agarval RP, Perera K (eds) Proceedings of the conference on differential and difference equations and applications. Hindawi Publishing Corporation, Melbourne, Florida, pp 11–20 4. Akhmet MU (2014) Quasilinear retarded differential equations with functional dependence on piecewise constant argument. Commun Pure Appl Anal 13(2):929–947 5. Akhmet MU (2008) Stability of differential equations with piecewise constant arguments of generalized type. Nonlinear Anal 68:794–803 6. Akhmet MU, Aru˘gaslan D (2009) Lyapunov-Razumikhin method for differential equations with piecewise constant argument. Discrete Continuous Dyn Syst 25(2):457–466 7. Akhmet MU, Aru˘gaslan D, Liu X (2008) Permanence of nonautonomous ratio-dependent predator-prey systems with piecewise constant argument of generalized type. Dyn Continuous Discrete Impulsive Syst Ser A Math Anal 15(1):37–51 8. Akhmet MU, Aru˘gaslan D, Yılmaz E (2010) Stability analysis of recurrent neural networks with piecewise constant argument of generalized type. Neural Netw 23:805–811 9. Akhmet MU, Aru˘gaslan D, Yılmaz E (2010) Stability in cellular neural networks with a piecewise constant argument. J Comput Appl Math 233:2365–2373 10. Aru˘gaslan D, Cengiz N (2018) Existence of periodic solutions for a mechanical system with piecewise constant forces. Hacet J Math Stat 47(3):521–538. https://doi.org/10.15672/HJMS. 2017.469 11. Bainov DD, Simeonov PS (1995) Impulsive differential equations: asymptotic properties of the solutions. World Scientific, Singapore, New Jersey, London
640
48 Investigation of Mass-Spring Systems Subject …
12. Samoilenko AM, Perestyuk NA (1995) Impulsive differential equations. World Scientific 13. Lakshmikantham V, Bainov DD, Simeonov PS (1989) Theory of impulsive differential equations. World Scientific, Singapore, New Jersey, London, Hong Kong 14. Wiener J (1983) Differential equations with piecewise constant delays. In: Lakshmikantham V (ed) Trends in the theory and practice of nonlinear differential equations. Marcel Dekker, New York, pp 547–552 15. Wiener J (1984) Pointwise initial-value problems for functional differential equations. In: Knowles IW, Lewis RT (eds) Differential equations. North-Holland, New York, pp 571–580 16. Cooke KL, Wiener J (1984) Retarded differential equations with piecewise constant delays. J Math Anal Appl 99:265–297 17. Cooke KL, Wiener J (1991) A survey of differential equations with piecewise continuous arguments. In: Busenberg S, Martelli M (eds) Delay differential equations and dynamical systems (Lecture Notes in Math 1475). Springer, Berlin, pp 1–15 18. Shah SM, Wiener J (1983) Distributional and entire solutions of ordinary differential and functional differential equations. Inter J Math Math Sci 6(2):243–270 19. Cooke KL, Wiener J (1984) Distributional and analytic solutions of functional differential equations. J Math Anal Appl 98:111–129 20. Dai L, Singh MC (1994) On oscillatory motion of spring-mass systems subjected to piecewise constant forces. J Sound Vibration 173(2):217–231 21. Dai L, Singh MC (2008) Nonlinear dynamics of piecewise constant systems and implementation of piecewise constant arguments. World Scientific, Singapore
Chapter 49
Classification of High Resolution Melting Curves Using Recurrence Quantification Analysis and Data Mining Algorithms Fatma Ozge Ozkok and Mete Celik
49.1 Introduction Identification of plant, animal, and microorganisms species is essential in several application domains [1–3]. Although morphological differences are often used in species identification, sometimes it is not enough. Therefore, molecular methods such as PCR (Polymerase Chain Reaction) analysis have become popular recently [4–6]. PCR method, which makes thousands or even millions of copies of nucleic acid sequences in a short time, is widely used in molecular biology. PCR products can be analyzed by post-PCR analysis methods such as gel electrophoresis. In gel electrophoresis method, species comparison is performed according to the bands formed by nucleotides. The species of an unknown sample is found by comparing its band with that of the species of known samples [7–9]. With this method, a limited number of samples can be compared, and the results can be misleading due to the risk of contamination. Because of these reasons, high resolution melting (HRM) analysis methods have become popular. HRM analysis is fast, accurate, and cost-effective and has no risk of contamination since it occurs in a closed tube [10–12]. In real-time PCR, with the increasing temperature, the DNA strands are separated that leads to the release of fluorescent dye. An HRM curve is formed based on increasing temperature and decreasing amount of fluorescent dye. The shape of the HRM curve is unique to each species since the shape of the HRM curve is based on the sequence, length, and GC content of DNA [13, 14]. Classification methods are mainly used for the identification of HRM curves. Since the shapes of the HRM curves are specific to each species, the distances between the F. O. Ozkok (B) · M. Celik Department of Computer Engineering, Erciyes University, Kayseri, Turkey e-mail: [email protected] M. Celik e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_49
641
642
49 Classification of High Resolution Melting Curves Using Recurrence …
curves provide information about the closeness/similarity of the species. Therefore, species having the same or similar HRM curves can be grouped together to differentiate them from the other species. Classification of HRM curves has been applied to many application areas, such as food safety [15, 16], identification of disease-causing bacteria or viruses [17, 18], and successful results have been obtained. However, classification HRM curves is challenging due to several reasons. First, if the number of species or the number of samples increases, the analysis becomes difficult. Second, as the sampling resolution increases, the size of the HRM curve time series increases. Third, closely related species may not be distinguished because the HRM curve is very similar. Recently, recurrence plots and recurrence quantization analysis have become essential methods for the analysis of nonlinear time series. The methods are also able to capture slight differences between samples. Therefore, it is frequently used in the analysis of similar data. Moreover, data mining methods can analyze big datasets in many areas such as health [19–21], finance [22, 23], and marketing [24, 25]. When RQA and data mining algorithms are combined, time series classification performance increases significantly [26, 27]. For these reasons, we proposed a HRM classification method based on RQA and data mining algorithms. The HRM dataset was generated using the Nucleotide database in the NCBI library and the DECIPHER package in R library. Experimental studies show that the proposed method improves the classification performance of the HRM dataset. The paper is structured as follows. The HRM dataset and methods are explained in Sect. 49.2. Then, experimental results are presented in Sect. 49.3. Finally, conclusions and future works are presented in Sect. 49.4.
49.2 Materials and Methods In this section, first, the dataset is introduced, then the basics of recurrence plot, data mining algorithms such as SVM and Naïve Bayes are explained, and, the details of the proposed system are presented.
49.2.1 Dataset In this study, a dataset which contains 750 HRM curves of 15 yeast species was used. The dataset is generated using the DECIPHER package in R library [28] and the ITS2 DNA sequence of yeast samples. The yeast species are Candida albicans, Diutina catenulata, Candida dubliniensis, Pichia norvegensis, Candida tropicalis, Candida zeylanoides, Clavispora lusitaniae, Cryptococcus neoformans, Debaryomyces hansenii, Kluyveromyces lactis, Kluyveromyces marxianus, Meyerozyma caribbica, Pichia kudriavzevii, Rhodotorula mucilaginosa, and Yarrowia lipolytica (Fig. 49.1). The details of the dataset are given in Refs. [29, 30].
49.2 Materials and Methods
643
49.2.2 Methods In this section, the basics of recurrence plot, SVM, and Naïve Bayes are explained. 49.2.2.1
Recurrence Plot
Recurrence plot is a nonlinear data analysis method that converts data to images. To generate a recurrence plot, first a distance matrix is created according to the distance between the points of the time series. Then, if the value in the distance matrix is less than the threshold value, black is assigned, and if it is greater, white is assigned. Figure 49.2 presents a sample recurrence plot of each species in the dataset [31, 32].
Fig. 49.1 Sample HRM curves of each species
Fig. 49.2 Sample recurrence plots of each species
644
49 Classification of High Resolution Melting Curves Using Recurrence …
Table 49.1 RQA parameters of a sample of each species Candida albicans
Vmax
Lmean
ENTR
LAM
Lmax
DIV
TREND
187
89.5618
5.03418
0.999699
231
0.004329
−0.0025
Diutina catenulata
149
74.59704
4.991686
1
346
0.00289
−0.00243
Yarrowia lipolytica
231
100.953
5.362751
1
346
0.00289
−0.00276
Candida zeylanoides
167
82.35163
5.100786
0.99975
175
0.005714
−0.00284
Candida dubliniensis
189
88.32613
5.13084
0.999707
217
0.004608
−0.00256
Kluyveromyces marxianus
200
82.80671
5.099242
0.999786
238
0.004202
−0.00249
Clavispora lusitaniae
183
79.56046
5.079662
1
346
0.00289
−0.00249
Pichia kudriavzevii
253
121.6949
5.230169
0.99978
285
0.003509
−0.00302
Pichia norvegensis
152
136.4595
4.678915
0.999717
303
0.0033
−0.00227
Cryptococcus neoformans
198
82.39479
5.109683
0.999732
248
0.004032
−0.00241
Kluyveromyces lactis
197
82.17808
5.110514
0.999833
237
0.004219
−0.00248
Debaryomyces hansenii
161
84.08253
5.072331
0.999726
172
0.005814
−0.00264
Candida tropicalis
147
85.66242
5.122424
0.999703
185
0.005405
−0.0025
Meyerozyma caribbica
209
88.7582
5.219226
0.999767
213
0.004695
−0.00296
Rhodotorula mucilaginosa
157
74.77143
4.96556
0.999706
223
0.004484
−0.00213
Recurrence quantification analysis (RQA) method is proposed for the analysis of recurrence plot patterns. The method use recurrence plot-based several variables such as longest vertical line length (Vmax), the average length of diagonal line (Lmean), entropy (ENTR), laminarity (LAM), longest diagonal line length (Lmax), divergence (DIV), and trend (TREND) [33–35]. In Table 49.1 the RQA parameters of a sample of each species are given.
49.2.2.2
Support Vector Machine
Support vector machine (SVM) is one of the most used data mining algorithms for classification and regression. The algorithm, first, was introduced by Vapnik and then has been used in a variety of pattern recognition problems [36, 37]. The algorithm separates datasets into classes using a hyperplane or hyperplanes. The primary purpose of the algorithm is to determine the hyperplane that finds the best bounds (Fig. 49.3).
49.2.2.3
Naïve Bayes
Naïve Bayes is one of the most powerful data mining algorithms which is based on Bayes Theorem. The supervised classification algorithm is applied in many types of
49.2 Materials and Methods
645
Fig. 49.3 SVM algorithm
data such as text, signal, or medical data [38–40]. The algorithm calculates conditional probabilities of data using Eq. 49.1. Then test data assign to higher probability. The test data is then assigned to the class with the highest probability. Bayes Theorem is given in Eq. 49.1. p(A | B) is the posterior probability, the probability of event A when event B occurs. p(B | A) is the likelihood probability, the probability of event B when event A occurs. p(A) is the prior probability, the probability of event A that is independent of event B. p(B) is the marginal probability, the probability of event B that is independent of event A [41–43]. p(A|B) =
p(B|A) p(A) p(B)
(49.1)
49.2.3 Proposed Method In this study, we proposed a method based on RQA and data mining algorithms such as SVM and Naïve Bayes to classify HRM curves. This method consists of two steps: pre-processing and classification. In the pre-processing step, we extracted features from HRM curves with RQA. In the classification step, we classified these features with data mining algorithms. The flow of the proposed method is given in Fig. 49.4.
646
49 Classification of High Resolution Melting Curves Using Recurrence …
Fig. 49.4 The flow of the proposed method
49.3 Experimental Results In this section, the classification performance of RQA, HRM curve, and Tm input data using SVM and Naïve Bayes algorithms are compared. Experiments were performed using 10-fold cross validation. Experimental results were evaluated using accuracy, F1 score, precision, specificity, and recall values (Eqs. 49.2–49.6).
Accuracy =
T r ue Positi ve + T r ueN egati ve T r ue Positi ve + T r ueN egati ve + False Positi ve + FalseN egati ve
(49.2)
F1 scor e = 2 ×
Pr ecision =
Speci f icit y =
Recall =
Pr ecision × Recall Pr ecision + Recall
(49.3)
T r ue Positi ve T r ue Positi ve + False Positi ve
(49.4)
T r ueN egati ve T r ueN egati ve + False Positi ve
(49.5)
T r ue Positi ve T r ue Positi ve + FalseN egati ve
(49.6)
The macro average and standard deviation of accuracy, F1 score, specificity, precision, and recall classification metrics are given in Table 49.2. When RQA parameters of HRM and SVM algorithm were used, accuracy, F1 Score, specificity, precision, and recall values are 95.2 ± 2.2, 0.95 ± 0.02, 1 ± 0, 0.96 ± 0.02, and 0.95 ± 0.02. When Tm and Naïve Bayes algorithm were used, accuracy, F1 Score, specificity, precision, and recall values are 58.8 ± 3.74, 0.57 ± 0.03, 0.95 ± 0.01, 0.6 ± 0.04, and 0.59 ± 0.04. The best classification results were obtained when RQA parameters of HRM and SVM algorithm were used. Moreover, when RQA parameters of HRM and SVM algorithm were used, successful results were obtained. The worst results were obtained, when Tm and SVM algorithm were used.
49.3 Experimental Results
647
Table 49.2 The classification results of the methods RQA HRM SVM Naïve SVM Bayes Accuracy F1 Score Specificity Precision Recall
95.2 ± 2.2
94.53 ± 2.55 0.95 ± 0.02 0.94 ± 0.03 1±0 1±0 0.96 ± 0.02 0.95 ± 0.03 0.95 ± 0.02 0.95 ± 0.03
86.53 ± 3.47 0.86 ± 0.04 0.99 ± 0 0.89 ± 0.02 0.87 ± 0.03
Naïve Bayes 92.27 ± 2.42 0.91 ± 0.03 0.99 ± 0 0.93 ± 0.04 0.92 ± 0.02
Tm SVM
Naïve Bayes
60 ± 3.08
58.8 ± 3.74
0.58 ± 0.03 0.96 ± 0.01 0.61 ± 0.04 0.6 ± 0.03
0.57 ± 0.03 0.95 ± 0.01 0.6 ± 0.04 0.59 ± 0.04
Table 49.3 F1 scores of the methods based on species Species
RQA
HRM
Tm
SVM
Naïve Bayes
SVM
Naïve Bayes
SVM
Naïve Bayes
Candida albicans
0.88
0.95
0.92
1
0.48
0.5
Diutina catenulata
0.96
1
0.92
1
0.53
0.43
Yarrowia lipolytica
0.95
0.95
1
0.93
0.95
0.95
Candida zeylanoides 0.94
0.86
0.74
0.73
0.29
0.36
Candida dubliniensis 0.95
0.97
0.94
1
0.25
0.25
Kluyveromyces marxianus
0.92
0.99
0.64
0.82
0.54
0.52
Clavispora lusitaniae 0.95
0.97
0.81
0.99
0.68
0.68
Pichia kudriavzevii
1
1
1
1
0.99
0.99
Pichia norvegensis
1
1
1
1
0.99
0.99
Cryptococcus neofor- 1 mans
1
1
1
0.9
0.88
Kluyveromyces lactis 0.91
0.97
0.64
0.83
0.38
0.37
Debaryomyces hansenii
0.9
0.59
0.53
0.42
0.17
0.16
0.94
0.87
0.94
1
0.73
0.76
1
0.84
1
0.27
0.25
1
0.98
1
0.55
0.52
Candida tropicalis Meyerozyma bica Rhodotorula mucilaginosa
carib- 0.95 1
The F1 scores of methods on species are given in Table 49.3. As can be seen in the table, when RQA parameters of HRM are used as input, both Naïve Bayes algorithm and SVM algorithm are more successful in the classification of almost all species. Moreover, while other methods classified Debaryomyces hansenii species with a very low F1 score, the method classified it with 0.9 F1 score. The lowest F1 score was generally obtained, when Tm and SVM algorithm were used.
648
49 Classification of High Resolution Melting Curves Using Recurrence …
49.4 Conclusions HRM is a powerful method for the molecular analysis of DNA. There are various methods for analyzing HRM data in the literature. However, if there are closely related samples in the dataset or the number of samples is high, errors may occur in the analysis. In this study, we proposed a method based on RQA and data mining to classify HRM curves. This method consists of two steps: pre-processing and classification. In the pre-processing step, features of HRM curves are extracted with RQA. In the classification step, these features are classified using data mining algorithms. The method was applied to a HRM dataset that consists of 750 HRM curves. Experiments have shown that the proposed method improves the results. In future research, we plan to develop new data mining algorithms and new feature extraction methods to improve the classification performance of HRM.
References 1. Madesis P, Ganopoulos I, Anagnostis A, Tsaftaris A (2012) The application of bar-HRM (barcode DNA-high resolution melting) analysis for authenticity testing and quantitative detection of bean crops (Leguminosae) without prior DNA purification. Food Control 25(2):576–582. https://doi.org/10.1016/j.foodcont.2011.11.034 2. Bowman S, McNevin D, Venables SJ, Roffey P, Richardson A, Gahan ME (2017) Species identification using high resolution melting (HRM) analysis with random forest classification. Australian J Forens Sci 1–16. https://doi.org/10.1080/00450618.2017.1315835 3. Paiva MHS, Guedes DRD, Leal WS, Ayres CFJ (2017) Sensitivity of RT-qPCR method in samples shown to be positive for zika virus by RT-gPCR in vector competence studies. Genet Molec Biol 40(3):597–599 4. Erlich HA et al (1989) PCR technology, vol 246. Springer 5. Bartlett JM, Stirling D (2003) PCR protocols, vol 226. Springer 6. Kubista M, Andrade JM, Bengtsson M, Forootan A, Jonák J, Lind K, Sindelka R, Sjöback R, Sjögreen B, Strömbom L et al (2006) The real-time polymerase chain reaction. Mol Aspects Med 27(2–3):95–125 7. Walter J, Tannock G, Tilsala-Timisjarvi A, Rodtong S, Loach D, Munro K, Alatossava T (2000) Detection and identification of gastrointestinal lactobacillus species by using denaturing gradient gel electrophoresis and species-specific pcr primers. Appl Environ Microbiol 66(1):297–303 8. Lambertz ST, Danielsson-Tham M-L (2005) Identification and characterization of pathogenic Yersinia enterocolitica isolates by pcr and pulsed-field gel electrophoresis. Appl Environ Microbiol 71(7):3674–3681 9. Watson JD (2012) The polymerase chain reaction. Springer Science & Business Media 10. Vossen RH, Aten E, Roos A, den Dunnen JT (2009) High-resolution melting analysis (HRMA)more than just sequence variant screening. Human Mutation 30:860–866. https://doi.org/10. 1002/humu.21019 11. Wittwer CT (2009) High-resolution DNA melting analysis: advancements and limitations. Human Mutat 30(6):857–859. https://doi.org/10.1002/humu.20951 12. Winchell JM, Wolff BJ, Tiller R, Bowen MD, Hoffmaster AR (2010) Rapid identification and discrimination of Brucella isolates by use of real-time PCR and high-resolution melt analysis. J Clin Microbiol 48(3):697–702
References
649
13. Zambounis A, Aliki X, Madesis P, Tsaftaris A, Vannini A, Bruni N, Tomassini A, Chilosi G, Vettraino A (2016) HRM: a tool to assess genetic diversity of phytophthora cambivora isolates. J Plant Pathol 98(3):611–616 14. Pereira L, Gomes S, Castro C, Eiras-Dias JE, Brazão J, Graça A, Fernandes JR, Martins-Lopes P (2017) High resolution melting (hrm) applied to wine authenticity. Food Chem 216(Supplement C):80–86 15. Kesmen Z, Büyükkiraz ME, Özbekar E, Çelik M, Özkök FÖ, Kılıç Ö, Çetin B, Yetim H (2018) Assessment of multi fragment melting analysis system (mfmas) for the identification of food-borne yeasts. Current Microbiol 75(6):716–725. https://doi.org/10.1007/s00284-0181437-9 16. Druml B, Cichna-Markl M (2014) High resolution melting (hrm) analysis of DNA—its role and potential in food analysis, Food Chemistry 158 (Supplement C) 245–254. https://doi.org/ 10.1016/j.foodchem.2014.02.111 17. Ashrafi R, Bruneaux M, Sundberg L-R, Pulkkinen K, Ketola T (2017) Application of high resolution melting assay (HRM) to study temperature-dependent intraspecific competition in a pathogenic bacterium. Sci Rep 7(1):1–8. https://doi.org/10.1038/s41598-017-01074-y 18. Cousins MM, Swan D, Magaret CA, Hoover DR, Eshleman SH (2012) Analysis of HIV using a high resolution melting (HRM) diversity assay: automation of HRM data analysis enhances the utility of the assay for analysis of HIV incidence. PLOS ONE 7(12):1–10. https://doi.org/ 10.1371/journal.pone.0051359 19. Jayasri N, Aruna R (2021) Big data analytics in health care by data mining and classification techniques. ICT Express 20. Abdulqadir HR, Abdulazeez AM, Zebari DA (2021) Data mining classification techniques for diabetes prediction. Qubahan Acad J 1(2):125–133 21. Devi RDH, Vijayalakshmi P (2021) Performance analysis of data mining classification algorithms for early prediction of diabetes mellitus 2. Int J Biomed Eng Technol 36(2):148–171 22. Gupta A, Dengre V, Kheruwala HA, Shah M (2020) Comprehensive review of text-mining applications in finance. Finan Innovat 6(1):1–25 23. Majumdar S, Laha AK (2020) Clustering and classification of time series using topological data analysis with applications to finance. Expert Syst Appl 162:113868 24. Kumar TS (2020) Data mining based marketing decision support system using hybrid machine learning algorithm. J Artif Intell 2(03):185–193 25. Amado A, Cortez P, Rita P, Moro S (2018) Research trends on big data in marketing: a text mining and topic modeling based literature analysis. Eur Res Manage Bus Econom 24(1):1–7 26. Silva L, Vaz JR, Castro MA, Serranho P, Cabri J, Pezarat-Correia P (2015) Recurrence quantification analysis and support vector machines for golf handicap and low back pain EMG classification. J Electromyogr Kinesiol 25(4):637–647 27. Acharya UR, Sree SV, Chattopadhyay S, Yu W, Ang PCA (2011) Application of recurrence quantification analysis for the automated identification of epileptic EEG signals. Int J Neural Syst 21(03):199–211 28. Wright ES (2016) Using decipher v2. 0 to analyze big biological sequence data in r. R J 8(1) (2016) 29. Ozkok FO, Celik M (2021) Convolutional neural network analysis of recurrence plots for high resolution melting classification. Comput Meth Prog Biomed 207:106139 30. Ozkok FO, Celik M (2022) A hybrid CNN-LSTM model for high resolution melting curve classification. Biomed Sig Proc Cont 71:103168 31. Zbilut JP, Webber CL Jr (1992) Embeddings and delays as derived from quantification of recurrence plots. Phys Lett A 171(3–4):199–203 32. Webber CL Jr, Zbilut JP (1994) Dynamical assessment of physiological systems and states using recurrence plot strategies. J Appl Phys 76(2):965–973 33. Zbilut JP, Thomasson N, Webber CL (2002) Recurrence quantification analysis as a tool for nonlinear exploration of nonstationary cardiac signals. Med Eng Phys 24(1):53–60 34. Zbilut JP, Webber CL Jr (2007) Recurrence quantification analysis: introduction and historical context. Int J Bifurcat Chaos 17(10):3477–3481
650
49 Classification of High Resolution Melting Curves Using Recurrence …
35. Marwan N, Webber Jr CL, Macau EE, Viana RL (2018) Introduction to focus issue: recurrence quantification analysis for understanding complex systems, Chaos: an Interdisc J Nonlinear Sci 28(8):085601 36. Vapnik V (1999) The nature of statistical learning theory. Springer Science and Business Media, 1999 37. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 38. Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text classification. Knowl Inf Syst 44(1):77–89 39. Bhakre SK, Bang A (2016) Emotion recognition on the basis of audio signal using naive bayes classifier. In: International conference on advances in computing, communications and informatics (ICACCI). IEEE 2363–2367 40. Vembandasamy K, Sasipriya R, Deepa E (2015) Heart diseases detection using naive Bayes algorithm. Int J Innovat Sci Eng Technol 2(9):441–444 41. Murphy KP et al (2006) Naive bayes classifiers. Univer British Columbia 18(60):1–8 42. Webb GI, Keogh E, Miikkulainen R (2010) Naïve bayes. Encyclopedia of machine learning 15:713–714 43. Berrar D (2018) Bayes’ theorem and naive Bayes classifier, encyclopedia of bioinformatics and computational biology: ABC of Bioinformat; Elsevier Science Publisher. The Netherlands, Amsterdam, pp 403–412
Chapter 50
Machine Learning Based Cigarette Butt Detection Using YOLO Framework Hasan Ender Yazici and Taner Dani¸sman
50.1 Introduction The way data is collected and represented has influence on the outcomes of machine learning systems. As a result, extracting features with high representation from the available data is given specific attention in terms of improving the efficiency and classification performance of machine learning algorithms [1]. In the 1970s, the process of selecting optimal features to create feature vectors and solve the problem manually became a complicated process. Since there is human design in rule-based machine learning approaches, making decisions on which information is related to solve the problem is harder. Therefore, selected features as optimal features were not even important features or information to classify an object/entity by using machine learning approaches. In order to avoid such situations and speed up the process, using Deep Learning is suitable.i Deep Learning replaced the old method by bringing a different perspective. Deep Learning is form of machine learning that allows systems to discover for a fact and grasp the world in relation to the hierarchical order of ideas [2]. Deep Learning, a type of machine learning, is used to learn complex datasets which is not pre-processed. Deep learning is being utilized to develop AI technologies that can imitate the brain functions of humans [3]. It is a representation/unsupervised learning methods, made it possible for the machine to gain self-learning capability with raw data. The detection studies which use deep learning mostly depend on automatic detection. Because of using less effort for the learning stage, it is getting more popular for studies which can work on raw data. H. E. Yazici (B) Computer Engineering Department, Graduate School of Natural Applied Sciences, Akdeniz University, Antalya, Turkey e-mail: [email protected] T. Dani¸sman Computer Engineering Department, Faculty of Engineering, Akdeniz University, Antalya, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_50
651
652
50 Machine Learning Based Cigarette Butt Detection …
Cigarette butts are the most common waste that we can see in our streets everyday. Because of these daily wastes, our streets are filled with environmental waste and make healthy life impossible for people. This waste is hard to collect by human beings. It is also hard to detect without concentrating. Also, municipalities create a budget for this issue and pay high fees for cleaning. In this study, we work on studies required for detecting cigarette butt with machine learning techniques under the framework of YOLO. To do this objective, an image dataset was created manually by taking images in real life. For creating our model and teaching it to the machine, every image is labeled manually. To make our study successful, they standardized into 1000 × 800 resolution. At the end of our study, we obtained promising results such as 89% mean average precision, precision of 95%, recall of 93%. F1 score of 98%, precision of 96% and accuracy of 97% were calculated on the testing dataset.
50.2 Related Works Any kind of objects such as small arms, ceramics, glasses, smoke, fire etc. is detected by using algorithms like YOLO that use own unique dataset. Studies in object detection have been carried out deep learning in recent years. Li et al. in their work, they presented an Agricultural Greenhouses Detection in High-Resolution Satellite Images Based on Convolutional Neural Networks: Comparison of Faster R-CNN, YOLO v3 and SSD in China. For detecting Agricultural Greenhouses from satellite photos, they used Convolutional Neural Network(CNN) based models such as Single Shot Multi-Box Detector(SSD), You Look Only Once(YOLOv3) and Faster R-CNN. Their performance, which primarily included accuracy and efficiency, was analyzed and assessed using test sets in a variety of settings. According to mean avarage precision and frame per seconds metrics, the YOLO v3 network produced the greatest results. They concluded that while SSD and Faster R-CNN both indicate certain capabilities in terms of detecting efficiency and accuracy, they both have struggle to meet the demands of quick and accurate object detection at the same time. In operational monitoring work, YOLOv3 can generally accomplish accurate and fast Agricultural Greenhouses detection using satellite images, which can be used as ground for planning and management by government [4]. Basaran et al. in their work presented an application of Determination of Tympanic Membrane Region in the Middle Ear Otoscope Images with Convolutional Neural Network Based YOLO Method. They discovered that LBP, HOG, IFT, and histogrambased algorithms like k-Means are the most common methods for detecting objects. Object detection with computer vision and deep learning methods such as YOLO, Fast R-CNN, SSD, Faster R-CNN and R-CNN are practiced. Traditional algorithm and deep learning algorithm, that use object-region detection, are both efficient when these experiments are reviewed. They applied object detection algorithm named YOLO, which has been demonstrated to be an effective method in the literature in their research [5].
50.3 Method and Material
653
Dikbayir and Bülbül realized that when they add a Faster R-CNN neural network before YOLO, their results are getting higher by 4.3% and their input values reach 60 fps in their work which is Real-Time Vehicle Detection by Using Deep Learning Methods. However, they observed that the performance and accuracy of the developed algorithm decreased as the input size increased [6]. Cagıl and Yıldırım presented Detection of an Assembly Part with Deep Learning and Image Processing. In their experiment, they used the YOLOv3 algorithm with DarkNet Neural Network and they saw that the study works correctly at a rate of 84% [7]. Shahriar et al. in their work, discussed which method is better between Mask R-CNN and YOLO. In their experiments they did on their custom image dataset, YOLO had magnificent results on detecting human figures. They also discovered that Mask R-CNN takes much longer to detect human figures, whereas YOLO can detect any form of human figure. YOLO can be used to detect any form of item in real time and is believed to be the better model of the two [8].
50.3 Method and Material 50.3.1 Deep Learning A strong type of machine learning which allows computers to handle sensory problems like image and speech recognition is increasingly being used in research. Deep learning techniques, such as deep artificial neural networks, employ numerous processing layers to find structure and pattern in huge datasets. Each layer learns a notion from the input, which is then built upon by successive layers; the higher the level, the more abstract the concepts learnt. Deep learning does not require any pre data stage and extracts features automatically. In the initial layer, a deep neural network tasked with interpreting forms would learn to detect simple edges, and then in following layers, it would learn to recognize more complicated shapes made of those edges. Even though there is no strict rule regarding how many layers deep learning requires, most experts agree that more than two are required [9]. Figure 50.1 shows the Deep Neural Network’s structure.
50.3.2 Convolutional Neural Network (CNN) Convolutional Neural Network (CNN) is a deep learning technique based on artificial neural networks (ANN) that is extensively used in the fields of computer vision, object recognition, image processing, classification and object detection [10]. Local receptive fields, shared weights (or weight replication), and spatial or temporal subsampling are three architectural structure that Convolutional Networks use to assure some degree of shift, scale, and distortion invariance. Each unit in layer takes input from group of units that is being in the prior layers that are positioned in a small neighborhood [11].
654
50 Machine Learning Based Cigarette Butt Detection …
Fig. 50.1 Deep neural network’s structure
CNN does not need feature extraction. The machine detects the feature between layers by itself. The most important achievement is that computer hardware is now able to work under more load. Moreover, graphical process units(GPU) have made a lot of progress in terms of increasing their capacity and workload. Because of these developments, CNN is used in image and video classification by using its many layers.
50.3.3 You Only Look Once (YOLO) You Only Look Once (YOLO) is smart CNN for real-time object detection. This technique separates image in areas and predicts bounding boxes and probabilities for every area using one neural network applied to entire image. The predicted probabilities are used to weight these bounding boxes. One convolutional neural network predicts various bounding boxes and class probabilities for those boxes at the same time. YOLO increases detection performance by training on entire images. Compared to traditional way, this integrated model has a lot of advantages. The YOLO network is made up of two completely connected layers and twenty four convolutional layers. They just use 1 × 1 reduction layers followed by 3 × 3 convolutional layers instead of the inception modules used by GoogLeNet [12]. Figure 50.2 shows the YOLO’s architecture. Yolov5’s network architecture is divided into three sections: CSPDarknet is the backbone, PANet is the neck, and Yolo Layer is the head. The data is first input into CSPDarknet, which extracts features, and then into PANet, which fuses them. Finally, Yolo Layer gives you the results of your detection (class, score, location, size). In this study, we are going to use YOLOv5 which is the newest version of YOLO Framework [13]. Figure 50.3 shows the architecture of YOLOv5.
50.3 Method and Material
655
Fig. 50.2 Architecture of YOLO
Fig. 50.3 Architecture of Yolov5 [13]
50.3.4 Dataset In this section, the preparation process of collecting images to create a dataset are explained. These steps include preparing the dataset, labeling and the other preprocessing of the dataset. To ensure our learner can handle different kinds of streets and cigarette’s butt(orange, white), we researched the Internet for a dataset. Even though there are some datasets, which are either synthetically composed images of cigarettes on the ground or almost the same pictures taken from video frames, we
656
50 Machine Learning Based Cigarette Butt Detection …
decided not to use them as dataset. So, every image, that in the dataset, is manually taken for labeling process.
50.3.4.1
Taking Images
The image dataset was created manually by taking images in real life for different angles, slopes, and distances. The necessity of these images to be in different places at different times is experienced in the study since cigarette waste could be anywhere in public, agricultural areas. After taking images, resize all the images in the same size to make it through the processing. The image dataset consists of 2100 images at 1000 × 800 resolution. For resizing images to the specific size we used a Bulk Image Resizing Made Easy 2.0(BIRME).
50.3.4.2
Labeling
We worked on labeling every image that we have taken. To do labeling we used makesense.ai. The labeling procedure is shown in Fig. 50.4. At the end of this process we get the txt file that contains the height, width, X, Y for the cigarette butt in every image. It also outputs the height, width, X, Y for the cigarette butt in.csv format.
Fig. 50.4 Labeling procedure with makesense.ai
50.3 Method and Material
50.3.4.3
657
Setting Google Colab Environment
To speed up the training process we need a faster GPU and more RAM. Therefore we used Google Colab Environment. Therefore we are able to work on 2100 pictures in 18 batches which use 16 GB RAM for GPU.
50.3.4.4
Dividing Dataset
According to the Pareto Principle, we separated our dataset into validation and training datasets in an 80/20(0.8) rate. The Pareto Principle says that around 80% of the consequences are caused by 20% of the causes. Therefore, there are 1600 images for the train process and 400 for validation. Also, after getting weights, there are 100 more images for the test process.
50.3.4.5
Weights Process
First of all, we picked the YOLOv5x pre-trained model, which is XLarge model, to fit our study. After that with the dataset, we created manually, the machine will learn the features of our model from the labels that we made by itself. In this case, 30–35 epochs is enough to learn our model. However we tried with 60 epochs to get better results.
50.3.4.6
Detecting Cigarette Butts
By using weight that we get from training, we are able to detect our model in a given image, video or livestream video as it is shown in Fig. 50.5.
Fig. 50.5 Detecting in video, detecting in image
658
50 Machine Learning Based Cigarette Butt Detection …
50.4 Results and Discussion In this study, we experimented detecting cigarette butts by using YOLO Framework. Even with some limitations for the YOLO Framework, especially in detecting small objects, we obtained good results. In the training process, validation batch prediction results are between 0.3 and 0.9 in confidence. Most of them are more than 0.7. However, we set the confidence threshold to 0.33 for not capturing the predictions even if there is less confidence rate. We also set the Intersection over Union(IoU) rate to 0.72 to get better results with 60 epochs. It takes 3.52 h to get the best weights from our dataset. The mean average precision (mAP) is used to evaluate models in the field of object detection. By comparing the ground-truth bounding box to the detected box, the mAP derives a score. This metric measures how well the model detected a ground-truth bounding box. The model is more accurate if the final score is higher. In our study, we have reached 0.8897 in metric/mAP_0.5 which is the most important metric. It is seen in Fig. 50.6. The better mAP results depend on loss function results. In the training process our box_loss value decreased to 0.014897 and obj_loss value decreased to 0.0072. Also, in the validation process box_loss is 0.026 and obj_loss is 0.0108. If an object is detected in the grid cell, the loss function penalizes classification error [14]. Number of images in the process matters. We also have these results shown in Fig. 50.7 for metric/precision and metric/recall. Another important metric is the Precision-Recall curve which should be like that the recall increases, the precision decreases since the number of positive samples
Fig. 50.6 Metric/mAP_0.5
50.4 Results and Discussion
659
Fig. 50.7 Metric/precision and metric/recall Graph
increases, accuracy decreases. This is an expected situation. When a looking Precision Recall curve graph, it is easy to determine which point is the highest precision and recall value. The Precision-Recall curve graph, Precision-Confidence, the Recall-Confidence and F1 curve graph can be seen in Fig. 50.8. After getting the weights and results that we showed above, we wanted to test 100 more pictures to try our model. We took another 100 images which 90 of them include cigarettes, 10 of them do not include cigarettes inside of images. Our model detects 90 images as True Positive from our test images. It also detects 3 of them as False Positive(different object as cigarette) and 7 of them as True Negative. None of the images detected as False Negative. Table 50.1 shows the confusion matrix of our test images after the detection process. Table 50.2 shows the results of our test.
Fig. 50.8 Metric graphs, a The precision-recall curve graph, b Precision-confidence graph, c Recall-confidence graph, d F1 curve graph
Table 50.1 Confusion matrix of test dataset
n = 100 Truth
Prediction Cigarette
Not cigarette
Cigarette
TP: 90
FN: 0
Not cigarette
FP: 3
TN: 7
660
50 Machine Learning Based Cigarette Butt Detection …
Table 50.2 Result of test dataset Accuracy
Precision
Recall sensitivity
Specificity
F1 score
0.97
0.967
1
0.7
0.983
50.5 Conclusions and Future Work Cigarette butts are one of the most dangerous reasons for environmental pollution. Public organizations are trying to prevent the pollution that exists because of cigarette butts, by increasing the number of people in cleaning service. However, this situation significantly increases the cost. In our study, a model is designed for cigarette butt detection. It has been shown that Machine Learning based on Cigarette’s Butt Detection using YOLO Framework enables us to detect cigarette’s waste at a good rate. By increasing the number of images in the dataset, detecting becomes easier. That means, we can build monitoring systems even in livestream cameras. It can lead to an autonomous system that detects cigarette butts in crowded streets. So, it makes our environment in city life healthier. Living in a non-polluted area is crucial for human beings and any living being. It can be extendable to create autonomous systems to clean our streets and needed areas. Also, by adding GPS Location stickers in video taken by autonomous systems, it will be easier to detect which location is more polluted. Moreover, labor can be saved by shifting the workforce to other areas and by reducing the high budget of cleaning service, public organizations will shift their resources to other important things.
References 1. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828 2. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge 3. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 4. Li M, Zhang Z, Lei L, Wang X, Guo X (2020) Agricultural Greenhouses detection in highresolution satellite images based on convolutional neural networks: comparison of faster RCNN, YOLO v3 and SSD. Sensors 20(17):4938 5. Basaran E, Comert Z, Celik Y, Velappan S, Togacar M (2020) Determination of tympanic membrane region in the middle ear otoscope images with convolutional neural network based YOLO method. DEUFMD 22(66):919–928 6. Dikbayir HS, Bülbül H˙I (2020) Derin Ö˘grenme Yöntemleri Kullanarak Gerçek Zamanlı Araç Tespiti. Türk Bilim Ara¸stırma Vakfı 13(3):1–14 7. Cagil G, Yıldırım B (2020) Detection of an assembly part with deep learning and image processing. Zeki Sistemler Teori ve Uygulamaları Dergisi 3(2):31–37 8. Shahriar SS, Junzo W, Anuvara R, Dra R (2020) J Phys: Conf Ser 1529:042086 9. Rush N (2016) Deep learning. Nat Methods 13(1):35 10. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: International conference on engineering and technology (ICET). IEEE, pp 1–6
References
661
11. LeCun Y, Bottou L, Bengio Y, Haffner P (1988) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 12. Abrol S, Mahajan R (2015) Artificial neural network implementation on FPGA chip. Int J Comput Sci Inf Technol Res 3:11–18 13. Xu R, Lin H, Lu K, Cao L, Liu Y (2021) A forest fire detection system based on ensemble learning. Forests 2021(12):217 14. Redmon J, Divvala S, Girshick R, Fardahi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp 779–788
Chapter 51
Securing and Processing Biometric Data with Homomorphic Encryption for Cloud Computing Abdulrahim Mohamed Ibrahim
and Alper Ozpinar
51.1 Introduction With the rapid increase in the daily and widespread use of biometric technologies, it has become a critical problem to analyze the security of our biometric identity in case of a breach or leak. The increase in security and its improvement through biometric technology has drawn attention to conduct studies. There are different but similar explanations for biometrics. Biometrics is the branch of science that deals with the identification and verification of a person based on physiological and behavioral characteristics. These characteristics or identifiers are permanent, unique and can distinguish one person from another [1]. Biometrics is the measurement and statistical analysis of a person’s physical and behavioral characteristics [2]. The term is derived from the Greek words “bio,” meaning life, and “metric,” meaning to measure [3]. Biometrics are usually identified by two main characteristics, physiological characteristics, which include a person’s fingerprints, DNA, facial pattern, handprints, retina, ear patterns/characteristics, and even smell, and behavioral characteristics, which is the person’s behavior, which includes typing rhythm, gait, gestures, and voice [4, 5]. These characteristics are unique to each person and are therefore ideal for identification, access control and security purposes [6, 7].
51.2 Related Works According to [8], for the implementation of SIFT on the UPOL dataset considering a new iris image after the triage of the competitor image, when comparing between the test image and the image from the dataset, all distances between the test feature A. Mohamed Ibrahim (B) · A. Ozpinar Istanbul Commerce University, Istanbul, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_51
663
664
51 Securing and Processing Biometric Data with Homomorphic Encryption …
vector and the feature vector of the image from the dataset were calculated, where the test feature vector is considered to match with another feature vector of the image from the dataset if the distance to the feature is smaller than the distance to the next feature in percentage. Their study was based on the Aly matching scheme, which they improved to achieve equal or better results. According to [9], the main problem of image matching based on SIFT is to extract a set of keypoints to identify the image, and image matching becomes keypoint matching. Therefore, SIFT was introduced for face recognition, where firstly the face feature database is created by SIFT, secondly the input face features are extracted by SIFT, and thirdly the faces are recognized by feature matching. On the study [10] which was based on several methods. One of the methods is the triage of handprints to detect local image features that are invariant to the scale, translation, and rotation of the image. This was achieved by selecting key points at local maxima and minima of the differences of the Gaussian function in scale space. Around each detected local maximum or minimum, a 16 × 16 window is used to construct a histogram of the gradient alignment. SIFT Key points are adjusted if the distance between that point and its nearest neighbors is significantly greater than the distance between that point and the second nearest neighbors. According to [11], they wanted to ensure that the unencrypted privacy-sensitive biometric features were only available to the client capable of performing computations in the encrypted domain, thus preventing eavesdropping and malicious attacks on the database system. Since all database entries are stored in encrypted form, there are no valuable attack vectors in case of database intrusion or insider threats. To further strengthen the security of the proposed system, their methodology is based on measuring active attacks against the transmitted feature vector and deciding the identification transaction. To prevent attackers from manipulating or penetrating the database, the database entries can be randomly merged at the time of each identification transaction, both of which do not affect the biometric performance.
51.3 Method and Material 51.3.1 Biometric Identification The identification mode involves building the character of a person depending on biometric evaluations. The comparator identifies the captured biometric data with those stored in the database using a 1:N matching algorithm for recognition [12, 13]. In the identification mode, the framework perceives a person looking through the formats of the multitude of clients in the information base for a match. Subsequently, the system directs a one-to-numerous correlations with build up a singular’s character (or falls flat if the subject isn’t enrolled in the database) without the issue asserting an identity (e.g., “Whose biometric information is this?”). Identification is an essential part of negative recognition applications where the system builds up
51.3 Method and Material
665
whether the individual is who she (verifiably or expressly) denies being. The reason for negative recognition is to keep a solitary individual from utilizing various characters. Recognizable proof may likewise be used in specific recognition for convenience (the client not needed to guarantee a character). While traditional individual recognition strategies such as passwords, PINs, keys, and tokens might work for positive recognition, negative recognition must be established through biometrics. The identification system will ask you to classify yourself by submitting a biometric measurement. The measurement you enter is then compared to the measurement you provided when you registered with the system to determine if it matches. Biometric tests are always fuzzy to some degree and change over time and with the level of capture. If the biometric measures presented and captured are sufficient, you are assumed to be the person registered under the name you provided with certainty. If the characteristics presented and captured are insufficient, you will usually be allowed one repeat attempt. With multiple attempts, the number of incorrectly rejected users may be less than 1%.
51.3.2 Homomorphic Encryption Encryption refers to algorithms applied to convert plaintext into a code or incoherent form of content and shop security. The recipient uses the “key” to the encrypted content to decrypt the content. Since the old technique was used to secure data, which is extremely important for military and government operations, it has extended to the mere lives of civilians. This includes the online routines of banks, the exchange of data through frameworks, the exchange of important private data that is necessary for the application of encryption for security [14]. The evolution of ICT systems from OnPremise and disconnected resources to cloud-based storage systems such as OneDrive, TheBox, Dropbox, GoogleDrive expands the daily use of these systems. Although they take advantage of these services, the potential downsides of using cloud services are the loss of privacy and business value of confidential data. A practical way to solve these problems is to encrypt all data stored in the cloud and perform operations on the encrypted data. According to [14], they describe it as a form of encryption that supports specific types of computations like addition or multiplication to be provided out on ciphertext. The encrypted result will remain the same when decryption is done. Homomorphic Encryption has five frames of transferring through: . . . .
Setup: scheme, security parameter, Functionality parameter. Key generation: secret key, public key, re-linearization key, Galois key. Encryption: number or vector of number (ciphertext). Evaluation then Decryption.
666
51 Securing and Processing Biometric Data with Homomorphic Encryption …
51.3.3 Overview of SEAL and TENSEAL Simple Encrypted Arithmetic Library SEAL is a homomorphic encryption library from Microsoft that can be used to perform additions and multiplications on encrypted data as shown in Fig. 51.1 [15]. Only certain privacy-critical parts of programs computed in the cloud should be implemented using Microsoft SEAL. There are two different homomorphic encryption schemes such as the Brakerski/Fan-Vercauteren BFV [16] and Cheon et al. CKKS [17] with completely different properties. The BFV scheme allows modular arithmetic to be performed on encrypted data. The CKKS scheme [18] allows addition and multiplication with encrypted real or complex numbers, but only gives approximate results [17]. TENSEAL is a library for performing homomorphic encryption operations on tensors, built on top of Microsoft SEAL. It gives ease of use through a Python API, while maintaining performance by implementing most of its operations using C++ [18, 19]. According to [20] TENSEAL is a library that combines classical machine learning frameworks with homomorphic encryption functions. It controls all complexities of executing tensor methods on encrypted data. The core API is designed around three main components: the context, the simple tensors, and the encrypted tensors.
51.4 Proposed Methodology and Algorithm In Fig. 51.2 a detailed flowchart for the proposed methodology and algorithm has been provided. The proposed framework focuses on security and the use of HE and assumes that clean and unbiased biometric images are already available to the system.
Fig. 51.1 Microsoft SEAL cloud storage and computation
51.4 Proposed Methodology and Algorithm
667
Fig. 51.2 Proposed algorithm
Many sampling methods and devices for biometric feature extraction have already been explained in the literature and also in the initial parts of this paper [21, 22]. The algorithm can be separated in to two main stages “Feature Extraction from Biometric Data” and working with this data with HE.
51.4.1 Experimental Dataset The experiments have been conducted on 11K Hands dataset which is publicly available for research experiments, According to [23] this dataset has 11,076 full hand from each side and each angle. The subjects were different in age, 18–75 years old. To vary the shapes of the photographed hands, we asked each subject to open and close
668
51 Securing and Processing Biometric Data with Homomorphic Encryption …
Fig. 51.3 Representative examples of the proposed dataset
the fingers of their right side and left side hands at random. Each hand was photo’d from both the dorsal and palmar sides. Sample images can be seen from Fig. 51.3. There is a precise description of metadata combined with each hand image. . . . . . .
190 subjects, 1600 × 1200 pixels, Subject identifications, Gender details, Age information, Skin color of the hand.
In the extension, each metadata record combines a set of data about the recognised hands, such as with initials of side and hand as RH or LH, hand side (dorsal or palmar), and coherent indications of whether the hand image contains additions, irregularities, or changes such as clean nails. All irregularities, additions, and image distortions are excluded from the sample item dataset to focus on the main purpose of the item. These biometric variants are reserved for future work.
51.5 Discussion and Results “Colab” from Google Research used for computations with Python 3 with standard free configuration setup. The first stage for the study is to prove that chosen SIFT algorithm can be used for hand images. The idea is to check if different people have different hand images and they have a limited similarity. For both left and right hands similarity between different people is around 20% as can be seen from Table 51.1. Table 51.1 Implementing SIFT on people versus people palms (P vs P) Samples
Type
Time (s)
Min
Max
Good match
Average % similarity
P versus P
Left
35.8807
3.81
20.69
439
18.29
P versus P
Right
34.3043
10
31
338
14.08
51.5 Discussion and Results
669
Table 51.2 Implementing SIFT on people versus themselves (verification) Samples
Type
Time (s)
Min
Max
Good match’s
Average % similarity
Verification
Left
18.8149
35
141
910
75.83
Verification
Right
15.5564
29
154
1041
86.75
Table 51.3 Results from total samples match
Hands samples
Type
Total sample
Average similarity (%)
Verification
Left and Right
50
81.29
Different people
Left and Right
50
16.185
As a result, different people also have different SIFT extracted image similarity levels. So the next part of the study is to see if this approach works for different poses of the same people. The dataset was searched for different samples of the same people and a new dataset was created with 2 images of each person with left and right hand. The processing time is not important as it is a relative approach depending on the machine configuration. The results are shown in Table 51.2, and as expected, the similarities are above 75%. In summary, the results have shown that identification using SIFT works well in matching and identification as well as in securing and is incomparable can be seen in Table 51.3. As part of our experiment, another comparison was also made to see if our SIFT implementation performed similarly with reference studies. For this purpose, a comparison was made with the recent approach of [24] for different types of images from different domains. As can be seen from Table 51.4, using SIFT for handheld images has better matching rates than comparing different images. The next research section focuses on processing hand images with extracted features to compute homo-morphic encryption. Since basic mathematical operations can be performed with HE in vectors, distance-based similarity approaches such as Euclidean and Minkowski are implemented with SEAL. Thus, encrypted feature vectors are compared based on distances to see how similar they are, as in neighborhood-based classification approaches. The results of these studies can be seen in Table 51.5. By using homomorphic encryption TENSEAL to encrypt the extracted samples that passed through SIFT, the new size of the extracted sample was 1039 × 1384 Table 51.4 Comparison between our proposed work and related work
Approach
Algorithm
Application
Match rate (%)
Proposed work
SIFT
Paper’s dataset
81.29
Reference study [24]
SIFT
Image matching
62.89
670
51 Securing and Processing Biometric Data with Homomorphic Encryption …
Table 51.5 Comparison Euclidean timing with homomorphic encryption and without
Time (s)
Average distance
People versus people differences Euclidean distance
0.368
0.71
Euclidean over HE
23.441
1.04
Euclidean distance
0.364
0.52
Euclidean over HE
23.434
0.86
Self verification
which was smaller than the original and running it through HE allowed us to encrypt the outputs to secure them and process a distance measure to evaluate the result with the time taken for processing, as can be seen in Table 51.5, we implemented a distance measure like Euclidean over our data sample which provided improved security. The result of homomorphic encryption over Euclidean to measure the distances shows that it consumes more time than direct Euclidean encryption with the original 1600 × 1200 data. As expected, computation time has increased by working with HE, but this is a scaling problem that can be improved in the cloud without sacrificing security. And also the average distances of the data that are above a certain threshold can be used for identification.
51.6 Conclusions and Future Work As a result of the study, our approach has shown that optimal security through homomorphic encryption and identification management with SIFT is required to secure biometric identity. In our approach, we address the use of the patterns without changing the actual data by implementing the homomorphic encryption. In future studies, there is a gap in improving the timing in the performance of homomorphic encryption, and we will study and improve the timing of homomorphic encryption when using biometric data. The authors also plan to extend this study with other biometric identification data such as vein, palm and iris identification.
References 1. Dargan S, Kumar M (2020) A comprehensive survey on the biometric recognition systems based on physiological and behavioral modalities. Exp Syst Appl 143:113114. https://doi.org/ 10.1016/j.eswa.2019.113114 2. Jain A, Bolle R, Pankanti S (1996) Introduction to biometrics. In: Biometrics. Springer, pp 1–41 3. Hadid A, Evans N, Marcel S, Fierrez J (2015) Biometrics systems under spoofing attack: an evaluation methodology and lessons learned. IEEE Sig Process Mag 32:20–30
References
671
4. Jain A, Hong L, Pankanti S (2000) Biometric identification. Commun ACM 43:90–98 5. Delac K, Grgic M (2004) A survey of biometric recognition methods. In: Proceedings. Elmar2004. 46th international symposium on electronics in marine. IEEE, pp 184–193 6. Rafie R (2016) Implementation of biometric authentication methods for home based systems 7. Benantar M (2005) Access control systems: security, identity management and trust models. Springer Science & Business Media 8. Pˇavˇaloi I, Ignat A (2019) Iris image classification using SIFT features. Procedia Comput Sci 159:241–250. https://doi.org/10.1016/j.procs.2019.09.179 9. Yanbin H, Jianqin Y, Jinping L (2008) Human face feature extraction and recognition base on SIFT. In: Proceedings of international symposium on computer science and computational technology ISCSCT 2008, vol 1, pp 719–722.https://doi.org/10.1109/ISCSCT.2008.249 10. Chen J, Moon YS (2008) Using SIFT features in palmprint authentication. In: Proceedings of international conference on pattern recognition.https://doi.org/10.1109/icpr.2008.4761867 11. Drozdowski P, Buchmann N, Rathgeb C et al (2019) On the application of homomorphic encryption to face identification. In: Proceedings of 2019 international conference of the biometrics special interest group (BIOSIG) 12. Jorgensen Z, Yu T (2011) On mouse dynamics as a behavioral biometric for authentication. In: Proceedings of the 6th ACM symposium on information, computer and communications security, pp 476–482 13. Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2146–2153 14. Bozduman HÇ, Afacan E (2020) Simulation of a homomorphic encryption system. Appl Math Nonlinear Sci 5:479–484. https://doi.org/10.2478/amns.2020.1.00046 15. Chen H, Laine K, Player R (2017) Simple encrypted arithmetic library-SEAL v2. 1. In: International conference on financial cryptography and data security. Springer, pp 3–18 16. Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. IACR Cryptol ePrint Arch 2012:144 17. Cheon JH, Kim A, Kim M, Song Y (2017) Homomorphic encryption for arithmetic of approximate numbers. In: International conference on the theory and application of cryptology and information security. Springer, pp 409–437 18. Mert AC, Öztürk E, Sava¸s E (2019) Design and implementation of encryption/decryption architectures for bfv homomorphic encryption scheme. IEEE Trans Very Large Scale Integr Syst 28:353–362 19. Halevi S, Polyakov Y, Shoup V (2019) An improved RNS variant of the BFV homomorphic encryption scheme. In: Cryptographers’ track at the RSA conference. Springer, pp 83–105 20. Benaissa A, Retiat B, Cebere B, Belfedhal AE (2021) TenSEAL: a library for encrypted tensor operations using homomorphic encryption. 1–12 21. Prabhakar S, Pankanti S, Jain AK (2003) Biometric recognition: security and privacy concerns. IEEE Secur Priv 1:33–42. https://doi.org/10.1109/MSECP.2003.1193209 22. Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE Trans Circuits Syst Video Technol 14:4–20 23. Afifi M (2019) 11K hands: gender recognition and biometric identification using a large dataset of hand images. Multimed Tools Appl 78:20835–20854. https://doi.org/10.1007/s11042-0197424-8 24. Karami E, Prasad S, Shehata M (2017) Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images
Chapter 52
Automatic Transferring Data from the Signed Attendance Papers to the Digital Spreadsheets Sefa Çetinkol, Ali Sentürk, ¸ and Yusuf Sönmez
52.1 Introduction Students are obligated to attend the lectures in the education programs of the universities. Attendance obligation rates are stated in the official regulations. This rate is generally 70% for theoretical courses and 80% for applied courses in Turkey. In the recent years, several different methods have been proposed to detect student attendances. For instance, some of the proposed methods include face detection and identification [1], QR codes [2], fingerprints and Bluetooth devices [3] and RFID [4] to detect student attendances. Despite the aforementioned proposed methods, the traditional method, which students’ signatures are taken on a paper, still continues to be used commonly. Students should regularly sign the attendance paper for each lecture. Students’ attendances are submitted in the online systems or inserted into the digital spreadsheets to calculate the attendance rates of the students. This is a time-consuming process for lecturers. There are a few studies about detecting signatures on attendance papers. Weerasinghe and Sudantha are used morphological operations and Support Vector Machine classifier to recognize and extract signatures on the papers. The extracted signatures are verified by using the Kolmogorov Smirnov test [5]. In this study, the attendance S. Çetinkol (B) Electronics Engineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkey e-mail: [email protected] A. Sentürk ¸ Electrical and Electronics Engineering, Faculty of Technology Isparta, Isparta University of Applied Sciences, Isparta, Turkey e-mail: [email protected] Y. Sönmez Faculty of Information and Telecommunication Technologies, Azerbaijan Technical University, Baku, Azerbaıjan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_52
673
674
52 Automatic Transferring Data from the Signed …
paper is scanned, therefore the image is less noisy. Additionally, the text is followed by only one signature hence signature detection is straightforward. Kurniawan et al. used many image processing methods to detect signature cells on the attendance papers. If the number of white pixels in a signature cell is greater than the threshold value, the cell is classified as containing a signature [6]. In this study, the signature paper is scanned and all the signatures are almost fit in the cells. Therefore, simple thresholding is sufficient for detecting signatures in the cells. In this paper, the methods are proposed to extract attendance data from signed papers to evaluate attendance ratios automatically by using image processing and machine learning algorithms. The attendance papers used in this study are photographed therefore the images are more noisy. The signature cells are smaller and many signatures are overflowed to the neighbor cells. The non-signed cells are filled with a board-marker or kept blank. Frequently, non-signed cell markings are also overflowed onto the signed cells. The process is summarized as follows: The original attendance lists and the signed attendance lists are matched by using geometric transformations and template matching. The texts on the attendance lists are read by Optical Character Recognition (OCR). The signature cells of the students are detected. The proposed signature detection methods are used on these cells to classify whether the cells contain signatures or not. The first proposed method named “The Edge and Skeleton Method” and CNNs are used in the second proposed method. The detection is visualized on the signed attendance paper and the mispredicted cells can be manually corrected. Then, the attendances of the students are tabulated in the digital spreadsheets. The rest of the paper is organized as follows: Sect. 52.2 presents the computer vision algorithms used in the proposed methods. Section 52.3 covers the Convolutional Neural Network. Proposed methods are explained in Sect. 52.4. Results are given in Sect. 52.5. Finally, Sect. 52.6 presents the conclusion.
52.2 Computer Vision Methods Computer vision and machine learning algorithms are used to transfer the attendance lists to digital spreadsheets. The programs are written in Python programming language. The following libraries are used in the programs: • Numpy (Numerical Python) is a mathematical library that provides fast calculations of numeric data [7]. • OpenCV (Open Source Computer Vision) library offers more than 2500 algorithms related to image processing and computer vision [8]. Images are stored in the form of the Numpy arrays in OpenCV library. • Pytesseract library is “a wrapper for Google’s Tesseract-OCR Engine”. TesseractOCR is developed by Google since 2006 for optical character recognition [9]. • Tensorflow is an open-source machine learning library developed by Google [10]. It is mainly focused on artificial neural network based machine learning algorithms.
52.2 Computer Vision Methods
675
• Openpyxl library is used for reading from and writing to digital spreadsheet files [11]. Computer vision methods and algorithms used in the Edge and Skeleton method are explained in the following subsections.
52.2.1 Canny Edge Detection Canny edge detection is one of the most powerful algorithms for detecting edges in images [12]. The steps of Canny edge detection algorithm are as follows: • Gaussian filter is applied to the image in order to remove the noise. • Sobel kernels in vertical and horizontal directions are applied to find the intensity gradients of the image pixels. Then the magnitude and direction of the gradients are calculated. • Non-maximum points in a neighborhood with the same directions are suppressed and the local maximum is selected as edge pixel candidate for the next step. • Upper and lower threshold values are defined. A pixel with an intensity gradient greater than the upper threshold is considered an edge. A pixel with an intensity gradient below the lower threshold is considered a non-edge. A pixel with an intensity gradient between the upper and lower threshold values is considered as an edge if it is connected to the pixels which are above the upper threshold value. Otherwise, it is not considered an edge.
52.2.2 Morphological Tranformations Erosion and dilation are two fundamental morphological transformation operations [13]. In the morphological transformation operations, an n × m kernel is used. In erosion, if any pixel value in the image under the kernel area is minimum, the corresponding pixel value under the kernel is converted to minimum. In dilate, if any pixel value in the image under the kernel area is maximum, the corresponding pixel value under the kernel is converted to maximum. Erosion and dilation operations are shown in Fig. 52.1. ) ( )( I ' = A ⊕ K = { k x + x, k y + y | k x , k y ( ) ∈ K . k x + x, k y + y ∩ A /=, (x, y) ∈ Z} I ' = A.K =
(52.1)
) ( {( )( ) } k x + x, k y + y | k x , k y ∈ K . k x + x, k y + y ⊆ A, (x, y) ∈ Z (52.2)
Let A be a region in an image to be dilated or eroded and K be a kernel. The dilation operation is defined as shown in Eq. 52.1 and erosion operation is shown in Eq. 52.2. In the equations, k x is the x coordinate and k y is the y coordinate of the kernel.
676
52 Automatic Transferring Data from the Signed …
(a) Erosion Fig. 52.1 Morphological transformation
52.2.3 Shape Skeleton Shape skeleton is the best fit one pixel wide representation of an object in an image. Shape skeleton is useful for feature extraction and object definition. Images and their skeletal structures are shown in Fig. 52.2. The shape skeleton algorithm is given in Algorithm 1. Morphological transformations are the core operations of the shape skeleton algorithm. Algorithm 1 Shape Skeleton kernel: K Z = {z i j = 0 ∈ {0, 255}m×n } for input image: I ∈ {0, 255}m×n do while ∃E(x, y) /= 255 do E=I.K D=E⊕K S=I−D Z =Z+S I = E. end while output image: Z end for
Fig. 52.2 Shape skeleton
52.3 Convolutional Neural Networks
677
52.3 Convolutional Neural Networks Convolutional Neural Network (CNN) is an artificial neural network model generally used for object classifications, clustering and recognition in the images. In CNN, various filters are applied to the images. The filtered images are used for training the artificial neural network (ANN). The trained model is used for the classification of the objects in the other images. CNN is considered a deep learning algorithm since it consists of multiple layers. These layers are explained in the following subsections. After images are predicted by using CNN layers, the prediction error is propagated backward to update the weights of the neurons and convolution filters to minimize the error. In Fig. 52.3 the general structure of a CNN is shown.
52.3.1 Convolution Layer Many filters with different sizes like 2 × 2, 3 × 3, or 5 × 5 are applied in the convolution layers to extract various features from the image.
52.3.2 Rectified Linear Unit (ReLU) Layer In Rectified Linear Unit (ReLU) layer, non-negative input remains the same while the negative input value is set to 0 as shown in Eq. 52.3. The ReLU layers increase the learning rate. . ReLU (x) = max(0, x) =
0, x < 0 x, x ≥ 0
(52.3)
52.3.3 Max-Pooling Layer Max-pooling layer reduces the size of the data by getting the maximum value in a neighborhood. Although this process causes data loss, the remaining data is still
Fig. 52.3 General structure of a CNN model
678
52 Automatic Transferring Data from the Signed …
Fig. 52.4 Max-pooling
sufficient. Max-pooling layer increases the performance by reducing the computations in the network. Furthermore, it prevents over-fitting. Max-pooling is shown in Fig. 52.4.
52.3.4 Fully Connected Layer The data from previous stage is flattened and flows through the Fully Connected (FC) layers which is the classical Artificial Neural Networks. Neurons in an FC layer have weighted connections to the neurons in the previous FC layers as shown in Fig. 52.5. Equation 52.4 is used to calculate the values of neurons at FC layers,
xi(n+1)
⎛ ⎞ . (n) (n+1) ⎠ = f ⎝ (wi j × x (n) j ) + bi
(52.4)
j
where; x nj winj bi(n+1) f
is the value of jth neuron at nth layer, is the weight between jth neuron at nth layer and ith neuron at (n + 1)th layer, is the bias value of ith neuron at (n + 1)th layer, is the activation function.
The values of weights and the filters are updated by using local gradient and chain rule to minimize prediction errors in the back-propagation stage [14].
52.4 Proposed Methods
679
Fig. 52.5 Fully connected layer
The CNN is trained by updating the network parameters in the back-propagation stage followed by the prediction calculations.
52.4 Proposed Methods The block diagram of the procedure is shown in Fig. 52.6. In the first stage, the attendance paper is photographed. Then image thresholding and filters are applied to the attendance paper image. The signed attendance list area is extracted by detecting the corners of the largest rectangle. The corners can also be selected manually if automatic detection is failed. The digital attendance list is also exported as a pdf file. Template matching and geometrical transformations are applied to match the signed attendance list and digital attendance list. The steps of the matching process are as follows: • A smaller template from the digital attendance list is searched in the signed attendance list by using template matching. Rotation by a small angle is applied and template matching is repeated. • The angle, which makes the highest template matching score, is selected as rotation angle. • The signed image is rotated by the selected rotation angle. • Shifting is applied to the rotated image in order to ensure that the template taken from the digital attendance list and the template detected in the signed attendance list are at the same coordinate. In the next stage, Canny edge detection method is used to find the edges in the attendance list. Edges are used to obtain contours of the rectangles. Edge and area relations of rectangles are used to detect cells.
680
52 Automatic Transferring Data from the Signed …
Detection of the Attendance List Finding Corners Automatically with Filtering & Thresholding
Signed Attendance List
Selecting Corners Manually
To Make Digital and Signed Attendance Lists Similar Digital Attendance List Template Matching
Geometrical Transformations
Extracting Lecture and Student Information from the Attendance List
To Obtain Cells at the Attendance List
Canny Edge Detection
Morphological Transformation
Relation of Edge & Area
Optical Character Recognition (OCR)
Control of Signatures
Convolutional Neural Networks (CNN)
Edges and Skeletons of Signatures
Tabulation of Attendance Information into the Digital Spreadsheets
Openpyxl
Fig. 52.6 Block diagram of the program
52.4 Proposed Methods
681 Signature Cell
Edges of the Shapes in the Cell
Skeletal Structure of the Shapes in the Cell
Canny Edge Detection
Shape Skeleton
Numbers of Horizontal & Vertical Scores of the Images and Thresholding
Signature or Not Signature
Fig. 52.7 Edge and skeleton method configuration
Morphological transformations are applied in order to reduce the noises and OCR is applied to extract lecture and student information in the cells. Signature cells of each student are detected by using the coordinates of the cells contain student ID and name. Signature cell areas may differ according to the length of the student names. Scaling is applied to the cells to make the dimensions of the signature cells same. Resized signature cells are used in the proposed methods. The first proposed method to detect signature in the cells is named “The Edge and Skeleton Method”. The block diagram of the method is shown in Fig. 52.7. In this method, first the edges of the shapes in the cell are detected using Canny edge detection algorithm. The skeletal structures of shapes are obtained as well. The non-empty pixels in each row and column are counted in both the edge image and skeleton image. The sum of non-empty pixels greater than the predefined threshold values in each row or column are considered as the scores of the images in horizontal and vertical directions. The procedure applied to the edge image and the skeleton image results four scores for each signature cell. The procedure to obtain the score for the horizontal direction is shown in Algorithm 2. Generally, signatures are located in the center of the cells. There may be signature overflows from the other cells on the edges. Therefore the score is accumulated by using smaller coefficients for the edge pixels and larger coefficients for the center pixels. These coefficients are represented by x vector in Algorithm 2.
682
52 Automatic Transferring Data from the Signed …
Algorithm 2 Obtaining score for horizontal direction scor e = 0 x = for i = 0 to n do num_o f _ pi xels = 0 for j = 0 to m do if I (i, j ) is not empty then increase num_o f _ pi xels by 1 end if end for if num_o f _ pi xels > threshold then increase scor e by x, where k is the quotient of i / n8 end if end for
If all four scores are greater than their corresponding threshold values, the cell is classified as signed. If any of the scores is less than their threshold values, the cell is classified as unsigned. The first signature control method is illustrated in Fig. 52.8. First, the numbers of non-empty pixels in the horizontal and vertical directions are counted. The threshold values for horizontal and vertical directions are defined as 6 and 3, respectively, and the coefficients are for this example. If the number of the non-empty pixels is greater than the corresponding threshold value, the score is increased by 2 for the middle one-third of the image and 1 for the other parts. In the figure, the number of non-empty pixels in the fourth row is 12. The fourth row is in the upper part, therefore the score is set to 1. The numbers of the next two rows are not greater than the threshold value, so the score is not increased. The seventh row is in the middle part and the number of non-empty pixels in the seventh row is 12, therefore the score is increased by 2, and so on. Finally, the horizontal score is obtained as 7. This procedure is also applied for the vertical direction and the vertical score is obtained as 10. The second signature control method is proposed using CNN. Various CNN models are used for the detection of signatures in the cells of 9 attendance papers. These models contain 16, 32, 64 and 128 filters in the convolution layers. The models consist of two convolution layers, two max-pooling layers, two ReLU layers, two to four FC layers, and an output layer. These CNNs are trained with the signed and unsigned cell images of 8 attendance papers and the cell images in the 9th attendance paper are used for testing. Thus, training and testing of CNN models are performed with ninefold cross-validation. After applying the proposed methods for signature detection on the attendance lists, the backgrounds of the signature detected cells are colored green and the other cells are colored red. Then, the mispredicted cells can be corrected manually. The extracted data of attends and absences of the students are tabulated by using Openpyxl library into a digital spreadsheet.
52.5 Results
683
Fig. 52.8 The edge and skeleton method
52.5 Results The signature cells in an attendance paper may be signed in different styles. For example, a signature may fit in a cell or overflow onto the other cells. Likewise, absences are also indicated in different styles. For example, the signature cells may be left blank or filled horizontally or vertically by using a board-marker. Occasionally, while filling the empty cells with a board-marker, markings also overflow onto the other cells. These situations are shown in Fig. 52.9. The images used in the the steps of the Edge and Skeleton Method are shown in Figs. 52.10, 52.11 and 52.12. An unsigned cell is shown in Fig. 52.10a. The edge image and skeletal structure image obtained from the are shown in Fig. 52.10b, c. Algorithm 2 is applied to these edge image and skeleton image. As shown in Fig. 52.10e, horizontal and vertical scores for the edge image are calculated as 47.75 and 0. Likewise, horizontal and vertical scores for the skeleton image are calculated as 34 and 17. Since the vertical scores for both images are lower than the corresponding threshold values, the cell is considered to not contain a signature and the background of the cell is colored red as shown in Fig. 52.10d. A signed cell image is shown in Fig. 52.11a. The edge image and skeletal structure image obtained from Fig. 52.11a are shown in Fig. 52.11b, c. Algorithm 2 is applied
(a) Empty (b) Horizontal (c) Vertical (d) Marking (e) Signature (f) Signature fit in a cell overflow Cell marking marking overflow Fig. 52.9 Styles of the signature cells
684
(a) Signature cell
52 Automatic Transferring Data from the Signed …
(b) Edges of the cell
(c) Skeleton of the cell
(d) Coloring background
(e) The data and predictions Fig. 52.10 The edge and skeleton method on an unsigned cell image
(a) Signature cell
(b) Edges of the cell
(c) Skeleton of the cell
(d) Coloring background
(e) The data and predictions Fig. 52.11 The edge and skeleton method on a signed cell image
(a) Signature cell
(b) Edges of the cell
(c) Skeleton of the cell
(d) Coloring background
(e) The data and predictions Fig. 52.12 The edge and skeleton method on an unsigned overflowed cell image
to the edge image and the skeleton image. The horizontal and vertical scores for the edge image are calculated as 103.5 and 216. Likewise, horizontal and vertical scores for the skeleton image are calculated as 94.5 and 109 as shown in Fig. 52.11e. Since all scores are greater than the corresponding threshold values, the cell is considered to contain a signature and the background of the cell is colored green as shown in Fig. 52.11d. An unsigned cell, which is marked horizontally with a board-marker and also contains a signature overflow, is shown in Fig. 52.12a. The edge image and skeleton image obtained from Fig. 52.12a are shown in Fig. 52.12b, c. The horizontal and vertical scores for the edge image are 68.75 and 37, the horizontal and vertical scores for the skeleton image are 27.75 and 23 as shown in Fig. 52.12e. Since all scores are greater than the corresponding threshold values, the cell is considered to contain a signature and the background of the cell is colored green as shown in
52.5 Results
(a) Signature cell
685
(b) The data and predictions
(c) Coloring background
Fig. 52.13 Signature control with 64 filtered CNN
(a) Signature cell
(b) The data and predictions
(c) Coloring background
Fig. 52.14 Signature control with 128 filtered CNN
Fig. 52.12d. However, because of the signature overflow almost extended to the center and marking is not in the center of the cell, the signature detection is inaccurate. CNN is used in the second signature control method. Various CNN models are used to predict signature in the cells. The signature cells used in 64 filtered CNN, CNN predictions, and prediction coloring are shown in Fig. 52.13. Predictions lower than 0.5 are evaluated as not a signature and the other prediction values are evaluated as a signature. Prediction results are shown in Fig. 52.13b, the first and second predictions are accurate, on the other hand, the third cell is mispredicted. Similar figures are shown in Fig. 52.14 for 128 filtered CNN. The mispredicted third cell shown in Fig. 52.13 is predicted accurately in 128 filtered CNN as shown in Fig. 52.14. Photographed attendance papers are shown in Fig. 52.15a. The images after corner detection, rotation, and shifting in the pre-processing step are shown in Fig. 52.15b. The signature cells of the Fig. 52.15b are predicted by using the proposed methods. The final coloring is shown in 15c after mispredicted cells are corrected. The Edge and Skeleton method and CNN methods are applied to 9 attendance papers. The overall accuracy of the Edge and Skeleton method is 97.61% as shown in Table 52.1. 16, 32, 64 and 128 filtered CNN models are trained with the cell images from 8 attendance paper out of 9 and tested with the cells of the remaining attendance paper. Therefore, 9 trainings are performed for each CNN model. The best overall accuracy is 98.91% which is obtained from 64 filtered CNN model. The details of the prediction results of 64 filtered CNN are shown in Table 52.2. CNN models with various FC layer and filter size are trained and tested. The best and worst prediction accuracy of these models are shown in Table 52.3.
686
52 Automatic Transferring Data from the Signed …
(a) Photographes
(b) Rotated and shifted
(c) Colored
Fig. 52.15 Attendance papers
Table 52.1 Prediction results of the edge and skeleton method Paper
Number of cells
Accurate predictions
Inaccurate predictions
Ratio
Accuracy (%)
1
351
348
3
348/351
99.15
2
360
357
3
357/360
99.16
3
360
350
10
350/360
97.22
4
360
357
3
357/360
99.16
5
360
332
28
332/360
92.22
6
360
345
15
345/360
95.83
7
360
350
10
350/360
97.22
8
351
349
2
349/351
99.43
9
360
357
3
357/360
99.16
3222
3145
77
3145/3222
97.61
Total
The confusion matrix presents the relation between the actual and prediction of cells. Confusion matrices of the Edge and Skeleton method and 64 filtered CNN are shown in Table 52.4. The average execution times of the methods for signature control are 64.24 and 82.07 s as shown in Table 52.5. While the differences between the overall accuracy of the methods are around 1%, the Edge and Skeleton method outperforms the best CNN model by 27.76% in terms of execution time.
52.5 Results
687
Table 52.2 Test results of the 64 filtered CNN model paper
Number of cells
Accurate predictions
Inaccurate predictions
Ratio
Accuracy (%)
1
351
344
7
344/351
98.00
2
360
358
2
358/360
99.44
3
360
352
8
352/360
97.78
4
360
357
3
357/360
99.17
5
360
358
2
358/360
99.44
6
360
354
6
354/360
98.33
7
360
355
5
355/360
98.61
8
351
350
1
350/351
99.72
9
360
359
1
359/360
97.72
3222
3187
35
3187/3222
98.91
Total
Table 52.3 Test results of the CNN models Method
Case
Number of FC layers
Accurate predictions
CNN with 16 Filters
Best
3
3181
41
98.73
Worst
4
3122
100
96.90
CNN with 32 Filters
Best
4
3160
62
98.08
Worst
3
3131
91
97.18
CNN with 64 Filters
Best
4
3187
35
98.91
Worst
2
3140
82
97.46
CNN with 128 Filters
Best
4
3170
52
98.38
Worst
2
3089
133
95.87
Inaccurate predictions
Accuracy (%)
Table 52.4 Confusion matrices of the methods Actual values Signature Prediction values
The edge and skeleton
Signature
2124
64
13
1021
Not signature 64 filtered CNN
Signature
2126
24
11
1061
Not signature Table 52.5 Execution times of the methods
Not signature
Methods
Average execution time (s)
The edge and skeleton
64.2
64 filtered CNN
82.07
688
52 Automatic Transferring Data from the Signed …
52.6 Conclusion In this study, two methods are proposed to extract attends and absences from attendance papers for automatically transferring the attendance data to the digital spreadsheets. In the preprocessing step, first, the photographed attendance paper is matched to the soft copy of the attendance paper. Then, the cells on the attendance paper are detected and the signature cells of the students are located by using the coordinates of the cells which contain student name and ID. The proposed methods are used to predict whether the cell is signed or not. The first method is named the Edge and Skeleton method which uses the edges and the skeletal structures of the shapes in the cells. CNN is used in the second method. Various CNN models with 16, 32, 64 and 128 filter sizes and 2–4 FC layers are experimented for signature prediction. After the cells are predicted using the proposed methods, predictions are visualized on the signed attendance paper and mispredicted cells are manually corrected. Then, the attendances and the absences are tabulated in the digital spreadsheets. The overall accuracy of the Edge and Skeleton method is 97.61%. The best prediction score among the CNN models is obtained from 64 filtered CNN with 98.91% accuracy. While the differences of the overall accuracy of the methods are around 1%, the Edge and Skeleton method outperforms the best CNN model by 27.76% in terms of execution time. Thus, the proposed Edge and Skeleton method is a good alternative for CNN models.
References 1. Pathak P, Paratnale M, Khairnar D, Yadhav P, Wadgaonkar P (2016) Student attendance monitoring system via face detection and recognition system. Int J Sci Technol Eng (IJSTE) 2(11):625–630 2. Hendry R, Rahman MNA, Seyal AH (2017) Smart attendance system applying QR code. In: Proceedings of the 12th ınternational conference on latest trends in engineering and technology Kuala Lumpur, Malaysia 3. Jadhav SH, Ashutosh B (2017) Android based digital attendance recording system. Int J Adv Res Ideas Innovat Technol 3:227–230 4. Sezdi E, Tüysüz B (2018) Elektronik bilgi sistemleri tabanlı ö˘grenci yoklama kontrol sistemi. Bilgi Yönetimi 1(1):23–31 5. Weerasinghe L, Sudantha BH (2019) An efficient automated attendance entering system by eliminating counterfeit signatures using kolmogorov smirnov test. Global J Comp Sci Technol 19(2):25–29 6. Kurniawan H, Agarina M, Irianto SY (2017) Image processing: capturing student attendance data. Int J 16(7):7002–7009 7. NumPy. Numerical computing with Python (2021). [Online]. Available: https://numpy.org/ 8. OpenCV. Open source computer vision library (2021). [Online]. Available: http://opencv.org/ 9. Pytesseract. Python wrapper for Google’s tesseract-ocr (2021). [Online]. Available: https:// pypi.org/project/pytesseract/
References
689
10. Tensorflow. An open-source software library for machine learning. (2021). [Online]. Available: https://www.tensorflow.org/ 11. OpenPyXL. A Python library to read/write excel files (2021). [Online]. Available: https://pypi. org/project/openpyxl/ 12. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698 13. Matuska S, Hudec R, Benco M (2012) The comparison of CPU time consumption for image processing algorithm in Matlab and OpenCV. In: 2012 ELEKTRO. IEEE, pp 75–78 14. Zhang Z (2016) Derivation of backpropagation in convolutional neural network (cnn). University of Tennessee, Knoxville, TN
Chapter 53
Boarding Pattern Classification with Time Series Clustering Kamer Özgün, Baris Doruk Ba¸saran, Melih Günay, and Joseph Ledet
53.1 Introduction Transportation is an indispensable part of human activities. The increasing need for mobility, along with the increasing developments in technology, has led to significant improvements in transportation infrastructures. Live public transport system data is collected through different technologies such as Geo-location Systems (GPS), Automatic Fare Collection (AFC), Automatic Passenger Counter (APC), Automatic Vehicle Positioning (AVL) [7, 10, 21]. Such systems generate large amounts of data every day and can be used to design comfortable and efficient public transport systems [16, 20, 22]. Transit planners require various types of performance measures for the different aspects of public transport systems, such as network, neighborhood, route, stop and passengers [17]. Performance indicators and quality attributes in transportation by using big data analytic and visualization techniques are discussed recently by [4, 5]. The goal is to keep transportation systems optimized within the boundaries of the performance measures while meeting transit system users needs in the most efficient possible way. As the cities constantly growing, studies conducted on transportation K. Özgün (B) Department of Industrial Engineering, Antalya Bilim University, Antalya, Turkey e-mail: [email protected] B. D. Ba¸saran · M. Günay Department of Computer Engineering, Akdeniz University, Antalya, Turkey e-mail: [email protected] M. Günay e-mail: [email protected] J. Ledet Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_53
691
692
53 Boarding Pattern Classification with Time Series Clustering
system keep getting more complex and more into micro level. In general, similarity and dissimilarity among the elements of the transportation system can be examined by clustering approaches. Clusters share significant similarities that are interesting to public transport authorities, thus speed up analyzes and decisions due to fewer modifiers or parameters. In this study, the focus is on revealing similar demand patterns that can be a very powerful input in making routing and scheduling decisions. By clustering on smart card data, travel patterns can be discovered and labelled based on the temporal and spatial characteristics. These clusters provides a better knowledge of passenger’s demands by time and location to optimize the public transportation system [1, 3, 6, 8, 11, 14, 15]. For an extensive literature review on the use of smart card data in strategic, tactical and operational aspects in recent years, readers may refer to [7, 9, 10, 16, 21]. Figure 53.1 displays the demand profiles of bus lines in an average day. Here, boarding counts on a particular bus line are aggregated in 30-min time intervals, so that, there are 48 time points on the x-axis in a 24-h period starting at 3:00 and ending at 02:59. On y-axis, a normalized boarding count is the average boarding count of that particular bus line over all trips belongs to a specific time interval, divided by the maximum demand observed on that bus line over all time intervals. In Fig. 53.1, the majority of bus lines present similar patterns with significant three peaks. Namely, a morning peak around 7.30, an uptrend to an afternoon peak around 16:00 and a downtrend starting in the afternoon with an evening peak around 22:00 are observable. However, as shown in Fig. 53.1, passenger counts may vary significantly over time. For example, the variations of boarding counts between 10:00 and 15:00 are obvious. Figure 53.2 tells us that the boarding data is not symmetrical but skewed (since various medians cuts the boxes into unequal pieces) and is loosely grouped in half hour intervals due to various interquartile ranges. Moreover, variations in each time interval are observable from Fig. 53.2, For example, between 7:00 a.m. and 7:30 a.m., one of the bus trips with 10 boarding is an outlier, in this period minimum, maximum and median boarding counts are observed as 20, 100 and 65 respectively.
Fig. 53.1 Demand profiles of all bus lines
53.2 Methodology
693
Fig. 53.2 Average daily passenger demands of all transit lines compared in box plot
The objective of this study is to identify daily boarding activity patterns by revealing clusters of similar bus lines based on time series. In literature, there are clustering studies based on trip records of individuals [3, 6, 8, 11, 14] to create trip chains (i.e. sequences) of passengers, Unlike the trip chain approach, [12, 13, 19] perform clustering analysis of individual bus stops. To the best of our knowledge, time series clustering by using smart card data has not yet been applied to characterize the daily demand profiles of bus lines for a bus network.
53.2 Methodology The source data set, is provided by the Department of Transportation for the City of Antalya, consist of the boarding data of May and October besides individual days of 16th June and 19th December in 2019. Knime software [2] and custom developed Python scripts have been used for most of the analysis and clustering. We have 305 bus lines in our dataset. 121 of them active in the city center have been selected. In order to prevent outliers, bus lines with less than 500 passenger counts in a day are eliminated. As a result, 82 of 121 active bus lines are considered for boarding pattern classification. In this study, boarding counts on a specific bus line are grouped into 30-min time intervals, Average boarding counts are calculated for each of the 48 time interval in a day. These averages are divided to their maximum in order to obtain a normalized data point (as a percent of the maximum) belongs to a specific interval. Consequently, a bus line has a data field as a time series with 48 normalized data points in order to distinguish patterns from the effect of volumes while clustering. As an example, daily boarding activity of the bus line KL08 in Antalya is shown with its data field in Fig. 53.3. For instance, demand of KL08 at 10:00 is 60% of the maximum daily demand which occurred at 15:30 in Fig. 53.3. As apparent in Fig. 53.2, there is random variation inherent in the data collected over time. A technique often used with time series data to smooth short-term fluctuations and underline trends is the moving average. In this study, a simple (unweighted) moving average is applied with the subsets of serial 5 data points.
694
53 Boarding Pattern Classification with Time Series Clustering
Fig. 53.3 Daily demand profile of the bus line KL08
While clustering the dataset into groups, the objective is to minimize the distance between points included in a cluster and a point determined as the center of that cluster. Clustering algorithms differ from each other in the way they determine 1— the center and 2—the distance metrics for a cluster. In this study, three different clustering methods selected for public transportation lines applied for the first time in literature are as follows: ED DTW BD
Standard K-Means Clustering with Euclidean Distance Metric Time Series K-Means Clustering with Dynamic Time Warping Metric K-Medoids Clustering with Band Distance Metric.
In K-Means, the center of a cluster is the mean of the points in that set. Standard KMeans often require Euclidean distance for efficient solutions. For example, Tupper et al. offers Standard K-Means clustering for bus stops. In their study the boarding counts by bus stops in each time interval are considered as the data fields. Similarly, in our study, boarding counts by bus lines are represented as a series of 30-min time intervals as the data fields. Our first method ED, denotes Standard K-Means. In our second method DTW, dynamic time warping metric is alternatively used for K-Means. Finally, the third method uses K-Medoids clustering algorithm. Unlike K-Means, K-Medoids select actual data points as centers. Furthermore, K-Medoids can be used with arbitrary dissimilarity measures and it minimizes a sum of pairwise dissimilarities instead of a sum of squared distances. Similar to our third method BD, Tupper et al. proposes a band distance metric for clustering time series data of bus stops, where each time series pair represents a band [18].
53.3 Results and Discussion In a classical clustering technique, it is assumed that the number of clusters is known a priori when partitioning a dataset. In this study, 4 clusters are assumed to be sufficient.
53.3 Results and Discussion
695
Fig. 53.4 Results of K-means clustering with euclidean distance metric, ED
Results of ED are represented in Fig. 53.4. The observations on Fig. 53.4 are summarized as follows: . ED-0: The normalized passenger load at morning peak around 7:30 is around 70%. Afternoon peak is at 16:00. Normalized load never drops below 65% between 7:30 and 18:30. . ED-1: Here, 100% load is realized twice a day. The morning peak shifts forward half an hour and takes place at approximately 7:00. The demand dips around 60% at 9:30, then starts increasing until afternoon peak at 16:00. Normalized load never drops below 60% between 5:30 and 18:30. . ED-2: This cluster represents a similar profile to ED-1, but with a lower morning peak load around 80%. Activities are slightly higher in ED-2 than in ED-1 after morning dip at around 9:00. As in ED-1, normalized load never drops below 60% between 6:00 and 18:30. . ED-3: The peaks and dip times are similar to ED-1 and ED-2. With a decrease of up to 50%, the lowest demand is observed in this cluster from 7:00 to 13:00. It follows a similar pattern between clusters after 13:00. Normalized load never drops below 50% between 6:00 and 19:30.
696
53 Boarding Pattern Classification with Time Series Clustering
Fig. 53.5 Results of time series K-means clustering with dynamic time warping metric, DTW
Results of DTW are given in Fig. 53.5. Comparison of Figs. 53.4 and 53.5 shows that the differences in bus line loads clustered by DTW compared to ED are significant during off-peak times. For example in DTW-1 during morning dip at around 9:30 the lowest load is around 50% but the highest load is around 80%. Similarly for DTW-0, differences in line loads around 20% are observed between 8:00 and 14:30. Figure 53.6 presents results of BD. While DTW clustered bus lines with similar load rates at peak times, but with high variations in up and down trends. BD combined similar up and down trends, but with high variations at peak times. Referring to Figs. 53.4, 53.5 and 53.6, important time intervals might be generalized. For most of the clusters, dominant peak hours are around 07:30 and 15:30. A prominent dip is realized at around 10:00. Either the rate changes or minor peaks are observed for some clusters around 17:00 and 18:00. The time intervals with high variations are 13:30, and 22:00. Table 53.1 represents average absolute errors and average standard deviations respectively, for those particular time intervals according to the clustering methods.
53.3 Results and Discussion
697
Fig. 53.6 Results of K-medoids clustering with band distance metric, BD Table 53.1 Mean and standard deviation of clustering methods Time ED mean DTW mean BD mean ED std. window 07:30 10:00 13:30 15:30 17:00 18:00 22:00 Daily averages
31.84 33.82 36.57 11.23 39.28 52.74 40.40 34.98
32.72 49.27 38.61 11.14 39.52 52.80 42.64 38.10
63.31 36.07 30.62 12.03 29.65 40.71 36.91 35.61
4.64 4.71 5.17 1.36 4.74 6.92 5.96 4.78
DTW std.
BD std.
4.60 6.68 5.17 1.31 5.36 7.27 5.79 5.17
8.79 4.29 4.00 1.60 3.81 5.70 4.32 4.65
698
53 Boarding Pattern Classification with Time Series Clustering
Table 53.2 Bus lines included in same clusters by all methods Core clusters ED DTW LC07, KC06, LC07A VL13A, VS18, DC15A, DC15, FL82, 511, AF04, VC59 VF02, VL13, TCP45, TC16A, AF04A VF66, TB72, KPZ83 TC16, CV14 UC11, MF40, MC12, CV47 KM61, TK36 LF09, LF10 ML22, KF52 KL21, TCD49
BD
0 1
1 2
2 0
1 2 2 2 2 2 3 3
2 0 0 3 3 3 0 1
2 1 2 1 2 3 2 2
In Table 53.1, BD has the lowest errors at 13:30, 17:00, 18:00. In dominant peak time intervals, i.e. at 7:30 and 15:30, performances of ED and DTW are close and better than BD. In Table 53.1, DTW has the lowest variation at 07:30 and BD has the lowest variations on other time windows. In daily scale, BD has lowest variation and error while DTW has the highest values. As a result, BD is better at determining general up and down trend similarities while DTW is better at identifying peak hour similarities. ED falls between these two methods but is close to DTW. Table 53.2 show core clusters that are obtained by various methods. When clustered bus lines are analyzed for the city demographics and geography, it is observed that these clusters tend to group together based on weather the trips is for leisure and/or business and from outskirts-center or center-center.
53.4 Conclusion Many of the bus lines tends to have similar daily demand characteristics regardless of their differences in popularity. In transportation planning, daily activities of bus lines are frequently used. Planning calculations may be shortened and standardized with the use of a powerful tool as time series clustering. This paper investigates three possible clustering results, on smart card data provided by the la¸sım Planlama ve Raylı Sistem Dairesi Ba¸skanlı˘gı—Antalya. Travel patterns are described by time on a bus line-level analysis. Discussed methodologies emphasized on two aspects: peak identification and general trend identification. So a future study or planning may specialize the clustering within one of these two aspects and may achieve different transportation planning strategies.
References
699
References 1. Alsger A, Tavassoli A, Mesbah M, Ferreira L, Hickman M (2018) Public transport trip purpose inference using smart card fare data. Transp Res Part C Emerg Technol 87:123–137 2. Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B (2007) Knime: the konstanz information miner. In: Studies in classification, data analysis, and knowledge organization 3. Briand AS, Côme E, Mohamed K, Oukhellou L (2016) A mixture model clustering approach for temporal passenger pattern characterization in public transport. Int J Data Sci Anal 1(1):37–50 4. Daraio C, Diana M, Di Costa F, Leporelli C, Matteucci G, Nastasi A (2016) Efficiency and effectiveness in the urban public transport sector: a critical review with directions for future research. Eur J Oper Res 248(1):1–20 5. Dell’Olio L, Ibeas A, Cecin P (2011) The quality of service desired by public transport users. Transp Policy 18(1):217–227 6. Faroqi H, Mesbah M (2021) Inferring trip purpose by clustering sequences of smart card records. Transp Res Part C Emerg Technol 127:103131 7. Harrison G, Grant-Muller SM, Hodgson FC (2020) New and emerging data forms intransportation planning and policy: opportunities and challenges for “track and trace” data. Transp Res Part C Emerg Technol 117:102672 8. He L, Agard B, Trépanier M (2020) A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method. Transp A Transp Sci 16(1):56–75 9. Li T, Sun D, Jing P, Yang K (2018) Smart card data mining of public transport destination: a literature review. Information 9(1):18 10. Lu K, Liu J, Zhou X, Han B (2020) A review of big data applications in urban transit systems. IEEE Trans Intell Transp Syst 11. Ma X, Wu YJ, Wang Y, Chen F, Liu J (2013) Mining smart card data for transit riders’ travel patterns. Transp Res Part C Emerg Technol 36:1–12 12. Matias L, Gama J, Mendes-Moreira J, De Sousa JF (2010) Validation of both number and coverage of bus schedules using AVL data. In: 13th International IEEE conference on intelligent transportation systems. IEEE, pp 131–136 13. Min M (2018) Classification of seoul metro stations based on boarding/alighting patterns using machine learning clustering. J Inst Internet Broadcast Commun 18(4):13–18 14. Mohamed K, Côme E, Oukhellou L, Verleysen M (2016) Clustering smart card data for urban mobility analysis. IEEE Trans Intell Transp Syst 18(3):712–728 15. Özgün K, Günay M, Bulut B, Yürüten E, Baysan MF, Kalemsiz M (2021) Analysis of public transportation for efficiency. In: Hemanth J, Yi˘git T, Patrut B, Angelopoulou A (eds) Trends in data engineering methods for intelligent systems—ICAIAME 2020, vol 76. Springer International Publishing, Berlin 16. Pelletier MP, Trépanier M, Morency C (2011) Smart card data use in public transit: a literature review. Transp Res Part C Emerg Technol 19(4):557–568 17. Stewart C, Bertini R, El-Geneidy A, Diab E (2016) Perspectives on transit: potential benefits of visualizing transit data. Transp Res Rec J Transp Res Board 2544. https://doi.org/10.3141/ 2544-11 18. Tupper LL, Matteson DS, Anderson CL, Zephyr L (2018) Band depth clustering for nonstationary time series and wind speed behavior. Technometrics 60(2):245–254 19. Tupper LL, Matteson DS, Handley JC (2016) Mixed data and classification of transit stops. In: 2016 IEEE International conference on big data (big data). IEEE, pp 2225–2232 20. Van Oort N, Cats O (2015) Improving public transport decision making, planning and operations by using big data: cases from Sweden and The Netherlands. In: 2015 IEEE 18th International conference on intelligent transportation systems. IEEE, pp 19–24 21. Welch TF, Widita A (2019) Big data in public transportation: a review of sources and methods. Transp Rev 39(6):795–818. https://doi.org/10.1080/01441647.2019.1616849 22. Zhu L, Yu FR, Wang Y, Ning B, Tang T (2019) Big data analytics in intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 20(1):383–398
Chapter 54
Shipment Consolidation Practice Using Matlog and Large-Scale Data Sets Michael G. Kay, Kenan Karagul, Yusuf Sahin, and Erdal Aydemir
54.1 Introduction The coordination of activities from the point of origin of the material to the point of final consumption is within the scope of logistics management. With the addition of activities such as returns, packaging, reverse logistics, stock management, purchasing, and distribution, this scope has expanded and taken its current form. Transportation, storage, and packing constitute logistics activities related to product flow. On the other hand, custom, insurance, stock management, order, and customer service management constitute value-added activities in logistics [1]. Logistics management consists of functional areas that play an important role in the success of the company. The success of each functional area increases the quality of customer service while reducing the costs of the operation. Logistics is an important key to reaching customers who have spread to different points of the global market faster and more effectively than competitors operating at different points in the world. Transportation is the most important element in logistics cost as a key component for most companies. It has been observed that the cost of freight transport corresponds to between one-third and two-thirds of the total logistics cost [2]. With M. G. Kay Edward P. Fitts Department of Industrial and Systems Engineering, NC State University, North Carolina, USA e-mail: [email protected] K. Karagul (B) Logistics Department, Pamukkale University, Denizli, Turkey Y. Sahin Business Administration Department, Burdur Mehmet Akif Ersoy University, Burdur, Turkey e-mail: [email protected] E. Aydemir Industrial Engineering Department, Suleyman Demirel University, Isparta, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_54
701
702
54 Shipment Consolidation Practice Using Matlog …
increased competition in international trade, numerous companies have begun to look for ways of moving their products at a low cost. As a result, the efficient transport of items has become a critical and difficult issue for many companies. Transportation is concerned with the movement of raw materials, supplies, and finished products between suppliers, manufacturing plants, distribution centers, and retailers. The use of faster modes of transport, such as air and road, reduces delivery times and increases reliability, despite higher transportation costs [3]. A well-designed transportation system can significantly reduce operating costs while improving service quality and logistics efficiency. Shipment consolidation is a good option in this case. Consolidation is the process of incorporating items manufactured and used in various locations and at various times so that they’ll be transported in a single vehicle [4]. Logistics managers have to choose between a truckload (a private truck) and a less-than-truckload carrier (an outsider carrier). The selection of the right mode to transport shipments can provide significant cost savings. Consolidation policies, proposed by researchers and public agencies working in the field of transportation management, are an alternative to increasing the use of truckloads and reducing the externalities produced by freight transport [5]. The shipment consolidation, an application widely used in transportation, is the collection of many small shipments in one truck while fulfilling logistic system constraints [6]. The shipment consolidation mechanism aims to define TL and LTL shipments by minimizing travel distance. Due to the shipment consolidation, the shipments of two or more customers located in the same area can be carried together. Consolidating multiple shipments into a single truckload offers a lower-cost alternative to transporting each shipment independently via a single shipment, point-topoint truckload (P2P TL), or less-than-truckload (LTL) carrier. By taking advantage of economies of scale on this issue, the transportation cost of customer orders can be reduced significantly. In this paper, the problem of consolidation of shipment has been addressed by using datasets produced using real city and town coordinates and population densities of Turkey, and the economic effect of consolidation on transportation costs has been tried to be revealed. For the solution of the problem, the Matlog codes of the solution procedure proposed by Kay et al. [28] were used. Matlog consists of Matlab functions that directly implement various solution techniques to implement logistics engineering-related tasks or script the pre-and post-processing needed to interface with solvers such as CPLEX [7]. The remainder of the paper is organized as follows. Section 54.2 presents the related literature on the consolidation problem. The explanation of the problem is provided in Sect. 54.3. Section 54.4 presents the details of the solution procedure, and the detail of the computational experiments is provided in Sect. 54.5. Finally, the paper is concluded in Sect. 54.6.
54.2 Background
703
54.2 Background To date, several studies confirmed the effectiveness of shipment consolidation. In the following, we briefly review shipment consolidation literature and then present the detail of our work. Obtaining optimal solutions to the shipment consolidation problem is difficult and time-consuming. Because the traveling distance for a shipment depends on the other shipments that are assigned to the same route. Due to this feature, there are few studies in the literature that analytical models are used. Studies carried out using methods such as renewal theory [8], Markov decision processes [9, 10], marginal analysis [11], stochastic cleaning system models [12, 13], mixed integer programming model [14], branch-and-price algorithm [5], and linear optimization models [15–17] are included in the literature. Many heuristics and metaheuristics have been developed and used to date for the shipment consolidation problem. The Clarke and Wright savings algorithm (CW) is the most common heuristic used to solve this problem due to its easy implementation and high computational speed [18]. Pooley and Stenger [19]; Pooley [20] employed modified Clarke and Wright’s [21] Savings Algorithm to solve LTL, multi-stop, and one-way TL vehicle routing problems. Chu [22] developed a mixed-integer programming model for the mode selection problem between LTL and TL for outgoing shipments and proposed a Savings Algorithm based heuristic for its solution. A variant of the same problem with a homogeneous fleet is investigated in Bolduc et al. [23, 24], C´ot˙e and Potvin [25], and Potvin and Naud [26]. Bolduc et al. [23] used sequential and parallel versions of the Savings Algorithm to solve the problems of identifying customers to be assigned to external carriers and routing heterogeneous vehicle fleets. The initial solutions obtained were developed by the 4-opt method. Bolduc et al. [24] proposed a meta-heuristic that uses a perturbation procedure during the construction and improvement phases to exchange customers between private fleet and co-carrier. C´ot˙e and Potvin [25] solved the homogeneous capacity vehicle routing problem (VRPPC), where each customer can be served by an internal fleet vehicle or an external co-carrier, using a tabu search algorithm. Potvin and Naud [26], unlike the previous study, solved the heterogeneous capacity vehicle routing problem with a tabu search heuristic with a neighborhood structure based on ejection chains. Nguyen et al. [27] suggest a heuristic for shipment consolidation problem. They compared the results of the method with a variety of fundamental policies, a rolling horizon approach, and a stochastic dynamic programming model. Li et al. [14] and Cheong et al. [17] proposed solution methods in which the Lagrange relaxation method is used together with local branching heuristic and subgradient optimization algorithms to solve the shipment consolidation problem. As can be seen from the studies discussed in this section, a significant amount of paper has been published on the problem addressed in the study. In general, the benefits of shipment consolidation have been demonstrated, but efforts are underway to develop different solution approaches. The solution method used in this study is differentiated from the classical Clark and Wright Savings Algorithm. There are sub-procedures for developing the obtained routes and assigning the shipments to the routes.
704
54 Shipment Consolidation Practice Using Matlog …
54.3 Shipment Consolidation Problem Shipment consolidation is an effective approach to reduce total transportation costs. Consolidating many shipments into a single truckload is a cost-effective option to transport each shipment separately via a single shipment, point-to-point truckload (P2P TL), or less-than-truckload (LTL) carrier. Consolidation has the advantage of potentially saving money through economies of scale in transportation when numerous smaller shipments are consolidated into a single payload. In this research, a procedure is used to determine consolidated route sequences that minimize the total logistics costs (TLC) of the shipments. Simple heuristics may be utilized to offer good solutions to the route sequencing problem, which is identical to the traveling salesman problem. To make the route sequencing operation as simple and quick as possible, a new procedure is adopted that just requires the existing route sequence and the shipment to be added to the sequence as input. Figure 54.1 illustrates the various routing options available for TL transportation operations.
54.3.1 TL Transport Charge The mode of transport chosen for a specific item is determined by its value. The cost of transporting a single unit of an item via a specific mode (e.g., LCL, LTL) is determined by the item’s density. As indicated in Eq. 54.1, the shipment weight (q), maximum payload (qmax ), revenue per loaded-truck-mile (r), and distance between the O-D pair (d) information is utilized to calculate the TL transportation cost (cTL). ( ) TL Transport Charge $ = cT L =
Fig. 54.1 Routing alternatives
[
q qmax
] rd
(54.1)
54.4 Methodology
705
54.3.2 Total Logistics Cost Both inventory carrying and shipment costs must be defined as a function of q when calculating the size of a shipment based on its TLC. The q-dependent value of the total logistics cost can be calculated using Eq. (54.2). TLC(q) = TC(q) + IC(q) =
f c(q) + αvq q
(54.2)
f : expected demand (tons/year) q: the average size of shipment (tons). c(q): transport charge as a function of shipment size ($), k: shipment charge ($). α: average inter-shipment inventory percentage at origin and destination. v: unit shipment cost ($/ton). h: inventory carrying rate, the cost per dollar of inventory per year (1/year). “vh” denotes the cost of holding one tonne of inventory for one year. The parameter “α” represents the average percentage of the shipment size q that is held across the origin and destination. If q is not provided for periodic shipments and must be calculated, f can be used in place of q for calculating the aggrega demand and density of several products transported together as part of a single load [28]. ) . . ( s K cu q qmax = min K wt , (54.3) K cu = ( s ) 2000 2000 Here: s = density of items (lb/ft3 ) K wt = truck trailer weight capacity (tons) K cu = truck trailer’s cubic capacity (ft3 ) When numerous items are sent as part of a single load, it must be treated as a single demand-weighted aggregate shipment with total weight and total density for m items. At this point, you can find more detailed information about the calculation in Kay et al. [28].
54.4 Methodology The aim of the routing procedure used in the study is to minimize the total cost of traveling from the loading of the first shipment on the route to the unloading of the last shipment. For this purpose, the order of n shipments in a load (L = (y1 , …, yn )) is taken as input and a routing array with 2n elements (R = (z1 , …, zn )) representing the following order is obtained as output. In route R, the first occurrence of zj = yi indicates loading of the shipment yi , and the second occurrence of zk = yi indicates unloading when j < k. Each element zi of the R route array has a location x i (X =
706
54 Shipment Consolidation Practice Using Matlog …
Fig. 54.2 Route sequence procedure and sub-procedures
(x 1 , …, x 2n )) that represents the origin or destination of the shipment. A route R with 2n elements has 2n − 1. segments in it consisting of a consecutive pair of elements. The total cost of the route is the sum of the cost of each of its segments. Given the costs ci j between all i and j pairs of locations, the cost of the route R is calculated using Eq. (54.4). On the other hand, Eq. (54.5) determines the pairwise saving for shipments i and j. c(R) =
2n−1 .
cxi ,xi+1
(54.4)
i=1 0 0 cisav j = ci + c j − c(i, j)
(54.5)
Figure 54.2 shows the route sequencing procedure and sub-procedures.
54.5 Computational Experiments A basic numerical example of the proposed two-step solution method is presented in the following section. First of all, a data set consisting of 12 shipments was created. The details of these shipments are summarized in Table 54.1.
16
5
1
18
7
11
8
19
9
9
13
9
12
14
2
4
15
1
2
3
4
5
6
7
8
9
10
11
12
8
10
2
6
3
10
17
e
b
idx
33.76
64.09
29.51
25.08
24.38
61.74
14.38
22.36
15.28
30.20
88.27
72.56
f
Table 54.1 Sample data set (DS17)
33.67
23.92
2.69
9.35
10.43
9.46
7.53
2.24
10.79
15.89
11.37
20.05
s
9649.17
5745.51
12,850.47
2550.08
26,541.59
251,305.03
1003.40
26,939.59
2,945,769.42
6,37,052.29
14,779.57
2,492,234.15
v
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
a
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
h
91.32
480.66
813.24
370.70
710.15
770.38
967.32
191.41
381.51
438.69
283.24
1,078.97
d
6113.81
46,195.49
1,06,438.03
35,160.65
15,049.33
90,385.74
14,645.56
4099.83
11,627.43
9471.87
35,271.36
65,233.48
TLC 1
20.31
1.56
0.08
0.08
3.39
0.08
5.06
2.38
3.02
12.38
4.43
0.25
q1
1.66
41.17
393.16
334.26
7.18
822.73
2.84
9.40
5.07
2.44
19.93
293.43
n1
1206.56
130.17
55.81
16.06
651.15
15.87
1344.27
2120.04
559.31
1558.31
779.30
24.67
cu1
54.5 Computational Experiments 707
708
54 Shipment Consolidation Practice Using Matlog …
54.5.1 The Second Phase: Determination of Consolidated Shipments and the Shipment Routes In the second stage of solution methodology, the consolidated shipments and the routes of these shipments are determined. In this example, eight shipments are transported as direct shipments, while four shipments are transported in pairs as multi-stop consolidated shipments. Direct shipments are shown in Table 54.2, while consolidated shipments are shown in Table 54.3. With the help of consolidation, it saves $ 2,815.34 in total logistics cost. This amount corresponds to 0.64% of total logistics cost and 7.07% of logistics cost of consolidated shipments. As a result of the solution, two different routes were determined for two consolidated shipment groups. Route 1 Shipment Index is [6 3 6 3] and route sequence is [6 19 13 10]. On the other hand, Route 2 Shipment Index is [5 4 5 4] and route sequence is [9 4 6 9]. Representation of shipments on the map after the consolidation is shown in Fig. 54.3. Table 54.2 Direct shipments Index
Begin
End
Shipment mode
Distance (km)
Cost ($)
1
11
16
LTL
710.15
65,233.48
2
8
17
TL
370.70
35,271.36
7
9
1
LTL
283.24
90,385.74
8
12
2
TL
438.69
15,049.33
9
14
10
LTL
381.51
35,160.65
10
2
18
LTL
191.41
1,06,438.03
11
4
7
LTL
967.32
46,195.49
12
15
8
TL
770.38
6113.81
4113.4
3,99,847.89
Total
Table 54.3 Consolidated shipment Index
Begin
End
Shipment mode
Route no
Shipment index
Route index
Initial cost
Final cost
3
19
10
TL
1
3−3
19−10
9471.87
21,574.29
6
6
13
TL
1
6−6
6−13
14,645.56
4
4
9
TL
2
4−4
4−9
11,627.43
5
9
6
TL
2
5−5
9−6
4099.83
Total cost
39,844.69
Total savings
2,815.34
15,455.06 37,029.35
54.5 Computational Experiments
709
Fig. 54.3 Representation of shipments on the map after consolidation
54.5.2 Computational Experiments with Large-Scale Data Sets The solution procedure is conducted using fourteen experimental data sets for the Aegean Region of Turkey. Results for Aegean Region are reported in Table 54.4. This table shows that the solution procedure reduces total logistics cost (TLC1) by 0.21– 5.27% on average. When compared to the initial cost of consolidated shipments this rate is even higher (32.52%). Calculation times for the pairwise savings calculation, C-W Savings Algorithm, and 2-opt heuristics are also shown in Table 54.4. For example, the total calculation time for DS13 is 17,84 s. Table 54.4 also points out that, the consolidation process can be performed for the whole data set. The information on shipments, cost reduction rates, and CPU times for this dataset are shown in Figs. 54.4, 54.5 and 54.6, respectively. To analyze the solutions proposed by the proposed solution procedure, 14 data sets including the whole of Turkey were randomly generated. Shipment consolidation could not be achieved for 4 of these data sets. For the other ten data sets, cost savings were achieved by consolidating shipments in different numbers. In the rates ranging from 0.64 to 3.62% savings were achieved at total logistics cost, while savings were realized at rates ranging from 2.19 to 19.59% in the total costs of consolidated shipments. Solutions for DS27 and DS28, in which 30 shipments were placed, were obtained for a total of 2.23 and 2.89 s, respectively. The information on shipments, cost reduction rates, and CPU times for this dataset are shown in Figs. 54.7, 54.8 and 54.9, respectively. Some of the detailed route diagrams of the solutions presented in Tables 54.4 and 54.5 are shown in Appendix A and Appendix B. In these diagrams, there are routes of independent (TL and LTL) and multi-stop consolidated shipments.
60
60
DS13
DS14
39
27
28
36
50
50
DS11
23
40
DS10
DS12
23
25
30
40
17
DS8
30
DS7
15
11
5
7
5
3
NDS
DS9
20
20
DS5
10
DS4
DS6
5
10
DS2
5
DS1
DS3
NS
Data sets
21
33
14
22
17
15
7
13
5
9
5
3
0
2
NCS
2,034,258.74
2,417,424.41
1,336,708.74
1,954,008.83
1,571,570.73
1307,894.11
943,226.82
603,911.10
1,192,640.20
709,214.53
158,205.51
316,878.54
41,775.47
71,004.12
TLC1 ($)
2,009,765.72
2,331,105.89
1,318,531.47
1,931,067.75
1,548,994.14
1,278,401.39
9,34,048.86
5,77,897.06
1,190,148.03
6,97,887.12
155,917.12
311,749.94
41,775.47
67,447.80
TLC2 ($)
24,493.01
86,318.52
18,177.27
22,941.08
22,576.59
29,492.72
9177.96
26,014.04
2492.17
11,327.41
2288.40
5128.60
0.00
3556.32
CRTLC ($)
Pairwise savings + Two OPT
Table 54.4 Solutions for Aegean towns data sets (DS1-DS14)
1.22
3.70
1.38
1.19
1.46
2.31
0.98
4.50
0.21
1.62
1.47
1.65
0.00
5.27
CRTLC (%)
14.78
13.43
15.29
14.82
16.97
17.81
32.52
20.15
8.79
17.17
4.67
9.36
0.00
7.35
CRCS (%)
7.7094
7.6524
5.2370
5.5148
3.2510
3.2687
1.8156
1.8453
0.8668
1.1060
0.2454
0.2900
0.0552
0.0619
CPU1
2.2544
9.8318
0.9032
4.6046
1.4088
0.6891
0.3967
2.6539
0.1553
0.3827
0.0398
0.1212
0.0095
0.0101
CPU2
0.3056
0.3566
0.0579
0.3102
0.0772
0.0529
0.0355
0.3145
0.0182
0.0487
0.0440
0.0263
0.0039
0.0119
CPU3
10.27
17.84
6.20
10.43
4.74
4.01
2.25
4.81
1.04
1.54
0.33
0.44
0.07
0.08
CPUt
710 54 Shipment Consolidation Practice Using Matlog …
54.5 Computational Experiments
711
Number of Shipments
60 50 40 30 20 10 0 0
2
4
6
8
10
12
14
8
10
12
14
8
10
12
14
Data Set NS
NDS
NCS
Fig. 54.4 Shipment information for Aegean datasets
Cost reduction rate (%)
35 30 25 20 15 10 5 0 0 Cost Reduction in TLC (%)
2
4
6 Data Set
Cost Reduction in Consolidated Shipments (%)
Fig. 54.5 Cost reduction rates for Aegean datasets (%) 18
Calculation times (Sec)
16 14 12 10 8 6 4 2 0 0
2
4
6 Data Sets
CPU Time for pairwise savings calculation (seconds) CPU Time for 2-Opt Algorithm (seconds)
Fig. 54.6 CPU times for Aegean datasets (sec)
CPU Time for C-W Saving Algorithm (seconds) Total CPU Time (seconds)
712
54 Shipment Consolidation Practice Using Matlog … 35
Number of Shipments
30 25 20 15 10 5 0 15 NS
16
17
18
19
20
21
22
23
24
25
26
27
28
Data Set
NDS
Fig. 54.7 Shipment information for Turkey data sets
Cost reduction rate (%)
25 20 15 10 5 0 15 CRTLC (%)
16
17
18
19
20
CRCS (%)
21 22 Data Set
23
24
25
26
27
28
23
24
25
26
27
28
Fig. 54.8 Cost reduction rates for Turkey data sets (%)
Calculation times (sec)
3
2
1
0 15 CPU1
16 CPU2
17 CPU3
18
19 CPUt
20
21
22
Data Set
Fig. 54.9 CPU times for Turkey datasets (seconds)
16
18
18
DS22
15
30
DS28
13
12
8
9
5
2
2
6
0
0
6
4
0
0
NCS
1,063,385.16
862,741.40
1,428,021.13
1,376,820.12
617,631.83
1,297,769.26
1,123,305.75
726,421.89
551,399.73
878,945.16
468,573.14
439,692.59
916,394.98
273,513.39
TLC1 ($) 273,513.39
1,026,281.00
836,994.60
1,414,315.00
1,341,289.37
607,838.93
1,285,826.50
1,123,158.61
706,751.58
551,399.73
878,945.16
447,107.94
436,877.25
916,394.98
37,104.16
25,746.79
13,706.13
35,530.76
9792.89
11,942.76
147.13
19,670.31
0.00
0.00
21,465.21
2815.34
0.00
0.00
CRTLC ($)
Pairwise savings + Two OPT TLC2 ($)
3.62
3.08
0.97
2.65
1.61
0.93
0.01
2.78
0.00
0.00
4.80
0.64
0.00
0.00
CRTLC (%)
NDS: Number of direct shipments NCS : Number of consolidated shipments TLC1 : Total logistics cost before consolidation TLC2 : Total logistics cost after consolidation CRTLC (%): Cost reduction in tlc (%) CRCS (%): Cost reduction in consolidated shipments (%) CPU1,2,3 (Sec) : Calculation times for pairwise saving, c-w saving algorithm, and 2-opt algorithm CPUt : total calculation time (1 + 2 + 3).
17
17
18
25
30
DS26
16
25
DS25
DS27
15
20
20
DS23
DS24
12
15
18
15
DS20
15
DS19
8
6
DS21
12
12
DS17
DS18
10
10
10
10
DS15
DS16
NDS
NS
Data sets
Table 54.5 Solutions for Turkey data sets (DS15-DS28)
19.36
10.35
9.88
14.29
12.86
18.09
2.19
19.59
0.00
0.00
9.77
7.07
0.00
0.00
CRCS (%)
1.8392
1.8660
1.2720
1.2687
0.8023
0.8020
0.6676
0.6594
0.4496
0.4559
0.2826
0.2841
0.2007
0.1970
CPU1
0.9711
0.3090
0.2790
0.2842
0.1606
0.2219
0.0987
0.1325
0.0673
0.0542
0.0787
0.0308
0.0114
0.0119
CPU2
0.0798
0.0501
0.0221
0.0379
0.0179
0.0075
0.0080
0.0248
0.0001
0.0001
0.0173
0.0123
0.0002
0.0012
CPU3
2.89
2.23
1.57
1.59
0.98
1.03
0.77
0.82
0.52
0.51
0.38
0.33
0.21
0.21
CPUt
54.5 Computational Experiments 713
714
54 Shipment Consolidation Practice Using Matlog …
54.6 Conclusion In this study, the shipment consolidation problem is addressed. In addition to TL and LTL shipments, the shipments to be consolidated for the minimization of the transportation costs are also determined. Real datasets were created using Turkey’s real city and town locations and population densities to demonstrate the economic impact of consolidation on transportation costs. Savings were achieved at up to 5.27% of the total cost in the test sets which load consolidation can be carried out. On the other hand, this rate can reach up to 32.15% of the total cost of consolidated shipments. The consolidation process can be performed at twenty-three of twentyeight data sets. The initial total transportation cost of the shipments in the thirty datasets is 26,683,337.38 dollars. This amount decreased to 26,241,431.80 dollars. The monetary value of the savings achieved is $ 441,905.58 after consolidation. A savings rate of 1.65% was achieved when all data sets were taken into consideration. The most significant benefit of the distribution approach is the limited amount of data that must be revealed to the public for them to participate in the consolidation process. This has the potential to greatly increase the number of deliveries that are available for consolidation. On the other hand, with collaborative transportation planning, the need for third-party logistics providers can be significantly reduced.
54.7 Appendix 1 Some of the Solution Graphics for Aegean Town Data Sets
54.8 Appendix 2 Some of the Solution Graphics for Turkey Data Sets
Data
Independent shipment
Multi-stop consolidated shipments
DS7
DS6
DS5
sets
54.8 Appendix 2 Some of the Solution Graphics for Turkey Data Sets
715
716
54 Shipment Consolidation Practice Using Matlog …
Data
Independent shipment
Multi-stop consolidated shipments
DS21
DS20
DS19
sets
References 1. Tanya¸s M (2011) Lojistik Yönetimi. Lojistik Temel Kavramlar (Lojisti˘ge Giri¸s). In: Tanyas M, Hazır K (eds), Ça˘g University Press, Mersin 2. Ballou RH (1999) Business logistics management, 4th edn. Prentice-Hall International, US 3. Ravindran AR, Warsing DP (2013) Supply chain engineering: models and applications, 1st edn. CRC Press Taylor & Francis Group, New York
References
717
4. Hall RW (1987) Consolidation strategy: inventory, vehicles and terminals. J Bus Logist 8(2):57– 73 5. Mesa-Arango R, Ukkusuri SV (2013) Benefits of in-vehicle consolidation in less than Truckload freight transportation. Transport Res Part E: Logistics Transport Rev 60(2013):113–125 6. Baykasoglu A, Kaplanoglu V (2011) Evaluating the basic load consolidation strategies for a transportation company through logistics process modelling and simulation. Int J Data Anal Techniques Strategies 3(3):241–260 7. Kay MG (2016) Matlog: logistics engineering using MATLAB. J Eng Sci Design 4(1):15–20 8. Cetinkaya S, Bookbinder J (2003) Stochastic models for the dispatch of consolidated shipments. Transport Sci Part B 38(2002):747–768 9. Higginson JK, Bookbinder JH (1995) Markovian decision process in shipment consolidation. Transp Sci 29(3):242–255 10. Bookbinder JH, Cai Q, He QM (2011) Shipment consolidation by private carrier: the discretetime and discrete quantity case. Stoch Model 27(4):664–686 11. Higginson JK (1995) Recurrent decision approaches to shipment-release timing in freight consolidation. Int J Phys Distrib Logist Manag 25(5):3–23 12. Gupta YP, Bagchi PK (1987) Inbound freight consolidation under just-in-time procurement: application of clearing models. J Bus Logist 8(2):74–94 13. Bookbinder JH, Higginson JK (2002) Probabilistic modeling of freight consolidation by private carriage. Transport Sci Part E 38(5):305–318 14. Li Z, Bookbinder JH, Elhedli S (2012) Optimal shipment decisions for an air freight forwarder: formulation and solution methods. Transp Res Part C 21(1):17–30 15. Klincewicz JG, Luss H, Pilcher MG (1990) Fleet size planning when outside carrier services are available. Transp Sci 24(3):169–182 16. Tyan JC, Wang FK, Du TC (2003) An evaluation of freight consolidation policies in global third party logistics. Omega 31(1):55–62 17. Cheong MLF, Bhatnagar R, Stephen CG (2007) Logistics network design with supplier consolidation hubs and multiple shipment options. J Indust Managem Optimization 3(1):51–69 18. Pichpibul T, Kawtummachai R (2012) An improved Clarke and Wright savings algorithm for the capacitated vehicle routing problem. ScienceAsi 38(3):307–318 19. Pooley J, Stenger AJ (1992) Modelling and evaluating shipment consolidation in a logistics system. J Bus Logist 13(2):153–174 20. Pooley J (1993) Exploring the effect of LTL pricing discounts in the LTL versus multiple stop TL carrier selection decision. Int J Logist Manag 4(2):85–94 21. Clarke G, Wright JW (1964) Scheduling of vehicles from a central depot to a number of delivery points. Oper Res 12(4):568–581 22. Chu CW (2005) A heuristic algorithm for the truckload and less-than-truckload problem. Eur J Oper Res 165(3):657–667 23. Bolduc MC, Renaud J, Boctor F (2007) A heuristicfor the routing and carrier selection problem. Eur J Oper Res 183(2):926–932 24. Bolduc MC, Renaud J, Boctor F, Laporte G (2008) A perturbation metaheuristic for the vehicle routing problem with private fleet and common carriers. J Operational Res Soc 59(6):776–787 25. C´ot˙e JF, Potvin JY (2009) A tabu search heuristic for the vehicle routing problem with private fleet and common carrier. Eur J Oper Res 198(2):464–469 26. Potvin JY, Naud MA (2011) Tabu search with ejection chains for the vehicle routing problem with private fleet and common carrier. J Oper Res Soc 62(2):326–336 27. Nguyen C, Dessouky M, Toriello A (2014) Consolidation strategies for the delivery of perishable products. Transp Res Part E 69(2014):108–121 28. Kay MG, Karagul K, Sahin ¸ Y, Gunduz G (2021) Minimizing total logistics cost for long-haul multi-stop truck transportation. Transp Res Rec 2676(2):367–378
Chapter 55
The Imminent but Slow Revolution of Artificial Intelligence in Soft Sciences: Focus on Management Science ˙ Samia Chehbi Gamoura, Halil Ibrahim Koruca, and Ceren Arslan Kazan
55.1 Introduction Nowadays, we are not afraid of machines as we daily use “smart” things: smartphones, smart-TVs, GPS, home assistants, in our offices, at home, in remote-work, and many other places [1]. However, in reality, Artificial Intelligence (AI) is more profound than just being something associated to those connected objects [2]. More than a simple digital processing support in machines, AI techniques have the capacity to make predictions autonomously; they can also take actions independently without the involvement of humans [3]. Unlike the technical sciences and engineering (so-called “hard sciences”), in “soft sciences,” the AI technology is often associated with “threats” and evokes “reform” of the traditional civilization [4]. For humanists, the simple analysis of the expression “Artificial Intelligence” indicates the word “artificial” that may advocate “un-humanization” [5]. For social scientists, AI is regarded as something that should be rejected because it denies the civilization from the social values, creates distancing, and weakens the inter-human interactions [6]. In organizational management research, AI applications have a reputation for generating job losses and inequality at workplaces [7]. These are a few motives that encourage the movement of “fears” again the AI progress in soft sciences. Therefore, the research discusses the justification for these fears increase [8]. On the one hand, research studies and projects in soft and trans-disciplines that support the accommodation of the innovation by positively accepting AI are
S. Chehbi Gamoura (B) EM Strasbourg Business School, HuManiS (UR 7308), Strasbourg University, Strasbourg, France e-mail: [email protected] H. ˙I. Koruca · C. Arslan Kazan Department of Industrial Engineering, Süleyman Demirel University, 32260 Isparta, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_55
719
720
55 The Imminent but Slow Revolution of Artificial Intelligence in …
increasing persistently [9]. On the other hand, the research works that accuse AI negatively are also growing in social sciences [10]. However, the literature examination reveals that both of these research lines struggle in providing material proof. Therefore, this paper positions specifically in this research gap with a twofold purpose: First, we try to provide concrete materials of the underlying AI paradoxical trend in soft sciences between imminence and slowness. Second, we intend to discern the key arguments that explain the confusing movement of AI intrusion in soft sciences. Many research works about these questions exist but distinctly, in qualitative forms for most of them. Thus, a quantitative form is missing. This constitutes the main object of this research. The paper is structured as follows. First, we primarily present a background of the most pertinent previous researches in Sect. 55.2. We then present the problem and the research gap in Sect. 55.3. Section 55.4 focuses on our methodology of research including a qualitative-quantitative analysis, related discussion through statistical key figures, illustrations, and outcomes. Finally, we conclude the paper in Sect. 55.5 with a summary and a brief arguments about the limits and the future works.
55.2 Background and Related Works 55.2.1 Artificial Intelligence The foundation of the concept “Non-natural Intelligence” appeared for the first time with Alan Turing in 1936 [1]. Turing revealed that any reasoning procedure could be replicated by a machine [2]. He stated later, in 1950, that we could talk about “Artificial Intelligence” when an interacting human cannot differentiate between an interlocutor whether he is a human or a machine (called the “Turing test”) [11]. The initiation of AI in soft sciences is not new [12]. However, its intensity began increasing since the birth of the Y generation, associated with the advent of massive digitalization of the world and societies [13]. E. Rich is one of the first scientists who claimed in [14] that AI approaches are more suitable to humanities than exact sciences (i.e., mathematics). He argued that AI approaches are mainly based on symbolic computing and formal-based abstracting rather than facts and numbers. Since the concepts that are operated in humanities studies are typically symbols, there may be common points the two sciences can share here. If we focus more on understanding soft sciences and AI studies, we can find other common grounds. For instance, historical and archaeological studies employ relationships and causality for hidden patterns detection, which is familiar with Machine learning (ML) approaches [15]. Likewise, organizational management theories try to extract concepts from collaborative environments, which is the precise purpose of Knowledge and Representation (KR), a sub-field of AI [16]. Furthermore, the tokenization technique plays a central role in languages, translation, and interpreting researches, where it represents the backbone of Natural Language Processing (NLP), one of the most prolific branches of AI [17] The most common feature that makes AI close to humanities and social sciences is “reasoning” [18, 19]. Social fields like sociology, psychology,
55.2 Background and Related Works
721
Fig. 55.1 Proposed visual nomenclature between several sciences to situate artificial intelligence (with its branches)
education, and others need, actually, reasoning in addition to Expert Systems (ES), Optimization meta-heuristics, and Machine Learning [20] (ML) (Fig. 55.1).
55.2.2 Soft Sciences In “soft science,” the term “soft” implies to differentiate a class of disciplines from the “hard” ones. In a few words, the hard sciences focus on the observable entities and rigorous norms sustained by the theoretical fundamentals and experimentations. The soft sciences, on the contrary, deal with the “living” and the “abstract mind” of these entities [21]. Nevertheless, sometimes the frontiers between the two classes of sciences are fuzzy [22], such as the multidisciplinary fields like management and economics that can embed hard sub-disciplines as statistical econometrics. Another example is psychology, which broadly uses mathematical modeling to profile characters [23]. Several scientists disapprove of this separation of opposing “soft” to “hard” in sciences [21]. The profitable placements of research projects and funding probably affect the investments in AI applications for soft sciences, compared to the vast amounts of funding spent in AI research for hard sciences [22]. However, the shared conviction that AI is part of technologic innovation makes sense to invest more in “hard sciences” like engineering in priority, more than the other sciences. A clear-cut definition of “soft science” does not exist, and several delineations contrast in the literature [22]. Regardless of the diversity of these definitions, they often refer to the perceived methodology of research and not the research material. Soft sciences cover many disciplines and sub-disciplines that are challenging to list
722
55 The Imminent but Slow Revolution of Artificial Intelligence in …
Fig. 55.2 Differentiation between soft-pure and soft-applied disciplines in soft sciences according to [24]
exhaustively in single research work. Among them, we find psychology, history, archaeology, politics, anthropology, and so on in “soft-pure” disciplines and law, economics, education, management, and so forth in “soft-applied” fields in [24] classification (Fig. 55.2).
55.2.2.1
Focus on Management
Management theories and thoughts depend primarily on the human factors of organizations despite the automation of several parts in these organizations [7]. Therefore, “management science” belongs to soft-applied discipline, according to [24]. The owner of the Nobel Prize in economics “Herbert Simon” stated in 1956, that machines would be able to perform any task human-being can do [25]. Yet, half a century later, we still far from the scenario of Simon and humans continue performing the majority of processes in organizations by themselves. The foremost need of practitioners and academics in management comes from the side of decision-making [26]. With the integration of AI algorithms in processes, the automation of decision-making brings changes in the way to organize and perform the work [27]. We know that the complexity in decisions mostly comes from uncertainty, which is expected by nature, as the core part of management is planning for future actions [10]. In addition to decision support, the integration of AI algorithms spread in various levels: in workflow management to benefit from automation [28] in conflict management to take advantage of the adaptability [29] in the leadership requirements to find support from optimization facilities [30] and so forth.
55.3 Problem Position and Research Gap
723
During the short period of the last twenty years, the feeling of AI acceleration in disrupting organizations increased [6]. However, t sentiment of acceleration also generated another sentiment of losing control and he fear from the AI effect [31]. Besides, the empowerment of Cloud Computing and Open Data by IT services pushed AI applications up in the non-technical fields without any need for IT skills. That points out so-called “AI democratization” [13]. For example, Google solutions of Speech-To-Text techniques, used mainly in enterprises, use soft robots-interprets in linguistics without technical experts in AI.
55.3 Problem Position and Research Gap The soft sciences concepts consist of theories and hypotheses related to human-being aptitudes and associated with mental perceptions (sentiment, opinion, convictions, beliefs) [21]. However, socialists in soft sciences have a different angle of thinking and abstraction in perceiving the world unlike those in hard sciences [22]. For a sociologist, a humanist or a manager, the first idea in mind when thinking about AI is “replacing the mental perception and the human-thinking”, or in other words “the un-humanization of the society”. That point is the primary impediment in promoting the replacement of the mental part (sentiments) which is, in fact, the essence of soft sciences. In the side of academic management, research works that support job destruction caused by AI are plenty [6]. Some of them conducted forecasting and quantitative methods to assess the damages in the labor markets [10]. Some others go a step further and foresee a potential crisis in the employment that would generate a disaster in the societies [8]. Likewise, abundant other researchers predict a real threat of automation and its impact on societies [32]. Others, they support the phenomenon of civilizations “mechanization” into extreme fantasy scenarios with the so-called movement of “AI Takeover” [33]. Our transversal literature review from most of these existing researches categorizes them into three main drawbacks. These drawbacks found, in fact, the basis of the research gap on which our work is based: (1) The holistic view: Some scientists suppose that the intensification of machinemade tasks is uniformly scattered across all the sectors, regardless of the nature and the extent of the tasks being automated [4, 13, 26, 34, 35] a so forth. (2) Dissolving assumptions: Some research works assume that when machines supplant humans in performing tasks, the whole place of the human dissolve instantly [36–40] and others. (3) Inelastic changes: Some other academic publications adopt the idea of inelasticity of changes due to AI integration. They assume that any change in the skill-based profile induces indeed to unemployment and working conditions [6, 7, 31, 41]. Nonetheless, the research works that share the specific focus on the negative side of AI disruption omit the staff preparation and training, the evolution of education programs, the mutation of society toward the digital generations, and many other
724
55 The Imminent but Slow Revolution of Artificial Intelligence in …
factors. Our study’s analysis reveals that the research seems to oscillate between the accusers and the defenders of AI for soft sciences. No one research tries to study the paradoxical trend of AI evolution by analyzing the factors of the overview movement on both sides: the positive and negative views. That would be essential to evaluate more precisely the real effects of AI integration in soft sciences.
55.4 Proposed Approach and Methodology The quantitative and qualitative investigations based on the examination of the academic research are expected to clarify the trends of AI evolution and determine the limits of knowledge in this topic [3]. In this paper, we develop a quantitative– qualitative analysis starting with the early works in the field since 2010. However, due to the breadth of the soft science domains, applying our approach to a comprehensive analysis would be beyond the scope of a single article. Therefore, we focus on the management domain in our case. However, the same approach can be applied to any other domain. Over our examination in the preliminary phases of literature scanning, we highlighted an apparent absence of a balanced study gathering AI’s positive and negative effects. We also noticed the inconsistency in the use of AI techniques over time.
55.4.1 Investigating the Contrasted Investments on AI Research in Management Science: Visual Analysis To understand the relationship between researches cultures in countries underpinning the efforts of in investment, we overblown the figures by outlining our Dataset of publications in density. We used VOSviewer® in mapping the intensity of cooccurrence between countries (Fig. 55.3). VOSviewer® [42] is an analytics and mapping software widely used by scientists to generate density and association maps. The VOSviewer® map of Fig. 55.3 shows a concentration on the countries usually known as leaders in public investment and expenses on related-AI research projects in the last decade: the United States, United Kingdom, and Russia. Convergences occurred between these leader countries and others by co-authoring and research associations: South Africa, Singapore, India, Spain, and Saudi Arabia. Around those clusters, small micro-clusters of several countries are close, but edges are imprecise, displaying that the junction between different emerging research threads is probably happening. This result is an essential material of evidence that indicates the dependence of research on AI in research projects investments, which is unavoidably linked to the research culture in each country. Some European countries are missing in the map, like Germany and France, because their efforts in AI investments are majorly oriented to the hard sciences such as industry, manufacturing, medical research, and biology.
55.4 Proposed Approach and Methodology
725
Fig. 55.3 Density-based clustering map of AI in management research by using VOSViewer® [42]
55.4.2 Investigating AI Research in the Sub-fields of Management Science: Quantitative Analysis Table 55.1 lists the most relevant references that have discussed or applied AI in the subfields of management, taken from our cross-sectional literature review. A meticulous analysis was conducted to match each branch of AI in these contributions, resulting in some key numbers (percentages). The matrix in Table 55.1 shows an unequal use of AI among the different sub-disciplines of organizational management. Machine learning (ML) accumulates, by itself, more than the third of publications (38.37%) that are concentrated in a heterogeneous distribution. While the applications based on Representation and Knowledge (RK) take the second top ranking with 24.42%. This finding was expected, as management is familiar with ML algorithms for prediction, classification, and regression, particularly in marketing and finance. The primarily used topics include decision-making [43], workforce [27], and recruitment processes [31]. On the contrary, the other AI branches of Expert Systems, Adaptive Systems, and Automated Optimization are less used. Nevertheless, the most important discovery is the total absence of publications in two AI branches: Formal Computing (FC), which is expected because the organization of work topics is far from symbolization and computing. However, the missing researches in Natural Language Processing were unforeseen in our hypothesis. In addition, some researchers combine several AI techniques to tackle management problematics, such as [36, 44, 45]. During the examination of literature, we also noticed that many publications oriented research in the opposite direction to try finding the way for the “humanization” of machines. Besides, AI ethics is becoming an independent transdisciplinary
Organizational management sub-fields*
Organizational management sub-fields*
. . . . . . . . . . . . . .
. . . . . . . . . . . .
Bouajaja and Dridi [52] E Silva and Costa [38]
Networking, partnership and Jin and Zhang [53] structuration Yao [54] Paterakis et al. [55] Ponsteen et Kusters [40] Chiravuri et al. [36] Zhen et al. [44] Tkachenko et al. [56] Lee et al. [57] Kelly [39]
Metcalf et al. [43]
Organizational risk management
. . . .
. . .
Kou et al. [60] Poh et al. [61]
Hecker [16]
.
Organizational management von Krogh [12] theories Carley and Prietula [59]
Leadership and entrepreneurship
Conflict and collaboration
Resources management and tasks allocation
.
.
.
Turban et al. [50] .
.
.
Jarrahi [25]
Decision making in organizational management
Huang et al. [51]
RK*
ES*
References
Branches
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
FC*
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
AS*
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
NLP*
Artificial ıntelligence branches*
Table 55.1 The most relevant references that discussed and applied AI in management sub-fields
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ML*
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
MV*
(continued)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
AOPS*
726 55 The Imminent but Slow Revolution of Artificial Intelligence in …
Organizational management sub-fields*
Table 55.1 (continued)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Agrawal et al. [10] Geetha and Bhanu [31] Mohapatra and Sahu [63] van Zoonen and Toni [64] Grover et al. [3] West et al. [65] Jebelli et al. [66] Facchinetti et al. [67] Legg et al. [68] Ryan et al. [70] Frank et al. [45] Korinek [47] Aghion et al. [13] Brynjolfsson et al. [4] Caron et al. [72] Pinho et al. [73]
Organization of work, workflow, and processes
Profit, productivity, and economics
Career, training and skills management
Wellbeing, safety, security at workplace
Images, communication, and branding
Ransbotham et al. [71]
Liu et al. [69]
Frey and Osborne [6]
.
.
Recruitment and employment
.
.
Ajit [7]
Labor force, workforce, and Woolley et al. [62] human resources Sion [27]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(continued)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55.4 Proposed Approach and Methodology 727
Competitiveness
Social, legal and psycho-social studies
Behavioral studies and cognitive studies
. . .
. . 4.65
Meihami and Meihami [76] Shen [77] %
24.42
.
Demir et al. [9] .
.
.
Kim et al. [75] .
.
.
Lee et al. [58]
.
.
Malone and Bernstein [74] Brougham et Haar [41]
. .
. .
Dirican [32] Abubakar et al. [2]
.
0.00
.
.
.
.
.
.
.
.
.
15.12
.
.
.
.
.
.
.
.
.
0.00
.
.
.
.
.
.
.
.
.
38.37
.
.
.
.
.
.
.
.
.
2.33
.
.
.
.
.
.
.
.
.
15.12
.
.
.
.
.
.
.
.
Note ES*: expert systems, RK*: representation and knowledge, FC*: formal computing, AS*: adaptive systems, NLP*: natural language processing, ML*: machine learning, MV*: machine vision, AOPS*: automated optimization, planning, and scheduling
Table 55.1 (continued)
728 55 The Imminent but Slow Revolution of Artificial Intelligence in …
55.4 Proposed Approach and Methodology
729
discipline connecting AI expertise with ethics and law studies [46]. However, these publications are not considered and were consequently excluded in our analysis.
55.4.3 Investigating AI Impacts on Management Research: Qualitative Analysis According to the obtained results of the analysis below, we deduct five relevant factors that affect the revolution of AI in soft sciences positively and encourage the progress in Table 55.2. In management, many predictors of labor markets are warning about impacts of AI integration over the next few years that could conduct to substantial job losses and therefore surge the social inequalities [10]. Some state that jobs will disappear and henceforward amplify the social and income disparities [47]. A recent research report published in the famous journal “Technological Forecasting & Social Change” [6] outlined the rise of social inequalities as an adverse effect of AI-automation. The authors published a quantified assessment approach and predicted that 35% of employees in the UK and 47% in the USA are at risk of being displaced by computerization over the following decades. The publication was cited more than 6982 times afterward. On the side of organizational management, Data scientists often struggle with qualitative Data that mostly characterize soft sciences because they need to feed AI algorithms with quantitative Data or codify and symbolize Data. Several techniques exist to work around the issue of quantification. The coding methods are the most used, such as Qualitative Comparative Analysis (QCA) [48]. Table 55.2 The extracted relevant positive and negative factors that are affecting the revolution of AI in management (soft sciences) Factors
Factors that affect positively AI in soft sciences
Factors that affect negatively AI in soft sciences
Factor 1
The mutations of generations in societies toward the digital generations (generation Z)
The fear from human-free and automatic-behavior of AI algorithms
Factor 2
The boom of data availability because of the big data and cloud computing paradigms
The alarming studies about job losses and displacement in the labor markets
Factor 3
The ability of contents’ deduction and hermeneutics and knowledge mining in AI
The lack of quantitative Data and the incessant need for quantification and transcoding
Factor 4
The “Trivium” and “Quadrivium” The lack of the “intuitive” mode of the dissections in soft sciences lead to adopting decision in AI approaches AI
Factor 5
The ubiquitous digital due to the proliferation of social media and smart mobiles
The resistance to changes in social sciences because of the non-digitalized generations
730
55 The Imminent but Slow Revolution of Artificial Intelligence in …
Despite all this range of approaches in quantifying the Data, quantification is still the most challenging drawback for statistical-based algorithms to perform in humanities and social disciplines. When situations are too uncertain, ambiguous, and without any precedent records, the intuitive model of a decision may be more helpful [2]. As we know, the first enemy of AI approaches is the lack of historical Data and information. Without those conditions of precedency, the AI is blind [49]. Unfortunately, problems extending from worldwide disasters can dislocate the information-based decisions made through rationality, logic, and probability analysis. Therefore, algorithms are still ill-armed to tackle this kind of problem. The COVID-19 pandemic is the best example of those conditions where all AI approaches revealed useless and incapable of predicting the social impact and intensity of spread during the first months of the pandemic between January and July 2020 in Europe [37].
55.5 Discussion and Outcomes Based on the study above, through our visual, quantitative, and qualitative analysis, we conclude five main negative factors of slowness affecting the progress of AI in soft sciences. To better illustrate the challenging phenomenon of positive–negative effects that influence the use of AI in soft sciences, we synthetize in Fig. 55.4 the evolution of works. We conducted the shape of trends according to the levels of maturity and intensity of AI approaches. This was conducted going from the descriptive analytics in AI (Level 1), where research in soft sciences is better prepared and more mature, toward the more advanced cognitive analytics of AI (Level 5) where research in soft sciences is struggling to find a place and lacks in development.
55.6 Conclusion, Limitations, and Perspectives Soft Science’s concerns rely on societies, people, and organizations to deal with many issues such as representing knowledge, revealing hiding patterns, causality mapping, prediction, perspective, and others. These standard features lead to the critical potential in making AI interact with the soft disciplines. This paper examines the AIbased research in soft sciences, specifically in organizational management, using a quantitative–qualitative analysis approach. The proposed approach reveals that these increasing works exploit the standard features shared with AI fields. Examples are information-based reasoning in Expert Systems, automatic auto-evolution in Adaptive Systems, predictive methods in Machine learning, and causality relationships in Formal Computing. The paper discusses figures and trends with argumentation and tries to explain AI tendencies. The extension of this research in the future should investigate the micro-impacts of each of these branches (machine learning, expert systems, adaptive intelligence, etc.) and enable the measurement of these impacts in
References
731
Fig. 55.4 Explanative illustration of the AI evolution in soft sciences
an inclusive collection of fields. The objective would lead the study to reveal how these techniques can generate value-added factors in soft sciences. The limits of this research are related to the wide range and length of the academic literature we have to examine. We could not imbricate all the disciplines, and thus we could not present the trends of the entire fields because that would necessarily exceed the limits on one paper. Therefore, we explored only one meta-discipline— management science—to try to generate significant results. Another weakness is that our results are based on the academic review, extensively changing and expanding every day, especially in AI research. Given the rapidity of AI development and the proportion of research projects, it is likely that the outcomes delivered in this study will be significantly less pertinent in a few years than they are now. Despite these limitations, the research in this paper brings valuable new insight for soft sciences. It offers a view from the binoculars of an AI expert that might be biased for the soft sciences because they certainly have particularities that differ from the engineering fields. The most important outcome of this research is that soft scientists and AI specialists should join their forces and invest in AI capabilities while respecting the AI ethics of use. The AI should be the thinking and creativity enabler and not the killer.
References 1. Chankyu L, Hyeongjoo K (2020) Groundwork of artificial intelligence humanities. Jahr-Eur J Bioethics 11(1):189–207 2. Abubakar AM, Behravesh E, Rezapouraghdam H, Yildiz SB (2019) Applying artificial intelligence technique to predict knowledge hiding behavior. Int J Inf Manage 49:45–57
732
55 The Imminent but Slow Revolution of Artificial Intelligence in …
3. Grover P, Kar AK, Dwivedi YK (2020) Understanding artificial intelligence adoption in operations management: insights from the review of academic literature and social media discussions. Ann Oper Res 1–37 4. Brynjolfsson E, Rock D, Syverson C (2017) Artificial intelligence and the modern productivity paradox: a clash of expectations and statistics. National Bureau of Economic Research, No. w24001 5. Bostrom N (2005) Transhumanist values. J Philos Res 30:3–14 6. Frey CB, Osborne MA (2017) The future of employment: how susceptible are jobs to computerisation? Technol Forecast Soc Chang 114:254–280 7. Ajit P (2016) Prediction of employee turnover in organizations using machine learning algorithms. Algorithms 4(5):C5 8. Ernst E, Merola R, Samaan D (2019) Economics of artificial intelligence: implications for the future of work. J Labor Policy 9(1) 9. Demir KA, Döven G, Sezen B (2019) Industry 5.0 and human-robot co-working. Procedia Comput Sci 158:688–695 10. Agrawal A, Gans JS, Goldfarb A (2019) Artificial intelligence: the ambiguous labor market impact of automating prediction. J Econ Perspect 33(2):31–50 11. Kurzweil R (2004) The law of accelerating returns. In: Alan Turing: life and legacy of a great thinker. Springer, Berlin 12. von Krogh G (2018) Artificial intelligence in organizations: new opportunities for phenomenonbased theorizing. Academy of Management Discoveries 13. Aghion P, Jones BF, Jones CI (2017) Artificial intelligence and economic growth. National Bureau of Economic Research, 23928 14. Rich E (1985) Artificial intelligence and the humanities. Comput Humanit 117–122 15. Orengo HA et al (2020) Automated detection of archaeological mounds using machinelearning classification of multisensor and multitemporal satellite data. Proc Natl Acad Sci 117(31):18240–18250 16. Hecker A (2012) Knowledge beyond the individual? Making sense of a notion of collective knowledge in organization theory. Organ Stud 33(3):423–445 17. Aldabbas H, Bajahzar A, Alruily M, Qureshi AA, Latif RMA, Farhan M (2020) Google play content scraping and knowledge engineering using natural language processing techniques with the analysis of user reviews. J Intell Syst 18. Ramalingam VV, Pandian A, Chetry P, Nigam H (2018) Automated essay grading using machine learning algorithm. J Phys Conf Ser 1000:012030 19. Bang SH (2014) Thinking of artificial intelligence cyborgization with a biblical perspective (anthropology of the old testament). Eur J Sci Theol 10(3):15–26 20. Liu L, Silva EA, Wu C, Wang H (2017) A machine learning-based method for the large-scale evaluation of the qualities of the urban environment. Comput Environ Urban Syst 65:113–125 21. Simms JR (2011) Making the soft sciences hard: the Newton model systems. Res Behav Sci 28(1):40–50 22. Curado C, Henriques PL, Oliveira M, Matos PV (2016) A fuzzy-set analysis of hard and soft sciences publication performance. J Bus Res 69(11):5348–5353 23. Fulmer R, Joerin A, Gentile B, Lakerink L, Rauws M (2018) Using psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: randomized controlled trial. JMIR Mental Health 5(4):e64 24. Dang TNY (2018) The nature of vocabulary in academic speech of hard and soft-sciences. Engl Specif Purp 51:69–83 25. Jarrahi MH (2018) Artificial intelligence and the future of work: human-AI symbiosis in organizational decision making. Bus Horiz 61(4):577–586 26. Coglianese C, Lehr D (2016) Regulating by robot: administrative decision making in the machine-learning era. Geo LJ 105:1147 27. Sion G (2018) How artificial intelligence is transforming the economy. Will cognitively enhanced machines decrease and eliminate tasks from human workers through automation? J Self-Gov Manag Econ 6(4):31–36
References
733
28. Zur Muehlen M (2004) Organizational management in workflow applications–issues and perspectives. Inf Technol Manage 5(3–4):271–291 29. Robertson CB (2010) Organizational management of conflicting professional identities 43:603 30. Naoum S (2001) People and organizational management in construction. Thomas Telford, New York 31. Geetha R, Bhanu SRD (2018) Recruitment through artificial intelligence: a conceptual study. Int J Mech Eng Technol 9(7):63–70 32. Dirican C (2015) The impacts of robotics, artificial intelligence on business and economics. Procedia Soc Behav Sci 195:564–573 33. Walsbergerová T (2018) Laughing at robots: synthesising humour and cyber-paranoia in portrayals of artificial intelligence in Welcome to Night Vale. Eur J Humour Res 6(3):1–12 34. Alaarj S, Mohamed ZA (2017) Do knowledge management capabilities reduce the negative effect of environment uncertainties on organizational performance? A study of public listed companies in Malaysia. Int J Econ Res 35. Biddle JB (2020) On predicting recidivism: epistemic risk, tradeoffs, and values in machine learning. Can J Philos 1–21 36. Chiravuri A, Nazareth D, Ramamurthy K (2011) Cognitive conflict and consensus generation in virtual teams during knowledge capture: comparative effectiveness of techniques. J Manag Inf Syst 28(1):311–350 37. Coombs C (2020) Will COVID-19 be the tipping point for the intelligent automation of work? A review of the debate and implications for research. Int J Inf Manag 55:102182 38. E Silva LC, Costa APCS (2013) Decision model for allocating human resources in information system projects. Int J Project Manag 31(1):100–108 39. Kelly R (2018) Constructing leadership 4.0: Swarm leadership and the fourth industrial revolution. Springer, Berlin 40. Ponsteen A, Kusters RJ (2015) Classification of human and automated resource allocation approaches in multi-project management. Procedia Soc Behav Sci 194:165–173 41. Brougham D, Haar J (2018) Smart technology, artificial intelligence, robotics, and algorithms (STARA): employees’ perceptions of our future workplace. J Manag Organ 24(2):239–257 42. VOSviewer, Leiden University. VOSviewer, 4 Nov 2021. https://www.vosviewer.com/. Accessed 11 Nov 2021 43. Metcalf L, Askay DA, Rosenberg LB (2019) Keeping humans in the loop: pooling knowledge through artificial swarm intelligence to improve business decision making. Calif Manag Rev 61(4):84–109 44. Zhen L, Huang GQ, Jiang Z (2010) An inner-enterprise knowledge recommender system. Expert Syst Appl 37(2):1703–1712 45. Frank MR et al (2019) Toward understanding the impact of artificial intelligence on labor. Proc Natl Acad Sci 116(14):6531–6539 46. McLennan S, Lee MM, Fiske A, Celi LA (2020) AI ethics is not a panacea. Am J Bioethics 20(11):20–22 47. Korinek A, Stiglitz JE (2017) Artificial intelligence and its implications for income distribution and unemployment. Natl Bur Econ Res:24174 48. Kan AKS, Adegbite E, El Omari S, Abdellatif M (2016) On the use of qualitative comparative analysis in management. J Bus Res 69(4):1458–1463 49. An Y, An J, Cho S (2020) Artificial intelligence-based predictions of movie audiences on opening Saturday. Int J Forecast 37(1):274–288 50. Turban E, Sharda R, Delen D (2010) Decision support and business intelligence systems 51. Huang Z, van der Aalst WM, Lu X, Duan H (2011) Reinforcement learning based resource allocation in business process management. Data & Knowl Eng 70(1):127–145 52. Bouajaja S, Dridi N (2017) A survey on human resource allocation problem and its applications. Oper Res 17(2):339–369 53. Jin XH, Zhang G (2011) Modelling optimal risk allocation in PPP projects using artificial neural networks. Int J Proj Manag 29(5):591–603
734
55 The Imminent but Slow Revolution of Artificial Intelligence in …
54. Yao JM (2013) Scheduling optimisation of co-operator selection and task allocation in mass customisation supply chain based on collaborative benefits and risks. Int J Prod Res 51(8):2219– 2239 55. Paterakis NG, Erdinc O, Bakirtzis AG, Catalão JP (2015) Optimal household appliances scheduling under dayahead pricing and load-shaping demand response strategies. IEEE Trans Ind Inform 11(6):1509–1519 56. Tkachenko V, Kuzior A, Kwilinski A (2019) Introduction of artificial intelligence tools into the training methods of entrepreneurship activities. J Entrep Educ 22(6):1–10 57. Lee A, Inceoglu I, Hauser O, Greene M (2020) Determining causal relationships in leadership research using Machine Learning: The powerful synergy of experiments and data science. LeadShip Q:101426 58. Lee VH, Foo AT, Leong LY, Ooi KB (2016) Can competitive advantage be achieved through knowledge management? A case study on SMEs. Expert Syst Appl 65:136–151 59. Carley KM, Prietula MJ (2014) Computational organization theory. Psychology Press 60. Kou G, Chao X, Peng Y, Alsaadi FE, Herrera-Viedma E (2019) Machine learning methods for systemic risk analysis in financial sectors. Technol Econ Dev Econ 25(5):716–742 61. Poh CQ, Ubeynarayana CU, Goh YM (2018) Safety leading indicators for construction sites: A machine learning approach. Autom Constr 93:375–386 62. Woolley AW, Aggarwal I, Malone TW (2015) Collective intelligence in teams and organizations. Handb Collect Intell:143–168 63. Mohapatra M, Sahu P (2017) Optimizing the recruitment funnel in an ITES company: An analytics approach. Procedia Comput Sci 122:706–714 64. van Zoonen W, Toni GL (2016) Social media research: The application of supervised machine learning in organizational communication research. Comput Hum Behav 63:132–141 65. West A, Clifford J, Atkinson D (2018) “Alexa, build me a brand” An Investigation into the impact of Artificial Intelligence on Branding. Bus & Manag Rev 9(3):321–330 66. Jebelli H, Khalili MM, Hwang S, Lee S (2018) A supervised learning-based construction workers’ stress recognition using a wearable electroencephalography (EEG) device. In: Construction research congress pp 43–53 67. Facchinetti G, Addabbo T, Pirotti T, Mastroleo G (2012) A fuzzy approach to face the multidimensional aspects of well-being. IEEE Annu Meet N Amn Fuzzy Inf Process Soc: 1–6 68. Legg SJ, Olsen KB, Laird IS, Hasle P (2015) Managing safety in small and medium enterprises 69. Liu Y, Zhang L, Nie L, Yan Y, Rosenblum D (2016) Fortune teller: predicting your career path. In: Proceedings of Association for the Advancement of Artificial Intelligence Conference pp 201–207 70. Ryan P, Luz S, Albert P, Vogel C, Normand C, Elwyn G (2019) Using artificial intelligence to assess clinicians’ communication skills. Br Med J 364:l161 71. Ransbotham S, Kiron D, Gerbert P, Reeves M (2017) Reshaping business with artificial intelligence: Closing the gap between ambition and action. MIT Sloan Manag Rev 59(1) 72. Caron F, Vanthienen J, Baesens B (2013) Comprehensive rule-based compliance checking and risk management with process mining. Decis Support Syst 54(3):1357–1369 73. Pinho I, Rego A, e Cunha MP (2012) Improving knowledge management processes: a hybrid positive approach. J Knowl Manag 74. Malone TW, Bernstein MS (2015) Handbook of collective intelligence. MIT Press 75. Kim A, Cho M, Ahn J, Sung Y (2019) Effects of gender and relationship type on the response to artificial intelligence. Cyberpsychology, Behav, Soc Netw 22(4):249–253 76. Meihami, B, Meihami H (2014) Knowledge Management a way to gain a competitive advantage in firms (evidence of manufacturing companies). Int Lett Soc HumIstic Sci 3(14):80–91 77. Shen W (2019) Multi-agent systems for concurrent intelligent design and manufacturing. CRC press
Chapter 56
Multi-criteria Decision-Making for Supplier Selection Using Performance Metrics and AHP Software. A Literature Review Elisa Marlen Torres-Sanchez, Jania Astrid Saucedo-Martinez, Jose Antonio Marmolejo-Saucedo, and Roman Rodriguez-Aguilar
56.1 Introduction When COVID-19 arrived at this world, the market that was slowly entering the online economy, soon would change. The world stopped and all its population were forced to locked down themselves into their houses, redirecting their shopping behaviors. With more than three thousand nine hundred million people in quarantine in the world, which is equivalent to half the planet, the e-commerce boom was consolidated Enrico [14]. The change that some countries experienced also affected Mexico, and a survey revealed that 40% of the respondents are spending more time searching for some kind of products and/or services online after the COVID-19 Chevalier [12]. The number of people who preferred to shopping in physical stores drastically decreased (the traditional way), due to pandemic-associated factors such as fear of contagion due to overcrowding, concurrently to these changes in their behaviour, customers became aware of online discounted sales and the practicality of having products delivered at home Chevalier [11]. Before COVID-19, the top two reasons for buying online were E. M. Torres-Sanchez (B) · J. A. Marmolejo-Saucedo Facultad de Ingenieria, Universidad Panamericana, Augusto Rodin 498, Ciudad de Mexico 03920, Mexico e-mail: [email protected] J. A. Marmolejo-Saucedo e-mail: [email protected] J. A. Saucedo-Martinez Facultad de Ingeniería Mecnica y Eléctrica, Universidad Autonóma de Nuevo León, Ciudad Universitaria, San Nicolás de los Garza, Nuevo León, Mexico R. Rodriguez-Aguilar Facultad de Ciencias Economicas y Empresariales, Universidad Panamericana, Augusto Rodin 498, Ciudad de Mexico 03920, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_56
735
736
56 Multi-criteria Decision-Making for Supplier Selection Using Performance …
changes, the practicality of the delivery and the time saving, according to a survey made by Daniela [13]. When e-commerce companies can offer a variety of products and promotions at the same time, the safety of staying at home, the customers started to change the way they see buying something online. According to certain studies, the number of e-commerce users in Mexico is forecast to increase by 2025 to a total of 77.9 million people Rotar [31]. With this panorama, it is essential that now more than ever companies that are entering the world of e commerce understand what the criteria are to consider when the user is buying something online. When buying a product online, the companies need to understand the consumer needs so they can give all the necessary tools for the customer to make the last click. According to a survey made in Mexico back in June 2020, the four factors more important for Mexicans at the time of purchase were quality, price, delivery time and delivery cost Mera [27]. This survey was made with 600 Mexican netizens identifying also that the least attractive characteristics when buying online were the delivery cost and delivery time. When talking about product delivery to the final consumer into an urban area, is essential to bring last mile delivery up to the conversation. The last mile has become a much more important issue in recent years and has brought certain challenges when delivering to the final customer Boysen et al. [7]. It is extremely important that companies entering e-commerce and those already working in this area know the key criteria to consider when choosing a particular distribution channel. The content of this paper is broken down as follows. Section 56.2 detailed a literature review of the different type of models in which the key performance indicators can be structured and sub-section 56.2.1.1 to 56.2.1.6 shows the most important performance measurements utilized in these days and all types of industries. For Sect. 56.3 we encompass the most popular and easiest analytic hierarchy processes to use in the moment of a multi-criteria decision. Sub-section 56.3.2 shows some cases of study for each multi-criteria process. And finally, Sect. 56.4 offers the conclusions and steps to apply the knowledge obtained in this work to any type of case study in which you want to apply it later on.
56.2 Methodology As we mentioned before, the constant growth of e-commerce pushes companies to seek better prices and alliances with great suppliers that will help potentiate their performance. To do this, we carried out a literature review in which those key metrics essential for a top supplier were identified, and categorized according to different authors. And research was also carried out for tools for multi-criteria decision-making that allows us to select the one that best fits the previous metrics mentioned.
56.2 Methodology
737
56.2.1 Literature Review With e-commerce and its constant growth thanks to COVID-19, the selection of logistics providers has become crucial for any organization to be able to grow in the market, stay competitive and demonstrate better results. For this reason, in this section, the cataloging of the most popular performance indicators according to some authors was carried out, and a process of identification, understanding, and selection of them was made. A literature review was carried out to understand the most important metrics for different authors, and 28 papers were analyzed to find a hierarchy for each. The process of research was made in different platforms in which most of the papers were found in Science Direct, then Scopus, Google Academic, and SpringerLink, and the dates of the research contemplates from 2000 up to 2021. According to a literature review, we decided to divide all of the performance metrics founded into six criterion groups and each group was broken down into different key sub-criteria. The groups were defined as follows: 1. 2. 3. 4. 5. 6.
Cost Intangible Information Technology Performance Quality Service The groups mentioned above are not accommodated from most important to least.
56.2.1.1
Cost Metrics
Each of the authors mentioned at least once that the cost metric is one of the most important metrics to consider when we are trying to select a supplier. We know that in this fast paced world, there are a lot of suppliers in the logistics industry that can help you be more efficient, be more competitive but there are also some industries that even with a bad performance, charge you more than you expect. That is why, the cost is not the number one metric for several authors, because it needs to be accompanied with another key metrics. Some of the authors who mentioned cost as one of the most important metrics are: Hwang et al. [19], Karakaya et al. [21], Tu et al. [35], Sagar and Singh [32], Aguezzoul [1], Senthil et al. [34], Ayhan and Kilic [2], Chen [10], Ho et al. [18], Vaidyanathan [36], Garg [15], Luthra et al. [26], Golmohammadi & Mellat-Parast [16], Babbar & Amin [3], Yazdi et al. [38] Appearance: 53.6%.
738
56.2.1.2
56 Multi-criteria Decision-Making for Supplier Selection Using Performance …
Intangible Metrics
Certain authors mentioned that some important metrics to consider inside the qual itative factors, are such as the reputation of the supplier (experience), the financial stability, and geographical location. Some even mentioned that it is not the same to say that you as a company allied with ’W’ supplier rather than ’X’, attracts more market glances. Some of the authors who mentioned an intangible metric as one of the most important are: Sagar and Singh [32], Aguezzoul [1], Senthil et al. [34], Liou [25], Bottani and Rizzi [6], Braglia and Petroni [8] Appearance: 21.4%
56.2.1.3
Information Technology Metrics
When we talk about Information Technology or more commonly known as IT, some authors mentioned that a 3PL supplier with an advance IT are expected to decrease their logistics costs and increase the productivity of the supply chain [36]. The subcriteria considered for the IT metric comprehends data security, how they can react upon an uncertainty related to technology, their system stability, and others. Some of the authors who mentioned an information technology metric as one of the most important are: Hwang et al. [19], Prakash and Barua [29], Aguezzoul [1], Senthil et al. [34], Khaleie et al. [22], Vaidyanathan [36], Bottani and Rizzi [6], Braglia and Petroni [8], Golmohammadi and Mellat-Parast [16] Appearance: 32.1%.
56.2.1.4
Performance Metrics
Within the years, reflected in the papers analyzed, we can see than with the higher demand of the e-commerce and how the customers are asking for the easiest ways to buy online and to receive everything they want, at the moment they want it and wherever they are, the performance metrics has become one of the most critical factors to consider when we are selecting a new supplier. When we talk about performance we are considering more than two or three sub-criteria. One of the most important is the on-time delivery, which is the most mentioned in all of the papers and sometimes the most critical because of the complexity to deliver on time. Some of the authors who mentioned a performance metric as one of the most important are: Karakaya et al. [21], Tu et al. [35], Hwang et al. [19], Yu et al. [39], Chen and Yu [9], Ramlan and Qiang [30], Sarabi and Darestani [33], Aguezzoul [1], Senthil et al. [34], Chen [10], Ho et al. [18], Lau et al. [24], Garg [15], Gosling et al. [17], Golmohammadi and Mellat-Parast [16], Yazdi et al. [38] Appearance: 57.1%.
56.2 Methodology
56.2.1.5
739
Quality Metrics
Talking about quality metrics, we refer to the quality service the supplier can provide to the customer and the continuous improvement of this service. Some of the authors who mentioned a quality metric as one of the most important are: Hwang et al. [19], Tu et al. [35], Sagar and Singh [32], Prakash and Barua [29], Aguezzoul [1], Ho et al. [18], Banaeian et al. [4], Babbar and Amin [3], Yazdi et al. [38] Appearance: 32.1%.
56.2.1.6
Service Metrics
Considering this metric as what can be offered to the final client, the-value added service, and the infrastructure of the supplier to attend any circumstance and give the best outcome to the client. However, the outcome of the research reflect to be the least important metric and maybe because it was considered service as performance in some circumstances. Some of the authors who mentioned a service metric as one of the most important are: Bottani and Rizzi [6], Braglia and Petroni [8] Appearance: 7.1%. In Fig. 56.1 we can observe the percentage of appearance in the papers of each metric. Reaffirming that for some authors it is always important to consider some of the metrics as a group in order to obtain the best income. Even when there is a common characteristic in some industries about performance metrics, such as Cost or Performance, each industry is a focus in different areas. The supply chain considers all those stakeholders who participate in the delivery of a client’s product. In this context, we need to consider that the supplier that delivers the product which is the last part of the supply chain is the most important because it will have the direct contact with our customers and we will depend on their services to deliver what we worked from the beginning of the process. The importance of a good relationship with the delivery provider is fundamental and more important if we are looking for a long relationship in which both can benefit to grow and improve for the client, so the decision for the selection of the same supplier is of utmost importance. Fig. 56.1 Frequency in literature review for each KPI
740
56 Multi-criteria Decision-Making for Supplier Selection Using Performance …
56.3 Results 56.3.1 Criteria for Selecting a Supplier As we mentioned before, the constant pressure from the customer to receive the product as fast as possible and with the lowest cost is increasing exponentially due to some companies offering a same-day service delivery. That translates into changing the game rules, requiring more vehicles, shorten the delivery time, and giving an exceptional service level to the customer. In order to find this supplier that will work as a team with the company, it is necessary to have the right tools, or how we call it, the key performance indicators (KPI’s) that will show us which is the best match for the operational model. For these, there are a lot of varieties that we can look up into the extensive literature available about this topic but also be very careful to understand what exactly we are looking for talking about performance metrics. When we talk about performance metrics, or KPI’s, we can confirm that is quantifiable metrics in which help us to understand the performance of an organization and if they are achieving its goals and objectives Bauer [5]. There are different performance evaluation models in which we can accommodate and formulate the KPIs necessary for the case study in which we want to apply it. It is highly recom mended that organizations use a performance measurement method and improve all projects involved in the achievement of their goals Öztay¸si and Sürer [28]. There are different models that we could use as an example, some of those are explained below: . Balanced Score Card (BSC) which examines the financial point of view. The BSC uses four criteria the financial, customer, business, innovation, and learning views Kaplan et al. [20]. In some studies, the BSC is used with others models to get the bigger picture with the multi-dimensional performance indicators. . Efficient Consumer Response (ECR), this model exists as a strategy for distribu tors and suppliers to maximize customer satisfaction and minimize costs. Seeks to transform a push system into a pull system. . Supply Chain Operation Reference Model (SCOR), this model developed by the Supply Chain Council, helps to increase the Supply Chain effectiveness of any company. The SCOR model is designed as a tool to describe, measure, and evaluate any characteristic from the Supply Chain Chain Kurien and Qureshi [23].
56.3.2 Tools for the Selecting Process The Multi-Criteria Decision-Making, or by its acronym MCDM, has become the fastest growing and most important subdivision for the operations management and the research area for helping to find the best achievable solution. The MCDM can be divided into a pair of groups which are MODM (Multi-Objective Decision-Making) and MADM (Multi-Attribute Decision-Making). For a selection supplier problem,
56.4 Conclusions
741
the best tool to utilize would be the MADM with a Pairwise Comparison Method. This sub-division is based on the decision-making process considering different alternatives, criteria Yalcin et al. [37]. The most popular method for this category is the Analytic Hierarchy Process (AHP) which was first proposed in 1988. To solve the problem in this case, it’s necessary to consider the main objective at the top, the criteria below it, and then all the alternatives at the bottom. It exists the classic method of AHP but it also exists different type of fuzzy versions that can be applied in many applications where it exists vague information that needs to be analyzed. We made research looking for the most popular AHP method online that can be applied in some case studies. . . . .
MakeItRational AHP Software BPMSG Expert Choice Transparent Choice
Each of the above online methods can help you and your organization to under stand and measure how the project are implemented to support the strategic objective the organization is trying to achieve. Always looking to improve the decisions inside a company by leading the projects and making better choices for the resources. Even when AHP is the most popular, there are more other methods that could help in other different situations and with different types of industries.
56.4 Conclusions E-commerce was a big boom within the world, and the necessity to understand where we are standing as a company is extremely important. Understand where we are standing and where we want to go and with whom is more important because taking this decision is not an easy job, that is why for several years, different authors decided to find the best option to help workers to make a decision. Using different performance metrics, the right approach, and method, you can get to an answer for the problem presented, and obtain the best outcome possible for the company without having to use a high amount of investment to bring an expert in the field and help you perform the ranking calculation. Due to the high demand and the technology to which we now have access, it is easier to obtain this calculation through online models that with simple clicks give you the answer or the insight you need to make a decision. We have to consider that even we have the technology at our reach, we still need to make some research about the performance metrics that affect us directly and that we consider important in the moment of the selection of a new supplier and to adapt all to the scenario we are trying to find a solution to.
742
56 Multi-criteria Decision-Making for Supplier Selection Using Performance …
References 1. Aguezzoul A (2014) Third-party logistics selection problem: a literature review on criteria and methods. Omega 49:69–78 2. Ayhan MB, Kilic HS (2015) A two stage approach for supplier selec tion problem in multiitem/multi-supplier environment with quantity discounts. Comput Ind Eng 85:1–12 3. Babbar C, Amin SH (2018) A multi-objective mathematical model integrating environmental concerns for supplier selection and order allocation based on fuzzy qfd in beverages industry. Expert Syst Appl 92:27–38 4. Banaeian N, Mobli H, Nielsen IE, Omid M (2015) Criteria definition and approaches in green supplier selection–a case study for raw material and packaging of food industry. Prod Manuf Res 3(1):149–168 5. Bauer K (2004) Kpis—the metrics that drive performance management. DM Review 6. Bottani E, Rizzi A (2006) A fuzzy topsis methodology to support outsourcing of logistics services. Supply Chain Managem: An Int J 11(4):294–308 7. Boysen N, Fedtke S, Schwerdfeger S (2021) Last-mile delivery concepts: a survey from an operational research perspective. OR Spectrum 43(1):1–58 8. Braglia M, Petroni A (2000) A quality assurance-oriented methodology for handling trade-offs in supplier selection. Int J Phys Distrib Logistics Managem 30(2):96–112 9. Chen F, Yu B (2005) Quantifying the value of leadtime information in a single location inventory system. Manuf Serv Oper Manag 7(2):144–151 10. Chen Y-J (2011) Structured methodology for supplier selection and evaluation in a supply chain. Inf Sci 181(9):1651–1670 11. Chevalier S (n.d.-a) Reasons for buying online during the coronavirus pandemic in Mexico in April 2020. Statista 12. Chevalier S (n.d.-b) Share of consumers who spent more time either searching or buying products online after the covid-19 outbreak in Mexico as of June 2020. Retrieved from https://www. statista.com/statistics/1135008/shareconsumers-increased-online-purchases-covid19-mexico/ 13. Daniela O, L. E. S. O. F. V. A. G. P. O., Ricardo B. (n.d.). Reporte 2.0. impacto covid-19 en venta online mexico. Retrieved from https://www.amvo.org.mx/estudios/reporte-2-impactocovid-19-en-ventaonline-mexico/::text=Reporte%202%20%E2%80%93%20Impacto%20C OVID’%2D19,toma%20de%20decisiones%20de%20negocio 14. Enrico C (n.d.) El efecto de covid-19 en el ecommerce. Retrieved from https://www.forbes. com.mx/el-efecto-de-covid-19-en-el-ecommerce/ 15. Garg RK (2021) Structural equation modeling of e-supplier selection criteria in mechanical manufacturing industries. J Cleaner Prod 311:127597 16. Golmohammadi D, Mellat-Parast M (2012) Developing a grey-based decision making model for supplier selection. Int J Prod Econom 137(2):191–200 17. Gosling J, Purvis L, Naim MM (2010) Supply chain flexibility as a determinant of supplier selection. Int J Prod Econ 128(1):11–21 18. Ho W, Xu X, Dey PK (2010) Multi-criteria decision making approaches for supplier evaluation and selection: a literature review. Eur J Oper Res 202(1):16–24 19. Hwang B-N, Chen T-T, Lin J (2016) 01). 3pl selection criteria in inte grated circuit manufacturing industry in taiwan. Supply Chain Managem: An Int J 21:103–124. https://doi.org/10. 1108/SCM-03-2014-0089 20. Kaplan RS, Norton DP et al (2005) The balanced scorecard: measures that drive performance. Harv Bus Rev 83(7):172 21. Karakaya S, Savasaneril S, Serin Y (2021) Pricing with delivery time informa tion sharing decisions in service systems. Comput Ind Eng 159:107459 22. Khaleie S, Fasanghari M, Tavassoli E (2012) Supplier selection using a novel intuitionist fuzzy clustering approach. Appl Soft Comput 12(6):1741–1754 23. Kurien GP, Qureshi MN (2011) Study of performance measurement practices in supply chain management. Int J Business Managem Soc Sci 2(4):19–34
References
743
24. Lau HC, Lee CK, Ho GT, Pun K, Choy K (2006) A performance benchmarking system to support supplier selection. Int J Bus Perform Manag 8(2–3):132–151 25. Liou JJ (2012) Developing an integrated model for the selection of strategic alliance partners in the airline industry. Knowledge-Based Syst 28:59–67 26. Luthra S, Govindan K, Kannan D, Mangla SK, Garg CP (2017) An integrated framework for sustainable supplier selection and evaluation in supply chains. J Cleaner Prod 140:1686–1698 27. Mera I (2020) Mexicanos le ‘agarran el gusto’ a las compras por internet. El Financiero 28. Öztay¸si B, Sürer Ö (2014) Supply chain performance measurement using a scor based fuzzy vikor approach. In: Kahraman C, Öztay¸si B (eds) Supply chain management under fuzziness: Recent developments and techniques, Springer Berlin, Heidelberg, pp 199–224 29. Prakash C, Barua M (2016) An analysis of integrated robust hybrid model for third-party reverse logistics partner selection under fuzzy environment. Resour Conserv Recycl 108:63–81 30. Ramlan R, Qiang LW (2014) An analytic hierarchy process approach for supplier selection: a case study. In: 3rd International conference on global optimization and its application (ICOGOIA 2014), pp 9–12 31. Rotar A (n.d.) Number of e-commerce users in mexico from 2017 to 2025. Retrieved from https://www.statista.com/forecasts/251662/e-commerce-users-in-mexico 32. Sagar MK, Singh D (2012) Supplier selection criteria: study of automobile sector in India. Int J Eng Res Developm 4(4):34–39 33. Sarabi EP, Darestani SA (2021) Developing a decision support system for logistics service provider selection employing fuzzy multimoora and bwm in mining equipment manufacturing. Appl Soft Comput 98:106849 34. Senthil S, Srirangacharyulu B, Ramesh A (2014) A robust hybrid multi criteria decision making methodology for contractor evaluation and selection in third-party reverse logistics. Expert Syst Appl 41(1):50–58 35. Tu L, Lv Y, Zhang Y, Cao X (2021) Logistics service provider selection decision making for healthcare industry based on a novel weighted density-based hierarchical clustering. Adv Eng Inform 48:101301 36. Vaidyanathan G (2005) A framework for evaluating third-party logistics. Commun ACM 48(1):89–94 37. Yalcin AS, Kilic HS, Delen D (2022) The use of multi-criteria decision-making methods in business analytics: a comprehensive literature review. Technol Forecasting and Soc Change 174:121193. https://doi.org/10.1016/j.techfore.2021.121193 38. Yazdi AK, Wanke PF, Hanne T, Abdi F, Sarfaraz AH (2021) Supplier selection in the oil gas industry: a comprehensive approach for multi-criteria decision analysis. Socio-Econom Plann Sci 101142 39. Yu Z, Yan H, Cheng TE (2002) Modelling the benefits of information sharing-based partnerships in a two-level supply chain. J Operat Res Soc 53(4):436–446
Chapter 57
PID Controller and Intelligent Control for Renewable Energy Systems Pedro Domínguez Alva, Jose Antonio Marmolejo Saucedo, and Roman Rodriguez-Aguilar
57.1 Introduction Before going into the main topic which is “PID controller and intelligent control for Renewable Energy systems”, we must understand the main concepts that encompass this topic, which are: PID controllers and Renewable Energy systems. Let’s start from the simplest to the most complex. We have always heard of renewable energy, but have you ever stopped to think about the types of renewable energy and the different methods that exist to obtain it? There are different types of renewable energy, such as wind energy, geothermal energy, solar energy, etc. As we can see in the paper of Pacesila [18] some of the main characteristics of renewable energies are: • Reduction of climate change: Since we are not burning coal or oil derivatives, greenhouse gases decrease. • Inexhaustible: Gas, coal, oil, etc., are energy sources that we use today and will eventually run out, but renewable energies are considered infinite. • Available to everyone: Currently, this type of energy is being used to provide electricity to areas that are difficult to access. Now that we know a little about renewable energies, let’s switch to PID controllers. As we can see in the paper of Bansal [18] the PID controllers date to 1890s and the majority PID applications are industrial for their simplicity and ease of re- tuning. P. D. Alva (B) · J. A. M. Saucedo Facultad de Ingenieria, Universidad Panamericana, Augusto Rodin 498, Ciudad de Mexico 03920, Mexico e-mail: [email protected] R. Rodriguez-Aguilar Facultad de Ciencias Economicas y Empresariales, Universidad Panamericana, Augusto Rodin 498, Ciudad de Mexico 03920, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_57
745
746
57 PID Controller and Intelligent Control for Renewable …
During this chapter we will discuss various topics of PID controllers, which are its history, operation, various applications until we reach the main topic which are the PID controllers used in renewable energy systems.
57.2 Brief History of PID Controllers and Their Operation The proportional integral derivative or PID was developed by Elmer Sperry, and Approximately 90% of the closed-loop operations of the industrial automation sector use PID controllers [15]. The PID is a mechanism control that through closed-loop feedback allows to regulate the speed, temperature, pressure, and flow among other variables of a process in general. The PID controller calculates the difference between our actual variable against the desired variable. For the design of PID control we need to talk about P, I and D control modes. The following table is based on the articles by Borase and Ang and Chong [4, 8]. • P—Controller: This control provides a proportional overall control action to error signal through the gain factor. That means that the desired point is compared with the actual value, the resulting error must be multiplied by a constant to provide the output. In case the error value is zero, the controller output will still be zero. • I—Controller: The goal of this controller is to reduce the error of P- Controller because this controller generates an offset between the process variable and set point. • D—Controller: The function of this controller is to improve the transient answer using the compensation of one differentiator high-frequency. In other words the Integral controllers are not able to predict a future error so the Derivative Controllers are intended to reduce or eliminate the future error. From the above we can understand that the control signal is the sum of the three terms P, D and Ho and Lin [12] explains that the output of a PID controller is equal to the input and this is calculated from the feedback error explained in the following equation. . u(t) = K p e(t) + K i
e(t)dt + K p
de dt
(57.1)
For a PID controller that works in a closed-loop system which is described by the Eq. (57.1). The variable (e) represents the tracking error, (r ) describes the difference between the desired output and (y) is the actual output. (e) must be fed to the PID controller, and the controller computes both the derivative and the integral. (u) is the control signal to the plant that is equal to the proportional gain (K p ) times the magnitude of the error plus the integral gain (K i ) times the integral of the error plus the derivative gain (K d ) times the derivative of the error.
57.2 Brief History of PID Controllers and Their Operation
747
(u) which is the control signal is fed into the plant and (y) is obtained, this output (y) is compared with the reference to find a new error signal (e). The controller will repeat this process while in operation. Taking the Laplace transform as a reference, and using the Eq. (57.1) a transfer function of a PID controller can be found. Kp +
Kd s + K p s + Ki Ki + kd s = s s
(57.2)
where: K d = Derivative gain K i = Integral gain K p = Proportional gain. The general effects of (K d , K p , K i ) that are the parameter controller, an remember that those are on a closed-loop system are summarized in the table below that was obtain using the papers [9, 13]. Parameter
Rise time
Overshoot
Setting time
Steady—state error
Stability
Kp
Decrease
Increase
Partial increase
Decrease
Degrade
Ki
Decrease
Increase
Increase
Decrease
Degrade
Kd
Partial decrease
Decrease
Decrease
No Change
Improve
Using diferents journals [1, 11] we can make a recapitulation. The PID controller reduces errors and provides greater accuracy and stability in the process. It achieves this by using integral and derivative actions, by eliminating these control deviation errors, more efficient and faster processes are obtained. Graphically represented, the PID controller, along with its inputs and outputs can be seen as follows (Fig. 57.1).
Fig. 57.1 Graphical representation of a PID controller
748
57 PID Controller and Intelligent Control for Renewable …
57.3 Application of the PID Controller With the above explanation the following question may arise, at what point can I use a PID controller? As we were seeing during this topic, PID controllers detect several variables such as temperature, flow, and pressure, for this reason these are found in different industries with thermal processes, speaking in the scientific field are used to control furnace temperatures or pressures and flows of gases that are required in some incubators. Before continuing with this topic, we must be aware that there are different kinds of PID controllers such ON/OFF, proportional, and standard type controllers which will be explained below. The main headline of the following list was taken of [1] and the definitions are a compilation of the following articles: Bajpai, Moradi, Agarwal and Libretexts [5, 15, 17]. ON/OFF Control: Is used in non-critical applications, where the error between the setpoint and plant output can vary with a relatively large amount. Example of this could be the fridge or air conditioner that we have in our houses. So, we can conclude that this kind of control is the simplest one and is used to control the temperature. Proportional Control: “Proportional controllers give an output to the actuator that is a multiple of (proportional to) the error; they respond to the size of the error”. Those controllers are designed to delete the cycle that the ON/OFF Controller generates, this controller has the feature that the controller will try to reach the fixed point maintaining a constant temperature. This can be achieved by switching the output on and off for short periods of time. Standard Type PID Controller: This type of controller combines the P-Control with I-Control and D-Control making that the unit automatically compensate for the modifications of the system. These kinds of controllers are the most precise and steady of the 3 types of controllers. As can we have seen in the above list, there are several types of PID controllers, so the applications can be diverse. The most common application is where the PID controller controls the temperature, where the input is a temperature sensor, and the output is the signal to control a fan or air conditioner. This same principle is used in both industrial and technological applications. The following applications are identified in [1, 6, 8, 22]. • • • • • • •
Temperature Control of Furnace MPPT Charge Controller The Converter of Power Electronics PID for robotic manipulator PID Controller Interfacing PID for biomedical applications PID for mechanical systems.
57.4 Importance of Renewable Energy Systems
749
57.4 Importance of Renewable Energy Systems Renewable energy systems are becoming more and more popular today because the general population wants to lower their carbon footprint by using clean, renewable energy sources. The most common renewable energy technologies are: Some of the items in the following list were taken from the research of Amrouche, Zhao and Alrikabi [2, 3, 23] • Wind energy: This energy is obtained through the wind • Biodiesel: This is an organic fuel used in the automotive industry. The biodiesel is obtained from vegetable oils. • Solar energy: This energy is obtained from the sun, the form of capture of this energy is photovoltaic. • Tidal energy: This energy is obtained from waves. • Hydropower or hydroelectric power: This energy is obtained from rivers and freshwater streams. By using renewable energy systems, benefits can be obtained which are: • Environmental: Using the Renewable Energy Systems can obtain environ-mental benefits reducing the greenhouse gas emission. Zhao mention the following Equation to calculate the pollutant emission [23]: E B = (T Q R ∗ ρc ) ∗
n .
(E Vi ∗ P E i )
(57.3)
i=1
where E B = Environmental benefit. T Q R = Total renewable energy power generation. ρc = Share of coal-fired power plant in national total in term of electricity power generation. E Vi = environmental value of the i th pollutant. P E i = emission amount of the ith pollutant discharged. i = represents the pollutant. • Reduction of dependence on fossil fuels: Nowadays internal combustionvehicles are gradually being phased out and electric vehicles are gainingmomentum. This reduces the consumption of fossil fuels, helping to reducethe carbon footprint. • Technological innovation: Different processes for obtaining renewable energy are being developed daily, which promotes all kinds of technological innovation. This kind of innovation help to reduce the cost between renewable energy power and conventional power.
750
57 PID Controller and Intelligent Control for Renewable …
57.4.1 Methods of Obtaining Renewable Energies As we saw earlier, there are several renewable energy systems, but how do we obtain this type of energy? Let’s talk about how renewable energies are generated, which can be helpful for the next topic [19–21]. • Solar energy: The sun is responsible for heating the earth by means of solar radiation, which travels from the sun through space by means of particles called photons. Photovoltaic solar energy consists of directly transforming solar radiation into electricity. This is achieved by transforming photons into electricity using solar panels, using devices called photovoltaic cells, which are light- sensitive semiconductors. But the main question remains, how does it work? When photons hit the cells, what is known as an electron–hole pair is generated, which through the use of an external circuit forms a movement of electrons, and it is precisely from this movement that electric current is obtained. • hydroelectric power: Hydroelectric power is electricity generated by harnessing the energy of moving water. A classic hydroelectric power plant is a system consisting of three parts: a power plant in which electricity is produced; a dam that can be opened and closed to control the flow of water; and a reservoir in which water is stored. The mode of operation of a hydroelectric power plant is that a flow is supplied from the dam that drives a turbine which in turn generates electricity. • Wind energy: It is a renewable energy source that is obtained from the kinetic energy of the wind. The wind turbine makes the most of the wind’s energy by rotating on its tower. The wind spins the blades which are connected to a rotor which in turn is connected to a gearbox that raises the RPM. This kinetic energy is transferred to the generator which converts it into electrical energy.
57.5 How Can We Link PID Controllers with Renewable Energy Systems? and How Would This Benefit Us? In previous topics we saw how renewable energy systems and PID controllers work. And now we will see how we could integrate them to obtain an intelligent system. In previous topics we saw how both renewable energy systems and PID controllers work. In this topic we will see how we could integrate them to obtain an intelligent system.
57.5.1 Solar Energy There are some articles like the one by Mitra and Swain [16] where they analyze a photovoltaic converter powered by a PID controller. This was done to solve that most modules give a very low output voltage depending on the temperature, so a voltage converter is needed and to keep the output voltage constant it is necessary to use a controller.
57.6 Conclusion and Suggestions
751
57.5.2 Hydroelectric Power For the hydroelectric area, PID controllers are being developed for automatic control. Example of this can be found in Khodabakhshian and Hooshmand [13] article, where the purpose of this PID controller is to automatically control the generations of hydro turbine power systems. In this article it is shown that PID controllers can improve the damping of a power system, offering better performance than other types of controllers.
57.5.3 Wind Energy In the case of wind energy, new types of PID controllers are also being developed, such as the PID controller based on meta-heuristics algorithms. The method of this controller consists of damping the deviations in the speed of the generator. In Elsisi [10] paper he proposes the adjustment of PID controller gains using an IA technique called WOA. This type of controller can cope with wind speed fluctuations and ensures system stability under uncertainties of load variations. Different types of applications for PID controllers are being developed in this area, another example is the “Fractional-order nonlinear PID controller based maximum power extraction method for a direct-driven wind energy system”. This paper, written by Behera [7], proposes an application of fractional-order nonlinear PID controller is proposed in the machine side converter control loop of a 2 MW grid-connected wind energy system.
57.6 Conclusion and Suggestions Throughout this chapter we learned from and understanding the history and function of PID controllers, also, we understood the importance of renewable energy systems. We must remember that all these innovations in the PID controllers are intended to improve the process and, if the process improves, the efficiency of different systems and economic benefits will be greater. Pongamos de ejemplo un país en desarrollo, si lográramos mejor los procesos de obtención de la energía solar, y los costos para obtener esta tecnología fueran menores, las personas de bajos recursos podrían beneficiarse. Currently PID controllers are playing an important role in automating different systems, in this particular case we focus on renewable energy systems, but it is necessary to develop new processes and new technologies to make energy production more efficient.
752
57 PID Controller and Intelligent Control for Renewable …
References 1. Agarwal T (2020, 11). PID controller : working, types, advantages its applications. Retrieved from https://www.elprocus.com/the-working-of-a-pid-controller/ 2. Alrikabi N (2014) Renewable energy types. J Clean Energy Technol 61–64 3. Amrouche SO, Rekioua D, Rekioua T, Bacha S (2016) Overview of energy storage in renewable energy systems. Int J Hydrogen Energy 41(45):20914–20927 4. Ang KH, Chong G (n.d.) Li Y (2005) PID control system analysis, design, and technology. IEEE Trans Control Syst Technol 13(4):559–576 5. Bajpai P (2018) Chapter 24 - process control. In: Ba-jpai P (ed) Biermann’s handbook of pulp and paper (third edition) , 3rd edn, pp 483–492. Elsevier. Retrieved from https://www.scienc edirect.com/science/article/pii/B9780128142387000246 , https://doi.org/10.1016/B978-0-12814238-7.00024-6 6. Bansal HO, Sharma R, Shreeraman P (2012) Pid controller tuning techniques: a review. J Control Eng Technol 2(4):168–176 7. Behera C, Banik A, Nandi J, Reddy GH, Chakrapani P, Goswami AK (2020) A probabilistic approach for assessment of financial loss due to equipment outage caused by voltage sag using cost matrix. Int Trans Electr Energy Syst 30(3):e12202 8. Borase RP, Maghade D, Sondkar S, Pawar S (2020) A review of pid control, tuning methods and applications. Int J Dynam Control 1–10 9. Control Tutorials for MATLAB and Simulink. (n.d.). Introduction: PID controller design. Retrieved from https://ctms.engin.umich.edu/CTMS/index.phpexample=Introduct ion§ion=ControlPID 10. Elsisi M (2020) New design of robust pid controller based on meta-heuristic algorithms for wind energy conversion system. Wind Energy 23(2):391–403 11. Engineering O (2020, 10). What is a PID controller? Retrieved from https://www.omega.com/ en-us/resources/pid-controllers 12. Ho M-T, Lin C-Y (2003) Pid controller design for robust performance. IEEE Trans Autom Control 48(8):1404–1409 13. Khodabakhshian A, Hooshmand R (2010) A new pid controller design for automatic generation control of hydro power systems. Int J Electr Power Energy Syst 32(5):375–382 14. Knospe C (2006) Pid control. IEEE Control Syst Mag 26(1):30–31 15. Libretexts (2021, 03). 9.2: P, I, D, PI, PD, and PID control. Retrieved from 16. Mitra L, Swain N (2014) Closed loop control of solar powered boost converter with pid controller. In: 2014 IEEE international conference on power electronics, drives and energy systems (pedes), pp 1–5 17. Moradi M (2003) New techniques for pid controller design. In: Proceedings of 2003 IEEE conference on control applications, CCA 2003, vol 2. pp 903–908 18. Pacesila M, Burcea SG, Colesca SE (2016) Analysis of renewable energies in european union. Renew Sustain Energy Rev 56:156–170 19. Sahin AD (2004) Progress and recent trends in wind energy. Prog Energy Combust Sci 30(5):501–543 20. Sakurai T, Funato H, Ogasawara S (2009) Fundamental characteristics of test facility for micro hydroelectric power generation system. In: 2009 International conference on electrical machines and systems, pp 1–6 21. Sampaio PGV, González MOA (2017) Photovoltaic solar energy: conceptual framework. Renew Sustain Energy Rev 74:590–601 22. Seraji H (1998) A new class of nonlinear pid controllers with robotic applications. J Robot Syst 15(3):161–181 23. Zhao H-R, Guo S, Fu L-W (2014) Review on the costs and benefits of renewable energy power subsidy in china. Renew Sustain Energy Rev 37:538–549
Chapter 58
Machine Learning Applications in the Supply Chain, a Literature Review Walter Rosenberg-Vitorica, Tomas Eloy Salais-Fierro, Jose Antonio Marmolejo-Saucedo, and Roman Rodriguez-Aguilar
58.1 Introduction Supply chain (SC) always have been the subject of study to pursue improvement and harmony of its components due to its critical impact on companies operations. Bertolini et al. [3] defines SC management as the process of planning, controlling, and executing all logistic flows, from the acquisition of raw materials to the delivery of end product, in the most streamlined and cost-effective way.
Overtime the SC has grown to consider more than just fabrication processes inside the companies. Partnerships with suppliers, retailers and clients have strengthen to get better insights of how to optimize all the chain and define better key performance indicators that help to monitor the processes. Indicators are a fundamental part to measure the performance of every part of the SC. These indicators must be reliable for companies to have better control of their operation. One of the biggest problems the industry face is the inaccuracy of forecasting and the lack of flexibility to the demand. W. Rosenberg-Vitorica (B) Facultad de Ingenieria, Universidad Panamericana, Augusto Rodin 498, Ciudad de Mexico 03920, Mexico e-mail: [email protected] T. E. Salais-Fierro · J. A. Marmolejo-Saucedo · R. Rodriguez-Aguilar Facultad de Ingenieria Mecanica y Electrica,Universidad Autonoma de Nuevo Leon.Pedro de Alba S/N Ciudad Universitaria, San Nicolas de los Garza, Nuevo Leon 66451, Mexico e-mail: [email protected] J. A. Marmolejo-Saucedo e-mail: [email protected] R. Rodriguez-Aguilar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_58
753
754
58 Machine Learning Applications in the Supply …
On the other hand, in recent years Machine Learning (ML) has taken a lot of relevance in the world of technology. Its use have been gaining territory in different industries because it presents new ways to understand data, analyse it and make it more useful for the decision makers. ML is a mathematical tool with the capacity to analyze big volumes of data to identify useful information [27]. ML has the ability to learn from data, discover patterns and generate a base of experience (historical data) for progressively improving the performance on a specific task or to be more accurate in its calculations and predictions [21]. There are three types of ML techniques. Supervised learning or predictive learning, consist of a data set of independent variables that will predict or give an approximate value for a édependant variable. Unsupervised learning have as a goal to detect and identify patterns in the data. this patterns existence could be partial or unknown. Reinforcement learning don’t work with preset data but agents that are trained by trial and error events so in each event the machine can determine the best possible decision accordingly to its previous experience [3]. The core steps of ML are data preparation,learning and evaluation. the first step refers to all processes needed to select, clean and transform data to be usable and be prepared for the learning. The next step consists of all processes needed to select the Ml methods or algorithms and the training to tune and optimize the model. And the last step is to applied the test data set to the tuned model and assess the performance [22]. The application of ML in SC includes but is not limited to demand forecasting, scheduling of production plan, inventory management, logistics and delivery, procurement management, product replenishment and risk assessment [6]. In this paper we will analyze the applicability of ML techniques in different parts of the SC. These is not an exhaustive review of all ML existing algorithms but a review of which ones have been studied or implemented on a empiric or non-empiric way. The structure of the paper consists of Sect. 58.2, where we will show some of the studies that have been made. Section 58.3, have the criteria that was used to find information, the databases used for the search and other information. Section 58.4, gives insights of the results and Sect. 58.5 have the conclusions of the study.
58.2 Background Since SC has many processes we will use Fig. 58.1 as our reference. Additionally the arrows below the main parts of a SC represents the flow of information that is usually targeted by ML methods. A brief summary of the selected studies will be given in this section to have a big picture of the range that ML has for applicability. Carbonneau et al. [4] conducted a research to predict demand of a transformer producer company by using various ML techniques. They discovered that some algorithms were very useful for one type of transformers (10KA) but for the 16kVA and 25 kVA results were unreliable. They concluded that the results can significantly improve if a much larger data set is used, additionally they recommend to expand the parameter of the analysis.
58.2 Background
755
Fig. 58.1 Supply Chain and information flow
Moroff et al. [17] did an investigation regarding the application of ML in demand forecast. They compare ML with traditional methods for forecasting. What they discovered is that the Multilayer perceptron method showed best results. Regarding Random forest and Extreme gradient boosting methods their results were not that positive. Punia et al. [19] proposed a cross-temporal forecasting framework based on ML for a multi-channel retail SC. They used Ml algorithms to predict demand in the online and offline retail, they calculated and compared three planning horizons. The results were positive, the proposed ML based framework present less forecast errors than the direct forecasts. Malviya et al. [15] analyzed seven supervised learning algorithms to predict backorders with the objective of improving the control effectiveness of inventory management. The results were favorable, they concluded that the existence of back orders are relatively low compared with other parts of the SC. Basically back orders are the consequence of other processes not related with inventory management. Islam et al. [8] did a study to predict probable backorders of products using Distributed Random Forest and Gradient Boosting Machine. They design and prove their own model running it with real and ranged data. They found that performance increased twenty percent with the ranged data in both algorithms. Concluding that it is possible to predict backorder of products taking as parameters the inventory, lead time, sales and forecasted sales. Lauer et al. [12] applied Random trees and regression ML methods to predict the master production plan of a semiconductor company. Even when the results were positive what was really interesting is that they identify that the production plan presents the same instability behavior when changes are made to the demand highlighting the big influence it have on the planning. Loisel et al. [14] did a research about the use of ML to predict cold SC breaks and analyze seven studies. They analyze that ML could possibly give the capability to give a real time follow-up to temperature and predict with more precision the products temperature evolution during cold chain breaks. They pointed out that more researches most be done focused in predicting which will be the optimal locations for temperature sensors. Konovalenko and Ludwig [11] realize a study of ML applied to a pharmaceutical SC. The objective was to make the system learn to differentiate between a false positive or a true positive so when an alarm is triggered it would be easier to identify if further analysis is required or the operations can be resumed. Abbasi et al. [1] did a study in which they applied supervised learning algorithms to a blood SC with the objective of creating a decision maker agent that optimize the
756
58 Machine Learning Applications in the Supply …
inventory of blood in a hospital network. Their results showed that all the models were successful as optimization solutions. The Multilayer Perceptron model stand out as an efficient decision maker in the process for ordering and transshipment. Aboutorab et al. [2] proposed the usage of a Reinforcement learning algorithm for proactive risk identification of operational risks outside of the SC. They achieved to demonstrate that their approach was very effective and accurate on assisting risk managers in the identification of disruptive risk events versus the manual approach. Also they concluded that this approach can be extrapolated to other industries. Vanvuchelen et al. [23] conducted an experiment to solve joint replenishment problems in shipping of a SC using proximal policy optimization a reinforcement learning algorithm. They did a small-scale scenario and a larger one, in the small scenario the results reveal that the algorithm develops policies that were vary close to the optimal and outperform heuristic techniques. In the larger scenario without optimal policy the heuristic and ML techniques results were similar when the items had a similar cost structure but when the cost structure were dissimilar the Ml technique outperform the heuristic one. Meiners et al. [16] proposed the use of using artificial neural network algorithm in a two-stage batch control system in a manufacturing company. In a sintering process they measure the behavior of temperature of the old model verus the ML model and compared it to the ideal value. The conclusion was that the model using ML outperforms the old model by optimizing all the process and the predicted temperature had a better behavior. Nasurudeen Ahamed and Karthikeyan [18] proposed the use of ML integrated with an heuristic method for self driving vehicles in a SC. The objective was to improve the response time of self driving vehicles by finding the optimal path to their customer taking as parameters the path and the cost of the route. The results showed that their proposed method outperformed the existing heuristic method in service time. Rodríguez et al. [20] conducted an experiment to solve a Closed Loop SC problem by combining machine learning with fuzzy logic. The problem consisted of uncertainties that made production planning of a hospital laundry very difficult. Han and Zhang [5] build and evaluate a SC risk management model using machine learning neural networks to improve the efficiency of the SC management which has been ineffective on risk controlling. Hathikal et al. [7] developed a predictive model to estimate ocean import shipment lead times with machine learning. This model consider all the SC stakeholders hoping that all are beneficial with improved visibility and predictability. Kauten et al. [9] realized a study to explore the application of machine learning to create a donor retention model. The best approaches result to be Gradient Boosting and Random Forest and can be used by blood centers for outreach programs. Kim et al. [10] developed two machine learning action reward models, centralized and decentralized, to improve a two-stage serial SC with non-stationary customer demand. Both models outperform the traditional rolling horizon inventory control model in average inventory cost.
58.3 Methodology
757
Liu et al. [13] designed a novel hybrid quantum chaos neural network algorithm to solve an allocation problem of resources on a low-carbon SC. Their results were favorable, the model outperform traditional models concluding that the model was successful and that further studies should be done to incorporate sustainable SC parameters. Wang and Zhang [24] designed a machine learning model for power resource optimization in the SC. This model achieve to reduce unwanted resources involved in the development process Yalan and Wei [25] developed a machine learning model to optimize the time a user spend in a online transaction and improve data dissemination having in mind the security of the users data. This model prove to be more efficient on areas like customer satisfaction, error rate, data prediction and others. Yang [26] studied how machine learning can optimize a supplier management intelligent system of cross-border e-commerce platforms. This model optimize supplier credit evaluation with a difference matrix and cloud model method. Additionally helps to select suppliers and order allocation. Zarandi et al. [28] proposed a flexible fuzzy reinforcement learning algorithm to solve the problem of inventory control in SC by determining the amount to order for each retailer. Their results are better than the typical reinforcement learning techniques. Zhu et al. [29] conducted a study applying six machine learning methods to a finance SC to predict SMEs credit risk and determine which method have better results.They saw slight improvements in the accuracy of the predictions and for this type of business any percentage of improvement could transform in a lot of savings.
58.3 Methodology The main objective of our review is to study the application of ML in the SC, identify in which industries has been experimented or implemented, what type of algorithms have been used and if the results have been positive. The keywords that were used to search are as follows: “Machine learning” OR “Machine learning supply chain” OR “Machine learning shortage” OR “Machine learning backorder” OR “Learning systems” OR “Unsupervised learning” OR “Supervised learning” OR “Reinforcement Learning”. For definitions the scope of the search was any document that is related with machine learning and specifically for the review of SC application the scope was that at least the article must have machine learning and supply chain as keywords. The scope of the search was done in ScienceDirect and Springer databases for a total of 29 articles divided in 18 and 11 respectively (Fig. 58.2). Regarding the source, the study included 26 different Journals.
758
58 Machine Learning Applications in the Supply …
Fig. 58.2 Articles per database
58.4 Review Findings We identify thirteen industries that already are studying, testing or implementing different ML methods Fig. 58.3. Ten studies didn’t specified the industry and were catalogued as “any” assuming that the used technique can be applied to any industry SC. The most targeted SC process was demand forecasting with five cases. This is not a surprised since is one of the processes that more headaches give to companies because of its high volatility and the traditional forecasting methods are not flawless. Regarding Ml types, the most used was supervised learning with ten cases and reinforcement learning with four. This results make a lot o sense since demand forecast is more aligned with prediction like the supervised learning methods. Unsupervised learning is not commonly used as stand alone but as complementary method for the other two as a tool for classification of data, that is why it doesn’t figure.
58.5 Conclusions ML is relatively a new technology that nowadays is having a lot of attention, giving us a great chance to demonstrate its true potential. We noticed that many authors concluded in most cases that although their results were favorable they could be better, in other words they are promising. We are sure that ML could be much more useful in the SC than it is right now, its just a matter of keeping the good results in the experiments and most of all going deeper in its applications. We came to the conclusion that ML will be a great tool to use in any of the SC processes but some considerations must be taken for its use. Many of the studies coincide with the importance of having a good sources of data, these means to have access to big volumes but always having in mind that the quality of data is a priority. Data is fundamental as a trigger for the rest of the ML processes and most importantly for having useful information as an output. Even though in the review the most used type of ML was supervised learning its important to say that the three types, more or less, complements each other. With that in mind we highly recommend to experiment with all types and all the algorithms as possible because the context of each industry, company and databases are different.
58.5 Conclusions
Fig. 58.3 ML versus Technique versus SC process versus industry
759
760
58 Machine Learning Applications in the Supply …
Is worthy to take in consideration that the dependency between each part of the SC gains a lot of importance. The data of one process is used in the next process. Our recommendation before using ML is to have a controlled SC, the data analysis will be easier when having the know how of the operation and knowing more or less what to expect from the results of any experiment or implementation.
References 1. Abbasi B, Babaei T, Hosseinifard Z, Smith-Miles K, Dehghani M (2020) Predicting solutions of large-scale optimization problems via machine learning: a case study in blood supply chain management. Comput Operat Res 119:104941. https://doi.org/10.1016/j.cor.2020.104941 2. Aboutorab H, Hussain OK, Saberi M, Hussain FK (2022) A reinforcement learning-based framework for disruption risk identification in supply chains. Future Gener Comput Syst 126:110–122. https://doi.org/10.1016/j.future.2021.08.004 3. Bertolini M, Mezzogori D, Neroni M, Zammori F (2021) Machine learning for industrial applications: a comprehensive literature review. Expert Syst Appl 175:114820. https://doi.org/ 10.1016/j.eswa.2021.114820 4. Carbonneau R, Laframboise K, Vahidov R (2008) Application of machine learning techniques for supply chain demand forecasting. Eur J Operat Res 184(3):1140–1154. https://doi.org/10. 1016/j.ejor.2006.12.004 5. Han C, Zhang Q (2021) Optimization of supply chain efficiency management based on machine learning and neural network. Neural Comput Appl 33:1419–1433 6. Hartley JL, Sawaya WJ (2019) Tortoise, not the hare: digital transformation of supply chain business processes. Bus Horiz 62(6):707–715. (Digital Transformation Disruption). https:// doi.org/10.1016/j.bushor.2019.07.006 7. Hathikal S, Chung SH, Karczewski M (2020) Prediction of ocean import shipment lead time using machine learning methods. SN Appl Sci 2(7):1–20 8. Islam S, Amin SH (2020) Prediction of probable backorder scenarios in the supply chain using distributed random forest and gradient boosting machine learning techniques. Jo Big Data 7(1):1–22 9. Kauten C, Gupta A, Qin X, Richey G (2021) Predicting blood donors using machine learning techniques. Inf Syst Front 1–16 10. Kim CO, Kwon I-H, Baek J-G (2008) Asynchronous action-reward learning for nonstationary serial supply chain inventory control. Appl Intell 28(1):1–16 11. Konovalenko I, Ludwig A (2021) Comparison of machine learning classifiers: a case study of temperature alarms in a pharmaceutical supply chain. Inf Syst 100:101759 12. Lauer T, Legner S, Henke M (2019) Application of machine learning on plan instability in master production planning of a semiconductor supply chain. IFAC-PapersOnLine 52(13):1248– 1253. (9th IFAC Conference on Manufacturing Modelling, Management and Control MIM 2019). https://doi.org/10.1016/j.ifacol.2019.11.369 13. Liu X-H, Shan M-Y, Zhang L-H (2016) Low-carbon supply chain resources allocation based on quantum chaos neural network algorithm and learning effect. Nat Hazards 83(1):389–409 14. Loisel J, Duret S, CornuÃljols A, Cagnon D, Tardet M, Derens-Bertheau E, Laguerre O (2021) Cold chain break detection and analysis: can machine learning help? Trends in food science technology 112:391–399. https://doi.org/10.1016/j.tifs.2021.03.052 15. Malviya L, Chittora P, Chakrabarti P, Vyas RS, Poddar S (2021) Backorder prediction in the supply chain using machine learning. Proc, Mater Today. https://doi.org/10.1016/j.matpr.2020. 11.558 16. Meiners M, Mayr A, Thomsen M, Franke J (2020) Application of machine learning for product batch oriented control of production processes. Proc CIRP 93:431–436. (53rd CIRP Conference on Manufacturing Systems 2020). https://doi.org/10.1016/j.procir.2020.04.006
References
761
17. Moroff NU, Kurt E, Kamphues J (2021) Machine learning and statistics: a study for assessing innovative demand forecasting models. Proc Comput Sci 180:40–49. (Proceedings of the 2nd International Conference on Industry 4.0 and Smart Manufacturing (ISM 2020)). https://doi. org/10.1016/j.procs.2021.01.127 18. Nasurudeen Ahamed N, Karthikeyan P (2020) A reinforcement learning integrated in heuristic search method for self-driving vehicle using blockchain in supply chain management. Int J Intell Netw 1:92–101. https://doi.org/10.1016/j.ijin.2020.09.001 19. Punia S, Singh SP, Madaan JK (2020) A cross-temporal hierarchical framework and deep learning for supply chain forecasting. Comput Ind Eng 149:106796. https://doi.org/10.1016/j. cie.2020.106796 20. Rodríguez GG, Gonzalez-Cava JM, Pérez JAM (2020) An intelligent decision support system for production planning based on machine learning. J Intell Manuf 31(5):1257–1273 21. Sharma R, Kamble SS, Gunasekaran A, Kumar V, Kumar A (2020) A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. Comput Operat Res 119:104926 22. Tandon N, Tandon R (2019) Using machine learning to explain the heterogeneity of schizophrenia. Realizing the promise and avoiding the hype. Schizophrenia Res 214:70–75. (Machine Learning in Schizophrenia). https://doi.org/10.1016/j.schres.2019.08.032 23. Vanvuchelen N, Gijsbrechts J, Boute R (2020) Use of proximal policy optimization for the joint replenishment problem. Comput Ind 119:103239. https://doi.org/10.1016/j.compind. 2020.103239 24. Wang D, Zhang Y (2020) Implications for sustainability in supply chain management and the circular economy using machine learning model. Inf Syst e-Business Manage 1–13 25. Yalan Y, Wei T (2021) Deep logistic learning framework for e-commerce and supply chain management platform. Arab J Sci Eng 1–15 26. Yang Y (2020) Research on the optimization of the supplier intelligent management system for cross-border e-commerce platforms based on machine learning. Inf Syst e-Business Manage 18(4):851–870 27. Yang Y, Wu L (2021) Machine learning approaches to the unit commitment problem: current trends, emerging challenges, and new strategies. Electr J 34(1):106889 28. Zarandi MHF, Moosavi SV, Zarinbal M (2013) A fuzzy reinforcement learning algorithm for inventory control in supply chains. Int J Adv Manuf Technol 65(1–4):557–569 29. Zhu Y, Xie C, Wang G-J, Yan X-G (2017) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput Appl 28(1):41–50
Chapter 59
Machine Learning Applications for Demand Driven in Supply Chain: Literature Review Eric Octavio Mayoral Garinian, Tomas Eloy Salais Fierro, José Antonio Marmolejo Saucedo, and Roman Rodriguez Aguilar
59.1 Introduction Technological advances in computational science have contributed to the creation of modern tools and techniques that brings opportunity areas in solving many diverse and complex problems in a variety of topics applying new methodological perspectives. Machine Learning (ML) has recently become a popular and renowned technique in the so-called Industry 4.0, a currently tendency that seeks to remain and can be applied to attend an enormous variety of topics such as fraud detection, robotic, spam filtering, translation services, preventive healthcare, transportation, and many others. This has been achieved thanks to the increasingly exponential growth of information in form of data brought about electronic devices and Internet of Things (IoT) expanding tendency [3]. Recent research according to [1], predicted that the Internet of Things (IoT) data would be an integral part of big data by the year 2030 and the required quantity of
E. O. M. Garinian (B) · J. A. M. Saucedo Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin 498, Ciudad de México 03920, México e-mail: [email protected] J. A. M. Saucedo e-mail: [email protected] T. E. S. Fierro Facultad de Ingeniería Mecánica y Eléctrica, Universidad Autónoma de Nuevo Leon, Pedro de Alba S/N, Ciudad Universitaria, San Nicolás de los Garza, Nuevo Leon 66451, México R. R. Aguilar Facultad de Ciencias Económicas y Empresariales, Universidad Panamericana, Augusto Rodin 498, Ciudad de México 03920, México e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_59
763
764
59 Machine Learning Applications for Demand Driven …
intelligent agents would reach one trillion, thus making (IoT) data the most important part of Big Data, according to the forecast of HP. Data increase is the result of diverse devices employed at the periphery of industrial enterprise Supply Chain (SC) networks including embedded sensors, smartphones, computer systems, computerized devices, and digital machines to B2B data exchanges, all of this data creates new opportunities to extract more value [1, 17]. Demand Driven Material Requirements Planning (DDMRP) is an innovative methodology for managing the flow of materials and inventories, increasing the level of service, and reducing the levels of stock by inventory replenishment [10]. The purpose of this research is to contribute with an overview of the most representative current publications that can explain the advantages of implementing Machine Learning techniques for Demand Driven applications in the Supply Chain. This paper is divided as follows: Sect. 59.1 Introduction shows the introductory ideas and content of this article. Section 59.2 Literature Review (LR) presents the findings of Machine Learning and related tools, techniques, and technologies for attending Inventory and Supply chain related problems that can be studied within a Demand Driven perspective and the implementation cases and the results of related to investigations found. Section 59.3 details the methodology for the Literature Review that was followed to carry out the present study. Finally, Sect. 59.4 shows the conclusions and future research.
59.2 Literature Review (LR) This section, Literature Review (LR), is integrated with the gathered knowledge obtained from published articles of journals and books from reliable sources which include the topics we intend to address in this publication. Demand Driven and Machine Learning concepts are explained in this context and cases and studies where these techniques were applied are referred to explain and document current developments on these issues.
59.2.1 Problem Most demand forecasting systems offer limited insight to manufacturers, as they fail to capture contemporary market trends, the availability of products per seasonality, change customer choice and have an impact on the working capital, or cause the impact of bullwhip effect which can lead to inefficiencies, excessive inventory, stockouts, and back-orders, by low accuracy of demand forecasts using traditional models based on simple analytical techniques [9]. In matter of uncertainty dimension, [5] lists the four types of variances considered in literature, i.e., demand, supply, machine availability and equipment performance. Demand is recognized as the most difficult uncertainty source to control, and supply uncertainty in quantity and over time. Carbonneau et al. [4] express that the main challenge lies in the distortion of the demand signal (Bullwhip Effect—a phenomenon that could be explained in terms of chaos theory and the behavior of output variations
59.2 Literature Review (LR)
765
of the chaotic system) as it travels through the extended supply chain, increased with the relation of power, and currently with a more dynamic and agile E-business that impedes collaboration, but can be combated with the integration of systems, the sharing of valuable information by more collaborative and dyadic relationships to improve the accuracy of forecasts. Caridi and Cigolini [5] shows us the guidelines for approaching the problem of dimensioning, positioning, and managing safety stocks against demand uncertainty. To rapidly meet customer’s needs, and effectively manage procurement activities, companies have invested a conspicuous quantity of resources to implement (ERP) and/or advanced planning systems, affected by company´s databases consistency, and uncertainty that plagues production systems and especially the ones managed via (MRP) systems. The calculation of the safety stock and the reorder point becomes more complex when the demand and the lead-time are probabilistic and highly uncertain leading to overstocking and understocking [8].
59.2.2 Demand Driven and Its Role in the Supply Chain and Operations Management The logistics planning management processes aims at establishing the right product, in the right quantity, in the right condition, to the right place, at the right time, and at the right cost [2]. Wang and Chen [15] attends two important aspects, Demand planning (DP) usually treated as the optimization of internal resources (mathematical programming and game-theoretic modelling) and Sales forecasting (SF) is related to the impacts of external markets (statistical regression, time series, Machine Learning). Eruguz et al. [6] explains two approaches of addressing real world supply chains, usually correspond to general networks multi-echelon systems, consisting of several stages of procurement, manufacturing and transportation, namely the stochasticservice model (SSM) and the guaranteed-service model (GSM), this approaches optimize inventory decisions considering all stages in the supply chain simultaneously, representing a significant computational challenge to address the problem of allocating safety stocks from the external supplier to the final uncertain customer demand. Lo Schiavo et al. [10] examines the research results related to the safety stock of replenishment models, and the Theory of Constrains (TOC) replenishment method which provides an opportunity to convert inventory management from forecast driven to demand-driven and proposes a new formula for a new safety stock model. Demand Driven Forecasting method aims to sense demand signals with timely information about customer’s needs, then respond and shape real future customer’s demands using price, sales, contributions of promotional marketing activities, historical demand translated from the actual planning databases and other economic factors to predict and fulfill customer’s production orders immediately by utilizing advanced data mining techniques with the use of Big Data-driven fuzzy classifier based framework analytics to measure the success of marketing strategies as demand shifting or demand orchestration, by identifying patterns in the consumer behavior to provide superior value to the end users [9]. According to [11] coordination of the material
766
59 Machine Learning Applications for Demand Driven …
and information flow across the entire supply chain creates a win–win situation for all players in the supply chain.
59.2.3 Machine Learning (ML), Tools, Techniques, and Technologies Machine Learning as defined by Philip Chen and Zhang [12] is an important subject and has become a new research frontier of artificial intelligence which is aimed to design algorithms that allow computers to evolve behaviors based on empirical data, characterized to discover knowledge, and make intelligent decisions automatically. Georgios [7] lists in seven categories, the forecasting machine algorithms: Baseline; Auto regression (AR); Exponential Smoothing; Linear; Non-Linear; Ensemble; Deep Learning. Priore et al. [13] lists the main (ML) techniques: inductive learning; artificial neural networks; case-based reasoning; support vector machines; and reinforcement learning. Pereira and Frazzon [11] corroborates that the association of machine learning and simulation approaches assist supply chain decision-making. It is recommended to correctly identify the key variables to improve predictions, then explore which technique can suit best for each case of each nature. In real world environment practice, experts apply two or more tools and methods for demand forecasting and decision making. The selection of a more appropriate method is one of the most challenging tasks in data analysis, it may depend on the degree of mathematical complexity of each problem, accuracy, sensitivity, and specificities. These methods used to forecast have been categorized into three categories: Traditional time series methods, machine learning methods, and fuzzy machine learning methods are cited and compared by Carbonneau et al. [4], Kumar et al. [9], Philip Chen and Zhang [12], Salais-Fierro et al. [14], Wang and Chen [15], Wenzel et al. [16], Zougagh et al. [3, 18].
59.2.4 Implementation and Results of Machine Learning Cases and Applications This section presents examples of selective cases and applications and explores new solutions involving the smart use of the data generated by technological devices brings extraordinary opportunities in different fields pointing to (ML) implementations that can overcome the limited capabilities of traditional data analysis, attending diverse fields problems by adopting Demand-Driven theory in a Supply Chain and Operations Management environment and their results. Bas et al. [3] carried out a predictive analysis based on comparing different most recent supervised ML models with similar results, for training, validation and testing subset and learning algorithm processes, which does the tasks for classification data using techniques (SVM), (ANN) and (XGBoost), a cluster analysis followed by and exploratory data analysis, with the adoption of demand-driven to help explain a variety of most influential factors and aspects for individual potential consumers to turn into potential buyers that may incline to investment decision of the adoption of
59.3 Methodology for the Literature Review
767
new technology Electrical Vehicles (EV). Then identifies any characteristics common to the misclassification of individuals from two groups which can severely impact expensive marketing campaigns, instead increasing their chances of success. Kumar et al. [9] performs a contextual big data-driven investigation exercise as an application of demand shaping, demand sending and accurate demand forecasting on a TV manufacturing supply chain data set to optimize the ROI and profitability. Datasets results are compared using predictive value of demand analysis by (ARIMA), (SVM), (ANN), random forest, (MLR) and fuzzy neural networks methods. Carbonneau et al. [4] compares ML methods and training and testing sets experiment results of actual versus forecast demand using Foundries data provided by Statistics Canada and simulations of an extended supply chain. The overall performance of (RNN), (SVM), (NN), and (MLR) was significantly better than of the simpler techniques including moving average, naive, and trend methods.
59.3 Methodology for the Literature Review In this Literature Review (LR) is the main method of synthesis that aimed for identifying, evaluating, and interpreting the best quality research studies on a specific topic, research question or phenomenon of interest [2]. The scope of this research was to discover the applications of Demand Driven philosophy using Machine Learning for solving Supply Chain problems. Following [2] steps in methodological research, there were selected the most relevant scientific publications divided in three phases: Searching phase, selecting phase, and analyzing phase.
59.3.1 LR Searching Phase 1 The search for scientific publications was made in the major online scientific databases where most scientific publications are published and indexed in peerreviewed scientific journals: Science Direct, Elsevier Scopus, Springer Link, Inventio and Google Academic. The query for this search was performed during September 2021 and was defined by the combination of the main keywords: “Demand Driven”, “Machine Learning”, “Supply Chain”, “Data Science” and “Big Data”. Query results are listed in Table 59.1 and Fig. 59.1.
59.3.2 LR Selecting Phase 2 After Table 59.1 results, a process of individual selective choice of articles was carried out into the limits of framework of time. The selection screening criteria did not require a further refined search filter due to the limited number of papers
768
59 Machine Learning Applications for Demand Driven …
Table 59.1 Query search results Key word
Science Elsevier Springer Invention Google direct scopus link academic
“Demand driven”
7405
21,837
137,000
“Machine learning”
160,322 293,590 266,744
306,256
3,440,000
“Supply chain”
99,535
114,738 109,056
451,219
1,950,000
“Demand driven” AND “Machine learning”
295
23
531
343
4970
“Demand driven” AND “Machine learning” AND “Supply chain”
77
1
186
90
1240
(“Machine learning” OR “Big Data” OR “Data Science”) AND “Demand Driven”
486
46
910
30
8920
3
344
23
2,580
(“Machine Learning” OR “Big Data” OR 150 “Data Science”) AND “Demand Driven” AND “Supply Chain”
3308
10,434
Fig. 59.1 Distribution of publications for science direct journal query keywords research problem over the years
resulting from the specific set of search parameters. All articles found were written in the English language. A prevailing criterion was to articles indexed in scientific databases. The search was filtered to recent articles between the years 2015 and 2017, a period of 7 years, not limited to article type, publication title, subject areas, or access type. Main aspects were browsed as title, abstract and conclusions, looking for the valuable content related to the keywords and criteria that best suited for this Literature review (LR) context. Finally, those publications that did not meet with the research criteria or did not address the research problem were excluded.
59.4 Conclusions and Future Research
769
Additional publications were found out of bibliographical recommendations into the same articles lecture and further queries.
59.3.3 LR Analyzing Phase 3 Once the most relevant articles were selected for conducting this study by considering the screening criteria referring to those that best matched with attending the Supply Chain Inventory Management problems and the use of Machine Learning in conjunction with Demand Driven methodology, passed to the analyzing phase, where read in Full-Text, highlighted those texts of best interests, analyzed for their information, structured and referenced on meaningful content within the context of this Literature Review (LR).The final selection of articles includes 18 bibliographic references which provides a substantial sample of the existing literature.
59.4 Conclusions and Future Research In this paper, we reviewed the Demand Driven methodology for inventory management in a context of supply chain and operations management under the approach of implementation of Machine Learning, a Data Science technique, and the support of Big Data techniques in case that a large volume of Data is analyzed. This topic has gained popularity over the recent years for many researchers, governments, and organizations, as well, an increase in the number of publications on these topics can be appreciated over the years and has a large potential on applications for optimizing and finding new opportunities in inventory management. Considering the presented literature review, the current study offers relevant insights regarding different contexts: 1. For the Supply Chain context, a substantial increase in devices that can collect Data around the network for multiple purposes, leading to new opportunities and challenges that involve industry 4.0 capabilities, the adoption of strategies to obtain more informed and data oriented for decision making will lead to improvements in business competences and effective results in companies’ profitability oriented to fulfill markets and customers’ needs and expectations. Logistics planning and coordination have opportunities when large volumes of data can be processed and analyzed in real time turning into valuable information for decision making and converging every link interaction into a more agile and cooperative environment improving the flow of information and resources. 2. For the Demand Driven context, will continue to develop to its full potential with help of evolving technologies mentioned in this study, by applying Machine Learning with specialized computing algorithms, inventories can be managed to attend supply and demand uncertainties and control bullwhip effect, and use
770
59 Machine Learning Applications for Demand Driven …
more efficient methodologies for demand forecasting and planning. Computing technologies are emerging and developing in all fields of knowledge, in this study we found success cases where implementations for demand-oriented inventory control can be achieved thanks to these new technologies. 3. For the Machine Learning context, ML techniques provide best results, larger improvements, and more accurate forecasts of distorted demand signals than the simpler traditional forecasting techniques [4]. There is an increasingly list of mathematical algorithms and computer systems that must be selectively chosen or developed for each problem to be solved, this becomes a challenge but also a possible advantage for those who implement it. 4. For the Combined context, new technologies continue to develop by integrating Machine Learning, Big Data or IoT into modern systems that can take advantage of artificial intelligence to discover knowledge and make intelligent decisions automatically, such trends will transform the interactions between all stakeholders within the echelons of the supply chain, allowing an agile flow of materials, inventories, and information. Future research may focus on showing additional practical cases in companies that implement Demand-Driven and their results using different Machine Learning (ML) techniques to address this demand forecast and their accuracy results when compared to real demand. New mathematical models can be developed to define exactly which inputs are required and run across a new developed Machine Learning (ML) algorithms to have as an output an accurate Demand Driven forecast. Additional factors can be involved for model that can predict reality scenarios considering risks and variability in the analysis. New articles with new findings and applications compiling the most recent Literature Review will aim to serve as a foundation study to help other researchers contribute to the research gap expanding Machine Learning (ML) applicability in other important domains such as Supply Chain Management.
59.4.1 Declaration of Competing Interests The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References 1. Addo-Tenkorang R, Helo PT (2016) Big data applications in operations/supplychain management: a literature review. Comput Indust Eng 101:528–543. Retrieved from https://www.sciencedirect.com/science/article/pii/S0360835216303631, https://doi.org/ 10.1016/j.cie.2016.09.023
References
771
2. Barros J, Cortez P, Carvalho MS (2021) A systematic literature review about dimensioning safety stock under uncertainties and risks in the procurement process. Operations Res Perspectives 8:100192. Retrieved from https://www.sciencedirect.com/science/article/pii/S22147160 21000142, https://doi.org/10.1016/j.orp.2021.100192 3. Bas J, Cirillo C, Cherchi E (2021) Classification of potential electric vehicle purchasers: a machine learning approach. Technol Forecasting Soc Change 168:120759. Retrieved from https://www.sciencedirect.com/science/article/pii/S0040162521001918, https://doi.org/ 10.1016/j.techfore.2021.120759 4. Carbonneau R, Laframboise K, Vahidov R (2008) Application of machine learning techniques for supply chain demand forecasting. European J Operational Res 184(3):1140–1154. Retrieved from https://www.sciencedirect.com/science/article/pii/S0377221706012057, https://doi.org/ 10.1016/j.ejor.2006.12.004 5. Caridi M, Cigolini R (2002) Improving materials management effectiveness: a step towards agile enterprise. Int J Phys Distrib Logistics Managem 32(7):556–576. Retrieved from www. scopus.com, https://doi.org/10.1108/09600030210442586 6. Eruguz AS, Sahin E, Jemai Z, Dallery Y (2016) A comprehensive survey of guaranteed-service models for multi-echelon inventory optimization. Int J Prod Econom 172:110–125. Retrieved from https://www.sciencedirect.com/science/article/pii/S0925527315005162, https://doi.org/ 10.1016/j.ijpe.2015.11.017 7. Georgios M (2021) Machine learning applications in supply chain management 8. Ghafour KM (2018) Optimising safety stocks and reorder points when the demand and the lead-time are probabilistic in cement manufacturing. Int J Procurem Managem 11(3):387–398. Retrieved from https://www.inderscienceonline.com/doi/abs/https://doi.org/ 10.1504/IJPM.2018.091672, https://doi.org/10.1504/IJPM.2018.091672 9. Kumar A, Shankar R, Aljohani NR (2020) A big data driven framework for demanddriven forecasting with effects of marketing-mix variables. Indust Market Managem 90:493– 507. Retrieved from https://www.sciencedirect.com/science/article/pii/S0019850118306606, https://doi.org/10.1016/j.indmarman.2019.05.003 10. Lo Schiavo A, Lee C-J, Rim S-C (2019) A mathematical safety stock model for ddmrp inventory replenishment. Mathem Problems Eng 2019, 10. Retrieved from https://www.hindawi.com/jou rnals/mpe/2019/6496309/, https://doi.org/10.1155/2019/6496309 11. Pereira, M. M., & Frazzon, E. M. (2021). A data-driven approach to adaptive synchronization of demand and supply in omni-channel retail supply chains. International Journal of Information Management, 57, 102165. Retrieved from https://www.sciencedirect.com/science/article/pii/ S026840122030205X, https://doi.org/10.1016/j.ijinfomgt.2020.102165 12. Philip Chen C, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inform Sci 275:314–347. Retrieved from https://www.sciencedi rect.com/science/article/pii/S0020025514000346, https://doi.org/10.1016/j.ins.2014.01.015 13. Priore P, Ponte B, Rosillo R, de la Fuente D (2019) Applying machine learning to the dynamic selection of replenishment policies in fast-changing supply chain environments. Int J Prod Res 57(11):3663–3677. Retrieved from https://doi.org/10.1080/00207543.2018.1552369, https:// doi.org/10.1080/00207543.2018.1552369 14. Salais-Fierro TE, Saucedo-Martinez JA, Rodriguez-Aguilar R, Vela-Haro JM (2020) Demand prediction using a soft-computing approach: a case study of automotive industry. Appl Sci 10(3). Retrieved from https://www.mdpi.com/2076-3417/10/3/829, https://doi.org/10.3390/app100 30829 15. Wang C-H, Chen T-Y (2020) Combining biased regression with machine learning to conduct supply chain forecasting and analytics for printing circuit board. Int J Syst Sci: Operations Logistics 0(0):1–12. Retrieved from https://doi.org/10.1080/23302674.2020.1859157, https:// doi.org/10.1080/23302674.2020.1859157 16. Wenzel H, Smit D, Sardesai S (2019) A literature review on machine learning in supply chain management. In: Hamburg international conference of logistics (hicl) 2019, pp 413–441. Retrieved from http://hdl.handle.net/11420/3742 , https://doi.org/10.15480/882.2478
772
59 Machine Learning Applications for Demand Driven …
17. Zhong RY, Newman ST, Huang GQ, Lan S (2016) Big data for supply chain management in the service and manufacturing sectors: challenges, opportunities, and future perspectives. Comput Indust Eng 101:572–591. Retrieved from https://www.sciencedirect.com/science/art icle/pii/S0360835216302388, https://doi.org/10.1016/j.cie.2016.07.013 18. Zougagh N, Charkaoui A, Echchatbi A (2020) Prediction models of demand in supply chain. Proc Comput Sci 177:462–467. Retrieved from https://www.sciencedirect.com/science/ article/pii/S1877050920323322 (The 11th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2020)/The 10th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH2020)/Affiliated Workshops) https://doi.org/10.1016/j.procs.2020.10.063
Chapter 60
Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) and Fault Prediction with Real-Time Condition Monitoring in Manufacturing 4.0 ˙ Ceren Arslan Kazan, Halil Ibrahim Koruca, and Samia Chehbi Gamoura
60.1 Introduction Production is vital to both the economy and technology. To better meet the varying demands of customers in an increasingly competitive environment, it is necessary to meet the requirements of the entire supply chain effectively and efficiently by shortening production times, analyzing and reducing costs properly, and also by minimizing the production errors. In an increasingly competitive environment, it is not enough for companies to simply produce, they must be able to meet customer expectations correspondingly in the shortest possible time, with high quantities and efficiency rates. The technological advances associated with the advent of computer numerically controlled (CNC) machines in increasing use and constantly changing markets for more complex specific products and more complex production processes represent a real challenge in the machining sector today. Machining is a production method that removes material in layers by turning or milling processes using cutting tools in order to achieve the desired properties (surface, size, shape) of the material. In short, it is the transformation of the material into the desired shape by means of computer-controlled cutting tools. Machining is a very advantageous production method because the desired properties of a material are calculated in digital environments and the desired products with precise and correct dimensions are obtained digitally. However, compared to other production C. Arslan Kazan (B) · H. ˙I. Koruca Department of Industrial Engineering, Suleyman Demirel University, 32260 Isparta, Turkey e-mail: [email protected] H. ˙I. Koruca e-mail: [email protected] S. Chehbi Gamoura EM Strasbourg Business School, Strasbourg University, HuManiS (UR 7308), Strasbourg, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. J. Hemanth et al., Smart Applications with Advanced Machine Learning and Human-Centred Problem Design, Engineering Cyber-Physical Systems and Critical Infrastructures 1, https://doi.org/10.1007/978-3-031-09753-9_60
773
774
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
methods, it is expensive and material waste is quite high. However, companies always want to meet customers’ demands with high efficiency and low cost. For this reason, they try to develop various quality improvement policies and improve production control methods in machining. In order to remain competitive, they also try to prevent and eliminate errors by taking advantage of the experience of errors that have occurred before through error analysis methods. Regarding the subject of the article and the research done, the remaining content is arranged as follows: In the second part, machine learning algorithm applications made with FPGA are mentioned and general literature review is explained. Following this, general information about Artificial Neural Networks and FPGAs is given and the findings obtained from the applications are given and a general discussion about them is given in the third section. Subsequently, the final chapter and the article ended with explanations of the results and some possible future studies. Failure Mode Effects Analysis (FMEA) is a preventive risk management method that covers, among other actions, error detection and aversion in automated processes. FMEA systematically predicts potential failures and identifies problems and errors that may have a relevant impact, including human errors, equipment problems, and personnel training issues, object misplacements, communication difficulties and design problems. For these reasons, FMEA is one of the most relevant error analysis methods used in machining today. In this study, we conducted the analysis of the type of error and the effects in the machining unit of the industrial kitchen products manufacturing workshop. First, we performed the analysis by determining these errors to achieve the reduction of induced costs. To do this, we used the maps of the computers placed in the machines of the production line. During the process, it was found that the machines do not process parts when the operators sometimes leave the machines running. As a precaution, the cards were also used to check if the machines were working properly. The data collected instantly via these cards was listed on the Programmable Logic Controller (PLC) screen. Then they were transformed into a more formal and understandable form by developing an algorithm with the C# language. Thus, these data in this continuous flow format could be recorded with communication protocols and the unit cost data was calculated correctly as a result, depending on the periods taken. With the ERP interface on the screens placed near the CNC machines and used by the operators who collect the data transmitted by the PLC cards, the process of starting the workstation is done as soon as the operation is started by the opera-tors and ends when the process is finished. In case of faulty production, the amount of faulty product and the reason for faulty production are easily detected. At this stage, the error analysis is carried out with the FMEA technique based on the data transmitted during the quality control operation. The types of errors are also evaluated using the connected ERP program. The conclusive study as a result of this approach that we have conducted is that of preventing errors before they occur. The factors affecting the errors of the personnel as well as the total production costs were particularly studied before and after. As a last step, we proceeded to the comparison between the calculated unit cost database and the data obtained from the PLC cards.
60.2 Related Work
775
60.2 Related Work In order to better respond to customer demands that vary in an increasingly competitive environment, it is necessary to fulfill the requirements of supply chain management effectively and efficiently, to shorten production times, to analyze and reduce costs correctly, and to minimize production errors. In the literature, it is seen that there are many studies on the application of the Fault Mode Effects Analysis (FMEA) method in the manufacturing industry. In the study of Çevik and Aran [1], an application of the FMEA was carried out in a company that manufactures pistons. Ten types of errors were detected in the company and it the observed RPN value was above 100 in 3 of them. It was concluded that it would be appropriate to deal with the first 3 errors with high ROS values in the first stage, and the remaining 7 errors in the second stage. After the corrective & preventive actions determined for the first 3 errors, the ROS values were reduced below 100. As a result of the measures taken, a 47.4% decrease was observed in customer complaints [1]. Ghani et al. [2] investigated the error effects caused by the loading and unloading effects of the coated carbide cutting edge during milling. The milling parameters that have been observed to influence the failure of coated carbide tools are cutting speed, feed rate, depth of cut, and coolant application. Cutting speed and depth of cut were identified as the main factors responsible for failure and fatigue of coated carbide tools during milling of titanium alloy. In the study, it was observed that carbide tools were broken due to brittleness. Carbide is a brittle material by nature, so if this type of carbide cutting tool is to be used, then the cutting speed should be adjusted to 120 m/min in order to extend the tool life. It has been also suggested that better carbide cutting tool properties would be achieved by using ultra-fine carbide cutting tool grades in combination with advanced sintering processes such as nitriding and nitriding, and new coatings such as HT-Ti(C, N)/k-Al2O3, TiSiCN and CrSiCN [2]. The study of Mariajayaprakash and Senthilvelan [3], focuses on the power outage that occurs in a sugar factory in India. The biggest problem in the sugar factory is the frequent failure of the boiler, which leads to loss of production. Faults often occur in the screw conveyor of the boiler fuel supply system, rarely in the boiler grille. The main purpose of the study is to detect the malfunctions that occur frequently in the boiler and to minimize these malfunctions. The most important parameters that cause boiler failures are determined by applying the Failure Mode and Effect Analysis. Results obtained are as follow: (1) It has been shown that the quality of the drum feeder during the process can be improved with Taguchi’s Method at the lowest possible cost. (2) The Shikawa (Cause and Effect) Diagram has been found to be very effective in listing all the possible causes affecting the quality of the drum feeder during operation. (3) It has been stated that the fuel parameter significantly affects the quality of the drum feeder during the process. (4) Optimum fuel type, fuel moisture, engine load and silo level could be predicted. (5) The predicted error range of drum failure during operation is 1.37 < 1.86 < 2.35. The optimum level of boiler components failure was estimated with a 95% confidence interval [3].
776
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
In the study of Yee et al. [4], various statistical tools and techniques such as histogram, control charts, pareto charts, flow charts, and cause-effect charts were applied in or-der to increase process performance and improve product quality in a car-door glass manufacturing company. The results show that process behavior and capability have a strong relation-ship to each other, as long as the property of a process is key in determining the capability of the process. While the most problematic process was the tempering process, the rear glass was chosen as the product to be analyzed. It was concluded that the tempering process has a 9.49 percent rejection rate compared to the cutting process (5.91%) and the thermal printing process (3.38%). Separate solutions are offered for each problem such as vocational training, maintenance, pre-process control and adjustment for problems such as incorrect bending speed, poor quality work preparations, personnel with lack of experience, and irregular heating [4]. In his study, Ün˘gan [5] found error types for raw material and auxiliary material acceptance, incoming quality control, storage and interim shipments and packaging processes, which are the sub-processes of the production processes of stamps, springs and derivative products of a company operating in the automotive sector and operating in the metal processing field. It was aimed in the study to guide the practitioners to take precautions before they encounter such errors. In the study, measures to prevent possible error types and their effects were determined. The findings of the study suggest an improvement of 52.6% in raw material and auxiliary material acceptance, incoming quality control, stocking and intermediate shipments, and 54.3% in the packaging process [5]. Mzougui and Felsoufi [6] propose a method that combines the Advantages of Expected Fault Detection (AFD) and FMEA methods. With this approach, the fault detection process is improved and FMEA analysis has been applied despite the lack of information regarding cost. The proposed approach integrates the concept of focal points, failure hypothesis and scenarios. Sustainability and cost were used as additional factors for improvement in the prioritization and classification of errors. Considering that the Traditional Risk Priority Score (RPN) has been heavily criticized in many studies, the use of weighting factors was suggested in this study and their introduction improved the accuracy and sensitivity of the analysis. The use of anticipated failure detection (AFD) allows potential failures to be identified and helps engineers assess risks. As two additional factors, ensuring cost and sustainability, the failure decision has been improved and the effectiveness of the demolition analysis and action plans has been increased. However, weighting on the Risk Priority Number (RPN) calculation has reduced the values obtained and the threshold values need to be adapted to be more suitable for criticality analysis. In the traditional Risk Priority Score (RPN) calculation, severity, probability, detection values are multiplied, but in this article, weighted values are also included in the calculation. The weighted form of severity, probability, and determination has not been taken into account [6]. In their study, Hung and Chen [7] proposed a new AIN (Aluminum nitride) ceramic substrate metallization process using the laser-coated copper (LPC) technique instead of the current ceramic substrate production method. The root causes of process failure have been considered in advance to prevent common process failure in the research and development phase of LPC. In this way, a defect analysis model was created
60.2 Related Work
777
and focused on improving the root causes in order to reduce the cost and shorten the time required to solve product process problems effectively. The traditional AlN metallization was compared with the laser radiation-coated copper method and the application of three different methods were discussed under the name of selective chemical copper, selective electroless nickel and electroless gold methods: These applications are process simplification, precision design, easy production of threedimensional extrusions and green manufacturing. Many types of defects have been detected during the AIN metallization process of electroless copper plating. It has been determined that factors such as top coating and pollution cause scrap losses. As a result of the FMEA analysis used to prevent these errors, better results were obtained by increasing the rinsing time with water and using an alcohol solution to remove the particulate contaminant [7]. In the study of Lo et al. [8], FMEA was applied to increase the reliability of machine tools, to identify error types and to prevent risk. Traditional FMEA method and hybrid model results were compared. The hybrid model consists of FMEA-MCGDM approaches. FMEA error mode and effects analysis is defined as MCGDM multi-criteria group decision making. First, the R-BWM (weighted best– worst method) was used to calculate the criteria. It is expressed as a simple and effective way of determining ROS values. Secondly, the R-TOPSIS method was applied to rank the error types. As a result of the application of the method, it has been stated that the noise and vibration problem is the most urgent element requiring improvement. It is thought that the results obtained are useful for the development and improvement of product design plans and defect prevention strategies. Although it overcomes some of the limitations of the original FMEA analysis, it has been stated that it also has some limitations that need to be addressed [8]. In the study of Da˘gcı [9], FMEA was applied for the process from raw material procurement to product shipment in order to prevent errors in an enterprise operating in the machining industry. It has been found that eight separate processes directly affect the production, that are purchasing, raw material acceptance, raw material delivery, machine processing/manufacturing, leveling, measurement control, external process and shipping. When the number of scrap products according to the types of defects in 2018 and the number of scrap products according to the types of defects occurred in the first four months of 2019 are compared, it was found that incorrect NC program is given to the machine 47.05% in error type, incorrect information is given to the operator 25% in error type, thickness is out of tolerance (thinner) 87.01%, large hole diameter is 87.09% in error type, hole position is incorrect A decrease of 42.85% was observed in the error type of error, such as burring on the part, 71.42% in the error type of the part, 62.5% in the error type of the size decrease, and 100% reduction in the error type of running the old revision program [9]. The study of Sanchez et al. [10] is aimed to explore the adoption of Activity Based Costing (ABC) system to provide relevant cost information to businesses in the business of creating and selling internet of things (IoT) based products. As a result of this study, the cost of the smart wind generator was found US$ 6091.76. It was also found that the cost would be 7.9% lower if indirect costs were not allocated to each product [10].
778
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
An AHP-FMEA methodology was applied by Li et al. [11] to analyze the failure causes of floating offshore wind turbines. Fifteen fault scenarios have been identified in the study. Corrective and preventive actions have been identified to cut fault propagation paths and reduce fault effects. The wind turbine (with an RPN of 0.44) is the most critical system of the floating off-shore wind turbine, followed by the mooring system (0.18), the floating foundation (0.17), the tower and the transition pieces (0.16). Then broken mooring lines and 14 other failure modes are identified as risky failures. The findings of the study highlight that the RPN derived by different experts is inconsistent, that is, FMEAs are subjective methods and personal judgments influence the results; the choice of experts is rational because their backgrounds are different; and the accuracy of the AHP-FMEA results has been confirmed by the results of conventional methods [11]. Filz et al. [12] present a data-driven FMEA methodology by using deep learning models on historical and operational data during the use of industrial investment goods. This methodology is examined in a case study from the aviation industry. Since the probability of failure estimation alone is not the only feature associated with the expected risk, FMEA was applied to the employees to evaluate the error types in more detail. In addition to the time-saving advantages, risk assessment is no longer subjective, as every employee will achieve the same results thanks to this developed methodology. Due to the high accuracy of data-driven error estimation, components are replaced only when necessary, so they can be used for a longer period of time. With the help of the related methodology, failures of parts or components can be predicted by using instantaneous RPN values [12].
60.3 Method and Material In this study, the analysis of the type of error and the effects was conducted in the machining unit of an industrial kitchen products manufacturing workshop. The study was carried out on a real-time production environment in the machining workshop by the monitoring of computer-connected cards placed on 12 machines. The errors that occur in the machining unit of the enterprise that produces industrial type kitchen products are entered into the ERP program in the kiosks located in the production area by the operators. In order to control the operators, PLC cards are placed in 12 machines in the unit, and the working times and downtimes of the instant machines are obtained. The data entered by the operators is compared with the data received from the PLC cards, and the process is taken under control without being dependent on the user. Since the operators know that they are controlled, they have become able to perform their jobs with high performance. 12 CNC machines are used in the production workshop. The data obtained from the PLC cards and the data entered by the operators in the ERP program depending on the work order are compared. The total production cost before and after the application was compared.
60.3 Method and Material
779
60.3.1 Failure Mode Effects Analysis (FMEA) Method FMEA is an analytical method that consists of determining known or potential errors in a product or process using previous experiences or technologies and planning to prevent these errors [13, 14]. It evaluates the effects of the existing or possible errors on the system, defines the actions that reduce or prevents the occurrence of these errors, re-evaluates the possibility of errors by the implementation of the actions, and documents the system, taking into account the experience in a system [15]. The basic concepts used in FMEA applications are given below: Function: It focuses on what the purpose of the product or process is. In other words, the function is a set of goals that the product or process is expected to meet. Customer: It is the end user who will be affected by the product or service. The customer can be any person, department or business. Failure: It happens if a function cannot be completed as planned. Failure Mode: It is the failure of a system to perform the function as desired with one hundred percent success, or to perform it as required, depending on the complexity of a system. Failure Effect: It is the negative feedback experienced by the customer if the function in the system does not meet the expected goals. Failure Reason: It is any factor that prevents a function within the system from meeting expectations. Available Controls: These are the actions taken to maintain the existing functioning of the system and to reduce or eliminate the risk associated with the possible cause of failure. FMEA Element: It refers to the system considered within the scope of FMEA. Severity (S): It is the degree of the effect of the error in the system on the customer. Occurrence (O): It is the degree value, corresponding to the probability that the cause of the error causes the error type. Detection (D): It is the degree to which a possible error is prevented before it reaches the customer, through the existing controls, carried out in the system. RiskPriorityNumber(RPN) = Severity(S)xOccurrence(O)xDetection(D) (60.1) Risk Priority Number (RPN): Severity (S), Occurrence (O) and Detection (D) are obtained by multiplying the degree values. With RPN, errors in the system are ranked according to risk priorities and corrective and preventive actions are implemented in line with this priority. Criticality: It is the product of the probability of the error occurring and detecting the error before it reaches the customer. It is used to prioritize faults that require additional quality planning. Critical Characteristics: Characteristics that can affect legal regulation or product/service safety. In general, critical characteristics are determined by the following factors [16, 17].
780
60.3.1.1
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
Objectives of Failure Mode Effects Analysis
It is possible to list the objectives of the FMEA technique as follows: • To decide the types of failures that may occur in the product or process, their effects and criticality levels. • To prevent the occurrence of potential errors that may occur in the product or process by predetermining them. • Analyzing a product’s design characteristics in conjunction with planned manufacturing and assembly processes to ensure that the final product meets customer needs and expectations. • When potential types of defects are identified, to take corrective action to eliminate them or continually reduce their potential for their occurrence, thereby improving the product. • Documenting the reasons and principles on which the system is based for the assembly or manufacturing process [18, 19]. 60.3.1.2
Failure Modes in Machining Processes
The machining process is shaping of the parts with a certain volume, dimensions and surface quality by cutting (chipping) with certain tools. In machining, the cutting tool processes the work piece with a certain rotational speed and feed rate. It occurs by creating tension in a certain part of the work piece with the repetition of the work piece. In other words, the chip removal process is a mixed process that is established with tools as a result of mechanical effects of the work piece (deformation, friction, heat generation, chip shrinkage, breakage, deformation, hardening of the surface of the work piece and wear of the cutting tool). Mechanical energy is generally used in the machining process. In some new manufacturing techniques (plunge erosion, laser cutter, water jet, etc.), chemical, electricity and water energy are also used [20]. Machining methods and the cutting tools used in these methods show a wide variety. Some of the machining methods are; Shaper and planning, Turning, Drilling, Milling, Crochet/Broaching, Reaming, Grinding, Sawing and leveling, Boronizing [21]. After choosing the machining method in planning, appropriate cutting speeds and feed rates should be determined for the tool and work piece. Situations such as incorrect selection, use and adjustment of these factors carry risks such as tool breakage, employee injury, machine failure, faulty operation, part deterioration, rework, and delay in delivery [22]. There is a great need for operators as it is desired to access information through work order tracking. However, if the operators do not enter the work start times and the end times of the work into the system correctly, it is not possible to reach any of the requested information. For this reason, thanks to the PLC cards, placed on the CNC benches in the production units, machine operation data can be obtained without being dependent on the operators. In this case, both the operator can be controlled, and the working times of the machines can be reached accurately.
60.3 Method and Material
781
60.3.2 Programmable Logic Controller (PLC) PLCs receive information from the sensors in the field and process this information in accordance with the program on it. Thus, they can control all devices in the field. PLC is industrial type computers that are equipped with INPUT (input) and OUTPUT (output) units to receive and write information from the devices in the field, and communication units to communicate with the de-vices working in the field that can work in harmony with SCADA (remote control and monitoring system) [23]. It is a combination of a number of devices, designed to replace command control elements such as PLCs, auxiliary relays, time relays and counters. In these systems, any operation such as counting, timing, sorting is provided by software. Therefore, very fast and reliable results are achieved by using PLC for all kinds of advanced automation problems.
60.3.3 Kitchen Equipments Manufacturing Company The product range of the company, which is the subject of this study, is cookers, open buffet, refrigerator, ice cream machine, dishwasher, cold room, furniture, laundry, and other kitchen products (workbench, washing bench, hood, wall shelf, service shelf, tray transport trolleys, floor grid, material cabinet, premix counters, temperature cabinet, etc.). It can produce all the products that should be in the kitchen of a hotel within the company. It provides services to leading companies with its 330 staff on an area of 76,000 m2. There are 12 CNC machines in the machining unit. These machines are respectively; Horizontal lathes: DMG MORI CLX450, DMG MORI ECOTURN310, DMG MORI NLX2500/700, TAKISAWA EX710. Vertical lathes: VLGH950, HONOR VL86H, HONOR VL66M. 3-axes mills: FRONTIER MCV1166, 2 pcs WELE AQ1265. 5-axes mills: DMG MORI ECOMILL50, DMG MORI CMX50U.
60.3.4 System Integration: PLC, ERP, C# with WinProLadder PLC cards have been placed on each machine in the unit, and the working times of the machines have been made instantly accessible with the help of Tibbo Ethernet Converter. In the machining unit, a Tibbo converter attached CNC machine is installed. For the commissioning of the Tibbo Ethernet converter, information on unused IP numbers was obtained from the company’s IT team, and the communication interfaces are listed below. IP definition is made with the VSP Manager interface,
782
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
and the machine is matched with the IP information given to the VSP Manager in the DS Manager interface. After the communication with the machines was established, the necessary diagrams were created in the WinProLadder PLC program in order to receive the determined data from the machines. The necessary codes for communication protocols have been written in the C# program. The da-ta that became comprehensible as a result of communication protocols were transferred to the ERP program. Instead of keeping the data executable (.exe) form in the C# program, it has been transferred to the ERP software to make it accessible (Figs. 60.1 and 60.2).
Fig. 60.1 Tibbo VSP manager and DS manager
Fig. 60.2 PLC port opening codes
60.4 Discussion and Results
783
It is possible to define the BIT values as the connection data bits of the port parts. Communication is provided through these data bits. REGISTERS must be defined in order to distinguish from which machines the data comes from.
60.4 Discussion and Results Five processes that directly affect production were taken into account; Raw material acceptance process, Raw material delivery process, Machinery, Production process, and Processes conducted outside the unit. The raw material acceptance process, the process of giving the raw material to production, the malfunction problems caused by the machines, and other processes and operations are discussed in detail. Since errors arising from the transactions made by the supplier companies are frequently encountered, they were also examined in detail in this process. Evaluating only operational and machine failures within the scope of problems experienced in production causes errors. For this reason, the raw material acceptance process and the process of giving the raw material to production were also examined within the scope of FMEA. The errors that occur in the processes are classified on the basis of the process and are listed in Table 60.1. The error types determined as a result of FMEA application are listed in Table 60.2. The findings suggest that the most important possible error types (RPN ≥ 100) are the following error types: • • • • • • • • • • • • • • • •
Cracking in butt-weld due to milling cutting error in cylinder material. Inability to supply appropriate quality CRNI rods for the factory. Lack of weight in the promix counter load product. The lower outer thickness of the mixer shaft bushing is outside the tolerance values. Faulty production due to not machining the O-ring groove. Crushing in the produced ice cream machine chamber. Incorrect hole diameter position. Cylinder length is out of tolerance. Inputting the wrong CNC program to the CNC machine. The dyeing process is not done in accordance with the technical document. The width of the syrup connection head of the milkshake check valve is out of tolerance. Hole diameter is out of tolerance. The product front sheet of the ice cream machine is not painted according to the customer’s request. Putting the old revised program into production. Mixing of raw material. Giving the wrong raw material to the production line.
784
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
Table 60.1 Possible types of faults and their main causes in the machining unit Process
Failure mode
Root cause of failure
Severity
Occurrence
Detection
RPN
Raw material acceptance process
Quality
Defective product, received from the supplier company
5
2
6
60
Incorrect material 4 order to supplier company
1
3
12
Failure of the raw 3 material input quality control officer to fulfill his duty
1
3
9
Mixing of raw materials
9
2
6
108
Damage of the incoming material while placing it on the shelf in the warehouse
5
2
8
80
Incorrect raw material barcode labeling by the warehouse manager
8
1
9
72
Giving the wrong 9 raw material to the production line
2
6
108
Giving the 4 missing amount of raw material to the production line
1
7
28
Giving too much raw material to the production line
2
1
7
14
Machine failure due to pneumatic pressure drop
8
2
5
80
Machine failure due to lack of cooling water
7
1
4
28
Warehouse
Process of giving raw material to production
Machines
Production
Failure of horizontal lathe machines line
(continued)
60.4 Discussion and Results
785
Table 60.1 (continued) Process
Failure mode
Failure of vertical lathe machines line
Machines
Failure of 5-axes milling machines line
Failure of 3-axes milling machines line
Production process
Failure CNC program
Root cause of failure
Severity
Occurrence
Detection
RPN
Smartkey device failure
7
1
4
28
Machine failure due to leakage in hydraulic oil pump
7
3
4
84
Motor (servo) failure
8
3
4
96
Machine failure due to pneumatic pressure drop
8
2
5
80
Machine failure due to lack of cooling water
9
2
4
72
Malfunction due to low slideway oil
7
3
4
84
C axis failure
2
4
2
16
Machine failure due to lack of cooling water
9
2
4
72
Machine failure due to pneumatic pressure drop
8
2
5
80
Machine failure due to low press oil
7
1
4
28
Machine probe read error
2
2
4
16
Malfunction due to clogged filters
8
2
4
64
Machine failure due to pneumatic pressure drop
8
2
5
80
Malfunction due to low slide way oil
7
3
4
84
Inputting the wrong CNC program to the CNC machine
5
5
8
200
(continued)
786
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
Table 60.1 (continued) Process
Production process
Failure mode
Root cause of failure
Severity
Occurrence
Detection
RPN
Production
First piece trial error
2
5
3
30
Inability to supply appropriate quality CRNI rods for the factory
10
7
5
350
Giving the old revised program to production
8
4
4
128
Faulty production 10 due to not machining the O-ring Groove
7
4
280
The mixer shaft bushing’s lower outer thickness is out of the tolerance values
10
8
4
320
Crushing in the produced ice cream machine chamber
9
7
4
252
Butt-weld cracking due to milling cut error in cylinder material
10
8
5
400
Lack of weight in 10 the premix counter load product
7
5
350
The width of the milkshake check valve syrup connection head part is out of tolerance
8
5
4
160
Incorrect hole 9 diameter position
7
4
252
Hole diameter out 6 of tolerance
5
5
150
Quality
(continued)
60.5 Conclusions and Future Work
787
Table 60.1 (continued) Process
Processes performed outside the unit
Failure mode
Painting
Marking
Root cause of failure
Severity
Occurrence
Detection
RPN
Burr in the produced part
3
2
4
24
Cylinder length out of tolerance
9
7
4
252
Inappropriate roller surface cleaning
5
2
8
80
Twisting error on front sheet of ice cream machine
5
2
6
60
Inadequate painting of the front sheet of the ice cream machine according to the customer’s request
10
3
5
150
Incorrectly 9 conducted dyeing process
3
6
162
Incorrect 10 branding of the company name of the ice cream machine
2
4
80
Incorrect marking 8 process
1
8
64
60.5 Conclusions and Future Work FMEA help manufacturing organizations designing and producing quality products with low cost, as well as preventing faulty products before they reach the customer. FMEA technique analysis can be easily integrated with the ERP program and gives accurate and fast results during the analysis phase. With the application of FMEA technique, it has been observed that the production costs have decreased. The constant parameter in cost comparison is the production amount information. In January, 3971 pieces were produced. While the number of faulty products was 129 in January, the number of faulty products was 49 in February and 39 in March. The number of defective products decreased by 62% in February compared to January and by 70% in March. Data flow was obtained from PLC cards and operators separately, and the data were compared with each other. The RPN values of the current process were calculated
788
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
Table 60.2 Ranking of RPN values from largest to smallest as a result of FMEA application Failure modes
RPN
Butt-weld cracking due to milling cut error in cylinder material
400
Unable to supply CRNI bar of suitable quality for the factory
350
Lack of weight in the Promix counterload product
350
The lower outer thickness of the mixer shaft bushing is outside the tolerance values
320
Incorrect production due to not machining the O-ring groove
280
Crush in the produced ice cream machine chamber
252
Incorrect hole diameter position
252
Cylinder length out of tolerance
252
Giving the wrong CNC program to the CNC machine
200
The dyeing process is not carried out in accordance with the technical document
162
The width of the milkshake check valve syrup connection head part is out of tolerance
160
Hole diameter out of tolerance
150
Inadequate painting of the front sheet of the ice cream machine according to the customer’s request
150
Giving the old revised program to production
128
Mixing of raw material
108
Giving the wrong raw material to the production line
108
Motor (servo) failure
96
C axis failure
84
Malfunction due to low slide way oil
84
Damage to the incoming material while placing it on the shelf in the warehouse
80
Machine failure due to pneumatic pressure drop
80
Inappropriate roller surface cleaning
80
Incorrect branding of the company name of the ice cream machine
80
Machine failure due to pneumatic pressure drop
80
Inappropriate roller surface cleaning
80
Incorrect branding of the company name of the ice cream machine
80
Incorrect raw material barcode labeling by the warehouse manager
72
Machine failure due to lack of cooling water
72
Machine failure due to leakage in hydraulic oil pump
64
Malfunction due to clogged filters
64
The marking process is not carried out in accordance with the technical document
64
Defective product received from the supplier company
60
Bending error on the front sheet of the ice cream machine
60
First piece trial error
30
Giving the missing amount of raw material to the production line
28 (continued)
References
789
Table 60.2 (continued) Failure modes
RPN
Lubrication error
28
Smart key device failure
28
Machine failure due to low press oil
28
Burr in the produced part
24
C axis failure
16
Machine probe read error
16
Giving too much raw material to the production line
14
Incorrect material order to supplier company
12
Failure of the raw material input quality control officer to fulfill his duty
9
and the values were ordered from the largest to the smallest, and it was decided that 16 items should be prioritized for improvement. In the cylinder material with an RPN value higher than 100, cracking in the butt-weld due to milling cutting error, inability to supply CRNI rods of suitable quality for the factory, lack of weight in the promix counter load product, the lower outer thickness of the mixer shaft bushing outside the tolerance values, due to the O-ring channel not being processed faulty production, crushing in the produced ice cream machine chamber, incorrect hole diameter position, cylinder length out of tolerance, wrong CNC program given to the CNC machine, dyeing process not done in accordance with the technical document, the width of the milkshake check valve syrup connection head part out of tolerance, the hole diameter Corrective & preventive actions have been implemented for the faults of being out of tolerance, not painting the front sheet of the ice cream machine according to the customer’s request, putting the old revised program into production, mixing the raw material, giving the wrong raw material to the production line. The effect of 16 items with improvement priority on the production cost was also analyzed and it was observed in the cost records that they provided a 6% reduction in production costs achieved by increasing the number of logic gates and flip flops used in the FPGA. FPGA has been shown to be a very suitable solution for ANN based systems in terms of cost, time saving, re-configurability and parallel design capability.
References 1. Çevik O, Aran G (2009) Failure mode effects analysis (FMEA) in the quality improvement process and an application in piston production. Sakarya Univer Facul Econ Administrative Sci J Soc Econ Res 8(16):241–265 2. Ghani AJ, Harron CHC, Hamdan SH, Said AY, Tomadi AY (2013) Failure mode analysis of carbide cutting tools used for machining titanium alloy. Ceram Int 39(4):4449–4456 3. Mariajayaprakash A, Senthilvelan T (2013) Failure detection and optimization of sugar mill boiler using FMEA and taguchi method. Eng Fail Anal 30:17–26
790
60 Dynamic Data-Driven Failure Mode Effects Analysis (FMEA) …
4. Yee TM, Ahmed S, Quader MA (2014) Process behaviour and capability analysis for improvement of product quality in car-door glass manufacturing. In: 5th Brunei international conference on engineering and technology in Brunei, (BICET), pp 1–6 5. Ün˘gan MC (2017) Failure mode effects analysis and an application in automotive parts production. J Bus Sci (JOBS) 5(2):217–245 6. Mzougui I, Felsoufi ZE (2019) Proposition of a modified FMEA to improve reliability of product. In: 29th CIRP Design in Portugal, pp 1003–1009 7. Hung SW, Chen TK (2019) Disclosing AlN ceramic substrate process failure mode and effect analysis. Microelectron Reliab 103:113508 8. Lo HW, Liou JJH, Haung CN, Chaung YC (2019) A novel failure mode and effect analysis model for machine tool risk analysis. Reliab Eng Syst Saf 183:173–183 9. Da˘gcı B (2019) Analysis of error types and effects in the quality improvement process and investigation of errors by data mining: an application in the machining industry (Master’s thesis, Karabuk University, Institute of Science and Technology) 10. Sánchez M, Paz Moral M, Ramoscelli G (2020) Activity-based costing in smart and connected products production enterprises. Grow Sci 6:33–50 11. Li H, Diaz H, Soares CG (2021) A failure analysis of floating offshore wind turbines using AHP-FMEA methodology. Ocean Eng 234:109261 12. Filz MA, Langner JEB, Hermann C, Thiede S (2021) Data-driven failure mode and effect analysis (FMEA) to enhance maintenance planning. Comput Ind 129:103451 13. Besterfield DH, Besterfield C, Besterfield G, Besterfield M (1999) Total quality management. In: 3rd edn. Prentice Hall Publishing, America 14. Söylemez C (2006) Failure mode and effects analysis occupational safety application (Master’s thesis, Gazi University, Institute of Science and Technology) 15. Gönen D (2004) Failure modes and effects analysis and an application study (Master’s thesis, Balıkesir University, Institute of Science and Technology) 16. Stamastis DH (1995) Failure mode and effects analysis—FMEA from theory to execution. In: 2nd edn. Asq Pr Quality Press Publishing, America 17. Ceber Y (2010) Application of failure mode and effects analysis method (FMEA) in manufacturing industry (Master’s thesis, Dokuz Eylül University, Institute of Science and Technology) 18. Gül B (2001) Failure mode and effects analysis in quality management (Master’s thesis, Gazi University, Institute of Science and Technology) 19. Huang GQ (2000) Failure mode and effect analysis (FMEA) over the www. Int J Adv Manuf Technol 103:113508 20. Güngör F, Pacal K (2004) The effect of risk analysis in planning of machining system. In: IV. Proceedings of the national production research symposium 21. Cigdem M (2006) Imal Usulleri, 2nd edn. Caglayan Bookstore Publishing, Turkey 22. Gungor F, Pacal K (2006) The effect of risk analysis on the planning of the machining system. In: UAS 2006, VI. national production research symposium 23. Mirzaoglu ˙I, Saritas M (2008) Automation of a semolina production system using PLC and SCADA (Master’s thesis, Gazi University, Institute of Science and Technology)