111 24 48MB
English Pages 494 [493] Year 2023
Lecture Notes in Networks and Systems 772
Tawfik Masrour Hassan Ramchoun Tarik Hajji Mohamed Hosni Editors
Artificial Intelligence and Industrial Applications Algorithms, Techniques, and Engineering Applications
Lecture Notes in Networks and Systems
772
Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Tawfik Masrour · Hassan Ramchoun · Tarik Hajji · Mohamed Hosni Editors
Artificial Intelligence and Industrial Applications Algorithms, Techniques, and Engineering Applications
Editors Tawfik Masrour Department of Mathematics and Computer Science National Graduate School for Arts and Crafts Meknes, Morocco Tarik Hajji Department of Mathematics and Computer Science National Graduate School for Arts and Crafts Meknes, Morocco
Hassan Ramchoun National School of Business and Management - ENCG Meknes Meknes, Morocco Mohamed Hosni Department of Mathematics and Computer Science National Graduate School for Arts and Crafts Meknes, Morocco
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-43519-5 ISBN 978-3-031-43520-1 (eBook) https://doi.org/10.1007/978-3-031-43520-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
Preface
In today’s increasingly uncertain, complex, and ambiguous world, the transformative power of Artificial Intelligence (AI) becomes all the more crucial. With its immense potential, AI offers a powerful solution to tackle the formidable challenges encountered in our industrial landscape. From driving economic growth to revolutionizing health care and containing diseases, the application of AI holds immense promise for our communities. As we gaze into the future, we recognize that the challenges we face will only intensify. Thus, it is imperative that we direct our attention toward the industrial applications of AI. By harnessing its capabilities, we can unlock groundbreaking advancements in manufacturing, logistics, automation, and optimization, among others. This paradigm shift requires relentless pursuit of innovative studies and dedicated research, undertaken collaboratively by both academics and practitioners. In this rapidly evolving field, staying at the forefront of AI research is essential. By pushing the boundaries of knowledge and constantly exploring new frontiers, we can harness the true potential of AI in solving real-world industrial problems. It is through our collective efforts that we can revolutionize industries, reshape economies, and pave the way for a future that is empowered by the limitless possibilities of Artificial Intelligence. With an unwavering focus on the industrial and engineering applications of Artificial Intelligence, we proudly present the second edition of A2IA—Artificial Intelligence and Industrial Applications, an esteemed international conference organized by ENSAM— Meknes at Moulay Ismail University. Recognizing the urgent need for collaboration, our conference serves as a prominent platform for experts, researchers, academics, and industrial practitioners to convene. At A2IA, our primary objective is to foster a collective effort toward addressing the challenges faced in industrial domains through the power of AI. By bringing together leading minds from diverse backgrounds, we aim to facilitate indepth discussions, propose innovative solutions, and exchange groundbreaking concepts and theories. This synergy of ideas will serve as the catalyst for future research, enabling us to chart a course toward transformative advancements. We firmly believe that the key to unlocking the full potential of AI lies in fostering robust connections between institutions and individuals. Through this interconnected network, we can enhance the productivity and effectiveness of our research endeavors, propelling the field of AI and its industrial applications to new heights. As we embark on this shared mission, we invite you to join us at A2IA, where knowledge meets innovation, collaboration flourishes, and the future of industrial AI is shaped. Together, let us forge a path toward a world where AI revolutionizes industries, creates economic prosperity, and paves the way for a future of unparalleled possibilities. The conference covered an extensive range of topics, emphasizing the diverse applications of Artificial Intelligence in industrial settings. The areas explored in depth during the conference included: 1. Smart Operation Management: Exploring how AI can optimize and enhance the management of industrial operations, improving efficiency and productivity.
vi
Preface
2. Artificial Intelligence: Algorithms and Techniques: Delving into the algorithms and techniques that underpin AI systems, enabling intelligent decision-making and problem-solving. 3. Artificial Intelligence for Information and System Security in Industry: Addressing the critical role of AI in safeguarding information and system security within industrial environments. 4. Artificial Intelligence for Energy: Investigating the application of AI to revolutionize energy sectors, facilitating smarter energy generation, distribution, and consumption. 5. Artificial Intelligence for Agriculture: Highlighting the potential of AI to transform agricultural practices, from precision farming and crop monitoring to livestock management and yield optimization. 6. Artificial Intelligence for Health Care: Exploring the impact of AI on health care, including medical diagnosis, personalized treatment plans, drug discovery, and patient care. 7. Other Applications of Artificial Intelligence: Encompassing a wide array of industrial applications that leverage AI, such as manufacturing, logistics, supply chain management, robotics, and automation. In the proceedings of A2IA, we received an impressive number of paper submissions from around the world. After a rigorous blind peer-reviewed process, involving a panel of esteemed international experts in the conference’s subject areas, we have carefully selected and included papers for presentation and publication. The papers are organized into two volumes, focusing on different aspects of AI in industrial applications: • Volume 1: Artificial Intelligence and Industrial Applications: Smart Operation Management (In: Lecture Notes in Networks and Systems) showcases research on leveraging AI for intelligent operation management within industrial settings. • Volume 2: Artificial Intelligence and Industrial Applications: Algorithms, Techniques, and Engineering Applications (In: Lecture Notes in Networks and Systems) explores the development and engineering aspects of AI algorithms and techniques in various industrial domains. Lastly, we extend our heartfelt gratitude to all the contributors who have enriched this publication with their valuable insights, ideas, and research. We acknowledge the efforts of those whose papers could not be included, as their contributions were integral to the scholarly discourse surrounding AI in industrial contexts. We would like to express our sincere appreciation to the esteemed members of the program committee and the diligent reviewers who dedicated their expertise and time to ensure the quality and rigor of the selection process. Their invaluable contributions have greatly supported us in shaping this publication. We are immensely grateful for the unwavering support and organizational assistance provided by Moulay Ismail University. Their commitment to fostering intellectual growth and promoting research excellence has been instrumental in making this conference a success. Finally, we extend our special appreciation to Springer Nature for their collaboration and support in bringing this publication to fruition. Their partnership has been invaluable in disseminating the latest advancements in AI and its practical applications to a wider audience.
Preface
vii
With gratitude and enthusiasm, let us embark on this intellectual journey together, as we pave the way for remarkable advancements in Artificial Intelligence and its transformative industrial applications.
Organization
General Chair Tawfik Masrour
Department of Mathematics and Computer Science, Artificial Intelligence for Engineering Sciences Team (IASI), Laboratory of Mathematical Modeling, Simulation, and Smart Systems (L2M3S), ENSAM, My Ismail University, 50500 Meknes, Morocco [email protected], [email protected]
Co-chair Vincenzo Piuri
Department of Computer Science, University of Milan via Celoria 18, 20133 Milano (MI), Italy
Keynotes Speakers José Luis Verdegay Jose A. Lozano Bouchra R. Nasri Gabriella Casalino
University of Granada, Granada, Spain University of the Basque Country, Spain Université de Montréal, Montreal, Canada University of Bari, Italy
TPC Chairs Jose Arturo Garza-Reyes, UK Janusz Kacprzyk, Poland Noureddine Barka, Canada Ali Siadat, France
International Scientific Committee Abawajy Jemal H., Australia Aboulaich Rajae, Morocco Aghezzaf El-Houssaine, Belgium
x
Organization
Ahmadi Abdeslam, Morocco Ait Moussa Abdelaziz, Morocco Akhrif Iatimad, Morocco Aksasse Brahim, Morocco Al-Mubaid Hisham, USA Alali Abdelhakim, Morocco Alami Hassani Aicha, Morocco Ali Ahad, USA Ali Siadat, France Allahverdi Ali, Kuwait Aly Ayman A., Saudi Arabia Amri Samir, Morocco Arbaoui Abdelaziz, Morocco Azizi Abdelmalek, Morocco Azzeddine Mazroui, Morocco Babai Mohamed Zied, France Badie Kambiz, Iran Balas Valentina Emilia, Romania Bansal Jagdish Chand, India Batouche Mohamed Chawki, Saudi Arabia Behja Hicham, Morocco Belhadi Amine, Morocco Ben Abdllah Mohammed, Morocco Benabbou Rajaa, Morocco Benaissa Mounir, Tunisia Benghabrit Asmaa, Morocco Benoussa Rachid, Morocco Berrada Ilham, Morocco Berrada Mohamed, Morocco Bouhaddou Imane, Morocco Brouri Adil, Morocco Buccafurri Francesco, Italy Carrabs Francesco, Italy Castillo Oscar, Mexico Cerulli Raffaele, Italy Chaouni Benabdellah Abla, Morocco Chaouni Benabdellah Naoual, Morocco Charkaoui Abdelkabir, Morocco Chbihi Louhdi Mohammed Reda, Morocco Cherrafi Anass, Morocco Ciaramella Angelo, Italy Ciasullo Maria Vincenza, Italy D’Ambrosio Ciriaco, Italy Daoudi El-Mostafa, Morocco De Mello Rodrigo Fernandes, Brazil
Organization
Deep Kusum, India Dolgui Alexandre, France Ducange Pietro, Italy El Akili Charaf, Morocco El Bouanani Faissal, Morocco El Haddadi Anass, Morocco El Hammoumi Mohammed, Morocco Mohammed El Hamlaoui, Morocco El Hassani Ibtissam, Morocco El Jasouli Sidi Yasser, Belgium El Mazzroui Abb Elaziz, Morocco El Mghouchi Youness, Morocco El Ossmani Mustapha, Morocco Elbaz Jamal, Morocco Elfezazi Said, Morocco Elmazroui Azz Addin, Morocco En-Naimani Zakariae, Morocco Es-Sbai Najia, Morocco Ettifouri El Hassane, Morocco Ezziyyani Mostafa, Morocco Faquir Sanaa, Morocco Fassi Fihri Abdelkader, Morocco Fiore Ugo, Italy Fouad Mohammed Amine, Morocco Gabli Mohamed, Morocco Gaga Ahmed, Morocco Gao Xiao-Zhi, Finland Garza-Reyes Jose Arturo, UK Ghobadian Abby, UK Giuseppe Stecca, Italy Govindan Kannan, Denmark Grabot Bernard, France Hadda Mohammed, Morocco Hajji Tarik, Morocco Hamzane Ibrahim, Morocco Harchli Fidaa, Morocco Hasnaoui Moulay Lahcen, Morocco Herrera-Viedma Enrique, Spain Mohamed Hosni, Morocco Itahriouan Zakaria, Morocco Jaara El Miloud, Morocco Jaouad Kharbach, Morocco Jawab Fouad, Morocco Joudar Nour-eddine, Morocco Kacprzyk Janusz, Poland
xi
xii
Organization
Kaoutar Senhaji, Espagne Kaya Sid Ali Kamel, Morocco Khadija Bouzaachane, Morocco Khalid Haddouch, Morocco Khireddine Mohamed Salah, Morocco Khrbach Jawad, Morocco Kodad Mohssin, Morocco Krause Paul, UK Kumar Vikas, UK Laaroussi Ahmed, Morocco Lagrioui Ahmed, Morocco Lasri Larbi, Morocco Lazraq Aziz, Morocco Lebbar Maria, Morocco Leung Henry, Canada Mahfoud Hassana, Morocco Manssouri Imad, Morocco Marcelloni Francesco, Italy Massoud Hassania, Morocco Medarhri Ibtissam, Morocco Mhada Fatima-Zahra, Morocco Mkhida Abdelhak, Morocco Mohiuddin Muhammad, Canada Moumen Aniss, Morocco Moussi Mohamed, Morocco Najib Khalid, Morocco Nee Andrew Y. C., Morocco Nfaoui Elhabib, Morocco Nguyen Ngoc Thanh, Poland Nouari Mohammed, France Noureddine Boutammachte, Morocco Novák Vilém, Czech Ouazzani Jamil, Morocco Ouerdi Noura, Morocco Oztemel Ercan, Turkey Palmieri Francesco, Italy Pesch Erwin, Germany Pincemin Sandrine, France Rachidi Youssef, Morocco Rahmani Amir Masoud, Iran Raiconi Andrea, Italy Ramchoun Hassan, Morocco Rocha-Lona Luis, Mexico Saadi Adil, Morocco Sabor Jalal, Morocco
Organization
Sachenko Anatoliy, Ukraine Sael Nawal, Morocco Saidou Noureddine, Morocco Sailesh Iyer, India Sekkat Souhail, Morocco Senhaji Salwa, Morocco Serrhini Simohammed, Morocco Sheta Alaa, USA Siarry Patrick, France Soulhi Aziz, Morocco Staiano Antonino, Italy Taakili Aziz, Morocco Tahiri Ahmed, Morocco Tarnowska Katarzyna, USA Tyshchenko Oleksii K., Czech Tzung-Pei Hong, Taiwan Zemmouri Elmoukhtar, Morocco Zéraï Mourad, Tunisia
Local Organizing Committee Abou El Majd Badr Ahmadi Abdessalam Amri Samir Benabbou Rajaa Benghabrit Asmaa Benghabrit Youssef Bouhaddou Imane Cherrafi Anass El Hassani Ibtissam El Ossmani Mustapha Fassi Fihri Abdelkader Hadda Mohammed Hajji Tarik Mohamed Hosni Masrour Tawfik Ramchoun Hassan Sekkat Souhail Taakili Abdelaziz
Med V University, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSEM, Morocco ENSMR, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco ENCG Meknes, Morocco ENSAM Meknes, Morocco ENSAM Meknes, Morocco
xiii
xiv
Organization
Publication Chairs Ibtissam El Hassani, Morocco Tarik Hajji, Morocco Hassan Ramchoun, Morocco Ercan Oztemel, Turkey
Poster Chairs Benabbou Rajaa, Morocco Souhail Sekkat Morocco
Registration Chairs Abdessalam Ahmadi, Morocco Adil Saadi, Morocco
Web Chairs Ibtissam El Hassani, Morocco Zemmouri El Moukhtar, Morocco
Public Relations Chairs Abla Benabdellah Chaouni, Morocco Asmaa Benghabrit, Morocco Youssef Benghabrit, Morocco Imane Bouhaddou, Morocco Said Ettaqi, Morocco
Industrial Session Chairs Anass Cherrafi, Morocco Souhail Sekkat, Morocco
Organization
Ph.D. Organizing Committee PhD Students Amhraoui ElMehdi El Ouazzani Hind Quesdane Mohamed Khdoudi Abdelmoula Barbara Idriss
ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco
Engineering Students Adikpeto Rivaldo Akhatar Mourad Baraghen Adnane Bouchama Lyazid Dkhissi Soufiane El Alimi Sara El Kadiri Hajar El Yadari Moussa El Kinani ilyas El Manja Bilal Ezziyani Ilyass Fakhech Taha Ganlaky Farel Jari Mohammed Amine Kharbouch Youssef Malmoum Chorouk Mhamdi Oumkalthoum Outatahoute Saadia Saddik Imad Zamzami Rim Zarkane Issam
ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco ENSAM, Morocco
xv
Contents
Learning to Irrigate - A Model of the Plant Water Balance . . . . . . . . . . . . . . . . . . . Matthias Maszuhn, Frerk Müller-von Aschwege, Susanne Boll-Westermann, and Jan Pinski Free and Unfree Weed Classification in Young Palm Oil Crops Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sophie Thelma Marcus Jopony, Fadzil Ahmad, Muhammad Khusairi Osman, Mohaiyedin Idris, Saiful Zaimy Yahaya, Kamarulazhar Daud, Ahmad Puad Ismail, Anwar Hassan Ibrahim, and Zainal Hisham Che Soh A Method for Bengali Author Detection Using State of the Arts Supervised Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Abdul Hamid, Nusrat Jahan Marjana, Eteka Sultana Tumpa, Md. Rafidul Hasan Khan, Umme Sanzida Afroz, and Md. Sadekur Rahman
1
12
21
Robust State of Charge Estimation and Simulation of Lithium-Ion Batteries Using Deep Neural Network and Optimized Random Forest Regression Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saad El Fallah, Jaouad Kharbach, Abdellah Rezzouk, and Mohammed Ouazzani Jamil
34
Advancing Lithium-Ion Battery Management with Deep Learning: A Comprehensive Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hind Elouazzani, Ibtissam Elhassani, and Tawfik Masrour
46
State of Charge Estimation of Lithium-Ion Batteries Using Extended Kalman Filter and Multi-layer Perceptron Neural Network . . . . . . . . . . . . . . . . . . Oumayma Lehmam, Saad El Fallah, Jaouad Kharbach, Abdellah Rezzouk, and Mohammed Ouazzani Jamil
59
Pavement Crack Detection from UAV Images Using YOLOv4 . . . . . . . . . . . . . . . Mat Nizam Mahmud, Nur Nadhirah Naqilah Ahmad Sabri, Muhammad Khusairi Osman, Ahmad Puad Ismail, Fadzil Ahmad Mohamad, Mohaiyedin Idris, Siti Noraini Sulaiman, Zuraidi Saad, Anas Ibrahim, and Azmir Hasnur Rabiain
73
Genetic Algorithm for CNN Architecture Optimization . . . . . . . . . . . . . . . . . . . . . Khalid Elghazi, Hassan Ramchoun, and Tawfik Masrour
86
xviii
Contents
Enhancing Comfort and Security: A Chatbot-Based Home Automation System with Integrated Natural Language Processing and IoT Components . . . . Tarik Hajji, Abdelkader Fassi Fihri, Ibtissam El Hassani, Salma Kassimi, and Chaima El Hajjoubi
98
The Impact of Systolic Blood Pressure Level and Comparative Study for Predicting Cardiovascular Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Kenza Douifir and Naoual Chaouni Benabdellah Contribution to Solving the Cover Set Scheduling Problem and Maximizing Wireless Sensor Networks Lifetime Using an Adapted Genetic Algorithm . . . . . 123 Ibtissam Larhlimi, Maryem Lachgar, Hicham Ouchitachen, Anouar Darif, and Hicham Mouncif A Multi-Agent System for the Optimization of Medical Waste Management . . . 134 Ahmed Chtioui, Imane Bouhaddou, Abla Chaouni Benabdella, and Asmaa Benghabrit A Relaxed Variant of Distributed Q-Learning Algorithm for Cooperative Matrix Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Elmehdi Amhraoui and Tawfik Masrour Remote Sensing Image Super-Resolution Using Deep Convolutional Neural Networks and Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Safae Belamfedel Alaoui, Hassan Chafik, Abdeslam Ahmadi, and Mohamed Berrada Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language: Particularities and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Rkia Bani, Samir Amri, Lahbib Zenkouar, and Zouhair Guennoun Learning Sparse Fully Connected Layers in Convolutional Neural Networks . . . 183 Mohamed Quasdane, Hassan Ramchoun, and Tawfik Masrour Using Crank-Nicolson Scheme for Continuous Hopfield Network Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Safae Rbihou, Nour-Eddine Joudar, Zakariae En-Naimani, and Khalid Haddouch Fire and Smoke Detection Model for Real-Time CCTV Applications . . . . . . . . . 211 Tarik Hajji, Ibtissam El Hassani, Abdelkader Fassi Fihri, Yassine Talhaoui, and Chaimae Belmarouf Vehicle Image Classification Method Using Vision Transformer . . . . . . . . . . . . . . 221 Youssef Taki and Elmoukhtar Zemmouri
Contents
xix
A Review of Variational Inference for Bayesian Neural Network . . . . . . . . . . . . . 231 Bakhouya Mostafa, Ramchoun Hassan, Hadda Mohammed, and Masrour Tawfik Regression and Machine Learning Modeling Comparative Analysis of Morocco’s Fossil Fuel Energy Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Dalal Nasreddin, Yasmine Abdellaoui, Aymane Cheracher, Soumia Aboutaleb, Youssef Benmoussa, Inass Sabbahi, Reda El Makroum, Saad Amrani Marrakchi, Asmae Khaldoun, Aymane El Alami, Imad Manssouri, and Houssame Limami Enhancing Brain Tumor Classification in Medical Imaging Through Image Fusion and Data Augmentation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Tarik Hajji, Youssef Douzi, and Tawfik Masrour Train a Deep Neural Network by Minimizing an Energy Function to Solve Partial Differential Equations: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Idriss Barbara, Tawfik Masrour, and Mohammed Hadda Link Prediction Using Graph Neural Networks for Recommendation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Safae Hmaidi, Imran Baali, Mohamed Lazaar, and Yasser El Madani El Alami Uncertainty Analysis of a Blade Element Momentum Model Using GSA and GLUE Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Yassine Ouakki, Amar Amour, and Abdelaziz Arbaoui Industrial Accident Prevention Based on Reinforcement Learning . . . . . . . . . . . . 312 Tarik Hajji, Ibtissam El Hassani, Abdelkader Fassi Fihri, Taha Jakni, and Marouane Lahmamsi A Comprehensive Analysis of Consumers Sentiments Using an Ensemble Based Approach for Effective Marketing Decision-Making . . . . . . . . . . . . . . . . . . 323 Hicham Nouri, Karim Sabri, and Nassera Habbat Study of the Optimization Control of Agricultural Greenhouse Climatic Parameters by the Integration of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 334 Jaouad Boudnaya, Nina Aslhey Huynh, Ouèdan Jhonn Gomgnimbou, Kechchour Aya, Ait Omar Yassir, and Abdelhak Mkhida Comparative Study Between Double Vector Quantization Using SOM and GMM for Prediction Time Series Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Hanae El Fahfouhi, Zakariae En-Naimani, and Khalid Haddouch
xx
Contents
Towards Development of Synthetic Data in Surface Thermography to Enable Deep Learning Models for Early Breast Tumor Prediction . . . . . . . . . . 356 Zakaryae Khomsi, Achraf Elouerghi, and Larbi Bellarbi A Novel Model for Optimizing Multilayer Perceptron Neural Network Architecture Based on Genetic Algorithm Method . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Fatima Zahrae El-Hassani, Youssef Ghanou, and Khalid Haddouch An Improved YOLOv5 Based on Attention Model for Infrared Human Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Aicha Khalfaoui, Badri Abdelmajid, and El Mourabit Ilham Pneumonia Classification Using Hybrid Architectures Based on Ensemble Techniques and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Chaymae Taib, ELkhatir Haimoudi, and Otman Abdoun ResNet-Based Emotion Recognition for Learners . . . . . . . . . . . . . . . . . . . . . . . . . . 400 N. El Bahri, Z. Itahriouan, and A. Abtoy Mining the CORD-19: Review of Previous Work and Design of Topic Modeling Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Salah Edine Ech-chorfi and Elmoukhtar Zemmouri Demand Forecast of Pharmaceutical Products During Covid-19 Using Holt-Winters Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Anas Elkourchi, Moulay Ali El Oualidi, and Mustapha Ahlaqqach Using ANN-MLP for Supervised Classification of the Hercynian Granitoids from Their Geochemical Chara Teristics at the Aouli Pluton. (High Moulouya, Morrocco) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Taj Eddine Manssouri, Imad Manssouri, Abdellah El Hmaidi, and Hassane Sahbi Overview of Artificial Intelligence in Agriculture: An Impact of Artificial Intelligence Techniques on the Agricultural Productivity . . . . . . . . . . . . . . . . . . . . 447 Sara Belattar, Otman Abdoun, and El Khatir Haimoudi RNN and LSTM Models for Arabic Speech Commands Recognition Using PyTorch and GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Omayma Mahmoudi and Mouncef Filali Bouami Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
About the Editors
Dr. Tawfik Masrour is Full Professor of Applied Mathematics and Artificial Intelligence in National High School for the Arts and Crafts (ENSAM Meknes), My Ismail University UMI, and Member of Research Team Artificial Intelligence for Engineering Sciences (AIES) and Laboratory of Mathematical Modeling, Simulation, and Smart Systems. (L2M3S). He graduated from Mohammed V University–Rabat with MSc degree in Applied Mathematics and Numerical Analysis and from Jacques-Louis Lions Laboratory, Pierre and Marie Curie University, Paris, with MAS (DEA) in Applied Mathematics, Numerical Analysis, and Computer Science. He obtained his PhD, in Mathematics and Informatics, from École des Ponts ParisTech (ENPC), Paris, France. His research interests include Control Theory and Artificial Intelligence. Email: [email protected] Dr. Hassan Ramchoun is Doctorate Status in applied mathematics and computer sciences from the Laboratory of Modeling and Scientific Computing at the Faculty of Sciences and Technology of Fez, Morocco, He is a professor at the National School of Business and Management; he is a member of Artificial Intelligence for engineering sciences group in the Laboratory of Mathematical Modeling, Simulation, and Smart Systems. He works on neural network, probabilistic modeling, multiobjective optimization, and statistical learning methods. Email: [email protected]
xxii
About the Editors
Dr. Tarik Hajji is Professor of Computer Science and Artificial Intelligence in National High School for the Arts and Crafts (ENSAM- Meknes), My Ismaïl University UMI, and Member of Research Team Artificial Intelligence for Engineering Sciences (AIES) and Laboratory of Mathematical Modeling, Simulation, and Smart Systems (L2M3S). He graduated from Mohammed I University–Oujda with MSc degree in Computer Engineer. He obtained his PhD in Computer Science and Artificial, from the Faculty of Science of Oujda (FSO), Oujda, Morocco. His research interests include Computer Science, Big Data, and Artificial Intelligence. Email: [email protected] Dr. Mohamed Hosni received the Engineering Degree in Computer Science, Option Software Quality, from the National College of Applied Science from Mohammed First University (Morocco) in 2014, and PhD in Software Engineering in 2018 from the National School of Computer Science and Systems Analysis in University Mohammed V in Rabat (Morocco). He is currently Assistant Professor with the College of Art and Crafts (ENSAM), University Moulay Ismail of Meknes. He performed a PhD research stay and a Postdoc with the Software Engineering Research Group at the University of Murcia (Spain) in 2017 and 2019, respectively. He has published more than 29 papers in international journals and conferences. His research interests include software engineering, machine learning and its applications. Email: [email protected]
Learning to Irrigate - A Model of the Plant Water Balance Matthias Maszuhn1(B) , Frerk M¨ uller-von Aschwege1 , 1 Susanne Boll-Westermann , and Jan Pinski2 1
2
OFFIS - Institute for Information Technology, Oldenburg, Germany [email protected] Lehr- und Versuchsanstalt f¨ ur Gartenbau Bad Zwischenahn, Bad Zwischenahn, Germany
Abstract. Nowadays, a wide range of soil sensor systems is cheaply available for tree nurseries and thus allows them to monitor the condition of their plants. An automated, sensor-based irrigation and fertilization system promises many advantages over the manual irrigation such as a better plant growth rate and a lower water consumption. This paper proposes a workflow to create a plant water balance model based on recorded soil sensor data, weather data and weather forecasts and to bring it in relation with the plant’s growth model. The model was implemented and used to train a reinforcement learning algorithm to control the irrigation so that the soil humidity would stay within appropriate thresholds. After the training, we were able to keep a stable soil humidity for 19 of the 23 tested plants and react to changing weather conditions and soil temperatures. Keywords: Machine Learning · Reinforcement Learning · Horticulture · Plant nursery · Artificial Intelligence · Automated Irrigation
1
Introduction
Growth and quality of plants are directly depending on the continuous availability of water and the supply with nutrients. Therefore, the water supply can quickly become a limiting factor to plant growth and lacking water can cause irreparable damage to the plant. The needs-based irrigation of stock got increasingly difficult especially over the past few years with long periods of drought and an overall decreasing annual precipitation [3]. With an climate change, those events are expected to become more common and water will become an increasingly scarce resource in the future [10]. The domain of agriculture and horticulture is by far the biggest water consumer, covering nearly 70% of the total global water consumption, therefore saving water is crucial to this domain [1]. In practice, the irrigation and fertilization in tree nurseries is mostly controlled manually based on empirical values and human experience or rarely conducted by threshold based irrigation controllers [4]. However, the tree nursery c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 1–11, 2023. https://doi.org/10.1007/978-3-031-43520-1_1
2
M. Maszuhn et al.
staff and current automation systems tend to irrigate their fields more than necessary to save time and avoid the risk of drought stress at the cost of a higher water consumption. Weather forecasts and laboratory tests of the soil’s nutritional values are mostly ignored in practice, but can have huge implications on the required irrigation and fertilization. Such imprecise measures imply that the plant is rarely situated in a perfect water balance, hindering the plant’s growth process and in consequence reducing the tree nurseries economic returns. Nowadays, the availability and relatively low costs of suitable sensor systems to measure the water balance related parameters would allow tree nurseries to utilize them in an automated irrigation system. We feel like existing automation methods are not precise enough as they control the irrigation based on single parameters like the current soil humidity rather than a combination of all available parameters. Also such systems are usually developed in a perfectly homogeneous environment and do not consider the local differences of soil or weather conditions in larger tree nurseries. In this work, we propose a method to include the measurable environment parameters into a plant’s water balance model. This model is fed to a reinforcement learning algorithm for automatically adapting the irrigation and fertilization measures based on changing weather and soil conditions. In summary, this paper has the following main contributions: (1) A concept to build a system that can predict or autonomously take care measures in tree nurseries to reduce water usage or increase the plant quality and growth. (2) Partly implementation of the system with a plant water balance model describing the correlation between irrigation, precipitation, temperatures and the observed soil humidity and (3) the evaluation of a reinforcement learning agent trained in a simulation environment based on this model. The paper is organized as follows: Sect. 2 introduces the water balance model and its components. In Sect. 3 we give an overview of related work in the field of artificial intelligence in horticulture and agriculture. Section 4 gives a more detailed explanation of the experimental setup and the sensor data we collected. Section 5 presents the process to turn the recorded sensor data into a water balance model of the relevant parameters which is then used in Sect. 6 to train a reinforcement learning agent in that environment. Finally, in Sect. 7 the training results are presented. A conclusion discussing the results and future work follows.
2
Water Balance Model
A crucial prerequisite for learning irrigation of plants is to develop of a reliable water balance model which allows the evaluation of validity of proposed care measures. In the following we further elaborate on the different parts of the model shown in Fig. 1 and explain the relations. (1) In cooperation with our project partners, we choose “Thuja occidentalis” and “Rhododendron Robert Seleger” as our model plants as both cultures are relatively robust to drought stress allowing to test plant conditions in a wide range of soil humidities. However, we expect the underlying methodology
Learning to Irrigate - A Model of the Plant Water Balance
(2)
(3) (4) (5)
3
to be applicable for a wide range of plants by adjusting the parameters accordingly. We installed various sensor systems in test fields to keep track of the plant’s water balance. The recorded parameters are divided in three categories: (a) Soil Sensor Data: The soil humidity, soil salinity and soil temperature in the pot next to the plant’s roots (b) Current Weather Data: The precipitation, air temperature, sun exposure, wind speed and wind direction measures by a weather station installed next to the test fields (c) Weather Forecast Data: The hourly local weather forecast for the next day including the aforementioned weather parameters The combined data from our sensor systems is used to create the plant’s water balance model Later we plan to expand the system by a “growth model” where we include data about the plant’s growth, weight gain and overall quality. Finally, we will combine the water balance and growth models to train the reinforcement learning algorithm to achieve various learning goals. This could e.g. include maximizing the plant’s growth rate, reducing the water usage as much as possible without causing drought stress to the plant or targeting a certain plant size to a specific date when the plant will be sold. This characteristic of making the plant growth and quality in nurseries more predictable by automating care measures based on long term environment conditions makes us call this the “predictive plant production” method.
User defines learning goals
The scope of this work:
(a)
measured by installed system
state s tate and reward
Soil Sensor Data
(3) (b)
measured by weather station
Weather Data Water Balance Model (c)
Simulation Environment
RL Agent
retrieved from API
(5)
Weather Forecasts
(1)
(2)
Plant Growth Data
actions
recorded manually
Growth Model
(4)
Fig. 1. The figure shows the different types of recorded data and how they are used to create the plant’s water balance model and the plant’s growth model. Both models are then used to build a simulation environment where we train a reinforcement learning agent. The advantage of this approach is that various learning goals can be specified and combined depending on the water availability and plant type.
4
3
M. Maszuhn et al.
Related Work
Research on sustainability of irrigation methods in both agriculture and horticulture has become increasingly more prominent over the past 20 years [11]. The classical approach consists of monitoring the water usage over a full season and building a model that compares the required and supplied water amounts [5,9]. Another seminal concept in the literature is the construction of a fuzzy logic controller that translates an expert’s knowledge about plant conditions into a rule-based fuzzy logic controller that regulates the irrigation [2,4,6,7]. Also, the IoT is playing a significant role in agriculture related research topics recently and some environment parameters such as soil measurements and climatic variables are already taken into account by modern irrigation systems [8]. However, a self-learning system proposing irrigation and fertilization measures has to our knowledge not yet been considered. Furthermore, the existing approaches focus on either saving excess water, increasing the plant growth or preventing damage to the plant. The reinforcement learning approach on the other hand allows us to define and weigh multiple learning goals.
4
Experimental Setup
The task of training a reinforcement learning algorithm to irrigate and fertilize plants can be challenging, as there are many parameters to keep track of. In a first approach, we install a total of 23 soil sensors in two larger test fields of two tree nurseries providing realistic conditions and a variety of different sensor systems in a 35m2 test field (Fig. 2) at the premise of a plant research institute to conduct laboratory tests. The sensor systems are installed in the pots close to the roots of our “Thuja occidentalis” and “Rhododendron Robert Seleger” plants. For economical reasons, not every single plant has its own sensor system, but instead the sensors are divided over the fields and the data measured is representative for an area with multiple plants on the field. At the start of the season, the plants are all approximately the same size to provide comparability. The goal of this setup is to collect enough data to model the plant’s soil humidity under the influence of precipitation and a customary irrigation considering the effect of the soil’s temperature. This model can then be used to train a reinforcement learning algorithm to irrigate the plants keeping the soil humidity in a pre-defined range. A realistic water balance model ensures that the irrigation learned can be transferred back and being applied to real plants. An overview of the sensor systems in use is presented in Fig. 3. For the water balance model, we monitor the soil humidity, soil temperatures and the soil’s electrical conductivity in 30 min intervals. Overall, we collect data from May to September so the observation period involves most of the growing season. The data transmission is established using a LoRaWAN system with gateways installed close to the test fields to retrieve the sensor data which is then fed to a server and presented in a dashboard. Additionally, hourly data about
Learning to Irrigate - A Model of the Plant Water Balance
5
air temperatures and precipitations at the nurseries is obtained from a locally installed weather station and forecast data is retrieved from a publicly available weather API. In order to account for the imprecision of forecast data, we model these uncertainties with a probability distribution of certain weather events. The preciseness of the weather forecasts is highly depending on the location and is therefore a parameter that can be learned over multiple seasons by comparing weather forecasts with the observed actual weather conditions. To get a more applicationoriented setup, the test fields are located outdoors. However, this also implies additional error sources that we have to take into consideration. Generally, we experience two types of errors that require different measures: A sensor could send no data at all, either because its battery is empty or because of heavy rainfall causing water damage to the sensor. Since these values obviously do not correspond to reality, we can just remove missing values and data containing only zeros from our dataset. Fig. 2. The test field at the premise of a This way, we filter out around 7% plant research institute with different sensor of the total amount of sensor data. systems installed as seen in Fig. 3. Furthermore, a sensor can also send incorrect data commonly caused by an error during the transmission to the server. This can e.g. imply a sensor sending the same data multiple times or the gateway mixing up the values of two sensors. These errors are more severe and detecting them required implementation of a semi-automated feasibility check since faulty data could drastically impair the training progress. We implement different software solutions to handle these errors which are presented in Sect. 5 in more detail.
5
Model Construction
The water balance model is a crucial step to teach a reinforcement learning agent the correct irrigation measures based on the sensor data. Classical reinforcement learning comes with an agent that observes and interacts with an environment and gets rewarded based on the outcome of its actions. In our scenario, we require a water balance model that can simulate the effect of the measures proposed by the agent. Due to technical limitations, the agent can only decide to either turn the irrigation completely on or off for intervals of 30 min. An example control timeline is shown in Fig. 4. The control decision must be made based on a simulation of the water balance by calculating the soil humidity for the next time interval.
6
M. Maszuhn et al.
For this calculation, our model has to describe three major correlations: (1) How much does the pot’s soil humidity decrease over time depending on the temperature? Fig. 3. The different soil sensor systems (Either by evaporation or water tested in this project from left to right: absorption by the plant) Florja, Dragino, SMT 100, Tensiometer (2) By how much and over which (electronic reading), Tensiometer (manual period of time does the soil reading), Plantcare humidity increase after the plant was artificially irrigated? (3) Equally, by how much and over which time span does the soil humidity increase after precipitation? It should be noted though that our water balance model is just a simplified representation of the processes taking place in the real world. In this first try to model the plant’s water balance, we explicitly do not consider the plant size or weather effects like wind and sun exposure. To investigate the first correlation, we filter the data for all time steps while we do not supply water by irrigation or precipitation to get the unadulterated water consumption rate of the plant. In relation to the soil temperature the difference in soil humidity will be processed every thirty minutes afterwards. From all these humidity difference/temperature tuples, we approximate a linear function describing the plant’s water consumption for different temperatures. An example for this is shown in Fig. 5. To account for possible measurement errors mentioned in Sect. 4, the function was approximated based on just 80% of the tuples and 10% on both sides of the function were filtered out as outliers. irrigation control 1 on
off
on
off
0 00:00
00:30
01:00
01:30
02:00
02:30
time in hours
Fig. 4. An example timeline with 30 min time steps. The agent can either decide to turn the irrigation on or off completely depending on the current parameters and the forecast data.
Learning to Irrigate - A Model of the Plant Water Balance
7
The second correlation is more difficult to model, because the irrigation effect cannot be measured instantly. Instead, the time the water needs to reach the plant’s sensor depends on the soil properties and the sensor placement. Additionally, due to the sensor’s measuring method it can be assumed that the sensor over-modulates the soil humidity for a short duration after the water reaches the sensor. Rather than considering a single time Fig. 5. The graphic shows the measured interval, we model the soil humid- substrate moisture changes over 30 min in ity changes over the course of one comparison to the soil temperature for a sinday after each irrigation period. For gle sensor over one season. A linear function this, we filter the data for all time can be approximated to describe the relation steps where the artificial irrigation between temperature and moisture decrease was turned on and observe the soil for a certain temperature range. Due to possible measurement errors 10% of the data humidity changes in the next time points are marked as outliers and are not interval, allowing us to model the used in the approximation (red dots). long-term effect of an irrigation. An example is shown in Fig. 6. Since the plant also absorbs water from the soil in the observed time span, we add the approximated water consumption from our first correlation to calculate the net soil humidity gain. For the third correlation, our approach is the same as for the irrigation. However, compared to the irrigation which can only be turned on or off and therefore always supplies the same amount of water over the course of 30 min, there are varying precipitation intensities that can be considered as an additional parameter. Additionally, it takes longer for the precipitation water to reach the sensor than for the drip irrigation which is installed in the soil, so the observed soil humidity increase is visible later in the sensor data. Although the derived model is fundamentally the same for all sensors, a single model with universally applicable parameters can not yet be constructed due to large deviations in the soil properties, the sensor placement and the distance to the applied irrigation system. Instead, improving the model parameters is an ongoing task and the accuracy can be increased by adding more data over multiple seasons. With the simplified model we can simulate the basic factors affecting the plant’s water balance and train an agent to preferably keep the soil humidity in a solid range.
8
6
M. Maszuhn et al.
Training As described in Sect. 5, reinforcement learning involves an agent and its environment which in our scenario is a simulation of the plant’s water balance. Reinforcement learning requires the definition of three fundamental components.
(1) The action space, i.e. the actions an agent can take to interact with the environment which in our scenario is a single boolean value to indicate whether the irrigation should be turned on Fig. 6. The graphic shows the average (1) or off (0) in the current time humidity increase over the course of one day interval. after an irrigation period of 30 min for a sin(2) The observation space, i.e. the gle sensor. It is noticeable that the water information the agent can takes around eight hours to fully reach the observe from the environment sensor causing a peak of 22% increased and use for its decisions which humidity. After that peak the humidity consists of the normalized curslowly drops again to an overall increase of rent soil humidity as calculated 15% for an average irrigation period. by our water balance model, the soil temperature and the forecasted precipitation for the next 6, 12 and 24 h respectively. (3) The reward function, which defines the states of the simulation that the agent should be rewarded or punished for. The possibility to define reward functions for various purposes is the largest benefit of the reinforcement learning approach for plant irrigation as it allows us to induce and combine different learning goals. For example, we could define a reward function where the agent is additionally rewarded based on the plant growth. However, for now we reward the agent for keeping the soil humidity in a range between 50% and 70% where we expect ideal growth conditions for the “Thuja occidentalis” and “Rhododendron Robert Seleger” plants. Higher or lower soil humidities are punished based on the deviation from the ideal range. Also the reward is slightly decreased each time the agent chooses to turn the irrigation on to encourage a preferably low water consumption. The training phase is divided in episodes, each simulates a plant for one week in 30 min time steps. To ensure a good generalization of the learned properties, we randomize the starting soil humidity and choose a random one week time frame from our weather data for each new episode. Instead of training one generally applicable agent for all plants, we train one agent per set of parameters in our water balance model as we notice large differences caused by different soil properties and the sensor placement. Each agent was trained for 500 episodes
Learning to Irrigate - A Model of the Plant Water Balance
9
with the “Deep-Q-Learning” method, where a neural network is used to approximate Q, which is the expected reward for an action taken in a given state.
7
Results
At the start of the training phase, the agent randomly selects to turn the irrigation on or off in each step, resulting in a very high soil humidity. This is because the random selection causes the irrigation to be turned on for 50% of the time on average, but in reality a nursery would irrigate far less than that. After 150-200 episodes the agent begins to learn to keep an interval of approximately one day between irrigations, but does not yet consider the weather forecasts. The agent’s reward increases as the irrigation is turned on less often and the soil humidity stays within the ideal range for most of the time. After 300-350 episodes we notice that the agent starts to plan the irrigation periods proactively by taking the expected rainfall and temperatures into account. Overall the agent is able to keep the soil humidity between 50% and 70% for 19 of the 23 plants over the course of one week. For the remaining four models analyzing the data helps us to detect errors such as a wrong placement of the irrigation system or connection issues with the gateway. We also use the expert knowledge from our project partners in the tree nurseries to validate that the learned irrigation can realistically applied to a tree nursery. However, the practical use requires a method to defer a planned irrigation period manually (e.g. to allow working on that field) and should respect local laws and the tree nurseries habits in terms of irrigation times. By including the weather forecast into the model and allowing the agent to plan its irrigation ahead, we expect the overall water consumption to decrease, because the agent turns off the irrigation in expectation of rainfall. Validating that decrease in water consumption will require the integration of the reinforcement learning system in a tree nursery over a season.
8
Conclusion
We presented a workflow to train an AI to propose irrigation measures in tree nurseries based on available soil sensor systems and weather forecasts. A plant water balance model was constructed as the first major part of this workflow to describe the relations between irrigation, precipitation, soil temperatures and the soil humidity. This model was then used in a simulation to train a reinforcement learning agent that was finally able to balance the soil humidity between the thresholds of 50% and 70% for 19 of the 23 simulated plant models. We also discussed and implemented ways to discover and deal with possible error sources in such sensor systems. Compared to classical or timer-based irrigation methods, we expect a system using reinforcement learning to be a first step to save excess irrigation water in tree nurseries. In future work we plan to expand our system by including the plant growth data to propose measures that can aim for different goals like maximizing the plant growth or keeping the plant alive while using minimal water resources. In
10
M. Maszuhn et al.
addition to the irrigation, we will aim for modeling the plant fertilization as we see a lot of potential to further improve the plant growth this way. Ultimately, the goal is to create a parameterized model that can be generalized to propose irrigation and fertilization measures for most plants in tree nurseries. However, such a generally applicable model requires us to collect more sensor data over multiple seasons. Acknowledgments. We would like to thank the “European Innovation Partnerships for Agricultural Productivity and Sustainability” (EIP-Agri) for financing this project and therefore giving us the chance to work on this important topic.
References 1. Valuing water: The United Nations world water development report, vol. 2021. UNESCO, Paris (2021) 2. Benyezza, H., Bouhedda, M., Rebouh, S.: Zoning irrigation smart system based on fuzzy control technology and IoT for water and energy saving. J. Clean. Prod. 302, 127001 (2021) 3. Caretta, M., et al.: Water. In: Roberts, D., et al. (eds.) Climate Change 2022: Impacts, Adaptation, and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [H.-O. P¨ ortner. Cambridge University Press (2022). (in Press) 4. Carrasquilla-Batista, A., Chacon-Rodrıguez, A.: Proposal of a fuzzy logic controller for the improvement of irrigation scheduling decision-making in greenhouse horticulture. In: 2017 1st Conference on PhD Research in Microelectronics and Electronics Latin America (PRIME-LA), pp. 1–4 (2017). https://doi.org/10.1109/ PRIME-LA.2017.7899172 5. Fereres, E., Goldhamer, D.A., Parsons, L.R.: Irrigation water management of horticultural crops. HortScience 38(5), 1036–1042 (2003). https://doi.org/10.21273/ HORTSCI.38.5.1036 6. Gao, L., Zhang, M., Chen, G.: An intelligent irrigation system based on wireless sensor network and fuzzy control. J. Netw. 8(5), 1080 (2013) 7. Mendes, W.R., Ara´ ujo, F.M.U., Dutta, R., Heeren, D.M.: Fuzzy control system for variable rate irrigation using remote sensing. Expert Syst. Appl. 124, 13–24 (2019) 8. Navarro-Hell´ın, H., Mart´ınez-del Rincon, J., Domingo-Miguel, R., Soto-Valles, F., Torres-S´ anchez, R.: A decision support system for managing irrigation in agriculture. Comput. Electron. Agric. 124, 121–131 (2016). https://doi.org/10.1016/j. compag.2016.04.003
Learning to Irrigate - A Model of the Plant Water Balance
11
9. Saccon, P.: Water for agriculture, irrigation management. Appl. Soil. Ecol. 123, 793–796 (2018). https://doi.org/10.1016/j.apsoil.2017.10.037 10. Turral, H., Burke, J.J., Faur`es, J.M.: Climate change, water and food security, FAO water reports, vol. 36. Food and Agriculture Organization of the United Nations, Rome (2011). http://www.fao.org/docrep/014/i2096e/i2096e.pdf 11. Velasco-Mu˜ noz, J.F., Aznar-S´ anchez, J.A., Batlles-delaFuente, A., Fidelibus, M.D.: Sustainable irrigation in agriculture: an analysis of global research. Water 11(9), 1758 (2019). https://doi.org/10.3390/w11091758
Free and Unfree Weed Classification in Young Palm Oil Crops Using Artificial Neural Network Sophie Thelma Marcus Jopony1 , Fadzil Ahmad1(B) , Muhammad Khusairi Osman1 , Mohaiyedin Idris1 , Saiful Zaimy Yahaya1 , Kamarulazhar Daud1 , Ahmad Puad Ismail1 , Anwar Hassan Ibrahim2 , and Zainal Hisham Che Soh1 1 Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, Pulau
Pinang, Malaysia [email protected] 2 Department of Electrical Engineering, College of Engineering, Qassim University, Buraydah, Saudi Arabia
Abstract. An automatic classification of whether a circle area of a young palm oil crop is free or unfree from unwanted weed can be a crucial step towards improving the growth management of a palm oil plantation. However, there is still less study that utilizes artificial intelligence techniques to address this issue. Most previous studies focused on leaves diseases, ripeness of the fruit bunch of the palm oil and counting of the palm oil crop instead of the groundcover management of the palm circle. Hence, this study proposes the development of an automatic and intelligent technique to classify the condition of a young palm oil crop which is based on the condition of ground cover management. Images of the different young palm oil conditions where the palm circles must be visible in the image acted as the datasets for this system. Local Binary Pattern is implemented as the feature extraction method and the classification result is compared with and without feature selection technique. ReliefF was chosen as the feature selection technique for this system. The developed ANN model with the implementation of feature selection technique produced the highest accuracy of 92.9% of correct classification. Consequently, this system is suitable for the palm oil industries to monitor the health condition of the palm oil plant. Keywords: Weed Detection · Artificial Neural Network · Palm Oil Plantation · Feature Selection · Feature Extraction
1 Introduction Major agricultural activities in Malaysia vary from palm oil, paddy plant and rubber plantations. It has become one of the main sources for Malaysia’s economic growth. For the past few years, palm oil industry has become one of the major focuses by the Malaysian Government due to the increase in the world demand for palm oil products. Various efforts and initiatives have been introduced to boost up the palm oil industry. According to the Malaysian Palm Oil Council (MPOC), one of biggest manufactures © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 12–20, 2023. https://doi.org/10.1007/978-3-031-43520-1_2
Free and Unfree Weed Classification in Young Palm Oil Crops
13
and exporters of palm oil in the world is Malaysia. Currently, Malaysia attributes to 28% of world palm oil production and 33% of world exports. Despite that, Malaysia holds an important part to meet the global needs for oils and fats sustainably. To maintain high yields and the quality palm oil products to compete with neighboring countries, the ground managements, and the implementation of related technological advancement at the plantation level are crucial. The growth management of a young palm oil plant that started from a poly bag then transferred to the estate is crucial. A young palm oil plant has the characteristics of small structure, slow growth rate and the leaves of the palm oil will grow closely towards the surface of the soil. This will cause the young palm oil plant being easily dominated by the unwanted weeds and small plants that grows around the young palm oil. The presence of weeds towards the young palm oil plant will obstruct the growth of the palm oil plant through nutrients, light and moisture [1]. Young palm oil is more prone to plant diseases and easily being dominated by weeds. As a result, the young palm oil plant may lose some nutrients to grow healthy, broken branches, slower rate of growth and also results in the death of the young palm oil plant. Having the necessary information of the palm oil within the estate is important to monitor the yield for the season. Nutrient’s deficiency in oil palm leaves is also vital to be detected to know the quality of the fruit yield [2]. Hence, it is important for the palm circles to be clear from any unwanted weeds and obstruction. Hence, it is important to always monitor the growth condition of the young palm oil plants and ensure the young palm oil is free from any unwanted weeds. However, the intelligent techniques of monitoring and classifying young palm oil for ground cover management have not been a major focused or widely explored by previous researchers. The use of Artificial Intelligence technology in addressing this issue can offer more advantages by providing more accurate analysis and at the same time reduce the inaccurate detection by human [3]. Moreover, manual inspection of the condition of the palm oil farm is very tedious [4]. This may be due to the geographical area of the palm oil farm where it is difficult to access the deeper parts of the farm. Flying a drone towards the deeper parts of the palm oil farm and captures the images of the palm oil condition might help resolve this issue. However, detail study on the capability and the accuracy of the image processing and AI technology in performing this task is required. Several research works that apply AI technology in palm oil plantations and industries have been reported in recent years. To the best knowledge of the authors, none of the previous study focused on classifying the condition of the young palm oil to be either healthy or unhealthy based on the ground cover management of the palm oil. To implement AI technology to discriminate between the healthy palm oil without unwanted weeds and the unhealthy ones being dominated by weeds is a very challenging task. This is due to a few factors such as color of palm oil leaves and weeds is almost similar, surrounding background, brightness of image and many more. Artificial Neural Network (ANN) has been widely used throughout the studies of agricultural activities or plant yields such as oil palm nutrients [2], corn yield [5], plants classification [6] and more. The implementation of ANN into an intelligent classification and identification system shows a good success rate in terms of accuracy. For example, in [7], ANN shows the best accuracy which overtook Probability Neural Network (PNN) at an accuracy of
14
S. T. M. Jopony et al.
97.33%. Furthermore, ANN has been demonstrated to be an effective classifier compared to k-nearest neighbor in classifying plant leaf [8]. A comprehensive review of weed detection technique based on computer vision that involve the implementation of the ANN can be observed in [9].
2 Methodology Pattern recognition is one of the important fields of study in research area related to AI technique. It aims to provide acceptable answers corresponding to all possible sets of input pattern. Classification is a typical example of pattern recognition. It is a machine or computer-based process of assigning information, objects, or, in general, input patterns into a given set of classes. In the context of this study, it is to detect or determine whether a circle area of a palm oil being dominated by unwanted weed or not. Figure 1 shows the block diagram of the system to be implemented and it represents the general steps required for developing an intelligent classification system. The system will start off with the input data and then proceed with the pre-processing process which is the feature extraction. Once the feature extraction is carried out, feature selection will be implemented before proceeding with the classifier design. The decision making will act as the last step of the system and produce the output.
Input Data
PreProcessing (Feature Extraction)
Feature Selection
Classifier Design (ANN)
Output Data
Fig. 1. Block diagram of an intelligent classification system.
Figure 2 illustrates the flow of the system in a more detailed manner. The process of classifying healthy and unhealthy young palm oil images using ANN begins with gathering images of healthy and unhealthy young palm oil. These images will act as the input data for the automatic system to process and the camera of a smartphone is used as the device to capture the images. Next, the raw image data will undergo preprocessing process where the image will carry out feature extractions for the features to be extracted and analyzed. Feature selection is the next process which aims to choose the best and suitable features subset to be used to develop neural network model. Next, the dataset will be split into training and testing dataset. The training dataset will be used during training process to develop the model while the testing dataset will be used during the testing phase to test the accuracy of the model. The classification accuracy results obtained with and without feature selection will be compared. At the end, the ANN will classify the healthy and unhealthy young palm oil and the performance of the system will be evaluated in terms of percentage accuracy and confusion matrix.
Free and Unfree Weed Classification in Young Palm Oil Crops Input Image
Start
Pre-Processing (Feature Extraction)
Feature Selection
Train Dataset
ANN Training
Stopping condition satisfy?
No
15
Test Dataset
ANN Model
Unfree Weed
Free Weed
Calculate Accuracy
Yes
End
Fig. 2. Flow Chart of the ANN Classification System.
2.1 Data Collection The first part of the image data collections was taken on February 2020 at Sapi Plantation and Kinabalu Plantations in Telupid Sabah. The second part of the data collections were obtained from a small holders’ plantations in Kedah, Malaysia. The age range of the young palm oil are around 1 to 4 year which is the crucial age range of the development of the palm oil plant. Based on Table 1, a total of 46 healthy and 50 unhealthy young palm oil images were obtained respectively. The images were taken using a mobile phone. The images of young palm oil were obtained for both healthy and unhealthy palm oil. A palm oil plant with a good ground cover management where it is not surrounded by unwanted weeds is considered to be healthy as in Fig. 3(a). Palm oil is considered to be unhealthy when uncontrollable weeds took over the palm circle as in Fig. 3(b) and (c).
(a)
(b)
(c)
Fig. 3. Young Palm Oil without (a) and with (b and c) Weed in the Circle Area.
16
S. T. M. Jopony et al. Table 1. The Number of Healthy and Unhealthy Palm Oil Plant Images. Condition of Palm oil Plant
Number of Palm Oil Images
Healthy (Free weed)
46
Unhealthy (Unfree weed)
50
Total
96
2.2 Feature Attraction and Selection Feature Extraction is carried out to obtain the relevant information from the values of pixel of the images [10]. The feature extraction that was used in the pre-processing process of this system is the Local Binary Pattern (LBP). LBP is simple to be implemented and yet it is very effective texture operator that labels the pixels of an image by thresholding a 3 × 3 neighborhood of each pixel with the center value [11]. Because of its discriminative property and simple computational effort, LBP texture operator has become a famous approach in many applications. Meanwhile, ReliefF feature selection technique is applied for the feature selection process in this project. ReliefF is the modified version of the Relief algorithm. ReliefF overcomes the limitation of Relief that it can only deal with two class classification problem. Besides, ReliefF is more robust and able to work with noisy and incomplete data. All family of Relief algorithm (Relief, ReliefF and RReliefF) differs from the heuristic measures as they do not estimate the quality of the attributes. Therefore, they are effective, aware of the contextual information, and can precisely obtain the quality of attributes of problems to be addressed [12]. ReliefF is utilized in this work to improve the classification accuracy of the ANN model where redundant features will be removed. This can help to improve training time and output accuracy. The Relief algorithm framework is can be found in Fig. 4 [13].
Fig. 4. Relief Algorithm Pseudocode [13].
Free and Unfree Weed Classification in Young Palm Oil Crops
17
2.3 Neural Network The neural network structure that was chosen to model the system is the multilayer perceptron feedforward artificial neural network. The feedforward neural network was considered to be a straightforward measure for the ANN where data or information are transmitted in one direction started from input and propagated to hidden and output layer. There is no backward connection between nodes. This study uses one of the most common and famous network architectures which is the Multilayer Perceptron (MLP) for the classification process. MLP is known to have the capabilities to resolve classification problems, function approximation problems and pattern recognition [14]. MLP consists of input layer, hidden layer and output layer. The MLP training algorithm that was implemented in this study is the LevenbergMarquardt. According to [15], this algorithm is considered as a reliable algorithm in terms of classification accuracy among other training algorithms as the result obtained with Levenberg-Marquardt outperformed the other training algorithms. 2.4 Experimental Setup The handphone camera that is used for this research was an iPhone SE with a 12megapixels rear camera. The smartphone was used to capture the image of healthy and unhealthy palm oil plant. Next, MATLAB R2017a was chosen as the software to be used. The pre-processing stage, features extractions and feature selection were carried out in the MATLAB R2017a. Moreover, MATLAB R2017a is chosen to be the platform because MATLAB offer specialized toolbox for working with machine learning, neural networks, deep learning and many more. Hence, this makes it easier to develop neural networks for tasks such as classification. Table 2 shows the ANN design parameters that were used to execute this system. The fully connected feed-forward network that was implemented for this model is the Multilayer Perceptron (MLP) and the training algorithm for the MLP is the LevenbergMarquardt. This system has one hidden layer apart from the general input and output layer due to the application of MLP. Based on Table 2, this ANN was designed to have 10, 20 and 2 input, hidden and output nodes respectively. During the classification process, the dataset was separated into training and testing data. The training dataset is further divided into training and validation dataset. The validation dataset is used to stop the training process to avoid the network to overfit the data. The fitness of the ANN model is tested through the result of the testing data classification.
3 Result and Discussion The results obtained were evaluated and analyzed based on the classification accuracy from the trained ANN. These results had undergone feature extraction where 10 variables were extracted during the pre-processing process and these variables acted as the initial layers of the ANN model. For the feature selection analyzation, 8 were selected among the 10 variables to be fed into the model where the variables with the most redundant features were removed. A comparison with and without feature selection techniques was done to discuss the differences in the classification accuracy obtained.
18
S. T. M. Jopony et al. Table 2. ANN Design Parameters ANN Design
Parameters/Values
Training Algorithms
Levenberg-Marquardt
Input nodes
10 (8 with Feature Selection)
Hidden nodes
20
Output nodes
2
Training Stopping Criterion
6 consecutive validations fail
(a)
(b)
Fig. 5. Testing Confusion Matrix without Feature Selection (a) and with Feature Selection (b) Technique
Figure 5 shows the testing result of confusion matrix for the classification of the ANN with and without feature selection technique. 85.4% out of the total dataset were allocated for training and validation phase while the remaining 14 data samples were assigned as the testing samples. The default values for both training and testing samples were automatically set by MATLAB. According to Fig. 5 (a), 6 healthy and 5 unhealthy palm oil were classified correctly without selection technique. The unhealthy palm oil achieved 100% of correct classification but 3 healthy palm oil were wrongly classified as unhealthy in this process. Next, 5 healthy and 8 unhealthy palm oil were classified correctly with feature selection technique in Fig. 5 (b). All the healthy palm oil was correctly classified while only 1 unhealthy palm oil was wrongly classified as a healthy palm oil. In accordance with Table 3, there is a slight difference in accuracy percentage for different executions of the model. This could be due to different initial weight obtained for each different execution. It can be observed that, without feature selection the model of the ANN has achieved the highest correct classification at 78.6%. However, there is an increase in classification accuracy with the implementation of feature selection technique. The highest classification accuracy percentage obtained is 92.9% with feature
Free and Unfree Weed Classification in Young Palm Oil Crops
19
selection. This shows that the feature selection technique can improve the classification accuracy percentage. Table 3. The lowest, Highest and Average of the classification accuracy of the ANN in 10 executions with and without feature selection technique Classification Accuracy of ANN (%) Runs No
Without Feature Selection
With Feature Selection
1
57.1
85.7
2
71.4
64.3
3
78.6
78.6
4
71.4
64.3
5
78.6
71.4
6
64.3
71.4
7
71.4
71.4
8
71.4
71.4
9
57.1
92.9
10
64.3
78.6
Average
68.6
75.0
With feature selection technique, it not only improves the highest accuracy, but it also shows the improvement in the average accuracy. From the total 10 executions of the model, an average of 75% accuracy is achieved by implementing the feature selection. This is a 6.4% increment in the accuracy when compared to without feature selection. By eliminating less important attribute, the dataset complexity is reduced. The training process able to produce an ANN model with better performance.
4 Conclusion The classification of healthy and unhealthy palm oil farm images has been successfully implemented with the Artificial Neural Network model. The results of classifications gave a very positive results in determining whether the palm oil farm is either healthy or unhealthy based on the present of unwanted weeds with an accuracy of 92.9% in which the feature selection, Relief is applied. Furthermore, the application without feature selection is still considered to give positive results at a classification accuracy of 78.6%. As a conclusion, this ANN model is suitable to be used to classify healthy and unhealthy palm oil farm images with the implementation of feature selection technique to ensure a better result of classification. However, further analysis and future work are still required to improve the model if it is necessary. For example, more feature extraction techniques can be implemented to provide more details of data and more datasets can be prepared in the future to enhance the ANN model.
20
S. T. M. Jopony et al.
Acknowledgments. The authors would like to thank Universiti Teknologi MARA Cawangan Pulau Pinang for proving financial support and sufficient facilities to conduct this research.
References 1. Thongjua, J., Thongjua, T.: Effect of herbicides on weed control and plant growth in immature oil palm in the wet season Nakhon Si Thammarat, Thailand. Int. J. Agric. Technol. 12(7.1), 1385−1396 (2016) 2. Jayaselan, H.A.J., Nawi, N.M., Ismail, W.I.W., Mehdizadeh, S.A.: Application of artificial neural network classification to determine nutrient content in oil palm leaves. Appl. Eng. Agric. 34(3), 497−504 (2018). https://doi.org/10.13031/aea.12403 3. Aji, A.F., Munajat, Q., Pratama, A.P., Kalamullah, H., Setiyawan, J., Arymurthy, A.M.: Detection of palm oil leaf disease with image processing and neural network classification on mobile device. Int. J. Comput. Theory Eng. 5(3), 528 (2013) 4. Fadilah, N., Mohamad-Saleh, J., Halim, Z.A., Ibrahim, H., Ali, S.S.S.: Intelligent color vision system for ripeness classification of oil palm fresh fruit bunch. Sensors (Switzerland) 12(10), 14179−14195 (2012). https://doi.org/10.3390/s121014179 5. Kross, A., et al.: Using artificial neural networks and remotely sensed data to evaluate the relative importance of variables for prediction of within-field corn and soybean yields. Remote Sens. (Basel). 12(14), 2230 (2020) 6. Azlah, M.A.F., Chua, L.S., Rahmad, F.R., Abdullah, F.I., Wan Alwi, S.R.: Review on techniques for plant leaf classification and recognition. Computers 8(4), 77 (2019) 7. Vardhan, J.V., Kaur, K., Kumar, U.: Plant recognition using HOG and artificial neural network. Int. J. Recent Innov. Trends Comput. Commun. 5, 746–750 (2017) 8. Sharma, P., Aggarwal, A., Gupta, A., Garg, A. Leaf identification using HOG, KNN, and neural networks. In: Bhattacharyya, S., Hassanien, A., Gupta, D., Khanna, A., Pan, I. (eds.) International Conference on Innovative Computing and Communications. Lecture Notes in Networks and Systems Volume 2, vol. 56, pp. 83–91. Springer, Singapore (2019). https://doi. org/10.1007/978-981-13-2354-6_10 9. Wu, Z., Chen, Y., Zhao, B., Kang, X., Ding, Y.: Review of weed detection methods based on computer vision. Sensors. 21, 3647 (2021) 10. Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., Saeed, J.: A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J. Appl. Sci. Technol. Trends. 1, 56–70 (2020) 11. Shan, C.: Learning local binary patterns for gender classification on real-world face images. Pattern Recognit. Lett. 33, 431–437 (2012) 12. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003) 13. Urbanowicz, R.J., Meeker, M., la Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018) 14. Faris, H., Aljarah, I., Mirjalili, S.: Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl. Intell. 45, 322–332 (2016) 15. Du, Y.-C., Stephanus, A.: Levenberg-Marquardt neural network algorithm for degree of arteriovenous fistula stenosis classification using a dual optical photoplethysmography sensor. Sensors 18, 2322 (2018)
A Method for Bengali Author Detection Using State of the Arts Supervised Machine Learning Classifiers Md. Abdul Hamid(B) , Nusrat Jahan Marjana, Eteka Sultana Tumpa, Md. Rafidul Hasan Khan, Umme Sanzida Afroz, and Md. Sadekur Rahman Daffodil International University, Dhaka, Bangladesh {abdul15-12387,nusrat15-12999,eteka15-12121, umme15-11505}@diu.edu.bd
Abstract. Text classification is an important topic of study in the area of natural language Processing. To identify the authorship of the provided Bangla text, we create a model using the State of Arts Supervised method. Because our work is a multi-class categorization, we may use it to determine who wrote articles, news, emails, or messages. It can also use to find ghostwriters, identify anonymous authors, and detect plagiarism. This article focuses on the categorization of five Bengali authors. They are well-known writers in Bengali literature and poetry. Humayun Ahmed, Rabindranath Tagore, Muhammad Zafar Iqbal, Kazi Nazrul Islam, and Sarat Chandra Chattopadhyay are the five writers. Data were manually collected from various sources in the novels or books of these five writers, and we contained over 4500 paragraphs. A completely new dataset is created for the experimental evaluation. We preprocess Bengali text for training reasons. Logistic regression, naive Bayes, decision trees, support vector machines, random forests, XG-Boost, and K-nearest neighbor are among the seven classification methods employed. In our experiment, the Support Vector Machine produces the best experimental classification report. Support vector machine gives 82% model accuracy. Keywords: Multi Class-Classification · Author-Identification · State-of-art · Supervised Model
1 Introduction Even though Bengali is one of the most commonly spoken languages in the world, there is currently a lack of study on Bengali author detection [1]. Other popular languages have done enough study in the area of author attribution, which is why author detection in the Bengali language is significant. People are now more involved with the internet than ever before, thanks to electronic media. Individuals express their opinions on social media, blogs, forums, and other platforms, but the problem is that occasionally people duplicate others’ written poetry or novel’s specific line that is enjoyable or appealing, and claim it as their own. The identification of authorship is a computer method that © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 21–33, 2023. https://doi.org/10.1007/978-3-031-43520-1_3
22
Md. Abdul Hamid et al.
deals with the author’s evidence in a specific text [2]. In today’s internet era, there is a lot of content or genres of writing available every day. Author identification is the process of attributing an unknown work to a writer based on some distinguishing attribute or meter. It may be employed when numerous persons claim to have produced a work, or when no one is able or willing to identify the true author. So, Stylometry [3] is commonly used to detect theft, but it is also used in criminal investigations concerning literary authorship or forensic linguistics, such as discovering the authors of anonymously published books for the police. Stylometry is the study of various language styles as well as personal writing preferences. Two legal uses of these words are forensic [4] linguistics and the identification of authentic confessions. In recent years, it has been effectively used in a broader range of disciplines, including forums, email, blogs, chat, and other digital techniques, as well as music and fine-art paintings. Binongo and Smith, Holmes et al., and Burrow [5] originally employed multivariate analysis (MVA) and principal components analysis (PCA) on a few function words for authorship identification to address various authorship difficulties. Stamatatos [6] uses support vector machines (SVM) on the character level of n-grams to categorize English and Arabic news as a corpus. Koppel et al [7] report on the process of authorship attribution for thousands of selected potential authors. Author identification refers to the effort of identifying the author of each publication that is examined. This paper is about a classification issue, and it describes how to get the greatest classification accuracy easily by using Supervised machine learning models. The data was collected in the 4500+ range, and it was modeled so that it could be easily used as a train and test. Data is maintained as taught 80% of the time and as tested 20% of the time. The model was primarily taught as a train, and its accuracy was tested, so it’s evident how much of the 4500+ data it can detect properly. Logistic regression, naive Bayes, decision tree, support vector machine (SVM), random forest, K-nearest neighbor (KNN), and XG-Boost are the seven supervised machine learning techniques. These seven algorithm categories, which have simplified author identification, have shown the major source of accuracy.
2 Literature Review The authors of this paper [8] 2019 have presented an autonomous author identification strategy based on the analysis of personality profiles and Internet Relay Chat (IRC) messages. Firstly, they use their autonomous bots to watch IRC channels, then they construct personality profiles of each targeted author. This research article outlines its goals, which include developing techniques for author identification in cyberspace, the capacity to group anonymous messages that obstruct those produced by the same writers as well as known cyber criminals, and more. They [9] utilized the Automatic Author Identification and Characterization (AAIC) framework design for the IRC environment. In the experiment of this paper, they observed that KNN and SVM algorithms can provide an acceptable amount of accuracy in author identification. When the number of unique class items or the number of unique authors items is low compared to k-NN, SVM produces much higher accuracy. The authors of this article [10] 2017 attempt to illustrate a system of authorship attribution for blog posts written in Bengali. They were creating a brand-new dataset
A Method for Bengali Author Detection Using State of the Arts Supervised Machine
23
that consists of 3000 sample texts. Some works such as [2], and [3] have been done for the Bengali language. They classified the dataset using four classifiers, including a machine learning (ML) strategy. They get as high results as 99% for this system for character bigrams (tf, tfidf), trigrams (tf, tf-idf), and unigrams (tf, tf-idf). Additionally, it has been noted that MLP performs well with large datasets. In this article [11], 2012 the authors proposed a technique for calculating a standard Z score that can define a specific vocabulary found in a text rather than an entire corpus Several works such as [6, 12–14] and [4] have been done for the Authorship attribution. Their approach mainly depends on the differences between the expected and observed sound occurrence frequencies between two isolated subsets. In this research paper [15], 2011 the authors suggest ways to assign Latent Dirichlet Allocation (LDA) to the author’s attribution. Some related papers [16]. They studied the attribution of authorship with the authors of a few candidates and introduced a new method which in the latter case achieves sophisticated performance. Here, three datasets: Judgment, IMDb62, and Blog are used. A notable result is that LDAH-S provides high accuracy even when only a few subjects are used while LDAH-M requires about 50 subjects to surpass LDAH-S. The authors [17] 2007 introduced new distance measurements for the Common NGrams (CNG) method for author identification. In comparison to conventional CNG, the suggested method offers a more stable solution for large values of profile length. This study had all the tests performed character based on 3-g. In that situation, both the classification results and the length of the shortest (and longest) profile have greatly risen. In this paper [9], 2005 the authors have proposed a framework for identifying the authorship of online messages. They ran tests using online newsgroup messages in English and Chinese to test this architecture. The three classification techniques—decision trees (C4.5), back-propagation neural networks (NN), and support vector machines (SVM) were examined, as well as the relative strengths of the four different categories of attributes. On English and Chinese datasets, those Three Classifiers achieved 90 to 97% and 72 to 88% accuracy, respectively. This study [18] 2003 has presented an autonomous author detection for Turkish texts, that uses a new classification technique invented using known methods and compares it with known techniques. In the beginning, 22 style identifiers were removed and their success rates were calculated using equal weights. Artificial neural networks have a success rate of 60% when using MLP and a success rate of 72% when utilizing radial base function. Phase 2 saw a success rate increase to 78% when 11 of the 22 style markers were chosen with equal weights. However, radial base function success was 61%, whereas MLP success was 60%. This study [19] investigates how difficult it is to discern who composed brief historical Arabic writings by ten different authors. They can extract several lexical and character components of each author’s writing style using N-gram word levels and character levels as a text representation. The Naive Bayes classifier is then used to categorize the texts according to their authors. AAAT is a dataset that comprises three condensed texts for each book authored by a certain author. Twenty texts are used for training, while ten are used for testing
24
Md. Abdul Hamid et al.
Their study [20] presents a methodology based on the dynamics of word cooccurrence networks, which represent eight authors and their 80 texts. Following the creation of time series for 12 topological metrics, the texts were divided into sections with equal linguistic tokens. They use K-nearest neighbors and successfully predict 71 of 80 paragraphs (88.75% accuracy), which is remarkable but their data is very less amount. Previously, most authors worked in a separate language for author detection, with only a handful working in Bengali. Though several of the writers worked on author detection in Bengali, their data was restricted, and only a few classification models were employed (Table 1). Table 1. Published work summary based on author detection Ref
Year
Contribution
[21]
2015
[22]
Dataset
Model
Results and Finding
A new method Own dataset introduced for authorship attribution based on function WANs
WAN’s, Naïve Bayes, 1-NN, 3-NN, DT-gdi, DTce, SVM
Naïve bayes error rate 10.8% and Support Vector Machine error rate 11.5%
2017
Data was gathered Own dataset through social media and Internet communication channels. A methodology for real-time analysis that has been explained can find information about threats being traded on IRC by hackers
Stanford Detect threat CoreNLP, information RNTN model connected to shadow brokers leaking vulnerabilities 28 days before the WannaCry ransomware assault
[23]
2020
Data set obtained with the Twitter API, Experiment with a Multi-label classifier
Naive Bayes classifier
Kappa value of 67%, and balanced accuracy of 76%
[24]
2021
One training set and collected from an one validation set English-written with 11200 and Q&A forum 2400 problems each made up the PAN’21 SCD challenge’s data set
Hybrid algorithm
task one and two with F1-scores of 86% for task one and 78% for task two on the validation set
Collected from Twitter API,
(continued)
A Method for Bengali Author Detection Using State of the Arts Supervised Machine
25
Table 1. (continued) Ref
Year
Contribution
Dataset
[25]
2021
Make the DH Online Dataset (Digital (Kaggle) Humanities) community more aware of the advantages of using deep learning models
Model
Results and Finding
DNN
Analyzing multiple use-cases of DH studies in recent literature and their possible solutions and lays out a practical decision model for DH experts for when and how to choose the appropriate deep learning approaches for their research
3 Methodology We have divided our methodology section into 4 parts and shown how our goal is to complete the task. Figure 1 illustrates the workflow followed in this paper.
Fig. 1. Workflow
26
Md. Abdul Hamid et al.
Figure 1 is our workflow. The methodology section is divided into 4 subsections, and they are as follows: a description of the data, preprocessing, a brief discussion about the classifiers that we applied, and the model evaluation. The extended description of all subsections is given below. 3.1 Description of the Data We collect data from various sources and we specifically collect novels from 5 authors, they are Humayun Ahmed, Rabindranath Tagore, Muhammad Zafar Iqbal, Kazi Nazrul Islam, and Sarat Chandra Chattopadhyay. Those are from a variety of sources, such as the Internet and books. The data we have collected has more than 4500 paragraphs. We add two properties to the XLS sheet. The author’s paragraph is one, and the author’s name is the other. We attempted to collect paragraphs of the same length for every paragraph. 3.2 Preprocessing and Apply Algorithms The most crucial aspect of NLP is preprocessing the dataset so that it can be fit into the model. Data cleaning is an important aspect of the preprocessing procedure; we clean our data using a variety of cleaning methods. We finish our preprocessing in a few stages. These stages are outlined below. We dealt with null values and duplicate data in the preliminary stage by eliminating rows and deleting duplicate rows. There are a lot of needless characters in the text. For example, “[”, “/”, “@”, “|”, “)”, and so on. These extraneous characters are not appropriate for our dataset; they must be deleted from the text in order to obtain a clean dataset, which we accomplish (Table 2). In the next process, we have to deal with stop words because they are creating some problems while we train our various models. Stop words such as “অই”, “অতএব”, “অবিধ”, “হেল” etc. We create a stop words corpus and with the help of the corpus, we remove these stops words (Table 3). Table 2. Removing Unnecessary Letter
Text
After Removing Unnecessary Text না
?’
না
Stemming is known for getting root words. In the field of Bengali NLP, it plays an important role. We apply various appropriate stemming rules for our works and get root words and ready our dataset. In the concluding state of preprocessing, we apply TF-IDF Vectorizer. As the model can’t read the word but applies TF-IDF Vectorizer, we give each word some weights depending on frequency. The TF-IDF Vectorizer formula TF =
Number of times the term appears in a document Total number of words in the document
A Method for Bengali Author Detection Using State of the Arts Supervised Machine
27
Table. 3. Removing Stop Words
Text
After Removing stops words
অতএব এখন কী কের আমার িদন কাটেচ
এখন কী কের আমার িদন কাটেচ
IDF = log(
Number of the documents in the corpus ) Number of the documents int the corpus contain the term TF − IDF = TF ∗ IDF
We split our dataset into 2 parts, 80% of the data we use for training purposes remaining 20% for training purposes. For our working environment, we choose Google Colab. To complete our objective, we make use of many built-in libraries. For example, the Panda library for preprocessing, Matplotlib, Seaborn, Nltk, and others. Scikit-learn (Sklearn) is Python’s greatest usable and powerful machine-learning package. It offers a variety of effective machine learning and statistical modeling techniques, especially classification models. We utilize Sklearn to create our categorization models. 3.3 Discussion About Our Classifiers Our method simply guesses the author based on a document paragraph. To complete our goal, we employ supervised learning strategies, which necessitate the usage of classification algorithms. Based on our dataset, classification algorithms will predict category labels from new observation data. These methods work by training the model, which requires the use of training data. The primary purpose of classification algorithms is to detect existing categories or labels in a dataset. We’ll go through a couple of classification techniques below. In the classification field, Naive Bayes classifiers can outperform more powerful alternatives, especially for small data sets [26]. It is founded on the Bayes theorem and the word Naive, and its data are mutually independent. Naive Bayes, according to the dataset, is a linear classifier that is typically robust, easy to use, and works rapidly with enhanced accuracy [27]. The naive Bayes model performs effectively when the dataset is predefined and labeled. The naive Bayes formula is shown below. P( y )P(x) x P( ) = x y P(y)
28
Md. Abdul Hamid et al.
P(x/y) is the hypothesis’s probability, while P(y/x) is proof that the hypothesis is correct. P(x) is the prior probability, whereas P(y) is the marginal probability. The primary concept of the Support Vector Machine is that it is based on statistical learning theory [28, 29]. Support vectors are locations that intersect the marginal plane formed parallel to the found hyperplane. Fig 2 illustrates the SVM working with hyperplane concept.
Fig. 2. SVM
Because it does not forecast but memorizes the process, K-Nearest Neighbors (KNN) is sometimes known as a lazy type algorithm. It uses Euclidean distances to classify new points based on their commonalities. The Euclidean distance between a test sample and the required training samples is widely used to train the k-nearest-neighbor classifier [30]. KNN is commonly utilized because of its clear explanation and quick computation time. The formula for Euclidean Distance is given below. Euclidean Distance = (x2 − x1 ) + (y2 − y1 ) The decision tree structure, also known as the tree-base structure, is the tree-structure classifier [31]. The properties of the dataset are represented as the tree’s inner nodes, the branches as decision-making capability, and the leaf nodes as the outcome. Random Forest is built on the concept of ensemble learning [32]. When several classifiers solve a hard problem and increase the performance of a model, we refer to the process as an ensemble. The Random Forest Classifier makes several decisions on subsets of other datasets and averages them to improve the projected accuracy of other datasets. Logistic regression is one of the most widely used machine-learning techniques. It is used to forecast how a set of labeled structural and independent variables will be applied [33]. The Logistic Regression method ends with the result of a classified conditional variable. As a result, the outcome should have a separate or classified value. It might be binary, like yes or no, 0 or 1. Instead of delivering potential outcomes between 0 and 1
A Method for Bengali Author Detection Using State of the Arts Supervised Machine
29
when given exact, correct binary integers, it provides potential outcomes between 0 and 1. The formula for the sigmoid function is provided below. S(x) =
1 1 + e−x
The XG-Boost is an execution of Gradient Boosted decision trees. Here, decision trees are created in the following sequential order. By providing weights in each independent variable to the fed decision tree it predicts the results. It finds out the best result among them. 3.4 Model Evaluation The confusion matrix is generally used in the evaluation of multi-class or single-label classification models to evaluate performance [34]. Sensitivity, specificity, accuracy, and the area under the ROC curve are all common discrimination measurements (or, equivalently, the c-index). There are statistical tests for all these criteria to assess if one model outperforms another in discrimination ability [35]. ROC curves are a measure of a classifier’s prediction quality that compares and visualizes the tradeoff between the model’s sensitivity and specificity. We evaluate our models through different evaluation metrics, such as precision, recall, F1 score, and model accuracy. These formulas are given below, Accuracy =
TP + TN × 100% TP + FN + FP + TN
Recall = Precision = F1Score = 2 ×
TP × 100 TP + FN TP × 100% TP + FP
Precision × Recall × 100% Precision + Recall
4 Result and Discussion This work is mainly a multiclass problem. Humayun Ahmed (H.A), Rabindranath Tagore (R.T), Muhammad Zafar Iqbal (M.J.I), Kazi Nazrul Islam (K.N.I), and Sarat Chandra Chattopadhyay (S.C) are the five class. Hence, it produced a 5 × 5 confusion matrix as we work on 5 authors. To Evaluate the 7 classifier models, different performance metrics uses such as model accuracy, precision, recall and F1-score. These performance metrics measurement shows in Table 4. From the 7 classifiers model, SVM gives best overall model accuracy. SVM models with the R.T., M.J.I, and K.N.I classes provide the greatest precision, recall, and f1 score. In Logistic Regression models, however, the H.A. and S.C classes provide the greatest precision, recall, and f1 score. Table 4 contains the results.
30
Md. Abdul Hamid et al. Table 4. Classification report and accuracy
Classifiers
Class
Precision
Recall
F1-Score
Accuracy
Naive Bayes
H.A R.T M.J.I K.N.I S.C
0.67 0.00 0.00 0.91 0.73
0.87 0.00 0.00 0.07 0.97
0.75 0.00 0.00 0.13 0.84
71%
Logistic Regression
H.A R.T M.J.I K.N.I S.C
0.77 0.86 0.97 0.79 0.84
0.90 0.26 0.37 0.58 0.92
0.83 0.40 0.53 0.67 0.88
81%
KNN
H.A R.T M.J.I K.N.I S.C
0.58 0.62 0.79 0.74 0.81
0.89 0.25 0.34 0.19 0.70
0.70 0.35 0.48 0.30 0.75
67%
Random Forest
H.A R.T M.J.I K.N.I S.C
0.73 0.85 0.95 0.84 0.79
0.87 0.15 0.21 0.42 0.95
0.80 0.26 0.35 0.56 0.86
77%
Decision Tree
H.A R.T M.J.I K.N.I S.C
0.64 0.33 0.52 0.59 0.71
0.66 0.29 0.32 0.56 0.76
0.65 0.31 0.40 0.57 0.73
64%
SVM
H.A R.T M.J.I K.N.I S.C
0.78 0.80 0.96 0.86 0.85
0.91 0.38 0.48 0.61 0.91
0.84 0.52 0.64 0.71 0.88
82%
XG-Boost
H.A R.T M.J.I K.N.I S.C
0.72 0.68 0.95 0.78 0.81
0.86 0.32 0.40 0.50 0.88
0.79 0.43 0.56 0.61 0.84
77%
The ROC curve illustrates the true positive rate on the Y axis and the false positive rate on the X axis. As we know, the higher the AUC, the better the model generally. If we observe fig 3, all our classes’ AUC results are outstanding and this result only based on Logistic Regression model.
A Method for Bengali Author Detection Using State of the Arts Supervised Machine
31
Fig. 3. ROC Curve and AUC score of five class.
5 Conclusion Author detection is a feature that makes it simple to figure out which article belongs to which author. From the seven classification reports, the best result was found with the support vector machine, which gives 82% accuracy with 85% precision. The practical implementation of our work may be incorporated on the backend of social media, forums, blogs, or websites to identify the original author or detect plagiarism, among other things. Author detection is a procedure that makes it simple to identify dishonest persons who try to pass off someone else’s material as their own, which is illegal. This is something that everyone should be more aware of. In the future, more data might be used to improve the accuracy of this author recognition, making the work of author detection easier. For future work, two things may be done to increase the data to get higher accuracy and those are BERT and XLnet, a pre-trained model that can provide a better model with better accuracy.
References 1. Islam, N., Hoque, M.M., Hossain, M.R.: Automatic authorship detection from Bengali text using stylometric approach. IEEE Xplore (2017). https://ieeexplore.ieee.org/abstract/ document/8281793 2. Chakraborty, T.: Authorship Identification in Bengali Literature: a Comparative Analysis (2013). https://arxiv.org/abs/1208.6268 3. Das, S., Mitra, P.: Author identification in bengali literary works. In: Kuznetsov, S.O., Mandal, D.P., Kundu, M.K., Pal, S.K. (eds.) Pattern Recognition and Machine Intelligence, pp. 220– 226. Springer Berlin Heidelberg, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3642-21786-9_37 4. Rocha, A., et al.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017). https://doi.org/10.1109/TIFS.2016.2603960
32
Md. Abdul Hamid et al.
5. Mosteller, F., Wallace, D.L.: Inference in an authorship problem. J. Am. Stat. Assoc. 58(302), 275 (1963). https://doi.org/10.2307/2283270 6. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009). https://doi.org/10.1002/asi.21001 7. Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2006 (2006). https://doi.org/ 10.1145/1148170.1148304 8. Shao, S., Tunc, C., Al-Shawi, A., Hariri, S.: Autonomic Author Identification in Internet Relay Chat (IRC). IEEE Xplore (2018). https://ieeexplore.ieee.org/abstract/document/8612780 9. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inform. Sci. Technol. 57(3), 378–393 (2006). https://doi.org/10.1002/asi.20316 10. Phani, S., Lahiri, S., Biswas, A.: A machine learning approach for authorship attribution for Bengali blogs. IEEE Xplore (2016). https://ieeexplore.ieee.org/abstract/document/7875984 11. [11] Savoy, J.: Authorship attribution based on specific vocabulary. ACM Trans. Inform. Syst.30(2), 1–30 (2012). https://doi.org/10.1145/2180868.2180874 12. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009). https://doi.org/10.1002/asi.20961 13. Cyran, K.A., Sta´nczyk, U.: Machine learning approach to authorship attribution of literary texts. Int. J. Appl. Math. Inform. 1(4), 151–158 (2007) 14. Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2010). https://doi.org/10.1007/s10579-009-9111-2 15. Seroussi, Y., Zukerman, I., Bohnert, F.: Authorship attribution with topic models. Comput. Linguist. 40(2), 269–310 (2014). https://doi.org/10.1162/coli_a_00173 16. Juola, P.: Authorship attribution. Found. Trends® Inform. Retrieval 1(3), 233–334 (2007). https://doi.org/10.1561/1500000005 17. Stamatatos, E.: Author Identification Using Imbalanced and Limited Training Texts. IEEE Xplore (2007). https://ieeexplore.ieee.org/abstract/document/4312893 18. Diri, B., Fatih Amasyali, M.: Automatic Author Detection for Turkish Texts. https://citese erx.ist.psu.edu/document?repid=rep1&type=pdf&doi=f1422461024fcec79c94fe2671923c e79be0e4ef 19. Howedi, F., Mohd, M.: Text classification for authorship attribution using naive bayes classifier with limited training data. Comput. Eng. Intell. Syst. 5(4), 48 (2014). https://iiste.org/Journals/ index.php/CEIS/article/view/12132. Accessed 27 Jan 2023 20. Akimushkin, C., Amancio, D.R., Oliveira, O.N.: Text authorship identified using the dynamics of word co-occurrence networks. PLoS ONE 12(1), e0170527 (2017). https://doi.org/10.1371/ journal.pone.0170527 21. Segarra, S., Eisen, M., Ribeiro, A.: Authorship attribution through function word adjacency networks. IEEE Trans. Signal Process. 63(20), 5464–5478 (2015). https://doi.org/10.1109/ tsp.2015.2451111 22. Shao, S., Tunc, C., Satam, P., Hariri, S.: Real-Time IRC threat detection framework. IEEE Xplore (2017). https://ieeexplore.ieee.org/abstract/document/8064142. Accessed 27 Jan 2023 23. Abascal-Mena, R., López-Ornelas, E.: Author detection: analyzing tweets by using a Naïve Bayes classifier. J. Intell. Fuzzy Syst. 1–9 (2020).https://doi.org/10.3233/jifs-179894 24. Deibel, R., Löfflad, D.: Style Change Detection on Real-World Data using an LSTM-powered Attribution Algorithm Notebook for PAN at CLEF 2021. https://ceur-ws.org/Vol-2936/paper163.pdf 25. Suissa, O., Elmalech, A., Zhitomirsky-Geffet, M.: Text analysis using deep neural networks in digital humanities and information science. J. Am. Soc. Inf. Sci. 73(2), 268–287 (2021). https://doi.org/10.1002/asi.24544
A Method for Bengali Author Detection Using State of the Arts Supervised Machine
33
26. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 29(2/3), 103–130 (1997). https://doi.org/10.1023/a:1007413511361 27. Raschka, S.: Naive Bayes and Text Classification I - Introduction and Theory. arXiv.org, 2014. https://arxiv.org/abs/1410.5329 28. Madigan, D., Genkin, A., Lewis, D., Argamon, S., Fradkin, D., Ye, L.: Author Identification on the Large Scale. www.semanticscholar.org (2005). https://www.semanticscholar.org/ paper/Author-Identification-on-the-Large-Scale-Madigan-Genkin/e1b1b12896525c747650f ddb45bd0a81f798bb09. Accessed 27 Jan 2023 29. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https:// doi.org/10.1007/bf00994018 30. Peterson, L.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009). https://doi.org/10.4249/ scholarpedia.1883 31. Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39(4), 261–283 (2011). https://doi.org/10.1007/s10462-011-9272-4 32. Sagi, O., Rokach, Li.: Ensemble learning: a survey. WIREs Data Mining Knowl. Discov. 8(4) (2018). https://doi.org/10.1002/widm.1249 33. Dreiseitl, S., Ohno-Machado, L.: Logistic regression and artificial neural network classification models: a methodology review. J. Biomed. Inform. 35(5–6), 352–359 (2002). https://doi. org/10.1016/s1532-0464(03)00034-0 34. Krstini´c, D., Braovi´c, M., Šeri´c, L., Boži´c-Štuli´c, D.: Multi-label classifier performance evaluation with confusion matrix. Comput. Sci. Inform. Technol. (2020). https://doi.org/10.5121/ csit.2020.100801 35. DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3), 837 (1988). https://doi.org/10.2307/2531595
Robust State of Charge Estimation and Simulation of Lithium-Ion Batteries Using Deep Neural Network and Optimized Random Forest Regression Algorithm Saad El Fallah1(B) , Jaouad Kharbach1 , Abdellah Rezzouk1 , and Mohammed Ouazzani Jamil2 1
Facult´e des Sciences Dhar El Mahraz, Laboratoire de Physique de solide, Universit´e Sidi Mohamed Ben Abdellah, B.P. 1796, 30003 F`es-Atlas, Morocco [email protected] 2 Laboratoire Syst`emes et Environnements Durables, Universit´e Priv´ee de F`es, Lot. Quaraouiyine Route Ain Chkef, 30040 F`es, Morocco
Abstract. The estimation of the state of charge (SoC) of lithiumion batteries is crucial for battery management systems. SoC is one of the most critical parameters that must be determined in real-time to ensure the reliable and safe operation of Li-ion batteries. SoC is a non-measurable parameter, but its value can be derived from other measurable parameters, such as current, voltage, and temperature. Unlike most studies available in the literature, this paper presents a comparative study between two machine learning methods: the Random Forest Regressor (RFR) and the Multi-layer Perceptron (MLP) to accurately estimate the SoC of lithium-ion batteries from data collected under Matlab/Simulink software from a LiCoO2 battery cell, taking into account the effect of the operating temperature on the battery, and under different current charge/discharge profiles. The results indicate that the Random Forest regressor model is reliable in estimating the SoC with a coefficient of determination of 0.99, a mean error value of less than 0.5%, and a maximum error value of less than 1.83%. In contrast, the MLP yields a mean error value of less than 0.8%, and a maximum error value of less than 1.87%, demonstrating the accuracy and robustness of the Random Forest regressor model for SoC estimation. Keywords: Lithium-ion battery · Battery management system · Extended kalman filter · battery modeling · State of charge · Electrical vehicle · Machine learning
1
Introduction
Lithium-ion batteries are now widely used in our daily lives, whether in many portable electronic devices or electric vehicles. The latter has become one of the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 34–45, 2023. https://doi.org/10.1007/978-3-031-43520-1_4
State of Charge Estimation and Simulation
35
best candidates to gradually replace internal combustion engine vehicles [1,2]. Lithium-ion technology offers many advantages including high energy density, fast charging capability, longer life, and no memory effect [3]. Accurate estimation of the SoC, state of health (SOH), and state of Function (SOF) ensure battery safety and reliability by preventing harmful behaviors such as overcharging or deep discharging of the battery [4]. Moreover, this estimation is essential for electric vehicles (EVs) to know the available energy in the battery and thus predict the remaining range of the electric vehicle [5,6]. Direct measurement of the SoC is not possible, due to the nonlinearity of the SoC and the complex dynamics of batteries. For this reason, advanced estimation techniques are used to estimate the SoC [7,8]. A battery management system (BMS) is a fundamental component of any application; this system allows for real-time monitoring of the various internal characteristics of the battery, including its SoC and state of health (SOH), among others [9]. In this way, the BMS can intervene and protect the battery under critical operating conditions to prevent unexpected failures and damage to the battery [10,11]. The main approaches and methods for SoC estimation can be classified into three categories, namely, model-based methods, conventional methods, and databased methods. The Conventional methods include methods based solely on electrical measurements, such as the Coulomb counting method and the open circuit voltage (OCV) method [12,13]. Other methods are mainly based on mathematical equation models and electrical models. Filters can be added to correct model uncertainties, such as the adaptive extended Kalman filter (AEKF) [14], the adaptive unscented Kalman filter (AUKF) [15], and the ARX-based Kalman filter [16]. Machine learning (ML) methods can also be used to generate a model based on the data and identify the non-linear relationship between SoC and measurable variables such as current, voltage, and temperature. Currently, several studies focus on machine learning models such as linear regression, K-Nearest Neighbors regressor, Random Forest regressor (RFR) [17], and Decision Tree regressor (DTR) has been mentioned in the literature for SoC estimation [18– 20]. In addition to these models, in reference 19, the authors combined a recurrent neural network with long-term memory (LSTM-RNN) and an unaccented Kalman filter (UKF) to estimate the SoC of a 18650 lithium iron phosphate battery. This model includes current, voltage, and temperature as inputs and SoC estimation as outputs. The UKF was incorporated into the network to filter the output noise and improve the accuracy of the SoC estimation [21]. In this paper, we present a comparative study between two machine learning methods: Random Forest Regressor (RFR) and Multi-layer Perceptron (MLP) for accurate SoC estimation. The organization of this paper is as follows: In Sect. 2, we present the simulation methodology and the data manipulated in this study. In Sect. 3, we discuss the working principle of the different estimation methods. Section 4, contains the results and discussion. This manuscript concludes with a summary of the comparison and a conclusion.
36
2
S. El Fallah et al.
Simulation Methodology
To simulate the charge/discharge scheme of the battery adopted, Matlab/Simulink software is employed in this paper. As the SoC depends on the charge/discharge current profile, two profiles have been selected in this paper. First, a constant current profile is applied to the battery until it is fully charged, and a constant charge with constant current demand. For the discharge process, the battery is entirely discharged with current profiles of varying magnitude. Figure 2 presents various current profiles used for training. The effect of temperature on the SoC is also taken into account in the simulation during the charging and discharging processes of the battery. To study the effect of operating temperature on the discharge process of a battery [22], Fig. 1 shows the SoC of a battery for different operating temperatures with a constant current. We notice that the SoC is influenced by the operating temperature. When the operating temperature is high, the discharge is higher. On the other hand, the discharge time decreases when the temperature decreases.
Fig. 1. State of charge during the discharge process for different temperature values
The obtained data is fed to the studied machine learning algorithms to estimate the battery SoC. To compare the two machine learning methods, several simulations were performed to simulate the charging and discharging pattern of a LiCoO2 battery cell of 2Ah. This type of battery is used widely in electrified vehicle applications. In addition, we tested several types of batteries to verify that the proposed algorithms provide accurate and robust estimation performance for not only the various conditions mentioned above but also for different batteries. The Scikit-learn library in Python helped us to develop the different machine learning algorithms proposed in this paper and test them with various data from the training data. In this study, a performance comparison of the two regression models was performed to analyze the feasibility of using different machine learning methods in SoC estimation.
State of Charge Estimation and Simulation
37
Fig. 2. Different current profiles used for training.
The simulation in MATLAB is performed to collect data during the charging and discharging of a battery to estimate the SoC using machine learning models, as it creates a virtual model of the battery and its behavior. This simulated data can then be used to train machine learning models, such as neural networks, to accurately predict the battery’s SoC based on collected voltage, current, and temperature data. This approach can be useful for evaluating the performance of a battery before it is built and deployed in a real-world application and can also be used to optimize battery design and control algorithms. Simulating the charge/discharge behavior of a battery in MATLAB can be useful in evaluating the performance of a battery for reliable SoC estimation because it allows the user to test and analyze the behavior of the battery under a variety of conditions and inputs, such as different charge and discharge rates, temperatures, and initial conditions. This information can be used to develop and optimize battery SoC estimation algorithms, which is an important performance metric for batterypowered systems. In addition, the simulated data can be used to validate the experimental measurements and identify potential problems or sources of error in the battery performance. Overall, the use of simulation in this context provides an efficient and cost-effective way to evaluate battery performance for reliable SoC estimation. Figure 3 illustrates the schematic of the simulation.
38
S. El Fallah et al.
Fig. 3. Simulation schematic
3
SoC Estimation Methods
An accurate SoC estimation is crucial because battery management systems (BMS) use it to tell the user how much capacity is left until the next recharge, defined as the ratio between the charge of the cell at a certain time and its total capacity. Machine learning allows us to estimate and highlight a model capable of predicting the SoC of a battery to give more accuracy [23,24]. It is therefore considered a modern evolution of the concept of predictive analysis. Two methods are applied in this work to estimate the SoC of a lithium-ion battery: the Random Forest Regressor (RFR) and the Multi-Layer Perceptron (MLP). The Scikit-learn library in Python helped us to develop the different machine learning algorithms proposed in this paper and to test them with data different from the training data. The original data is divided into training data sets (80%) and test data sets (20%). The working principle of these two approaches is summarized in the following subsections. 3.1
Random Forest Regression
Random Forest is an ensemble-based supervised learning algorithm used in classification to predict categorical variables and in regression to predict continuous variables. RF regression provides many decision trees for regression, and its output is computed by calculating the average of the output of all decision trees. The decision tree is a model that has no prior tree structure. The tree structure depends on the complexity of the trained data during the learning phase. The working principle of RF regression is illustrated in Fig. 4 [25].
State of Charge Estimation and Simulation
39
Fig. 4. Working principle of RF regression
RF regression generates regression trees using the training data X, which is given by, X = x1 , x2 , x3 , x4 ......., xk which produces a forest. This method produces n outputs T1 (x), T2 (x), T3 (x)........, Tn (x) corresponding to each tree. To compute the final output, all tree predictions are averaged using the following equation: n 1 Tn (x) (1) RF (X) = n n=1 3.2
Multi-layer Perceptron
The MLP network is a type of feedforward neural network, which consists of an input layer, several hidden layers, and output layers [26], and the nodes that make up each layer are represented by a weighted sum of the outputs of the nodes in the previous layer, a bias, and a nonlinear activation function, as shown in Fig. 5. The advantage of choosing ReLU as the activation function is that it avoids the gradient leakage phenomena that are likely to occur in the sigmoid function. In addition, ReLU offers efficient network compression and computation. Nodes in each layer are connected to nodes in the next layer by a weight wij . The total input to a hidden neuron i is computed using the weights and biases of each layer as follows:
40
S. El Fallah et al.
ij =
n
xi wij + bj
(2)
i=1
where ij is the total input of the hidden layer neuron j, xi is the input of the hidden layer neuron i, and the hidden layer neuron j, bj is the bias of the hidden layer neuron j. It is actually possible to use different activation functions on the hidden layer. The activation function is applied on the total input and provides the output of the hidden neuron to the next layer as follows: Hj = f (ij ) =
1 1 + exp (−ij )
(3)
Fig. 5. Structure of basic multilayer perceptron neural network
4
Results and Discussion
It is interesting to note that the curves representing the SoC evolve linearly because the applied discharge current is constant during the discharge. We used only the discharge data to train our SoC estimation algorithm. However, the latter can also be estimated from the charge data. Subsequently, we used a charge and discharge current profile to train our state of the charge estimation algorithm. The indicators used to evaluate each model represent the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE), and the Coefficient
State of Charge Estimation and Simulation
41
of Determination (COD). All three can be calculated from the predicted and actual values. An accurate and effective comparison of RF and MLP requires that both methods be subjected to the same current profile and tested under the same conditions. Both strategies were initially tested under the current profile shown in Fig. 2, representing a discharge/charge cycle. Initially, the battery is discharged using a variable frequency impulse current and then recharged using a CC-CV profile, currently adopted in most battery chargers. Table 1 illustrates the performance comparison of the two methods, the multilayer perceptron, and the Random Forest Regressor, for a variable charge/discharge current profile. The results presented in Figs. 6, 7, 8 and 9 are obtained by both machine learning methods after several tests. It is important to note that the MLP used is a four layer network where the first three layers contain 150 neurons and the last layer contains one neuron. In addition to the accuracy of the algorithms, their learning and response times are other critical factors, since the intended application is real-time SoC estimation. Therefore, time is taken into account when comparing the performance of these methods. The results show that the Random Forest regression model has a minimum MAE of 0.0029 and a coefficient of determination of 0.99, it is also the fastest model, and therefore the best performing model.
Fig. 6. RF estimated SoC vs real SoC
42
S. El Fallah et al.
Fig. 7. MLP estimated SoC vs real SoC
Fig. 8. RF estimated SoC vs real SoC
Fig. 9. MLP estimated SoC vs real SoC
State of Charge Estimation and Simulation
43
Table 1. Performance comparison based on accuracy. Method
RMSE MAE
COD
Random Forest Regressor 0.0038 0.0029 0.9998 Multi-layer Perceptron
0.0055 0.0048 0.9994
MLPs are considered to be more complex models as they have multiple layers of artificial neurons and can require more computational resources to train and make predictions. They are also sensitive to the number of hidden layers and neurons, which can affect the model’s performance and complexity. Random forests, on the other hand, are relatively simpler models. They are composed of multiple decision trees, which are less complex than an artificial neural network. Random forests can be trained and make predictions more efficiently than MLPs, but they may not be able to achieve the same level of accuracy on certain problems. In terms of time consumption, MLPs generally take longer to train and make predictions than random forests because they have more parameters to optimize. However, the time required to train and make predictions will also depend on the specific problem, the size of the dataset, and the computational resources available.
5
Conclusion
An accurate SoC estimation is essential for battery management systems to protect electric vehicle batteries from overcharging or deep discharge. In this paper, a comparative study between two machine learning methods: Random Forest Regressor (RFR) and Multi-layer Perceptron (MLP) has been performed to achieve an accurate estimation of the SoC of Lithium-ion batteries. In this work, we observe a good correspondence between the reference SoC and the estimated SoC. Furthermore, the results show that the Random Forest Regressor model is reliable in estimating the SoC with a coefficient of determination of 99.98% and a mean error value of less than 0.5%, and a maximum error value of less than 1.83%. In future work, we plan to incorporate the effect of battery aging into the simulation to estimate the SoC with high accuracy.
References 1. Mwasilu, F., John, J.J., Eun-Kyung, K., Duc, D.T., Jin-Woo, J.: Electric vehicles and smart grid interaction: a review on vehicle to grid and renewable energy sources integration. Renew. Sustain. Energy Rev. 34, 501–516 (2014) 2. Scrosati, B., Garche, J.: Lithium batteries: status, prospects and future. J. Power Sour. 195(9), 2419–2430 (2010) 3. Lu, L., Han, X., Li, J., Hua, J., Ouyang, M.: A review on the key issues for lithiumion battery management in electric vehicles. J. Power Sour. 226, 272–288 (2013)
44
S. El Fallah et al.
4. Wang, Z., Feng, G., Zhen, D., Gu, F., Ball, A.: A review on online state of charge and state of health estimation for lithium-ion batteries in electric vehicles. Energy Rep. 7, 5141–5161 (2021) 5. McCurlie, L., Preindl, M., Emadi, A.: Fast model predictive control for redistributive lithium-ion battery balancing. IEEE Trans. Industr. Electron. 64(2), 1350– 1357 (2016) 6. Malysz, P., Gu, R., Ye, J., Yang, H., Emadi, A.: State-of-charge and state-of-health estimation with state constraints and current sensor bias correction for electrified powertrain vehicle batteries. IET Elect. Syst. Transp. 6(2), 136–144 (2016) 7. Chen, X., Shen, W., Dai, M., Cao, Z., Jin, J., Kapoor, A.: Robust adaptive slidingmode observer using RBF neural network for lithium-ion battery state of charge estimation in electric vehicles. IEEE Trans. Veh. Technol. 65(4), 1936–1947 (2015) 8. Chen, Y., Li, C., Chen, S., Ren, H., Gao, Z.: A combined robust approach based on auto-regressive long short-term memory network and moving horizon estimation for state-of-charge estimation of lithium-ion batteries. Int. J. Energy Res. 45(9), 12838–12853 (2021) 9. El Fallah, S., Kharbach, J., Hammouch, Z., Rezzouk, A., Jamil, M.O.: State of charge estimation of an electric vehicle’s battery using deep neural networks: simulation and experimental results. J. Energy Stor. 62, 106904 (2023) 10. Tao, L., Ma, J., Cheng, Y., Noktehdan, A., Chong, J., Lu, C.: A review of stochastic battery models and health management. Renew. Sustain. Energy Rev. 80, 716–732 (2017) 11. Elouazzani, H., Elhassani, I., Ouazzani-Jamil, M., Masrour, T.: State of charge estimation of lithium-ion batteries using artificial intelligence based on entropy and enthalpy variation. In: Ben Ahmed, M., Boudhir, A.A., Santos, D., Dionisio, R., Benaya, N. (eds.) Innovations in Smart Cities Applications Volume 6. SCA 2022. LNNS, vol. 629, pp. 747–756. Springer, Cham (2023). https://doi.org/10. 1007/978-3-031-26852-6 69 12. Edi, L., Nashirul, H.I., Muhammad, I., Soelami, F.N., Merthayasa, I.G.N.: State of charge (SoC) estimation on LiFePO 4 battery module using Coulomb counting methods with modified Peukert. In: 2013 Joint International Conference on Rural Information Communication Technology and Electric-Vehicle Technology (rICT ICeV-T). IEEE (2013) 13. Hongwen, H., Xiaowei, Z., Xiong Rui, X., Yongli, G.H.: Online model-based estimation of state-of-charge and open-circuit voltage of lithium-ion batteries in electric vehicles. Energy 39(1), 310–318 (2012) 14. Rui, X., Hongwen, H., Fengchun, S., Kai, Z.: Evaluation on state of charge estimation of batteries with adaptive extended Kalman filter by experiment approach. IEEE Trans. Veh. Technol. 62(1), 108–117 (2012) 15. Jinhao, M., Luo, G., Gao, F.: Lithium polymer battery state-of-charge estimation based on adaptive unscented Kalman filter and support vector machine. IEEE Trans. Power Electron. 31(3), 2226–2238 (2015) 16. Shifei, Y., Hongjie, W., Yin, C.: State of charge estimation using the extended Kalman filter for battery management systems based on the ARX battery model. Energies 6(1), 444–470 (2013) 17. Hossain, L.M.S., et al.: Real-time state of charge estimation of Lithium-ion batteries using optimized random forest regression algorithm. IEEE Trans. Intell. Veh. 8, 639–648 (2022) 18. Chao, H., Gaurav, J., Puqiang, Z., Craig, S., Parthasarathy, G., Tom, G.: Datadriven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Appl. Energy 129, 49–55 (2014)
State of Charge Estimation and Simulation
45
19. Hu, J.N., et al.: State-of-charge estimation for battery management system using optimized support vector machine for regression. J. Power Sour. 269, 682–693 (2014) 20. Hasan A.J., Yusuf, J., Faruque, R.B.: Performance comparison of machine learning methods with distinct features to estimate battery SOC. In: 2019 IEEE Green Energy and Smart Systems Conference (IGESSC). IEEE (2019) 21. Fangfang, Y., Shaohui, Z., Weihua, L., Qiang, M.: State of charge estimation of lithium-ion batteries using LSTM and UKF. Energy 201, 117664 (2020) 22. Jiazhi, M., Zheming, T., Shuiguang, T., Jun, Z., Jiale, M.: State of charge estimation of lithium-ion battery for electric vehicles under extreme operating temperatures based on an adaptive temporal convolutional network. Batteries 8(10), 145 (2022) 23. Youssef, H.Y., et al.: A machine learning approach for state-of-charge estimation of Li-ion batteries. In: Artificial Intelligence and Machine Learning for Multi-domain Operations Applications IV, vol. 12113. SPIE (2022) 24. Chandran, V., Patil, C.K., Karthick, A., Ganeshaperumal, D., Rahim, R., Ghosh, A.: State of charge estimation of lithium-ion battery for electric vehicles using machine learning algorithms. World Elect. Veh. J. 12(1), 38 (2021) 25. Niankai, Y., Ziyou, S., Heath, H., Jing, S.: Robust State of Health estimation of lithium-ion batteries using convolutional neural network and random forest. J. Energy Stor. 48, 103857 (2022) 26. Wang, D., Lee, J., Kim, M., Lee, I.: State of charge estimation using multi-layer neural networks based on temperature. In: 2022 22nd International Conference on Control, Automation and Systems (ICCAS). IEEE (2022)
Advancing Lithium-Ion Battery Management with Deep Learning: A Comprehensive Review Hind Elouazzani1(B) , Ibtissam Elhassani1,2 , and Tawfik Masrour1,2 1
Laboratory Mathematical Modeling, Simulation and Smart Systems (L2M3S), ENSAM-Meknes, Moulay Ismail University of Meknes, Meknes, Morocco [email protected], {i.elhassani,t.masrour}@ensam.umi.ac.ma 2 University of Quebec at Rimouski, Rimouski, Canada Abstract. Due to climate change and increasing global energy demands, lithium-ion batteries (LIBs) have recently gained increasing interest, particularly in electric vehicle applications (EV) and energy storage systems (ESS), due to their valuable features such as high energy density, fast charging ability, and long lifespan. The battery management system (BMS) monitors battery states and provides optimal charging profiles to indicate battery status and optimize battery behavior, and the machine learning (ML) methods, particularly deep learning ones, have demonstrated a consistent and significant increase in battery prognostic studies over the last few years. Therefore, this paper systematically reviews the commonly used methods based on deep neural networks for battery state monitoring and fast charging. The details of the proposed algorithms, including the data source, input features, estimation error, real-time applicability, and field of application, are presented, compared, and summarized in this paper. Finally, the review investigates the gaps in the reviewed methods and presents possible recommendations and future directions for next-generation BMS.
Keywords: Deep machine learning Battery Management System(BMS)
1
· Lithium Ion Battery(LIB) ·
Introduction
Due to climate change and increasing global energy demands, lithium-ion batteries (LIBs) have recently gained increasing interest, particularly in the electric vehicle (EV) market and energy storage systems (ESS). Compared to other types of rechargeable batteries, LIBs are known for their valuable features such as high energy density, very high energy efficiency, no memory effect, low self-discharge rates, long cycle lives, low cost, and safety [11]. A significant amount of research is being conducted to improve battery performance, including the incorporation of new materials and the discovery of new c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 46–58, 2023. https://doi.org/10.1007/978-3-031-43520-1_5
ALIB Management with DL: A Comprehensive Review
47
chemistries to deliver higher energy densities. Other research relies on performance management and internal state monitoring to ensure battery safety and longevity, both of which BMS ensures. The BMS is an electronic device that manages a rechargeable battery to meet certain requirements such as ensuring safety and thermal management, monitoring battery states, and providing optimal charging profiles. Machine learning (ML) methods, particularly deep learning ones, have demonstrated a consistent and significant increase in battery prognostic studies over the last few years [8]. Known also as data-based methods, these techniques are used to obtain the nonlinear relationship between battery states and features using a large number of empirical observations data [48], having the advantage of simplicity and not requiring an explicit battery model or extensive knowledge about its internal characteristics of battery [47]. Therefore, this paper provides a systematic review of various deep learning algorithms used in lithium-ion battery management, with an emphasis on state monitoring, specifically SOC, SOH, capacity, and RUL, as well as charging protocols. This paper provides a classification of proposed deep learning algorithms, along with details about the data used, input features, real-time applicability, the field of application, and the estimation error to compare the different methods. Finally, we rigorously discussed the various existing limitations of DL algorithms used in each field before presenting some future suggestions to pave the way for the development of an advanced DL algorithm in battery state estimation and charging strategy optimization.
2
Surveying Methodology
The initial search is conducted in two scientific databases: Scopus and Web of Science. 1. The initial evaluation was carried out using the appropriate keywords ( (“Reinforcement Learning” OR “Deep learning ”) AND (“Lithium Ion Batter*”) AND (“Battery management system” OR (“*fast charg*”))) ) with the English language and only the article papers, and accordingly 136 articles were identified, 20 duplicated papers were removed. 2. The second evaluation and screening were conducted using the articles’ title and abstract to ensure the compatibility of the content of the article with the objective of this paper and eliminate the review papers, and accordingly, 76 articles were selected. 3. For this paper, the focus is on reviewing only papers that propose deep learning methods for SOC or SOH estimation as well as fast charging. Furthermore, we minimized the number of articles to be processed according to the journal’s score factor, H-Index, and SJR. 4. At this stage, 41 articles from various journals were chosen for review. The Prisma flow diagram Fig. 1 describes the selection process of papers reviewed in this work.
48
3
H. Elouazzani et al.
SOC Estimation
State of charge (SOC) is defined as the ratio of the current remaining capacity to the maximum capacity, which represents the remaining available capacity before the battery needs to be recharged [22]. Generally, SOC is a battery internal parameter related to the average lithium concentration in the electrodes and affected by the cell temperature, electrochemical reactions, material degradation, and aging cycles [28]. As a result, the SOC cannot be measured directly; rather, it is estimated indirectly using the battery’s externally measured variables in conjunction with relevant SOC estimation algo-
Fig. 1. PRISMA flow diagram for the proposed systematic review
ALIB Management with DL: A Comprehensive Review
49
rithms [9]. Thus, considerable effort has been expended in developing a secure and reliable method for SOC estimation, which can be classified as lookingup table-based methods, model-based methods, and, more recently, data-driven methods. Data-driven methods, also known as machine learning methods and black box models, are used to learn the relation between the SOC and the observable variables with limited knowledge of battery internal characteristics and less time to design a significantly complex architecture than model-based techniques [27]. Table 1 provides a detailed synopsis of each of the DL algorithms reviewed in this work SOC estimation. 3.1
Convolutional Neural Network
Wang and al. [43] applied convolutional residual networks to estimate SOC, using measurable variables as inputs and exploiting process information and interrelationships among them. Regarding SOC estimation, the proposed network model performed well, with an RMSE of 0.998% and an MAE of 1.260%. To reduce training time, HANNAN et al. [18] built a multi-layer time-domain convolutional network with cyclically varying learning rates and optimized maximum and minimum values. When compared to the LSTM, GRU, and CNN models, the proposed architecture achieved a 0.85% MSE. 3.2
Recurrent Neural Network
A stacked bidirectional long short-term memory (SBLSTM) framework for SOC estimation at various temperature settings was developed in [2]. The bidirectional LSTM layers have the advantage of capturing battery temporal information in both directions and modeling long-term dependencies in the past and future. According to the experimental dataset, the proposed model enhanced SOC estimation accuracy with MAEs of 0.46% and 0.73% at fixed and varying ambient temperatures, respectively. In another work [31], an LSTM-based network is proposed to co-estimate SOC and SOE of lithium-ion batteries with an MAE of 0.91%. A two-hidden layer stacked gated recurrent unit (GRU) model using a onecycle policy learning rate scheduler was presented by [19]. The proposed model was tested on multiple drive cycles at varying temperatures and achieved a minimum RMSE of 0.52% and 0.65% on the train and test dataset respectively. With the same perspective, Xiao et al. [45] applied GPR in conjunction with the gated recurrent unit kernel to estimate SOC (GRU-GPR). The proposed deep recurrent kernel was used to capture ordering matters and recurrent structures in sequential measured data. It was capable of not only estimating the SOC using measured quantities but also quantifying the uncertainty of the SOC estimations. 3.3
Hybrid Method
Authors typically combined more than one type of deep learning method to achieve better SOC estimation results; for instance, Huang et al. [21] proposed
50
H. Elouazzani et al.
a hybrid convolution-gated recurrent unit (CNN-GRU) network for estimating state-of-charge (SOC) and validated its accuracy by training with DST and testing with FUDS. The CNN-GRU model was trained using data collected from battery-discharging processes and outperformed RNN, GRU, ELM, and SVM methods, with RMSE, MAE, and testing times of 0.0167%, 0.0133%, and 0.0435 s, respectively. Zhao et al. [50] proposed an RNNs-CNNs model to enhance the accuracy of SOC estimation by ensuring a reliable vector representation and sufficient parameter extraction. Firstly, a recurrent neural network is employed as the data representation model to generate a well-trained vector representation that is subsequently fed into a multi-channel extended convolutional neural network. This CNN then processes the optimized vector to fully obtain feature information and accurately estimate SOC. As compared to the RNNs and Ah counting methods, the proposed algorithm resulted in improved predictions by 4.3% and 11.3%, respectively. Authors in [15], a hybrid neural network model was proposed to estimate the State of Charge (SOC) of lithium-ion batteries using input data of voltage, current, and temperature measurements. The proposed framework consists of three modules: the convolutional and ultra-lightweight subspace attention mechanism (ULSAM) modules that extract feature information from the sequence of sampling points and produce feature maps, and a gated recurrent unit (GRU) module that processes these feature maps sequences and predicts the SOC value. This model resulted in an improved prediction accuracy of 4.3% and 11.3%, respectively. Similarly, Gong et al. [14] developed a DDN model for SOC estimation that included a convolution layer, an ultra-lightweight subspace attention mechanism (ULSAM) layer, a simple recurrent unit (SRU) layer, and a dense layer, with voltage, current, and temperature data as input. Bian et al. [3] combined multichannel convolutional and bidirectional recurrent neural networks (MCNN-BRNN) for SOC estimation in measurement sequences with multiscale perturbations and their comovements. To reduce estimation fluctuations and improve estimator robustness, the MCNN extracted multi-scale, locally robust features from different input paths, whereas the BRNN captured effective time-varying information of intercorrelated features in both the forward and reverse directions to sequentially estimate SOC. To achieve the best models’ hyperparameters, How et al. [20] presented a Bayesian optimization strategy known as the tree Parzen estimator (TPE) to optimize the hyperparameters of various DL models such as BGRU, GRU, LSTM, CNN, FCN, and DNN. In the BGRU-TPE model, the proposed method produced the best results, with the lowest RMSE and MAE of 0.8% and 0.56%, respectively. 3.4
Transfer Learning
Bian et al. [4] proposed deep transfer neural network (DTNN) with multiscale distribution adaptation (MDA) for SOC. A fully connected layer was added
ALIB Management with DL: A Comprehensive Review
51
after bidirectional LSTM with inputs of voltage, current, and temperature from different datasets for pre-training and training. Then, using transfer learning, they transferred features from the model trained on the pre-trained dataset to the model trained on the target dataset. The SOC estimation using DTNN achieved the lowest RMS compared to RNN, LSTM, and GRU as single neural network methods. Liu et al. [29] proposed and applied a model based on a temporal convolution network (TCN) to estimate the battery SOC of the LiFePO4 battery via transfer learning. The estimation had an MAE of 0.67% on average.
4
Health Management
The concept of State of Health (SOH) was originally introduced to evaluate the health status of a battery in comparison to a new battery. It is defined as the ratio of existing capacity to rated capacity or the present resistance to that of a fresh battery [32]. On the other hand, Remaining Useful Life (RUL) prediction is one of the major strategies for battery health management. It provides early warning of faults, which enables the identification of necessary battery maintenance and replacement in advance. Several studies have been conducted on battery health management, categorized as model-based and data-driven methods. In the following section, we will discuss some deep-learning approaches for estimating capacity, SOH, and RUL. Table 1. Comparaison of DL methods for battery SOC estimation Method DNN LSTM SBLSTM GRU
Ref
Data Source
Cathode Type
Inputs
Performance
Field of Application
[40] [31] [2] [19]
CCCV Experiments Panasonic CALCE Panasonic CALCE
LiFePO4 NCA NMC NCA NMC
I, V, T I, V, T I,V,T I,V,T (one cycle)
RMSE ≤ 2.03% MAE=0.91% MAE≤ 0.73% RMSE(train) = 0.52% RMSE(test) =0.65% MAE=0.79%
EV Not specified Not specified EV
GRU-GPR
[45]
I, V ,T
[43]
Panasonic Experiments Panasonic
NCA LFP
CNN
NCA
I,V,T
CNN-GRU
[21]
Collected**
(Li(NiCoMn)O2) I,V,T
MCNNBRNN DTNN
[3]
Panasonic CALCE Panasonic CALCE CALCE Panasonic NASA Panasonic
NCA NMC
I,V,T
RMSE= 0.998% MAE = EV 1.260% RMSE= 0.0167% MAE = EV 0.0133% RMSE(Avr)= 1.01% Not specified
NCA NMC NMC NMC LiCoO2 NMC
I,V,T
RMSE= 1.04%
Not specified
I,V,T
RMSE≤1.867%
Not specified
I,V,T
MAE(Avr)=0.67%
EV
[4]
GRU-TL
[42]
TCN
[29]
Not specified
Note: **: data collected using battery testing system (arbin BT2000 tester) with different drive cycles. CALCE: dataset provided by the Center for Advanced Life Cycle, Engineering [46], Panasonic by the University of Wisconsin-Madison (UW-Madison) [24], and NASA by the National Aeronautics and Space Administration. Avr: Average value [36]
52
4.1
H. Elouazzani et al.
DL for Battery Health Management
DL approaches have been increasingly adopted for battery SOH prediction because of their ability to automatically extract features from raw data. DNNs, CNNs, and RNNs are the most commonly used algorithms in battery health management. Table 2 provides a detailed synopsis of each of the DL algorithms reviewed in this work in terms of battery health management. Gong et al. [13] proposed a deep learning-based encoder-decoder model for mapping the relationship between the battery charging curve and the corresponding SOH. The encoder, which was composed of a feature extractor and a simple recurrent unit (SRU), could encode the charging curve sampling data and generate the encoding sequence. The decoder, which was based on a propagation (BP) neural network, was in charge of decoding the encoding sequence and estimating the SOH. In [38], Song et al. used data collected from 700 electric vehicles from realworld BEVs and HEVs over the course of a year. The health features were extracted first based on the battery aging mechanism and then used as inputs to a feed-forward neural network to estimate pack capacity. The big data platform’s SOH estimation framework is built on the offline-trained model. The results of the tests show that the SOH estimation has a maximum relative error of 4.5%. Che et al. [6] proposed a SOH prognostic method for battery packs of various constructions and health states using universal HI generation and a deep learning framework. First, the general HIs that were highly correlated with battery pack capacities were extracted. A universal deep-learning neural network is then built to predict SOH under various conditions. MAE and RMSE are less than 2.0% and 2.3%, respectively, under constant current conditions, and less than 2.5% and 3.1%, respectively, under dynamic working conditions. Concerning the RUL estimation, [49] adopted a hybrid parallel residual convolutional neural networks model (HPR CNN) based on a cloud computing system for online RUL prediction, using sparse data uploaded by BMS and corresponding to only 20% charging capacity. Due to its high efficiency in capturing long-term dependencies, the multiple long short- term memory (LSTM) network is highly adopted in capacity estimation, which is a fundamental indicator of the battery’s degradation. Li et al. [26] used LSTM recurrent neural networks to develop a data-driven capacity estimation model for electric vehicle cells in real-world operation. The proposed network received as input voltage-time sensor data from the partial constant current phase charging curve and was able to achieve a best-case mean absolute percentage error of 0.76%. In [7], Chen et al. proposed a long short-term memory recurrent neural network (LSTM-RNN) model for predicting battery capacity. The experimental results show that the proposed architecture can track the nonlinear capacity degradation trend with a maximum error of 2.84%. Li. et al. [25] also introduced the LSTM algorithm to predict the battery degradation trajectory of capacity in one shot using 100 cycles of data.
ALIB Management with DL: A Comprehensive Review
53
The model was trained on whole-lifetime data based on sequence-to-sequence learning and taking the capacity time series as input. Processor-in-the-loop tests validated the proposed method’s performance with a 1.8% mean error in the best case and a 7.8% maximum error in the worst case. Different methods were used in conjunction with the LSTM models to extract relevant health indicators for health management. In [16], a particle swarmoptimized LSTM (PSO-LSTM) method is proposed for SOH estimation. First, four health factors were selected: constant current charging time, discharge duration, IC curve peak values, and battery temperature peak value, and their correlations with battery SOH were determined using grey correlation analysis. Following that, a PSO-LSTM estimation model was developed using PSO to optimize several important LSTM hyperparameters. The experimental results demonstrated that the proposed network outperformed the single LSTM model and increased prediction accuracy by at least 5%. Ardeshiri et al. [1] proposed a multivariate stacked bidirectional long shortterm memory (SBLSTM)-based framework for predicting the RUL of a lithiumion battery using the Pearson correlation and extreme gradient boosting (XGBoost) algorithms for feature extraction, yielding a low root mean square percentage error of 1.94%. Another DL algorithm applied to battery SOH estimation is the GRU. Ungurean et al. [41] revealed that the GRU model produced almost identical estimation errors as the LSTM model but required 25% fewer parameters. Hybrid deep learning methods have also become popular in battery health management to improve estimation results. In [12], a hybrid model is presented that uses gate-recurrent unit-deep convolutional NN to estimate SOH from charging curves obtained by CCCV charging of lithium-ion batteries, with meaTable 2. Comparison of DL methods for battery health management. Method
Data Source
Cathode Type
Feature Extraction
No.Features
Prediction Task
Performance
EncoderDecoder FFNN DNN
Ref [13]
NASA Oxford
Charging Curve
*
SOH
RMSE≤ 1.03% RMSE≤ 1% Not specified
Not specified
[38] [6]
SHEVDC Aging experiences
NCA/graphite LCO/graphite NMC LiNiO2 LiFePO4
Historical operating data Voltage, OCV curves,
4 2(stdQ,stddQ))
SOH SOH
YES Not specified
EV HEV Not Specified
ADNN ODE RNN
[35] [34] [23]
Not specified Not specified Not specified
[30] [39] [17] [1]
RMSE≤1% R2 ¿ 0.9 and MAE≤0.1 % RMSE≤2.79 % RMSE=1.94%
YES YES YES YES
EV Storage System Not specified Not Specified Not Specified
PSO-LSTM LSTM
[16] [25]
LSTM HPR CNN GRU-CNN
[26] [49] [12]
Charge/Discharge process 15 RUL Capacities/Charge Voltage profiles * SOH EOL Discharge Curve 15 (Impedance- SOH RUL related features) NCA/graphite DTV curve 3 SOH RUL NCA/graphite V, T, I measurements * SOH EOL Not specified V, T, I measurements * Capacity Not specified V, T, I measurements 15 RUL LFP/graphite 18,650-26 J Charge/Discharge data 4 SOH NMC/graphite Capacity History 1 (Capacity series) Degradation tory RWTH Aachen NMC/Carbon CC charging curve 1 Voltage-time Capacity Battery Degradation LFP/graphite Charge sparse V/I/T curves * RUL NASA Oxford LCO LCO Charging curves I,V,T* SOH
Not specified Not specified YES
RNN DGPR LSTM SBLSTM
NASA Nasa Oxford Battery Degradation [37] NASA NASA NASA NASA Battery Degradation Aging experiments RWTH Aachen
RE≤ 4.5% RMSE ≤ 2.3% (CC) RMSE ≤ 2.3% (DC) RMSE = 6.66% RMSE=1% RMSE=5% RMSE=3.08% 32.1 cycles
LSTM-RNN
[7]
Battery Degradation LFP/graphite
GRU
[41]
NASA
Not Specified Not Specified LFP/graphite
Not specified
Electric/Thermal characteristics 3(IR, T, DIC peak) variations V,I,T measurements Capacity
Capacity SOH
RMSE ≤ 0.786% trajec- ME min = 1.8% ME max = 7.8% MAE≤ 2.08%(worse case) MAE≤5% MAE = 4.11% MAE = 1.62% RMSE≤ 1.14% AE≤ 2.56%
Real-Time Applica- Field of Application bility
Yes YES
Not Specified EV stationary ESS
YES YES YES
EV Not Specified Not Specified
NO
Not Specified
YES
Not Specified
Note: ME: mean error. RE: relative error. stdQ: standard deviation of Q sequence. stddQ: standard deviation of differential Q sequence. DIC: Discharging Incremental Capacity. DTV: Differential Thermal Voltammetry. CC: constant conditions. DC: dynamique conditions. NASA dataset is provided by the National Aeronautics and Space Administration [36], Oxford dataset is provided by [5], Battery Degradation data [37].
54
H. Elouazzani et al.
sured data such as voltage, current, and temperature as inputs. The proposed method compared to other learning methods achieved an overall MAE with the best setting of 4.11% for the NASA dataset and 1.62% for the Oxford dataset.
5
Charging Control
Fast charging has emerged as one of the most pressing issues confronting the lithium-ion battery community in recent years. To solve the fast charging problem, the research community has developed several protocols, mostly mathematical optimization algorithms which rely on accurate electrochemical models. The effectiveness of such techniques, however, is severely limited by high computational costs and complexity. As a result, data-driven especially deep learningbased charging strategies are highly required. Wei et al. [44] proposed a DRL-based multi-physics constrained framework for LIBs fast charging with thermal and health constraints. By penalizing LIB over-temperature and degradation, a multi-objective optimization problem has been developed. As a solution, the proposed strategy combines a model-based observer for estimating the state of charge (SOC) and internal temperature, as well as a deep deterministic policy gradient (DDPG) algorithm, in order to achieve a trade-off between charging rapidity and physical constraint compliance. Park et al. [33] proposed a model-free DRL framework for LIB fast charging without violating the safety constraints. An electrochemical model was used as a battery simulator along with a DDPG algorithm in order to deal with continuous state and action spaces. The charging time optimization problem was addressed in two cases: the first case is a States-based Learning Policy when full state measurements are assumed, and the second case is an output-based learning policy with only output measurements considered as the agent observations. More recently, [10] used a deep reinforcement learning-based method to find an optimized multi-stages constant current charging profile for lithium-ion extremely fast charging without violating the battery temperature and voltage constraints, simulation results show achieving a charging time of 14 min for the thickest cathode and 6 min for the thinnest cathode.
6
Discussion and Research Trends
In terms of SOC estimation methods, the proposed algorithms are typically trained and tested on a variety of dynamic profiles, including the federal urban driving schedule (FUDS), the US06 test, and the dynamic stress test (DST). However, these standard profiles are different from the actual EV driving profiles because of the variety of areas, drivers, and periods, which may make real-time SOC estimation inefficient. In terms of health management methods, we noticed that some studies require historical capacity degradation of the battery in order to estimate SOH or RUL. However, in the real-time system, only measurable variables such as voltage,
ALIB Management with DL: A Comprehensive Review
55
current, and temperature are accessible, not battery capacity. As a result, these methods may be more complicated for online prediction. Also, some existing methods essentially require dense data of the entire discharge and/or charge curve as direct input or to extract health indicators from it, but in practice, batteries are not necessarily fully charged or discharged, and the charging process may not be constant but random, which makes obtaining complete charging and discharging data a difficult task to achieve. Furthermore, previous methods frequently suppose that the working conditions of the battery remain constant during training and validation, which is rarely the case. As a result, it is possible that some existing methods will be difficult to apply to practical applications. On top of that, there is a clear lack of experimental validation in the reviewed works. Because almost all methods are validated and tested using simulation, it may be worthwhile to investigate the feasibility of developing some test benches to test and validate the accuracy and efficiency of the proposed methods in real-world applications. Moreover, with the advancement of big data and cloud computing platforms, deep learning has emerged as a very potential candidate for SOH estimation. Cloud computing servers can solve computationally expensive and timeconsuming problems, making large amounts of data available. Finally, in terms of charging control strategies, there are not enough studies that use DL methods to find the optimal charging strategy, which could be a very promising field of research for future work.
7
Conclusion
Lithium-ion batteries are widely used in a variety of scenarios as electronic products, electric vehicles (EV), and communication base stations because of their high energy density and long life. BMS is required for LIB in order to evaluate their status and performance, as well as to maintain batteries on a regular basis to ensure long-term battery stability. Machine learning (ML) technology has a lot of potential in battery management with the development of big data and cloud computing and has shown impressive results. This paper examines the use of deep learning (DL) in the management and development of LIBs, with a focus on various methods for estimating SOC, battery health management, and charging control. This review investigates the most popular deep learning models and algorithms (DNN, CNN, LSTM, GRU, TL, and so on) for battery management and provides a thorough explanation, including the dataset used, battery types, estimation errors, online or offline applicability, and the field of application.
56
H. Elouazzani et al.
References 1. Ardeshiri, R.R., Liu, M., Ma, C.: Multivariate stacked bidirectional long short term memory for lithium-ion battery health management. Reliab. Eng. Syst. Saf. 224, 108481 (2022) 2. Bian, C., He, H., Yang, S.: Stacked bidirectional long short-term memory networks for state-of-charge estimation of lithium-ion batteries. Energy 191, 116538 (2020) 3. Bian, C., Yang, S., Liu, J., Zio, E.: Robust state-of-charge estimation of li-ion batteries based on multichannel convolutional and bidirectional recurrent neural networks. Appl. Soft Comput. 116, 108401 (2022) 4. Bian, C., Yang, S., Miao, Q.: Cross-domain state-of-charge estimation of li-ion batteries based on deep transfer neural network with multiscale distribution adaptation. IEEE Trans. Transp. Electrif. 7(3), 1260–1270 (2020) 5. Birkl, C.: Oxford battery degradation dataset 1 (2017) 6. Che, Y., et al.: State of health prognostics for series battery packs: a universal deep learning method. Energy 238, 121857 (2022) 7. Chen, Z., et al.: Capacity prediction and validation of lithium-ion batteries based on long short-term memory recurrent neural network. IEEE Access 8, 172783– 172798 (2020) 8. Dourhmi, M., Benlamine, K., Abouelaziz, I., Zghal, M., Masrour, T., Jouane, Y.: Improved hourly prediction of BIPV photovoltaic power building using artificial learning machine: a case study. In: Ben Ahmed, M., Abdelhakim, B.A., Ane, B.K., Rosiyadi, D. (eds.) Emerging Trends in Intelligent Systems & Network Security. NISS 2022. Lecture Notes on Data Engineering and Communications Technologies, vol. 147, pp. 270–280. Springer, Cham (2023). https://doi.org/10.1007/978-3-03115191-0 26 9. El Fallah, S., Kharbach, J., Hammouch, Z., Rezzouk, A., Jamil, M.O.: State of charge estimation of an electric vehicle’s battery using deep neural networks: Simulation and experimental results. J. Energy Stor. 62, 106904 (2023) 10. Elouazzani, H., Elhassani, I., Barka, N., Masrour, T.: Smart adaptive multi stage constant current fast charging for lithium ions batteries based on deep reinforcement learning. SSRN 4218801 11. Elouazzani, H., Elhassani, I., Ouazzani-Jamil, M., Masrour, T.: State of charge estimation of lithium-ion batteries using artificial intelligence based on entropy and enthalpy variation. In: Ben Ahmed, M., Boudhir, A.A., Santos, D., Dionisio, R., Benaya, N. (eds.) Innovations in Smart Cities Applications Volume 6. SCA 2022. LNNS, vol. 629, pp. 747–756. Springer, Cham (2023). https://doi.org/10. 1007/978-3-031-26852-6 69 12. Fan, Y., Xiao, F., Li, C., Yang, G., Tang, X.: A novel deep learning framework for state of health estimation of lithium-ion battery. J. Energy Stor. 32, 101741 (2020) 13. Gong, Q., Wang, P., Cheng, Z.: An encoder-decoder model based on deep learning for state of health estimation of lithium-ion battery. J. Energy Stor. 46, 103804 (2022) 14. Gong, Q., Wang, P., Cheng, Z.: A novel deep neural network model for estimating the state of charge of lithium-ion battery. J. Energy Stor. 54, 105308 (2022) 15. Gong, Q., Wang, P., Cheng, Z., et al.: A method for estimating state of charge of lithium-ion batteries based on deep learning. J. Electrochem. Soc. 168(11), 110532 (2021) 16. Gong, Y., et al.: State-of-health estimation of lithium-ion batteries based on improved long short-term memory algorithm. J. Energy Stor. 53, 105046 (2022)
ALIB Management with DL: A Comprehensive Review
57
17. Han, T., Wang, Z., Meng, H.: End-to-end capacity estimation of lithium-ion batteries with an enhanced long short-term memory network considering domain adaptation. J. Power Sour. 520, 230823 (2022) 18. Hannan, M.A., et al.: SOC estimation of li-ion batteries with learning rateoptimized deep fully convolutional network. IEEE Trans. Power Electron. 36(7), 7349–7353 (2020) 19. Hannan, M.A., How, D.N., Mansor, M.B., Lipu, M.S.H., Ker, P.J., Muttaqi, K.M.: State-of-charge estimation of li-ion battery using gated recurrent unit with onecycle learning rate policy. IEEE Trans. Indus. App. 57(3), 2964–2971 (2021) 20. How, D.N., Hannan, M., Lipu, M.S.H., Ker, P.J., Mansor, M., Sahari, K.S., Muttaqi, K.M.: SOC estimation using deep bidirectional gated recurrent units with tree Parzen estimator hyperparameter optimization. IEEE Trans. Indus. App. 58(5), 6629–6638 (2022) 21. Huang, Z., Yang, F., Xu, F., Song, X., Tsui, K.L.: Convolutional gated recurrent unit-recurrent neural network for state-of-charge estimation of lithium-ion batteries. IEEE Access 7, 93139–93149 (2019) 22. Jiang, J., Zhang, C.: Fundamentals and Applications of Lithium-ion Batteries in Electric Drive Vehicles. John Wiley & Sons, New York (2015) 23. Kim, S.W., Oh, K.Y., Lee, S.: Novel informed deep learning-based prognostics framework for on-board health monitoring of lithium-ion batteries. Appl. Energy 315, 119011 (2022) 24. Kollmeyer, P.: Panasonic 18650pf li-ion battery data. Mendeley Data. 1(2018) (2018) 25. Li, W., Sengupta, N., Dechent, P., Howey, D., Annaswamy, A., Sauer, D.U.: Oneshot battery degradation trajectory prediction with deep learning. J. Power Sour. 506, 230024 (2021) 26. Li, W., Sengupta, N., Dechent, P., Howey, D., Annaswamy, A., Sauer, D.U.: Online capacity estimation of lithium-ion batteries with deep long short-term memory networks. J. Power Sour. 482, 228863 (2021) 27. Lipu, M.H., et al.: Data-driven state of charge estimation of lithium-ion batteries: algorithms, implementation factors, limitations and future trends. J. Clean. Prod. 277, 124110 (2020) 28. Liu, K., Shang, Y., Ouyang, Q., Widanage, W.D.: A data-driven approach with uncertainty quantification for predicting future capacities and remaining useful life of lithium-ion battery. IEEE Trans. Indus. Electron. 68(4), 3170–3180 (2020) 29. Liu, Y., Li, J., Zhang, G., Hua, B., Xiong, N.: State of charge estimation of lithiumion batteries based on temporal convolutional network and transfer learning. IEEE Access 9, 34177–34187 (2021) 30. Ma, B., et al.: Remaining useful life and state of health prediction for lithium batteries based on differential thermal voltammetry and a deep-learning model. J. Power Sour. 548, 232030 (2022) 31. Ma, L., Hu, C., Cheng, F.: State of charge and state of energy estimation for lithium-ion batteries based on a long short-term memory neural network. J. Energy Stor. 37, 102440 (2021) 32. Ng, M.F., Zhao, J., Yan, Q., Conduit, G.J., Seh, Z.W.: Predicting the state of charge and health of batteries using data-driven machine learning. Nat. Mach. Intell. 2(3), 161–170 (2020) 33. Park, S., et al.: A deep reinforcement learning framework for fast charging of li-ion batteries. IEEE Trans. Transp. Electrif. 8(2), 2770–2784 (2022)
58
H. Elouazzani et al.
34. Pepe, S., Liu, J., Quattrocchi, E., Ciucci, F.: Neural ordinary differential equations and recurrent neural networks for predicting the state of health of batteries. Journal of Energy Storage 50, 104209 (2022) 35. Ren, L., Zhao, L., Hong, S., Zhao, S., Wang, H., Zhang, L.: Remaining useful life prediction for lithium-ion battery: A deep learning approach. Ieee Access 6, 50587–50598 (2018) 36. Saha, B., Goebel, K.: Battery data set (NASA AMES prognostics data repository, 2007) (2020) 37. Severson, K.A., et al.: Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy 4(5), 383–391 (2019) 38. Song, L., Zhang, K., Liang, T., Han, X., Zhang, Y.: Intelligent state of health estimation for lithium-ion battery pack based on big data analysis. J. Energy Stor. 32, 101836 (2020) 39. Tagade, P., et al.: Deep gaussian process regression for lithium-ion battery health prognosis and degradation mode diagnosis. J. Power Sour. 445, 227281 (2020) 40. Tian, J., Xiong, R., Shen, W., Lu, J.: State-of-charge estimation of lifepo4 batteries in electric vehicles: a deep-learning enabled approach. Appl. Energy 291, 116812 (2021) 41. Ungurean, L., Micea, M.V., Carstoiu, G.: Online state of health prediction method for lithium-ion batteries, based on gated recurrent unit neural networks. Int. J. Energy Res. 44(8), 6767–6777 (2020) 42. Wang, Y.X., Chen, Z., Zhang, W.: Lithium-ion battery state-of-charge estimation for small target sample sets using the improved GRU-based transfer learning. Energy 244, 123178 (2022) 43. Wang, Y.C., Shao, N.C., Chen, G.W., Hsu, W.S., Wu, S.C.: State-of-charge estimation for lithium-ion batteries using residual convolutional neural networks. Sensors 22(16), 6303 (2022) 44. Wei, Z., Quan, Z., Wu, J., Li, Y., Pou, J., Zhong, H.: Deep deterministic policy gradient-DRL enabled multiphysics-constrained fast charging of lithium-ion battery. IEEE Trans. Indus. Electron. 69(3), 2588–2598 (2021) 45. Xiao, F., Li, C., Fan, Y., Yang, G., Tang, X.: State of charge estimation for lithiumion battery based on gaussian process regression with deep recurrent kernel. Int. J. Elect. Power Energy Syst. 124, 106369 (2021) 46. Xing, Y., Ma, E.W., Tsui, K.L., Pecht, M.: An ensemble model for predicting the remaining useful performance of lithium-ion batteries. Microelectron. Reliab. 53(6), 811–820 (2013) 47. Xiong, R., Cao, J., Yu, Q., He, H., Sun, F.: Critical review on the battery state of charge estimation methods for electric vehicles. IEEE Access 6, 1832–1843 (2017) 48. Yang, F., Song, X., Xu, F., Tsui, K.L.: State-of-charge estimation of lithium-ion batteries via long short-term memory network. IEEE Access 7, 53792–53799 (2019) 49. Zhang, Q., et al.: A deep learning method for lithium-ion battery remaining useful life prediction based on sparse segment data via cloud computing system. Energy 241, 122716 (2022) 50. Zhao, F., Li, Y., Wang, X., Bai, L., Liu, T.: Lithium-ion batteries state of charge prediction of electric vehicles using RNNs-CNNs neural networks. IEEE Access 8, 98168–98180 (2020)
State of Charge Estimation of Lithium-Ion Batteries Using Extended Kalman Filter and Multi-layer Perceptron Neural Network Oumayma Lehmam1 , Saad El Fallah1 , Jaouad Kharbach1(B) , Abdellah Rezzouk1 , and Mohammed Ouazzani Jamil2 1 Université Sidi Mohamed Ben Abdellah, Faculté des Sciences Dhar El Mahraz, Laboratoire de
Physique de solide, B.P. 1796, 30003 Fès-Atlas, Morocco [email protected] 2 Université Privée de Fès, Laboratoire Systèmes et Environnements Durables, Lot., Quaraouiyine Route Ain Chkef, 30040 Fès, Morocco
Abstract. Lithium-ion batteries used by electric vehicle EV require a close monitoring to ensure the performance and safety of users. Determining the State of Charge (SOC) and the State of Health (SOH) of cells is one of the major challenges of the battery management system (BMS). However, these states are not directly measurable by sensors. Thus, they are only monitored and reflected as a function of measured parameters such as internal resistance, voltage, current, and temperature. Due to the non-linear dynamic electrochemical characteristics inside the battery, the accurate estimation of battery SOC/SOH remains a difficult task. This paper presents a comparative study between the extended Kalman filter (EKF) algorithm and multilayer perceptron (MLP) using different methods to estimate the SOC for battery Li-ion of 2.6 Ah capacity. First, the battery model was established then the EKF and MLP were applied to the model under the same current profiles. Finally, a comparative and critical study between the two methods is performed. The results of experience with different current profiles indicate that the EKF is effectively reliable in estimating the SOC but not enough as the MLP which offers a maximum error of less than 2%. Keywords: Lithium-ion battery · Battery Management System · State of Charge · Extended Kalman filter · Battery modeling · Electrical vehicle · Machine learning
1 Introduction The transportation sector, which is responsible for nearly a quarter of global CO2 emissions, sees electric vehicles as a solution to curb global warming and reduce urban pollution [1, 2]. Electric vehicles (EVs) offer a way to improve air quality and reduce fuel costs, global warming, and climate change issues [2]. However, the transition to electric transport comes with a new challenge: vehicles equipped with lithium-ion batteries can be particularly dangerous when they catch fire. These accumulators require close © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 59–72, 2023. https://doi.org/10.1007/978-3-031-43520-1_6
60
O. Lehmam et al.
monitoring to ensure the performance and safety of users [3]. Nowadays, this function is performed by the BMS. The determination of the state of charge (SOC) and the state of health (SOH) of cells is one of the main challenges of BMS. However, these states are not directly measurable by sensors. Thus, they are only monitored and reflected as a function of measured parameters such as voltage, current, temperature, and internal resistance [2]. Due to the non-linear dynamic electrochemical characteristics inside the battery, the accurate estimation of battery SOC/SOH remains a difficult task [4]. The SOC represents the relationship between the remaining capacity of the battery and the maximum available capacity and can be shown by Eq. (1). (1) SOC(t) = SOC(0) − I (τ )Cn d τ Cn is the nominal capacity of the battery, and I(τ) is the instantaneous current, positive when discharging and negative when charging [1]. Several methods have been suggested to estimate the battery SOC. Each one of these methods has a different performance when compared to the others. Among them, is Coulomb count (i.e., integral of current). It is based on continuous measurement of current and integration of the measured current over time. Known for its simplicity as it does not require a battery model [5]. However, it leads to high errors caused by the accumulation of integration errors in the measurement of the current and the difficulty to determine accurately the initial value of the SOC. Open circuit voltage (OCV) is also a conventional method. It consists of a voltage measurement when the cells are at rest and stabilized. This voltage is compared with a table directly giving the SoC according to the OCV voltage [6]. This method has a high accuracy but it cannot be applied during operation, and require a long time to achieve a stable state [7]. The second category versus the first uses the electrical model to estimate the SOC and relies on a closed-loop online observer such as the Kalman filter. Using sample data (Temperature, Voltage, Current), it recursively computes the minimum mean square error of the true SOC. This solves the issue of initial SOC and accumulated error. Nevertheless, it is only appropriate for linear systems. The EKF is therefore a nonlinear extension of the classic KF, specifically designed for systems with nonlinear dynamic models by means of Taylor series expansions [8]. This approach is the most common and widely used in real time battery management systems [5]. Other approaches for SOC estimation based on machine learning algorithms such as multilayer perceptron (MLP) are also suggested. It was found that a Neural network (NN) is a sufficient tool in predicting any nonlinear system [9]. The benefits of MLP are that; predicting SOC of lithium-ion battery does not necessitate any equivalent circuit model. The network is trained using voltage, current, and temperatures of cells as input and gives SOC as output. This paper presents a comparative study between the extended Kalman filter (EKF) algorithm and Multi-layer Perceptron (MLP) using the same data. Extracted from the battery simulation with current profiles. The remainder of this paper is organized as follows. Section 2 describes the model electric used for the battery. Section 3 presents an overview of the EKF algorithm and MLP model used to estimate the SOC. In Sect. 3.2, the simulation and results are described, and finally, the main conclusions are provided in Sect. 4.
State of Charge Estimation of Lithium-Ion Batteries
61
2 Battery Modeling The most commonly used model types are equivalent circuit models and electrochemical models [4, 7]. The electrochemical models are based on equations for the electrodynamic and chemical reactions of battery cells, are more complex than ECMs, and can be difficult to use due to the intensive computational requirements. ECM is a description of electrochemical phenomena using passive electrical components such as voltage sources, capacitors, and resistors. There are many ways to represent cell behavior using electrical components, for example the Rint model or Thevenin equivalent model. The Rint model consist of a voltage source (function of the SOC) and a resistor. Thevenin equivalent models use the Rint model basis with the addition of RC filters [11].
Fig. 1. Thevenin equivalent models
The Rint model is made of voltage source OCV, is a function of the SoC combined with a resistor R0 , and is regarded as the simplest ECM, shown in Fig. 1(a). Thevenin equivalent models Fig. 1(b) and Fig. 1(c) use the Rint model basis with the addition of RC filters [12]. It is found that three filters improve the model and represent the effect with more accuracy [13]. It has lower computing complexity.
62
O. Lehmam et al.
3 SOC Estimation 3.1 Extended Kalman Filter (EKF) The Kalman filter is a well-established algorithm that is used as an optimal estimator for estimating the internal states for dynamic system that are linear systems [14]. But For systems that are nonlinear systems, there is another version called extended Kalman filter. It is also basically a set of the same recursive equations as KF, consisting of prediction and correction steps. The prediction step uses previous measurements to estimate the internal state of the system, and the correction step uses current measurements to update the estimated state [9, 10]. The Fig. 2 shows a synoptic schema of EKF process.
Fig. 2. Block Diagram of Extended Kalman Filter Process
To demonstrate how the EKF works we need first define the model for nonlinear system [17]. , uk−1 xk|k−1 = f xk−1|k−1 + wk (2) zk|k−1 = h xk−1|k−1 + vk
vk is the process noise covariance and wk is the observation noise covariance, x k is the state vector of the system at discrete time k, and zk is the output of the system. At each time step, the two functions f and h are linearized by a first-order Taylor series expansion. Steps and initialization values of the EKF are presented as follows.
State of Charge Estimation of Lithium-Ion Batteries
Linearization:
∂f ∂x
|x k−1|k−1 ,uk−1 ∂h Hk = ∂x |x
Fk =
63
(3)
k−1|k−1
– Definition of the system process noises and measurement noises covariance matrices[18].
Qk = E(vk , vkT ) Rk = E(wk , wkT )
(4)
– Initialize the original parameters:
x0 = E[x0 x0T ] and T P0 = E (x0 − x0 x0 − x0 ]
(5)
Adaptation of the filter dynamics begins with proper initialization of the measurement covariance matrix (R) in addition to the error covariance matrix P [1]. – Estimate the predicted state:
xK|k−1 = f (xK−1|k−1 , uk−1 )
(6)
– Prediction of covariance matrix:
Pk|k−1 = Fk P k−1|k−1 FkT + Qk
(7)
– Update of Kalman gain [2]: K=
Pk|k−1 HkT Hk Pk|k−1 HkT + Rk
(8)
Kalman filters are characterized by a gain K that modifies the estimated state and allows to tune dynamics and performance of the filter. This gain changes from iteration to iteration[13]. – Update state estimation:
xk|k = xk|k−1 + K(yk − yk|k−1 )
(9)
– Predict estimate covariance:
P k|k = (I − KCk )Pk|k−1
(10)
The Kalman filter works similarly to other observers. Using measurements of the system’s inputs and outputs, it forms an estimate of the system state xˆ through minimizing the distinction among the measured and predicted outputs [4].
64
O. Lehmam et al.
Fig. 3. Equivalent circuit model (ECM)
The state of the ECM in Fig. 3 is set to be SOC. The voltage of the RC circuit is Vc , the current I t is the input of the model. It becomes negative during charging and positive during discharging, however the output is terminal voltage Vt. So, the dynamic ECM can be represented in state space as [17]: ⎧
⎪ 1 0 ⎪ SOC SOCk k+1 ⎪ = T ⎪ Vck + Vck+1 ⎪ ⎪ 0 e− τ ⎨ (11) − ηT ⎪
Cn T It,k + ωk ⎪ ⎪ Rc 1−e− τ ⎪ ⎪ ⎪ ⎩ Vt,k = Voc (sock ) − Vc,k − R0 It,k + νk where η is the cell Coulombic efficiency, Cn is the cell capacity, T is the sampling period, τ is the time constants of the Rc C. Both ωk and νk are white Gaussian random processes and are assumed to be mutually uncorrelated. First, we define xk = [sock Vc,k ] as the state vector of the system. Then linearize the nonlinear the system equations around the current operating point through the Taylor series expansion, ignoring terms of degree two or higher [8]. ⎧ ⎪ 1 0 ⎪ ∂f ⎨ = Fk = ∂x |x T k−1|k−1 ,uk−1 0 e− τ (12) ⎪ ⎪ ∂Voc (sock ) ⎩ Hk = ∂h | |soc=soc − 1 = ∂x x ∂sock
k−1|k−1
3.2 Multi-layer Perceptron The multilayer perceptron is network of connected artificial neurons, inspired by the brain neural networks, used for predicting the results of a dynamic system. it is applicable to wide variety of fields [8, 10]. An MLP network may have three types of nodes: inputs layer that take raw input from domain, hidden layers that take input from another layer and pass output to next layer, and output layer that make the prediction. Each one of them, except the input layer, processes information coming from nodes of the previous layer
State of Charge Estimation of Lithium-Ion Batteries
65
and sends the result to the next layer [3]. Each node or artificial neuron is connected to another node and has associated weights and thresholds. They communicate by sending signals to each other over weighted connections. When the output of an individual node exceeds a specified threshold (bias), then nodes are activated and send data to the next layer of the network. Otherwise, the data is not forwarded [11].
Fig. 4. Artificial neuron
To produce an output in neural network, each neuron Fig. 4 sums weighted received inputs as follow: zi[k] = j ωij[k] ∗ aj[k−1] + bj[k]
(13)
ai[k] = σ zi[k] =
(14)
1 [k]
1 + e−zi
where zi[k] is the input of neuron i in layer k, ai[k] is the activation function applied to zi[k] . it is defined by the sigmoid function σ, aj[k−1] is the activation function of neuron j of the previous layer k−1, ωij[k] is the weight associated with input j of neuron i of layer [k] k, and b[k] j is the bias of neuron j in layer k. The weights ωij of MLP are computed during the training phase by minimizing the loss function, which is typically a quadratic function of the output error [15]. In this work, the structure of the MLP used in Fig. 5 consists of an input layer, three hidden layers with 150 nodes in each hidden layer and output layers. The input layer consists of three nodes: current, voltage and temperature. The output layer contains only one node representing the SOC [20]. To define the output of nodes we chose to use the sigmoid function as activation function [21].
66
O. Lehmam et al.
Fig. 5. Multi-Layer-Perceptron (MLP)
4 Results and Discussion This section presents a comparative study of the two approaches: EKF and MLP. Firstly, to examine the accuracy and performance of the EKF to follow the actual SOC, a firstorder Thevenin model and an EKF algorithm have been developed in MATLAB/Simulink R2016a, the schematic of Fig. 7. It is composed of two main subsystems, the first one; is a functional block to simulate the Li-on battery with a capacity of 3.8 V/2.6 Ah it is represented as a non-linear state space and uses the parameters given in Table 1. Those parameters had been determined by [1] for the same battery but with current profile 2 Fig. 11. The second subsystem is also a functional block used for calculation the EKF filter algorithm. Table 2 contains the constants and the covariance matrices. They have been chosen after several tests to obtain the most convenient ones. Process noise and measurement noise were added respectively to the current and voltage. Figure 8 and Fig. 11 shows the two current profile 1 and 2 used in this simulation. With the profile 1 we get the result illustrated by Fig. 9. Showing how the estimated SOC tracks the actual SOC with an error not exceeding 3% Fig. 10. Using second profile which it is used by [1] we get approximately the same results Fig. 12 with an error not exceeding 4.5% Fig. 13. The extended Kalman filter estimation results presented in this paper were obtained by the same model used in the paper [1]. They used a constant charge/discharge current profile and estimated the state of charge with a single model. In our case, we tested a variable charge/discharge current profile as well as the constant charge/discharge current to evaluate the performance of the EKF and compared it to the results obtained with the MLP model. On the other hand, MLP presented different performance. To make an objective comparison both EKF and MLP approaches have to be subject to the same current profile and the same conditions. Therefore, MATLAB/Simulink software is firstly used to simulate only the battery without EKF in order to collect the data (Current, Voltage, Temperature, and SOC). Figure 6 illustrates the schematic of the simulation. There are 350 000 entries assembled during the time of the simulation. We saved 10% of those entries for comparing EKF and MLP and the others are used by scikit-learn library,
State of Charge Estimation of Lithium-Ion Batteries
67
an opensource library for machine learning developed in Python. This library in turn randomly split these data in two partitions for training and testing MLP. Finally, the same partition preserved above for comparing was used with EKF and MLP to carry out which one has the best accuracy. The results shown in Fig. 14 and Fig. 15 are obtained by the MLP model after several tests. It is important to note that MLP gives a better estimate of SOC than EKF with a maximum error of less than 2% .
Fig. 6. Simulation schematic
Fig. 7. Diagram of the battery model and EKF in Matlab/Simulink
68
O. Lehmam et al. Table 1. Parameters in the battery model [1]
Parameters
Values
R0
0.24
Rp
0.1
Cp
855 F
T
2s
Cn
2.6 Ah
SOCinit
1
Table 2. Parameters used for the simulation [1] Parameters P
Values 3e − 11
0
03e − 11 1e − 11 0 0
Q
0 R
1e − 11
1.52
Fig. 8. Current profile1
In general, the MLP is more powerful than the EKF when it comes to modeling nonlinear relationships, because it can represent any function to arbitrary accuracy, whereas the EKF is limited by its linearization process. Additionally, the MLP can be trained with a large dataset and generalize well, whereas the EKF relies on the precise knowledge of the system’s dynamics and the noise characteristics. Therefore, the MLP generally outperforms the EKF in estimating the SOC of a lithium-ion battery, especially when the relationship between inputs and outputs is nonlinear.
State of Charge Estimation of Lithium-Ion Batteries
Fig. 9. EKF estimation of the SOC with current profile 1.
Fig. 10. Estimation error of SOC using EKF with current profile 1.
Fig. 11. Current profile 2
Fig. 12. Estimation of the SOC with current profile 2
69
70
O. Lehmam et al.
Fig. 13. SOC estimation error with current profile 2
Fig. 14. MLP estimation of the SOC with current profile 1.
Fig. 15. Estimation error of SOC using MLP with current profile 1.
5 Conclusion State of Charge is an essential parameter for electric vehicles that determines the remaining charge of the Lithium-ion battery. The SOC also provides information on charging and discharging approaches to protects the battery from overcharging or over discharging [7]. This paper presents a comparative study between the extended Kalman filter (EKF) algorithm and multilayer perceptron (MLP) to estimate the state of charge (SOC) for a battery Li-ion of 3.8 V/2.6 Ah capacity. Primary, a first order Thevenin model was chosen to simulate battery dynamic in MATLAB/Simulink. Then EKF algorithm and MLP was applied to the model under the same profiles. The results of the implementation and simulation show that the MLP is effectively reliable in estimating the SOC with a maximum error of less than 2%. it presents the best performance and accuracy.
State of Charge Estimation of Lithium-Ion Batteries
71
References 1. Mazzi, Y., Ben Sassi, H., Errahimi, F., Es-Sbai, N.: State of charge estimation using extended kalman filter,” in 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), Fez, Morocco, pp. 1–6 (Apr. 2019). https://doi.org/10.1109/ WITS.2019.8723707 2. Wang, Z., Feng, G., Zhen, D., Gu, F., Ball, A.: A review on online state of charge and state of health estimation for lithium-ion batteries in electric vehicles. Energy Rep. 7, 5141–5161 (2021). https://doi.org/10.1016/j.egyr.2021.08.113 3. Zhang, S., Zhai, B., Guo, X., Wang, K., Peng, N., Zhang, X.: Synchronous estimation of state of health and remaining useful lifetime for lithium-ion battery using the incremental capacity and artificial neural networks. J. Energy Storage 26, 100951 (2019). https://doi.org/10.1016/ j.est.2019.100951 4. Taborelli, C., Onori, S.: State of charge estimation using extended Kalman filters for battery management system. In: IEEE International Electric Vehicle Conference (IEVC), pp. 1–8. IEEE, Florence (2014). https://doi.org/10.1109/IEVC.2014.7056126 5. Huria, T., Ceraolo, M., Gazzarri, J., Jackey, R.: High fidelity electrical model with thermal dependence for characterization and simulation of high power lithium battery cells. In: 2012 IEEE International Electric Vehicle Conference, Greenville, SC, pp. 1–8 (Mar. 2012). https:// doi.org/10.1109/IEVC.2012.6183271 6. Lebel, F.A., Messier, P., Sari, A., Trovão, J.P.F.: Lithium-ion cell equivalent circuit model identification by galvanostatic intermittent titration technique. J. Energy Storage 54, 105303 (2022). https://doi.org/10.1016/j.est.2022.105303 7. Lipu, M.S.H., Hannan, M.A., Ayob, A., Saad, M.H.M., Hussain, A.: Review of lithium-ion battery state of charge estimation methodologies for electric vehicle application. Int. J. Eng. Technol. 7, 3.17 (2018). https://doi.org/10.14419/ijet.v7i3.17.21909 8. Cheng, Z., Lv, J., Liu, Y., Yan, Z.: Estimation of state of charge for lithium-ion battery based on finite difference extended kalman filter. J. Appl. Math. 2014, 1 (2014). https://doi.org/10. 1155/2014/348537 9. Aydogmus, Z., Aydogmus, O.: A comparison of artificial neural network and extended Kalman filter based sensorless speed estimation. Measurement 63, 152–158 (2015). https://doi.org/ 10.1016/j.measurement.2014.12.010 10. Messier, P., Nguy˜ên, B.H., Lebel, F.A., Trovão, J.P.F.: Disturbance observer-based stateof-charge estimation for Li-ion battery used in light electric vehicles. J. Energy Storage 27, 101144 (2020). https://doi.org/10.1016/j.est.2019.101144 11. Pop, V.: Modeling battery behavior for accurate state-of-charge indication. J. Electrochem. Soc. 153(11), A2013 (2006). https://doi.org/10.1149/1.2335951 12. Li, A., Pelissier, S., Venet, P., Gyan, P.: Fast characterization method for modeling battery relaxation voltage. Batteries 2(2), Art. no. 2 (2016). https://doi.org/10.3390/batteries2020007 13. Lagraoui, M., Nejmi, A., Rayhane, H., Taouni, A.: Estimation of lithium-ion battery state-ofcharge using an extended kalman filter. Bull. EEI 10(4), 1759–1768 (2021). https://doi.org/ 10.11591/eei.v10i4.3082 14. Jiang, S.: A parameter identification method for a battery equivalent circuit model. Presented at the SAE 2011 World Congress & Exhibition, pp. 2011–01–1367 Apr. (2011). https://doi. org/10.4271/2011-01-1367 15. Hussein, A.A.: Kalman filters versus neural networks in battery state-of-charge estimation: a comparative study. IJMNTA 03(05), 199–209 (2014). https://doi.org/10.4236/ijmnta.2014. 35022 16. Wang, Z., Gladwin, D.T., Smith, M.J., Haass, S.: Practical state estimation using Kalman filter methods for large-scale battery systems. Appl. Energy 294, 117022 (2021) https://doi. org/10.1016/j.apenergy.2021.117022
72
O. Lehmam et al.
17. Xile, D., Caiping, Z., Jiuchun, J.: Evaluation of SOC estimation method based on EKF/AEKF under noise interference. Energy Proc. 152, 520–525 (2018). https://doi.org/10.1016/j.egypro. 2018.09.204 18. Paschero, M., Storti, G.L., Rizzi, A., Mascioli, F.M.F., Rizzoni, G.: A novel mechanical analogy-based battery model for soc estimation using a multicell EKF. IEEE Trans. Sustainable Energy 7(4), 1695–1702 (2016). https://doi.org/10.1109/TSTE.2016.2574755 19. Najeeb, M., Schwalbe, U., Bund, A.: Development of a dynamic model of lithium ion battery pack for battery system monitoring algorithms in electric vehicles. In: 2021 23rd European Conference on Power Electronics and Applications (EPE 2021 ECCE Europe), pp. P.1-P.10 (Sep 2021). https://doi.org/10.23919/EPE21ECCEEurope50061.2021.9570709 20. Zhai, X., Ali, A.A.S., Amira, A., Bensaali, F.: MLP neural network based gas classification system on Zynq SoC. IEEE Access 4, 8138–8146 (2016). https://doi.org/10.1109/ACCESS. 2016.2619181 21. Khalid, A., Sarwat, A.I.: Unified univariate-neural network models for lithium-ion battery state-of-charge forecasting using minimized akaike information criterion algorithm. IEEE Access 9, 39154–39170 (2021). https://doi.org/10.1109/ACCESS.2021.3061478
Pavement Crack Detection from UAV Images Using YOLOv4 Mat Nizam Mahmud1 , Nur Nadhirah Naqilah Ahmad Sabri1 , Muhammad Khusairi Osman1(B) , Ahmad Puad Ismail1 , Fadzil Ahmad Mohamad1 , Mohaiyedin Idris1 , Siti Noraini Sulaiman1 , Zuraidi Saad1 , Anas Ibrahim2 , and Azmir Hasnur Rabiain3 1 Centre for Electrical Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau
Pinang, Kampus Permatang Pauh, 13500 Bukit Mertajam, Pulau Pinang, Malaysia [email protected] 2 Centre for Civil Engineering Studies, Universiti Teknologi MARA, Cawangan Pulau Pinang, Kampus Permatang Pauh, 13500 Bukit Mertajam, Pulau Pinang, Malaysia 3 Thb Maintenance Sdn Bhd, 204 A, Jalan PSK 5, Pekan Simpang Kuala, 05400 Alor Setar, Kedah, Malaysia
Abstract. Cracking is one of the concerns that might affect the road condition in a bad way like causing accidents. Early detection of cracks improves the process of road maintenance where undesirable problems might be avoided and prevented. In general, the crack detection process is accomplished by human examination. However, this manual technique is tedious, time-consuming, arduous, and risky. This study proposed an autonomous crack-detecting system using Unmanned Aerial Vehicle (UAV) images using deep learning. It focuses on federal roads in Malaysia which is the road used to connect state capitals. A deep learning approach called You Only Look Once (YOLO) is applied to do the detection. Prior to the detection process, numerous image preprocessing techniques are implemented to prepare and increase the amount of dataset (images) for the training purpose. These stages are crucial for the data preparation in deep learning, to boost the deep learning detection performance and generalization ability. A YOLOv4 is constructed using a MATLAB environment and trained to detect cracks from the provided UAV images. Statistical measures like precision, recall, F1-Score, and Average Precision, are utilized to examine and evaluate the effectiveness of the suggested strategy. Two altitudes were utilized which are 10 m, and 20 m to maneuver the UAV. Simulation findings reveal that the suggested strategy obtained 82.02% of Average Precision (AP) at 10 m height for YOLOv3 and 87.98% at 10 m height for YOLOv4. Therefore, it can be stated that the YOLOv4 outperformed YOLOv3 in detecting pavement cracks from UAV images at 10 m height. Keywords: Deep learning · UAV images · YOLOv3 · YOLOv4 · Pavement crack
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 73–85, 2023. https://doi.org/10.1007/978-3-031-43520-1_7
74
M. N. Mahmud et al.
1 Introduction Road plays a critical and significant role in bringing people from all over the world together. It serves as the main medium for land transportation. Compared to water and air transportation, land transportation is the most frequently used in humans’ daily life. Buses, lorries, motorbikes, and cars are some of the most prevalent modes of ground transportation on roadways. As the road is considered an important asset, it is very mandatory to maintain its condition to sustain the quality so that it can last long. Road defects are defined as a condition where the structures of the road are unable to provide optimal service to the traffic above it. Several common types of road defects include cracks, potholes, and rutting. Road defects due to cracks might decrease road performance and create road safety issues as one of the common road concerns is cracking [1]. According to a study by Musa et al. [2], road defects is one of the major factors that cause road accident in Malaysia. Almost half of the respondents claimed accidents take place because they want to avoid potholes and road cracks to ensure their vehicles are not damaged [2]. Thus, it can be observed that road defects are indeed a serious problem and should not be taken lightly. Road maintenance and inspection are necessary to keep the good condition of the road. Crack detection is a part of the road maintenance process, and it has grabbed the attention of researchers in this field in recent years [1]. To date, the crack detection method is usually performed manually by human operators which consumes a lot of time and energy [1]. Besides, it also requires accurate information and detail about the type of road crack [3]. The procedure is less efficient because there might be an error that occurred during the human inspection. These factors cause the overall process to be complicated monitoring the road cracks manually. The process of analyzing and monitoring road conditions is quite challenging for road surveyors and engineers. It requires continuous and consistent observation and evaluation of the functions and features of the roads [4]. Road surface monitoring, especially crack detection on the road pavement’s surface, is a difficult job that is considered critical due to the essential nature of roads as modes of transportation [4]. Generally, the task is conducted manually, and the road inspection is done at the location. Nonetheless, the method has several drawbacks which make it less efficient. For instance, the overall operation is expensive and consumes a lot of time and energy [1, 3]. Apart from that, the traditional approach also seems to be unsafe and dangerous. Also, the information or detail of road crack is less accurate by using the manual method [5]. The accuracy of crack detection is very important for the maintenance of road surfaces [1]. Some parameters frequently used in crack analysis and monitoring to evaluate the condition of roads are quality, depth, type, shape, and length of cracks [4]. The inaccuracy of information will lead to increasing the cost of maintenance and poses a threat to road safety. An automated system for the detection of pavement cracks is crucial in this era of globalization. In recent years, many efforts have been undertaken to build a high-quality road infrastructure. Up to now, human inspection has been the common way to inspect or examine road quality. Due to the evolution of image processing techniques and computer technology, many automated approaches have been developed for road crack detection. Automatic detection systems can monitor road surface quality and facilitate prioritizing road network maintenance, extending the road’s lifespan [6].
Pavement Crack Detection from UAV Images Using YOLOv4
75
Automated pavement crack detection can be classified into three categories depending on the method of capturing images: using a specialized vehicle, smartphone, and UAV. A specialised vehicle is defined as a vehicle mounted with a camera. Huyan et al. [5] used a vehicle-mounted action camera and presented a pixel-wise crack detection system called Crack U-net that utilized U-shaped deep neural architecture. The model is trained using 2400 crack images collected from vehicle-mounted action cameras and smartphones and the best achievement was obtained with an accuracy of 99.01%, precision of 98.56%, recall of 97.98% and F-measure of 98.42%. It was also able to correctly extract the crack detail or information from the road surface images taking into account that the crack can be identified by a human. Jo and Jadidi [7] proposed an autonomous crack detection system using a combination of multi-layered image processing and deep belief learning. They used an iPhone SE with a 12-megapixel RGB camera to collect the data. Fan, et al. [8] proposed an approach for road crack detection based on CNN which is one of the deep learning algorithms. An architecture that has four convolutional layers with two max-pooling and three fully connected (FC) layers was used. All images were taken via an iPhone 5. Yang et al. [9] developed an automatic technique for the detection of cracks on the pavement known as Feature Pyramid and Hierarchical Boosting Network (FPHBN). The technique implemented a pixel-wise binary classification and consists of four elements which are bottom-up architecture, feature pyramid, side networks and hierarchical boosting. They utilized a smartphone for capturing pavement images. The proposed method yields the best results when compared with the other four methods which are Holistically nested Edge Detection (HED), Richer Convolutional Features (RCF), Fully Convolutional Networks (FCN) and CrackForest when applied on different datasets. Research on crack detection using UAV images is relatively new. Very little research was found in literature such as [4, 10]. Dadrasjavan et al. [4] introduced a method for automatic road crack detection using Unmanned Aerial Vehicle (UAV) images. UAVbased inspection can be regarded as a promising approach as it is much safer compared to sending the road engineers to collect data manually. The study applied an SVM classifier to determine the final cracks. Zhu et al. [10] also presented pavement crack detection based on a deep-learning approach using UAV images. Three object detection algorithms were used to compare the performance. Among those algorithms, YOLO v3 recorded the best performance with mean average precision (MAP) of 56.6%. YOLO v4 demonstrated 53.3% and Faster RCNN showed 48.8% in MAP accordingly. In general, YOLO v4 supposedly outperforms YOLOv3 because there are a few improvements made to the architecture. Yet, from the experiment conducted, the performance illustrated YOLOv3 achieved higher MAP than YOLO v4. Another study by Doshi and Yilmaz [11] proposed a state-of-the-art object detection algorithm called YOLO v4 to detect road damage which is crack. The model was trained and tested with 2 datasets of images. The F1-score recorded on dataset 1 is 0.628 while F1-score for dataset 2 is 0.6358. Even though the previous studies have utilised a deep learning approach for crack detection, the results are relatively low, especially on YOLO algorithms. Improvements
76
M. N. Mahmud et al.
are needed to properly tune the deep learning so that better detection performance can be achieved. On top of that, there is no study was conducted to perform crack detection on the road in Malaysia using UAVs. Therefore, this study proposes an automatic detection system using a deep learning approach known as YOLO via UAV to help ensure the safety of the road network. Early crack detection methods mainly depend on manual inspection which is time-consuming and requires the engagement of humans. By applying an automated system, the issues can be solved as the process becomes faster and does not need much human involvement. Besides, the data provided by the system are more accurate compared to the human inspection method. It also can conduct early detection and helps to reduce the cost of maintenance. Through early detection of cracks, the road repairing process can be done in a short amount of time. As a result, road accidents can be avoided.
2 Methodology This section explains the procedure or methodology implemented to develop an automatic road crack detection using a deep learning approach. The You Only Look Once (YOLO) object detection algorithm is utilized for this study. Figure 1 shows the summary of the road crack detection process which consists of four main stages. The process begins with the collection of data and then proceeds to the data preparation where image splitting, resizing and labelling are conducted. Lastly, crack detection is performed on the images and the performance is evaluated. Figure 2 displays the flowchart for the proposed method. Firstly, data is collected by capturing images of the road using a UAV. Next, the preparation of data is done on the captured images which includes splitting, resizing and labelling cracks in the images. The dataset is then divided into two segments: training and testing. The testing process will be done only if the performance of the developed YOLO is satisfied. Several parameters should be fixed or adjusted to optimize the network performance. Lastly, the performance of the tested model is evaluated.
Image Capturing and Data Collection
Data Preparation
Pavement Crack Detection
Performance Assessment Fig. 1. Block diagram of the proposed pavement crack detection using UAV
Pavement Crack Detection from UAV Images Using YOLOv4
77
Start
Image capturing and data collection
Data preparation
Development of YOLO
Training process
Performance satisfied?
No
Adjustment of parameters
Yes Testing process
Performance evaluation
End
Fig. 2. Detailed flowchart of the crack detection using YOLOv4
2.1 Image Capturing and Data Collection In this study, the images for the dataset were captured using a UAV. The drone model used is the Mavic 2 Pro. It has a 1-inch CMOS sensor that produces 5472 × 3648 image resolution. Figure 3 shows the drone that has been used for this study. The image format offered is JPEG or DNG (RAW). The maximum flight period of the drone is 31 min at a fixed speed of 25 km per hour (kph) and the limit time of hovering is 29 min. The furthest distance it can travel at a constant speed of 50 kph is 18 km. The altitude of the drone from the ground surface is set to 10 and 20 m. This is to ensure that fewer backgrounds or unwanted objects appear in the images. The higher the altitude of the UAV, the smaller the size of the road in the image. Besides, a minimum height also helps to avoid collision with obstacles such as birds and other flying objects. The camera is positioned perpendicular or 90° to the road surface. As for the UAV and camera limitation, the images were taken during the daytime and in good weather conditions in Perlis, Malaysia.
78
M. N. Mahmud et al.
In Malaysia, the road network mainly can be classified into two categories which are federal and state roads. The federal road is defined as the road used to connect state capitals and roads that lead to the country’s entry and exit points. The primary roads that provide intra-state travel between the district administrative centres are known as state roads. The type of road chosen for this study is federal roads. The data collection process is conducted with the assistance of the road and highway maintenance and construction company called THB Maintenance Sdn. Bhd. In total, 55 images with various pavement conditions (containing various noises such as grasses, water stains and shadows) were successfully captured using UAV.
a)
b)
c)
Fig. 3. The Mavic 2 Pro drone (a) Front view (b) Upper view (c) Remote controller
2.2 Image Preparation The road images acquired by the UAV contained single-lane crack information. The resolution of the collected pavement images was 5472 × 3648 pixels, requiring significant computation time to train the network model. Image preparation includes multiple procedures including image splitting, image resizing, image labelling, and data partitioning. Image Splitting. Raw captured images in the dataset are split into 8 images. The purpose is to create a larger dataset and to produce a clearer road image. Some examples of road images and their result of image splitting are shown in Fig. 4.
Pavement Crack Detection from UAV Images Using YOLOv4
79
(a)
(b) Fig. 4. Image splitting process (a) Original image and (b) Splitted original images
Image Resizing. After the original images are split, the new images will be downsized to 608 × 608 pixels to increase the detection accuracy and to adjust the input size necessary for the Darknet. Image Labelling. Data annotation, also known as image labelling, is defined as the process of labelling images to illustrate the outcome of prediction by the deep learning approach. The labelling task will be conducted using an image labeller in MATLAB software. A bounding box that represents the region of interest is drawn around the crack in the image. As a result, the location of the crack in the road image can be determined based on the coordinates generated from the bounding box. Image Partitioning. There is a total of 440 images in the dataset after the data preparation is done. To validate the proposed method, the dataset is divided into two smaller datasets which are training and testing datasets respectively. From the dataset, 106 images contain cracks. 70% of the crack images which is equivalent to 74 images are used for training purposes while the remaining 30% of images and 334 non-crack images are used for testing.
80
M. N. Mahmud et al.
2.3 Pavement Crack Detection You Only Look Once (YOLO) is one of the single-stage state-of-art object detection algorithms. The working principle of the object detector is to locate the object of interest in the given input image. It takes the input image and learns the object classification and the bounding box coordinates. The multi-stage detector, also known as a two-stage detector operation can be separated into two processes: the region proposal and classification [12]. The function of the proposal generator in a two-stage detector is to generate the region of interest and pass the information to the detection generator for bounding box regression and object classification [13]. Each model has its advantages and disadvantages. The single-stage object detection technique offers a high speed of performance but has lower accuracy rates. On the other hand, a multi-stage detector operates at a lower speed, yet provides high accuracy of detection.
Fig. 5. The YOLOv4 architecture for crack detection
Figure 5 shows YOLOv4 architecture models using UAV images. The models consist of backbone, neck and prediction or also known as head. YOLOv4 utilizes CSPDarkNet53 framework while YOLOv3 uses DarkNet53 framework as its backbone feature extractor. In this study, MATLAB R2022a with Deep Learning Toolbox was used to develop both the YOLOv4 and YOLOv3.
Pavement Crack Detection from UAV Images Using YOLOv4
81
2.4 Performance Evaluation To evaluate the performance of the proposed system, several statistical parameters such as precision, recall and F1-score are obtained. Crack pixels are referred to as positive samples whereas non-crack pixels are described as negative samples [14]. The results are classified into four categories which are: true positive (TP), true negative (TN), false positive (FP) and false negative (FN). Precision indicates the ratio of true positive to all predictive positive, as given in (1). Meanwhile, recall is the ratio of true positive to all actual positive, as in (2). F1-score is used to determine the accuracy from precision and recall values as shown in (3). It is a harmonic mean of precision and recall which is known as the model’s accuracy. The maximum possible F1 score is 1, showing flawless accuracy and memory, while the lowest possible score is 0, indicating neither precision nor recall [15]. Moreover, the Average Precision (AP) shows the form of the accuracy/recall curve to evaluate both precision and recall measures. The performance measures are given as follows: Precision = Recall =
TP TP + FP
TP TP + FN
2 × Precision × Recall Precision + Recall N AP = P(n)r(n)
F1 =
n=1
(1) (2) (3) (4)
where N is the number of all test images; P(n) is the precision of n images and Δr(n) is the change of recall.
3 Results and Discussion The performance of the proposed YOLOv4 is evaluated in detecting pavement cracks and further compared with the previous version, called YOLOv3. A total of 440 pavement images with two drone maneuver heights were selected. Four assessment measures were chosen as indications to evaluate the YOLOv3 and YOLOv4. The number of epochs and the initial learning rate were set to 90 and 0.001 respectively. Both values were determined analytically using trial and error and were found suitable for all analyses.
82
M. N. Mahmud et al.
Figure 6 shows examples of cracks that had been detected by the YOLOv4. The network searches the entire images, and when a crack is found, it will mark it with a yellow box and gives the confidence score belonging to the target. Results of the crack detection using YOLOv4. It can be seen that all cracks were successfully detected.
Fig. 6. Example of cracks that have been detected by YOLOv4 in various pavement conditions
Figure 7 shows the precision-recall curve of the YOLOv4. The precision-recall curve demonstrates how exact a detector is at varying recall levels. It determines the models’ average precision which is represented by the area under the curve. At all recall levels, the accuracy should ideally be 1. The algorithm attained an acceptable with an average precision of 88%.
Pavement Crack Detection from UAV Images Using YOLOv4
83
Fig. 7. Precision-recall curve of YOLOv4
This research also evaluates the performance of YOLOv4 with the previous version of YOLO called YOLOv3. For a fair comparison, the YOLOv3 was set to have similar tuning parameters as the YOLOv4. Table 1 compares the performance produced by YOLOv3 with YOLOv4. As can be seen in Table 1, YOLOv3 obtained 81.98%, 89.56% and 85.60% for recall precision and F1-score accordingly for the altitude of 10m. Meanwhile, at an altitude of 20m, YOLOv3 scored 76.92%, 81.07% and 78.94% for recall precision and F1-score respectively. The findings demonstrate that the recall, accuracy, and F1-score for 10m of YOLOv3 outperformed the results of the 20m. This demonstrates that the 10m height images provide better accuracy than the 20m height of YOLOv3. Meanwhile, the YOLOv4 achieved 89.87%, 94.44%, and 92.04% for recall precision and F1-score accordingly at an altitude of 10m. At the altitude of 20m, YOLOv4 scored 80.89%, 84.62% and 82.71% for recall, precision, and F1-score respectively. Table 1. YOLOv3 and YOLOv4 crack detection performance comparison at various altitudes. Detection method
Height
Recall
Precision
FI-score
AP
YOLOv3
10m
81.98
89.56
85.60
82.02
20m
76.92
81.07
78.94
78.03
10m
89.87
94.44
92.04
87.98
20m
80.89
84.62
82.71
81.48
YOLOv4
The results also show that the recall, accuracy, and F1-score of YOLOv4 for the altitude of 10m gain better performance than the 20m. It shows that the lower the maneuver altitude produces better results since the road cracks images are seen clearly. The YOLOv3 also demonstrates lower AP performance for both 10m and 20m (82.02% and 78.03%) as compared to YOLOv4 (87.98 % and 81.48%). This reveals that the YOLOv4
84
M. N. Mahmud et al.
with 10 m height has exceeded AP with 20 m height by 6.50 %. Meanwhile, the result of AP for YOLOv4 at the altitude of 10m has exceeded AP of 20m by 3.99%. In addition, for the total AP result, YOLOv4 with a 10-m height has exceeded the AP result for YOLOv3 with an altitude of 10m by 5.96%. This finding is consistent with the results in [15], which showed that the DarkNet53 feature extractor in YOLOv3 has difficulty in recognizing small objects. Overall, YOLOv4 was found to be better than YOLOv3 in terms of AP, Precision, Recall, and F1-Score.
4 Conclusion In this paper, a deep learning approach using YOLO was proposed to detect cracks from UAV images of federal roads in Malaysia. The study investigated the effect of UAV altitudes and YOLO models in performing the detection. The study concludes that the altitude of 10m provides a better crack appearance and hence improves the crack detection ability of the system. The newer version of YOLO, YOLOv4 was also found to be better at crack detection performance as compared to the previous version, that is YOLOv3. This study provides an initial assessment of the applicability of deep learning for crack detection from UAV images. The study was limited to two variants of YOLO. Future work can incorporate further evaluation utilising different deep learning object detection models, such as Single Shot Detector (SSD), Region-based Convolutional Network (RCNN), and RetinaNet. Acknowledgement. The authors acknowledge with gratitude the support and involvement of all parties, especially Universiti Teknologi MARA, Shah Alam and Cawangan Pulau Pinang. The authors are also grateful to Research Group Advanced Rehabilitation Engineering in Diagnostic and Monitoring (AREDiM) and Advanced Control System and Computing Research Group (ACSCRG) for their contribution and support. This research is funded by the Geran Insentif Penyeliaan (GIP), UiTM Grant No: GIP5/3 (034/2022).
References 1. Fan, R., et al.: Road crack detection using deep convolutional neural network and adaptive thresholding. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 474–479. IEEE (2019) 2. Musa, M.F., Hassan, S.A., Mashros, N.: The impact of roadway conditions towards accident severity on federal roads in Malaysia. PLoS ONE 15(7), e0235564 (2020) 3. Mandal, V., Uong, L., Adu-Gyamfi, Y.: Automated road crack detection using deep convolutional neural networks. In: 2018 IEEE International Conference on Big Data (Big Data) , pp. 5212–5215. IEEE (2018) 4. Dadrasjavan, F., Zarrinpanjeh, N., Ameri, A.: Automatic Crack Detection of Road Pavement Based on Aerial UAV Imagery (2019) 5. Huyan, J., Li, W., Tighe, S., Xu, Z., Zhai, J.: CrackU-net: a novel deep convolutional neural network for pixelwise pavement crack detection. Struct. Control. Health Monit. 27(8), e2551 (2020)
Pavement Crack Detection from UAV Images Using YOLOv4
85
6. Fakhri, S.A., Saadatseresht, M.: Road crack detection using gaussian/prewitt filter. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences (2019) 7. Jo, J., Jadidi, Z.: A high precision crack classification system using multi-layered image processing and deep belief learning. Struct. Infrastruct. Eng. 16(2), 297–305 (2020) 8. Fan, Z., Wu, Y., Lu, J., Li, W.: Automatic pavement crack detection based on structured prediction with the convolutional neural network. arXiv preprint arXiv:1802.02208 (2018) 9. Yang, F., Zhang, L., Yu, S., Prokhorov, D., Mei, X., Ling, H.: Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 21(4), 1525– 1535 (2019) 10. Zhu, J., Zhong, J., Ma, T., Huang, X., Zhang, W., Zhou, Y.: Pavement distress detection using convolutional neural networks with images captured via UAV. Autom. Constr. 133, 103991 (2022) 11. Doshi, K., Yilmaz, Y.: Road damage detection using deep ensemble learning. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 5540–5544. IEEE, (2020) 12. Carranza-García, M., Torres-Mateo, J., Lara-Benítez, P., García-Gutiérrez, J.: On the performance of one-stage and two-stage object detectors in autonomous vehicles using camera data. Remote Sens. 13(1), 89 (2021) 13. Soviany, P., Ionescu, R.T.: Optimizing the trade-off between single-stage and two- stage deep object detectors using image difficulty prediction. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 209–214. IEEE (2018) 14. Qiao, W., Liu, Q., Wu, X., Ma, B., Li, G.: Automatic pixel-level pavement crack recognition using a deep feature aggregation segmentation network with a scSE attention mechanism module. Sensors 21(9), 2902 (2021) 15. Nepal, U., Eslamiat, H.: Comparing YOLOv3, YOLOv4 and YOLOv5 for nomous landing spot detection in faulty UAVs. Sensors 22(2), 464 (2022)
Genetic Algorithm for CNN Architecture Optimization Khalid Elghazi1(B) , Hassan Ramchoun2 , and Tawfik Masrour1,3 1
2
National School of Arts and Crafts, Moulay Ismail University of Meknes, Meknes, Morocco [email protected], [email protected] National School of Business and Management, Moulay Ismail University of Meknes, Meknes, Morocco [email protected] 3 University of Quebec at Rimouski, Rimouski, Canada
Abstract. Convolutional Neural Network (CNN) is a famous type of deep feed-forward network that has proved a significant success in the area of computer vision. Despite this success, the choice of the optimal architecture for a given problem remains a challenging issue. The performance of CNN is significantly impacted by a few of its hyper-parameters. In this paper, the Genetic Algorithm (GA) is used to explore the architecture design space of convolutional neural networks. Most of the existing studies focus only on CNN hyperparameters namely filter number and filter size. The proposed algorithm extends existing research in this field by including the type of pooling. The MNIST dataset was used to evaluate our proposed method, and the simulations show the ability of GA to select the optimal CNN hyper-parameters and generate an optimal CNN architecture. Keywords: Convolution Neural Network · Genetic Algorithm hyper-parameters · CNN architecture optimization
1
· CNN
Introduction
Convolutional neural networks (CNNs) have provided state-of-the-art results in a large area of applications like object classification, object detection, and segmentation. One of the key factors influencing this success is the creation of several CNN architectures, such as GoogLeNet [19], ResNet [6], DenseNet [7], etc. All these architectures are in the framework of object classification. The majority of these architectures were manually created by experts who know this field. As a result, some researchers are interested in automating the creation of CNN architectures. This group includes the automatically designing CNN architectures [18], the genetic CNN [21], and Hierarchical Evolution [11], and the efficient architecture search method [4]. However, these networks’ accuracy is strongly linked to their structure and design. In fact, some researchers prove c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 86–97, 2023. https://doi.org/10.1007/978-3-031-43520-1_8
Genetic Algorithm for CNN Architecture Optimization
87
that selecting suitable values for these hyperparameters [9] significantly improves the performance of CNNs. The list of hyperparameters to optimize in the basic cases comprises the depth of CNN (the number of layers), the number and size of filters for each convolution layer, and the type of pooling in the subsampling layer. In [13], the authors aimed to optimize the number of filters and their sizes for each layer using the large dataset Caltech-256. Furthermore, LEIC [15] used the GA to improve the CNN architectures by optimizing not only the number of kernels in each layer but also the depth simultaneously. However, the crossover operator is discarded in the main part of LEIC. Without a crossover operator, GAs typically require a sizable population to explore the search space. On the other hand, evoCNN [17] developed the LEIC by using a new gene encoding strategy and a specific crossover. Additionally, a different representation strategy is created for efficiently initializing connection weights. Note that the number of possibilities to obtain the network structures is exponentially growing. So it is not practical to list all the candidates and choose the best one. For this reason, we can define this issue as an optimization problem in the large search space and apply genetic algorithms [1] to explore a sufficient number of architectures. In this study, we propose a strategy for using a genetic algorithm that seeks to explore search space and discover the optimal CNN hyper-parameters for a given classification task. In order to do this, we suggest an encoding technique that would encode each network structure as a fixed length. Next, we introduce many common genetic processes, such as selection, crossover, and mutation, which aim to improve population diversity and prevent premature convergence to local optima. Finally, we determine the quality of each individual by its accuracy on the MNIST dataset. The rest of this work is organized as follows. Firstly an overview of CNN is presented in Sect. 2. Next, we describe the process of designing network architectures using the genetic algorithm in Sect. 3. In Sect. 4, experimental results are shown. Finally, conclusions are given in Sect. 5.
2
CNNs-An Overview
In the domain of image recognition, CNN is the most popular and frequently utilized algorithm. The vital benefit of CNN is its ability to detect significant features without human intervention. A CNN structure that is frequently used deviates from Multi-Layer Perceptron (MLP) by combining numerous locally connected layers that seek to recognize features, while the final layers are fully connected layers that seek to classify images. Figure 1 shows an illustration of a CNN architecture for image classification. Convolution Layers: A convolution layer is the main component of a CNN architecture that contains a set of learnable filters. These filters are also known as "kernels". The objective of this layer is to extract image features. At the end of each convolution layer, we obtain a matrix known as the feature map [5] after sliding the filter across the input, as explained in (1) :
88
K. Elghazi et al.
Image
Convolution Layer RELU Layer P ooling Layer
Input layer
F eature extraction
F ully Connected Output Classes Layer
Classif ication
Fig. 1. Example of CNN architecture for image classification.
Y l = K l ⊗ X l−1
(1)
l
With K being the weights of filters in convolution layer l, and ⊗ denotes the operator of the convolution, X l−1 represents the input of the previous layer l −1. There are three factors that determine the feature map’s size, and they are as follows: (A) Depth: Depth refers to the number of filters employed during the convolution process. (B) Stride: Stride measures how many pixels our filter matrix slides across the input matrix. (C) Zero-padding: Zero-padding is a method that enables us to keep the original input size, we add zeros around the edges of the input matrix. Activation Function: After the convolution is finished, nonlinear features are extracted by applying an activation function to all values in the output of the convolution. In the literature, we have many activation functions like the ReLU [14], which is the maximum value between zero and the input value f (x) = max(0, y), and the function tanh or the sigmoid function. The output of layer l is determined by its activation function and is described as follows: (2) S l = F (Y l ) Subsampling or Pooling Layers: A Pooling layer is one of the fundamental components of a CNN architecture that is also named subsampling or downsampling. It aims to minimize the spatial size of the feature maps and retains the most important information. There are two most used methods of pooling: average pooling and max pooling.
Genetic Algorithm for CNN Architecture Optimization
89
Fully Connected Layers: After the convolution and pooling processes, we flattened the feature maps obtained in the last layer into a one-dimensional vector using a flattening operation vector, which is the input of the fully connected layer. This layer follows the basic method of MLP. Inside this layer, each neuron is connected to all neurons of the next layer and the previous layer. The output size of the last fully connected layer should match the number of classes in the problem. Finally, the softmax activation function can be used to compute posterior classification probabilities.
3
Proposed Approach
This section presents a genetic algorithm to select the optimal CNN hyperparameters that allow designing an optimal CNN model on the MNIST dataset [10]. Firstly we introduce a mechanism for representing individuals, containing their fixed and optimized parts. Following that, some genetic operations, such as selection, mutation, and crossover, are developed to contribute to traversing the search space and identifying the optimal solutions. In the proposed framework, each chromosome is a candidate solution representing a CNN architecture. The classification accuracy is chosen as the fitness function of a chromosome. In this case, the objective of the genetic algorithm is to compute the optimal hyper-parameter value that gives the least amount of error and consequently higher classification accuracy. The Algorithm 8 provides a summary of the proposed algorithm. Algorithm 1: General framework of the proposed algorithm Input : Training data Dtr , Testing data Dts , Number of epochs Np , Max. number of generations G, Population size N , Crossover probability Pc , Mutation probability Pm . Output: The discovered optimal architecture of CNN with its recognition accuracy (fitness) 1 P0 ← Randomly generates an initial population of N chromosomes and computing their fitness; 2 g ← 0; /* initialize a generation counter. */ 3 while g < G do 4 Selection: ← A binary tournament selection process for producing a new generation on the previous generation; 5 Crossover: ← For every pair of parents’ chromosomes, implementing crossover with probability Pc ; 6 Mutation: ← For every crossover individual making mutation with probability Pm ; 7 Evaluation: ← Evaluate the fitness of each new individual; 8 g ←g+1 ; Return: the best individual.
90
3.1
K. Elghazi et al.
Chromosome Encoding
In this work, each chromosome represents a candidate solution to the problem. Firstly, we fixed the depth of our CNN architecture on two blocks. In the first block, we have one convolution layer and one pooling layer, followed by the second block, which contains two convolution layers and one pooling layer. Then the genetic algorithm inputs are 2 × 3 + 2 variables to be optimized through the GA process. These variables 2 × 3 are pairs values of filter number per layer L and filter size K used in the same layer. The next 2 values are the code for each pooling layer, which refers to the pooling type P . Conceptually, our proposed encoding extends the existing encoding [8] by adding the type of pooling. The max pooling type is represented by a number 0, and the mean pooling type is represented by a number 1. Figure 2 illustrates an example of a chromosome. Conv1
L1
P ool1
K1
P1
Hidden Block1
Conv3
Conv2
L2
K2
L3
P ool2
K3
P2
Hidden Block2
Fig. 2. Example of chromosome encoding.
3.2
Population Initialization
As described in Sect. 2 a CNN is made up of convolution layers, pooling layers, and fully connected layers. Still, the fully connected layers are dropped in our suggested encoding method. As a result, only the convolution and pooling layers are used to build a CNN in the proposed encoding. As previously said, the parameters of a convolutional layer are the number of filters, and their respective sizes, the stride size, and the convolutional operation type. In the proposed approach, we use the same settings for the stride sizes and convolutional operations, respectively. The stride sizes, in particular, are set to 1 × 1, moreover, only one convolutional operation is employed. Then, the parameters encoded for a convolution layer are the number and the sizes of the filters. Additionally, the pooling layers used in the proposed encoding strategy will be 2 × 2 for the kernel sizes and 1 × 1 for the spatial stride. To this end, the parameter that will be optimized for a pooling layer is only the pooling type. 3.3
Selection
We use a selection mechanism at the start of each generation creation. A fitness function is given to the n−th individual Pt−1,n , prior to the t−th generation. This
Genetic Algorithm for CNN Architecture Optimization
91
Algorithm 2: Population Initialization
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Input : The population size N Output: The initialized population P0 (Architectures) P0 ← ∅ ; while |P0 | < N do L←2 Ind ← Create a list containing L blocks foreach block in Ind do if block.type = 1 then F N ← Uniformly generate an integer from 6..16 /* Filters number */ F S ← Uniformly generate an integer from 3..6 /* Filters size */ Ind ← CONV1(F N, F S) p ← Uniformly generate an integer from 0..1 if p = 0 then Ind ← POOL(max) /* Randomly choose max or mean */ else Ind ← POOL(mean) else F N1 , F N2 ← Uniformly generate two integers from 6..16 /* Filters number F S1 , F S2 ← Uniformly generate two integers from 3..6 /* Filters size Ind ← CONV1(F N1 , F S1 ) Ind ← CONV2(F N2 , F S2 ) p ← Uniformly generate an integer from 0..1 if p = 0 then Ind ← POOL(max)
*/ */
else Ind ← POOL(mean) P0 ← P0 ∪ Ind Return: P0
26
fitness function is defined as the accuracy obtained in the previous generation or initialization. In our case, we use binary tournament selection to identify which individuals are still alive, this implies that the best individual has the greatest chance of being chosen, whereas the worst individual will always be removed, and then, the next population Pt+1 is comprised of these selected individuals. additionally, the best individual is chosen, and to verify whether it has been placed into the next population Pt+1 . If not, it will replace the worst individual in Pt+1 , in order to determine the best architecture in each generation. Moreover, the number of
92
K. Elghazi et al.
individuals N does not change, and each individual of the previous generation may be chosen more than once, 3.4
Crossover
The crossover operation involves switching two individuals at once. It is a fundamental GA process that creates new offspring with some of each parent’s genetic structure. In the proposed approach the crossover is closest to the [18] crossover. Each parent is divided into two parts at random, and the two parts from the two parents are switched to create two children. Figure 3 shows two examples of the crossover. P ivot
P1 K1 L1 Hidden Block1
L1
K1
P1
L2
K2
L3
K3
P2
Hidden Block2
L2
Hidden Block1
K2
L3
K3
L1
K1
P1
P2
Hidden Block2
L1
K1
P1
L1
K1
P1
L2
K2
L2
K2
L3
K3
P2
L2
K2
L3
K3
P2
Hidden Block2 Of f spring
L3
K3
P2
Hidden Block2
Hidden Block1
K2
Hidden Block2
Hidden Block1
P arents P1 K1 L1 Hidden Block1
L2
Hidden Block1
L3
K3
Hidden Block2
L1
K1
P1
Hidden Block1
P2
L1
K1
P1
Hidden Block1
L2
K2
L3
K3
P2
K3
P2
Hidden Block2
L2
K2
L3
Hidden Block2
P ivot
Fig. 3. Examples of the two kinds of crossover.
3.5
Mutation
To increase the population’s diversity (having various network architectures) and ability to avoid local maxima, we apply the Mutation method to offspring created by the crossover mechanism. In the proposed algorithms, our mutation operation is a one-step process: Changing the parameter values of the chromosome gene at the selected position by a random number chosen from its corresponding range, as shown in Fig. 4.
Genetic Algorithm for CNN Architecture Optimization M utated
Original 8
3
1
5
16
12
5
0
Hidden Block2
Hidden Block1
8
3
1
5
0
Hidden Block1
11
3
12
Hidden Block1
4
12
5
0
Hidden Block2
M utated
Original 10
93
14
4
Hidden Block2
1
10
5
1
Hidden Block1
11
3
14
4
1
Hidden Block2
Fig. 4. Examples of mutation.
3.6
Evaluation
Algorithm 3: Fitness Evaluation Input : The individual (chromosome) ind, the numbers of epochs Np , the training data Dtr , the testing data (fitness evaluation data) Dts . Output: The fitness and its corresponding individual. 1 CN N ← Using the information encoded in ind, create a CNN with a classifier; 2 f its ← 0; 3 t ← 0; 4 while t < Np do 5 CN N ← Train the CN N on Dtr ; 6 t ← t + 1; 7 end 8 f its ← Calculate the classification accuracy on Dts ; Return: ind, f its . Algorithm 8 provides details for assessing individual fitness. After decoding CNN from the chromosome, we add a classifier according to the image classification dataset given. Afterward, a softmax classifier is added [2], and the particular image dataset given is used to determine the specific number of classes. Each output of the convolution is followed by a rectifier activation function. Moreover, the Stochastic Gradient Descent (SGD) algorithm [3] is used to train CNN using the training data Dtr . When the training phase is over, the individual’s fitness is set as the classification accuracy of the data set Dts .
4
Experiments
Due to its high computational resource requirements, the proposed genetic algorithm cannot be directly tested on large-scale datasets like ILSVRC2012 [16]. To find a solution, we evaluate the proposed algorithm on small datasets MNIST [10]. The parameter settings for the proposed algorithm are shown in Table 1.
94
K. Elghazi et al. Table 1. Parameters settings Categories
Parameters
Value ranges
Search space
Filter number Filter size Pooling
6,7,8...16 3,4,5,6 max,mean
SGD
Epochs Batch size Learning rate
2 128 0.01
Genetic process Population size Max generation number Crossover probability Mutation probability
10 10 0.7 0.3
Table 3 shows a summary of the results obtained by our proposed approach. We observe that all the important statistics about classification accuracy grow from generation to generation. Additionally, we provide the best network structures for each generation in the same Table 3. Figure 5 illustrates the evolutionary trajectory of the proposed method on the MNIST dataset. We also link the maximal and average classification accuracy for each generation using a dashed line and a solid line, respectively. In order to prove the effectiveness of the proposed algorithm and compare it with other methods, we adopted the same procedure used in[12]. We increase the number of epochs to 50 and use a batch size of 64 to retrain the best model. Furthermore, we add the fully connected part of LeNet-5 [10], and keep the same hyper-parameters related to the back-propagation. We propose to compare our results to five different methods, the first ones are those that were created by Lecun in 1998, named LeNet-1, LeNet-4, and LeNet5. These CNN architectures are dedicated to the problem of digit recognition. The last ones are evoCNN [17] and IPPSO [20], which are population-based algorithms. These algorithms are the most similar to our proposed algorithm (Table 2). Table 2. Comparison with existing models on MNIST Model
Testing error %
LeNet-1 [10]
1.7
LeNet-4 [10]
1.1
LeNet-5 [10]
0.95
evoCNN [17]
1.18
IPPSO [20]
1.13
Our Approach 0.73
Genetic Algorithm for CNN Architecture Optimization
95
Based on the results shown in Table 2, we can say that our method is more effective for the MNIST dataset. Table 3. Classification accuracy on the MNIST testing set Gen Max Acc Min Acc Avg Acc STD
Best Network Encoding
00
0.9685
0.9197
0.9573
0.01320 10-3-mean-16-6-15-3-mean
01
0.9715
0.9538
0.9629
0.00531 7-6-max-8-3-13-4-mean
02
0.9715
0.9538
0.9633
0.00477 7-6-max-8-3-13-4-mean
03
0.9715
0.9577
0.9666
0.00380 7-6-max-8-3-13-4-mean
04
0.9728
0.9660
0.9701
0.00263 7-6-mean-8-3-14-3-mean
05
0.9750
0.9728
0.9732
0.00091 9-4-mean-8-3-14-3-mean
06
0.9768
0.9728
0.9738
0.00163 10-3-mean-8-3-14-3-mean
07
0.9768
0.9718
0.9745
0.00199 10-3-mean-8-3-14-3-mean
08
0.9770
0.9768
0.9768
0.00009 10-3-mean-8-3-7-5-mean
09
0.9814
0.9727
0.9769
0.00194 11-4-mean-8-3-14-3-mean
Fig. 5. Evolution of the maximal and average classification accuracy over the generations. The bars indicate the maximal and minimal accuracies in the corresponding generation.
5
Conclusion
This study aimed to formulate the CNN hyperparameters as an optimization problem in a large search space, and the Genetic Algorithm is used to solve it. Firstly, we suggest a way to encode each network structure, then mutation
96
K. Elghazi et al.
and crossover operations are used to navigate the search space. In addition, the binary tournament selection is used to determine which individuals survive, and we used an efficient way to assess fitness. Our results demonstrate that the proposed method can discover an optimum CNN architecture for the MNIST dataset. The comparison of the obtained results with results from the literature has shown that our developments give satisfactory results.
References 1. Beheshti, Z., Shamsuddin, S.M.H.: A review of population-based meta-heuristic algorithms. Int. J. Adv. Soft Comput. App. 5(1), 1–35 (2013) 2. Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 4. Springer, New York (2006) https://link.springer.com/book/9780387310732 3. Bottou, L.: Stochastic gradient descent tricks. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 421–436. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_25 4. Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Efficient architecture search by network transformation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018) 5. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016) 6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 7. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) 8. Johnson, F., Valderrama, A., Valle, C., Crawford, B., Soto, R., Ñanculef, R.: Automating configuration of convolutional neural network hyperparameters using genetic algorithm. IEEE Access 8, 156139–156152 (2020) 9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) 10. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 11. Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017) 12. Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018) 13. Loussaief, S., Abdelkrim, A.: Convolutional neural network hyper-parameters optimization based on genetic algorithms. Int. J. Adv. Comput. Sci. App. 9(10), 1–15 (2018) 14. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010) 15. Real, E., et al.: Large-scale evolution of image classifiers. In: International Conference on Machine Learning, pp. 2902–2911. PMLR (2017) 16. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Genetic Algorithm for CNN Architecture Optimization
97
17. Sun, Y., Xue, B., Zhang, M., Yen, G.G.: Evolving deep convolutional neural networks for image classification. IEEE Trans. Evol. Comput. 24(2), 394–407 (2019) 18. Sun, Y., Xue, B., Zhang, M., Yen, G.G., Lv, J.: Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. 50(9), 3840–3854 (2020) 19. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 20. Wang, B., Sun, Y., Xue, B., Zhang, M.: Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2018) 21. Xie, L., Yuille, A.: Genetic CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1379–1388 (2017)
Enhancing Comfort and Security: A Chatbot-Based Home Automation System with Integrated Natural Language Processing and IoT Components Tarik Hajji(B) , Abdelkader Fassi Fihri, Ibtissam El Hassani, Salma Kassimi, and Chaima El Hajjoubi Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), ENSAM-Meknes, Moulay Ismail University of Meknes, Meknes, Morocco [email protected]
Abstract. People are relying more on technology to run their homes in modern times because of the world’s changing requirements for comfort and security, giving rise to the idea of domotics. The home automation system we propose in the following paper is intended to implement control and monitoring of household items such lighting systems, TVs, kitchen appliances, and security cameras; perform home surveillance; and provide home security. The main characteristic of this system is that it interacts with users via a chatbot that takes commands via text or voice messages and determines the actions to take using integrated natural language processing (NLP) offered by Google’s TensorFlow. The actuations, which take the form of turning on or off the devices, would be carried out by the IoT components. Keywords: Home Automation · Technology · Comfort · Security · Domotics · Control · Monitoring · Lighting Systems · Tvs · Kitchen Appliances · Security Cameras · Home Surveillance · Home Security · Chatbot · Text Messages · Voice Messages · Natural Language Processing · NLP · Google’s Tensorflow · Actuations · Iot Components
1 Introduction Home automation conjures up images of the future house of the 1970s: a house that runs itself and serves as an assistant [1]. With the development of technology areas and the increasing demand for efficiency, comfort, and security among people, that futuristic home that previously seemed like a utopian fantasy has become an attainable reality [2–4]. A wide range of people are drawn to home automation, some of whom are fans of cutting-edge technology looking for the newest smart toy to play with and some of whom are desperately in need of managing their busy and continuously complex houses [5–7]. The goal of this IoT-based home automation chatbot project is to enable users to remotely control their electric home appliances, thereby enhancing comfort and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 98–107, 2023. https://doi.org/10.1007/978-3-031-43520-1_9
Enhancing Comfort and Security: A Chatbot-Based Home Automation System
99
convenience, safety, and security. Users will be able to check their security cameras and receive alerts when the sensors detect unusual activity. The following system will accept user input inquiries given to the chatbot on its own Telegram account as text or voice messages. TensorFlow, an integrated NLP system from Google, will be used to process the input. The IoT components would translate the precise outputs it then provided, causing actuation.
2 Literature Review GSM, Bluetooth, and the internet are just a few of the control protocols that enable IoT devices to communicate to the internet and to one another in a home automation system [8]. Some of these technologies do, however, have significant drawbacks. In fact, Bluetooth-based solutions call for a 100-m range and a Bluetooth adaptor on each device. GSM-based systems, on the other hand, use microcontrollers to translate the primarily physical input that they get from sensors into commands that can be understood. However, several fuzzing attacks may happen while processing [9, 10]. Web browserbased Internet-based systems can also present significant security risks [11]. A chatbot, also known as a conversational agent, is a piece of software or a computer program that mimics and processes human communication through text or voice interactions [12]. It is powered by AI, automated rules, Natural Language Processing (NLP), and machine learning. Currently, industry 4.0, medicine 4.0, agriculture 4.0, and other domains are undergoing revolutionary changes as a result of the usage of AI [13–23]. An ideal chatbot
Fig. 1. Architecture of the proposed system
100
T. Hajji et al.
should be able to comprehend the conversation’s context and the user’s intents, learn from prior talks, and continuously develop. Voice assistants are chatbots that can converse verbally with human operators. They can interpret user-inputted spoken commands and translate them into actuation using natural language speech recognition [24]. The most widely used personal assistants now are Apple’s Siri, Google Assistant, and Amazon’s Alexa. A branch of linguistics, computer science, and artificial intelligence called “natural language processing” (NLP) studies how computers and human language interact, with a focus on how to train computers to process and analyze massive volumes of natural language data [25]. And the object-oriented programming paradigm, or OOP, which is based on the idea of “objects,” which can hold both data and code, namely data in the form of fields and code in the form of procedures, will be used in this project. Objects frequently have procedures connected to them that have access to and control over the object’s data fields. In the future work we will use information camming from big data context link in [26].
3 Proposed System 3.1 Architecture and Hardware Components The architecture diagram in Fig. 1 illustrates how all the modules in our system work together to manage the many parts of the proposed chatbot for home automation. A microcontroller Fig. 2 board called the Arduino Uno is based on the ATmega328P. (datasheet). It contains 6 analog inputs, a 16 MHz ceramic resonator (CSTCE16M0V53R0), 14 digital input/output pins (of which 6 can be used as PWM outputs), a USB port, a power jack, an ICSP header, and a reset button.
Fig. 2. Arduino Uno card
Enhancing Comfort and Security: A Chatbot-Based Home Automation System
101
The ESP32 Camera inf Fig. 3 is a small module built around the OV2640 and ESP32 chips. It provides the ESP32 with everything required to run, develop, and program on the wonderchip. With two powerful 32-bit LX6 CPUs, a 7-stage pipeline design, a main frequency adjustment range of 80 MHz to 240 MHz, an on-chip sensor, a Hall sensor, a temperature sensor, and other sensors, ESP incorporates WiFi, conventional Bluetooth, and BLE Beacon. The ESP32-CAM is broadly applicable in a variety of IoT applications. It is appropriate for IoT applications such as wireless positioning system signals, industrial wireless control, wireless monitoring, QR wireless identification, and smart home gadgets.
Fig. 3. ESP32-CAM
3.2 Software Components It’s a Pyfirmata Python Library: It is a Firmata protocol-based Python library. It eliminates the requirement for the Arduino language by enabling users to control an Arduino board from a Raspberry Pi or a computer using only Python. Natural language conversation-based human-computer interface technologies are created by DialogFlow, a Google-owned company. It’s used to build and include conversational user interfaces. Dialogflow in Fig. 4 is a tool for designing conversational experiences that are powered by AI, such as voice apps and chatbots. Intents are used by DialogFlow agents to categorize the user’s intentions. An end-intention users for one conversational turn is categorized by an intent. The following are components of a Basic Intent: These are a few examples of possible customer comments. For each intent, you can provide an action.
102
T. Hajji et al.
Dialogflow delivers the end-user expression’s extracted values as parameters when an intent match one at runtime. Responses: You specify the text, speech, or visual feedback you want to give the user. Dialogflow requires context to properly match an intent with an end-user expression and manage it. Events: Instead of relying on what an end-user communicates; events allow you to invoke an intent based on something that has already occurred [4]. An HTTP-based interface called the Telegram Bot API was made for developers interested in creating Telegram bots. Writing and running code on the device is done using the free source Arduino IDE software. Requests: For the Python programming language, it is an HTTP library. The project aims to simplify and improve the usability of HTTP requests. The latest update is version 2.27.1. The Apache License 2.0 governs the release of requests. One of the most well-known Python libraries that is not part of Python is called Requests.
Fig. 4. Dialog Flow Agent Architecture
3.3 Implementation First, we built a Telegram bot account with BotFather as we were using Telegram as our interface. We provided the username (@Housy housebot) and the name (Housy), and we were given an authentication token (Fig. 5). We built a DialogFlow agent and selected English as the medium of exchange. Then, using training phrases, we taught entities and intentions that we had generated. The agent was able to respond to commands like turning on and off appliances. Nevertheless, we turned on the “Small Talk” feature, which responds to banter, because we wanted it to be able to converse like a human like in Fig. 6.
Enhancing Comfort and Security: A Chatbot-Based Home Automation System
103
Fig. 5. Flow chart of the system
Pyfirmata was installed, enabling us to link the Python IDE with the Arduino card. Every pin on our Uno card, which was programmed using the Arduino IDE and C++ code, can be controlled by our Pyfirmata. Installing the Telegram Bot API and DialogFlow python libraries comes next. Next, we wired up the Arduino card with the appliances like in the Fig. 7. Since our software is object-oriented, we created a class called “house”, that contains several methods that we would utilize in the main.
Fig. 6. Creating the Telegram bot with the Bot Father
104
T. Hajji et al.
Fig. 7. Home automation circuit
Fig. 8. Action flowchart
4 Result and Analysis The system is currently customized for the kitchen, the bathroom, the bedroom, the living room, and the corridor. Each of these rooms has its own appliances: • • • • • •
The corridor: sound the alarm. The bedroom: a lamp and a TV. The kitchen: lamp and oven. The bathroom: a lamp. The living room: lamp, TV and security camera. There is also a presence sensor at the doorsteps.
The model provides the graphics for the ON and OFF movements by using LEDs to depict the TVs, lamps, and oven. The presence sensor, sound alarm, and security camera are all functional.
Enhancing Comfort and Security: A Chatbot-Based Home Automation System
105
The input, which may be a voice message or test message, is given to the model. In accordance with this classification, it can either be handled using the ‘handle text message’ function, which will process it using DialogFlow directly, or using the ‘handle voice message’ function, which will convert the audio message to a wave before processing it using DialogFlow. The response, which includes the state and the intent in both situations, is the output. The “response” function, which takes the state and intent as parameters, specifies which action has to be taken by the system and performs it when the intent is to conduct one of the planned actions. If the chatbot is instructed to carry out a task that doesn’t exist, such as turning on the TV in the toilet, the chatbot will refuse. The “on door activate” in Fig. 8 function records the presence sensor’s state change from false to true, snaps a picture with the security camera, and sends it to the user as a notification that an intruder has been found. The user can then select whether to activate the sound alarm or simply disregard the warning by pressing either of the “trigger alarm” or “ cancel” buttons that appear. The system activates the alarm when the user hits the “trigger alarm” button, and a “stop alarm” button displays so that the user can deactivate the alert as needed. The “handle buttons click” function was used in the design of each of these buttons. The system is hardcoded so that the commands are only carried out when the appropriate user issues them. It can still carry-on talks with every other Telegram user, though. Using inputs for both text and voice to answer to a discussion. The chatbot prompted the user to define the room when instructed to switch on the lights, showing that he is aware of the situation. Housy provided a message confirming that the lights were on in the kitchen when the room was indicated. The chatbot can carry on a lighthearted discussion thanks to the small talk capability. The presence sensor tripped and a notification was sent to warn the user that there might be an intruder. Once the presence sensor is activated, a security camera’s picture is automatically delivered to the user. Next, the chatbot asks whether to sound the alarm or not using inputs for both text and voice to answer to a discussion. The chatbot prompted the user to define the room when instructed to switch on the lights, showing that he is aware of the situation. Housy provided a message confirming that the lights were on in the kitchen when the room was indicated. The chatbot can carry on a lighthearted discussion thanks to the small talk capability.
5 Conclusion This research serves as an example of how AI may improve the comfort and security of human existence. The final goal of this project was to deliver an effective and reasonably priced home automation system that used several AI applications and was configured to accept commands via text and voice communications. The system can be modified to accommodate numerous houses as an improvement. Since anyone with a Telegram account can access Housy’s account and communicate with it, it is possible to use a dictionary to link each user’s account to the appropriate house, ensuring that when the system recognizes the user and links him to the house, the actions will be carried out in that particular house. The system could be improved by using a Raspberry because it wouldn’t need a computer to function.
106
T. Hajji et al.
References 1. Inoue, M., Uemura, K., Minagawa, Y., et al.: A home automation system. IEEE Trans. Consum. Electron. 3, 516–527 (1985) 2. Gill, K., Yang, S.-H., Yao, F., et al.: A zigbee-based home automation system. IEEE Trans. Consum. Electron. 55(2), 422–430 (2009) 3. AL-Ali, A.-R., Al-Rousan, M.: Java-based home automation system. IEEE Trans. Consum. Electron. 50(2), 498–504 (2004) 4. Malik, N., Bodwade, Y.: Literature review on home automation system. Ijarcce 6(3), 733–737 (2017) 5. Recek, B.: Razvoj wi-fi vhodno/izhodnih modulov za sisteme hišne avtomatizacije. Thèse de doctorat. [B. Recek] (2017) 6. Machul, D.P.: Building automation demonstrator based on the Domoticz environment. Thèse de doctorat. Instytut Elektrotechniki Teoretycznej i Systemów Informacyjno-Pomiarowych (2021) 7. Furma´nczuk, K.: Implementation of an automation system for an apartment based on Domoticz system. Thèse de doctorat. Instytut Sterowania i Elektroniki Przemysłowej (2022) 8. Al-Sarawi, S., Anbar, M., Alieyan, K., et al.: Internet of Things (IoT) communication protocols. In : 2017 8th International Conference on Information Technology (ICIT), pp. 685−690. IEEE (2017) 9. Deshmukh, S., Sonavane, S.S.: Security protocols for Internet of Things: a survey. In : 2017 International Conference on Nextgen Electronic Technologies: Silicon to Software (ICNETS2), pp. 71−74 IEEE (2017) 10. Attak, H. et al.: Shield: securing against intruders and other threats through an NFV-enabled environment. In: Zhu, S., Scott-Hayward, S., Jacquin, L., Hill, R. (eds.) Guide to Security in SDN and NFV. Computer Communications and Networks, pp. 197–225. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64653-4_8 11. Nguyen, H.Q.: Testing Applications on the Web: Test Planning for Internet-Based Systems. John Wiley & Sons (2001) 12. Lalwani, T., Bhalotia, S., Pal, A., et al.: Implementation of a chatbot system using AI and NLP. Int. J. Innov. Res. Comput. Sci. Technol. (IJIRCST) 6(3) (2018) 13. Tarik, H., Kodad, M., Miloud, J.E.: Digital movements images restoring by artificial neural netwoks. Comput. Sci. Eng. 10, 36–42 (2014) 14. Hajji, T., El Jasouli, S.Y., Mbarki, J., Jaara, E.M.: Microfinance risk analysis using the business intelligence. In: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 675–680. IEEE (2016) 15. Tarik, H., Jamil, O.M.: Weather data for the prevention of agricultural production with convolutional neural networks. In: 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), pp. 1–6. IEEE (2019) 16. Ouerdi, N., Hajji, T., Palisse, A., Lanet, J.L., Azizi, A.: Classification of ransomware based on artificial neural networks. In: Rocha, Á., Serrhini, M. (eds.) Information Systems and Technologies to Support Learning. EMENA-ISTL 2018. Smart Innovation, Systems and Technologies, vol. 111, pp. 384–392. Springer, Cham (2019). https://doi.org/10.1007/978-3-03003577-8_43 17. Hajji, T., Ouerdi, N., Azizi, A., Azizi, M.: EMV cards vulnerabilities detection using deterministic finite automaton. Procedia Comput. Sci. 127, 531–538 (2018) 18. Tarik, H., Mohammed, O.J.: Big data analytics and artificial intelligence serving agriculture. In: Ezziyyani, M. (eds.) Advanced Intelligent Systems for Sustainable Development (AI2SD’2019). AI2SD 2019. Advances in Intelligent Systems and Computing, vol. 1103, pp. 57–65. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36664-3_7
Enhancing Comfort and Security: A Chatbot-Based Home Automation System
107
19. Tarik, H., Tawfik, M., Youssef, D., Simohammed, S., Mohammed, O.J., Miloud, J.E.: Towards an improved CNN architecture for brain tumor classification. In: Serrhini, M., Silva, C., Aljahdali, S. (eds.) Innovation in Information Systems and Technologies to Support Learning Research. EMENA-ISTL 2019. Learning and Analytics in Intelligent Systems, vol. 7, pp. 224– 234. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36778-7_24 20. Hajji, T., Hassani, A.A., Jamil, M.O.: Incidents prediction in road junctions using artificial neural networks. In: IOP Conference Series: Materials Science and Engineering, vol. 353, no. 1, p. 012017. IOP Publishing (2018) 21. Douzi, Y., Kannouf, N., Hajji, T., Boukhana, T., Benabdellah, M., Azizi, A.: Recognition textures of the tumors of the medical pictures by neural networks. J. Eng. Appl. Sci. 13, 4020–4024 (2018) 22. Benabdellah, M., Azizi, A., Masrour, T.: Classification and watermarking of brain tumor using artificial and convolutional neural networks. Artif. Intell. Ind. Appl. Artif. Intell. Tech. Cyber-Phys. Digital Twin Syst. Eng. Appl. 144, 61 (2020) 23. Hajji, T., Masrour, T., Ouazzani Jamil, M., Iathriouan, Z., Faquir, S., Jaara, E.: Distributed and embedded system to control traffic collision based on artificial intelligence. In: Masrour, T., Cherrafi, A., El Hassani, I. (eds.) Artificial Intelligence and Industrial Applications: Smart Operation Management, vol. 1193, pp. 173–183. Springer International Publishing (2021). https://doi.org/10.1007/978-3-030-51186-9_12 24. Feng, H., Fawaz, K., Shin, K.G.: Continuous authentication for voice assistants. In: Proceedings of the 23rd Annual International Conference on Mobile Computing and Networking, pp. 343–355 (2017) 25. Kang, Y., Cai, Z., Tan, C.-W., et al.: Natural language processing (NLP) in management research: a literature review. J. Manag. Analy. 7(2), 139–172 (2020) 26. Hajji, T., Loukili, R., El Hassani, I., Masrour, T.: Optimizations of distributed computing processes on apache spark platform. IAENG Int. J. Comput. Sci. 50(2), 422–433 (2023)
The Impact of Systolic Blood Pressure Level and Comparative Study for Predicting Cardiovascular Diseases Kenza Douifir(B) and Naoual Chaouni Benabdellah Software Project Management Research Team, ENSIAS, Mohammed V University of Rabat, Rabat, Morocco {kenza_douifir,naoual.chaouni_benabdellah}@ensias.um5.ac.ma
Abstract. Data mining is a powerful tool applied to various domains such as the e-health field. For instance, the prediction of cardiovascular diseases is a subfield where Data Mining has contributed to automate the diagnosis and sometimes may be applied in the treatment stage of the disease. This paper aims to define the efficient classifier medical decision support system compared to four classification algorithms (K-Nearest Neighbors, Support Vector Machines, Multilayer Perceptron, and the decision tree algorithm C4.5). Those single techniques were combined for better results. Indeed, it was found that ensemble learning (stacking and bagging) is the chosen method that returns better results based on the training on two types of datasets. In addition, it is well known that blood pressure is one of the significant factors in cardiovascular diseases as well as the IMC for obesity, among others. The purpose is to specify the level of the factors that we dispose in our database that impacts the most the prediction. Data are processed for systolic blood pressure. It was found that systolic blood pressure above 129.5 mm Hg is highly impacting the development of a cardiovascular event. Under this value, other attributes are considered such as age and cholesterol LDL level. Keywords: cardiovascular disease · K-Nearest Neighbors · Support Vector Machine · Multilayer Perceptron · Decision Tree algorithm · ensemble learning · systolic blood pressure
1 Introduction The Healthcare industry today generates large amounts of complex data about patients, hospital resources, and disease diagnosis. The large amounts of data are a key resource to be processed and analyzed for knowledge extraction that enables support for decision making. Currently, doctors are in control of most of the patient interaction and diagnosis process. A computer program can access vast quantities of information from huge databases of medical records for specific areas or even on a global level, and since it is a program, it can make consistent unbiased decisions without other personal factors contributing to it. Looking at some previous trends, it is possible to detect some conditions that a © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 108–122, 2023. https://doi.org/10.1007/978-3-031-43520-1_10
The Impact of Systolic Blood Pressure Level and Comparative Study
109
doctor would most likely not have been exposed to or guess upon first look. Doctors can utilize these programs to make the best decisions by supporting their choices with solid historical data. Data Mining is a powerful technique to help doctors detect heart diseases and put the appropriate diagnosis. Data mining combines statistical analysis, machine learning, and database technology to mine hidden patterns and relationships from large databases. In this paper, data mining classification techniques were compared also they were evaluated based on three performance criteria. Each technique was compared with the ensemble learning method. Authors in [1] applied four classification techniques (SVM, KNN, ANN, and Random Forest) to some cardiovascular real-time data set collected from Jubilee Mission Medical College and Research Institute Thrissur. The result is that ANN is more efficient with an accuracy of 92.21%. They should have added some other performance metrics such as sensitivity and specificity to have more precise results. In addition, a study investigated the impact of the depth of neural networks on the accuracy in the classification of heart disease patients. A comparison between feed-forward network classifier accuracy and deep feed-forward network classifier accuracy was carried out. An intelligent deep learning model was developed and trained with stochastic gradient descent using the back propagation algorithm. The proposed model achieves a balanced accuracy of 91.43% and a high overall predictive value of 94.12% [2]. Some works wanted to make a comparison between heart disease prediction systems using data mining techniques such as [3]. This paper compared two classification tools MATLAB and WEKA, which have the same speed, to predict the presence of heart disease. They used the four Kernels concerning the Support Vector Machine (Linear, polynomial, RBF, and sigmoid) and found that the linear one outperforms the others. In general, the SVM algorithms perform better in MATLAB than in WEKA. Concerning the Decision Tree, it has better accuracy in Weka (67.7%). Another work focused on several data mining techniques such as association, classification, and clustering. They first applied the genetic algorithm to have the optimal subset of attributes and reduce the size of data, then found that the Decision Tree gives the best prediction using 14 attributes. This paper didn’t give the rate of performances of each algorithm [4]. Authors in [5] used a heart disease dataset to test the utility of ML approaches to heart disease prediction, finding that three classification algorithms KNN, Random Forest and Decision Tree performed extremely well with 100% accuracy. In addition, feature importance scores for each feature were estimated for all the applied algorithms except MLP and KNN. These features were ranked based on feature importance score. Furthermore, authors in [6] focused on the Indian population to build a machine learning based heart disease prediction system. The accuracy attained in this study is 93.8%. The prediction system developed in this research uses 13 clinical parameters and identifies the risk of a person to have heart disease. Compared with the studies done so far, this study has been carried out on Indian population, and the potential risk factors like high body weight, lack of exercise, psychological stress, family history, smoking and alcohol consumption habits have been considered in this study. Another work [7] did a mapping study over papers related to cardiology using Data Mining techniques. 142 empirical studies were selected after applying a selection process. The main aim of the chosen studies is to answer some research questions. The
110
K. Douifir and N. C. Benabdellah
results showed the increase in the works carried out in this topic of cardiology during the last decade. Around half of the elected studies used historical data sets for the evaluation of the models developed. Moreover, the most frequent medical task for which studies were conducted was the diagnosis of cardiovascular diseases. Many others proved that cholesterol and IMC for obesity are two significant factors to consider in the prediction [8]. This paper is organized as follows, Sect. 1 is an introduction to overview some related works. Section 2 presents the methodology considered in this work. Section 3 describes the data preparation and the pattern of data obtained. Section 4 presents the summary of the results after the process and contains a discussion on these results. Section 5 gives a conclusion and presents some future works.
2 Methodology Many works have been conducted to classify data of cardiovascular disease to predict human sickness as for example to apply hybrid classifiers like Decision Tree Bagging Method (DTBM), Random Forest Bagging Method (RFBM), K-Nearest Neighbors Bagging Method (KNNBM), AdaBoost Boosting Method (ABBM), and Gradient Boosting Boosting Method (GBBM) are developed by integrating the traditional classifiers with bagging and boosting methods, which are used in the training process [9]. In this present work, four algorithms are chosen: The K-Nearest Neighbor is a non-parametric classification method in machine learning. It works in a very simple way by taking into account the distance from the known data point. After gathering the K neighbors, the majority is taken and the unknown data are classified into that category. In other words, it estimates how likely a specific data is to be a member of one group or the other depending on what group the other data nearest to it [10]. A decision tree typically starts with a single node that branches into possible outcomes. Each of those outcomes leads to additional nodes, which branch off into other possibilities. This gives it a tree-like shape. The root from the first node or the root to a leaf makes the classification rule. The Support Vector Machine is a supervised machine learning algorithm where a hyper plane is selected to best separate the points in the input variable space by their class. The Multilayer Perceptron is a type of artificial neural network that is organized in several layers in which information flows from the input layer to the output layer only; it is, therefore, a feed-forward network. The Stacking method consists of combing the outputs of several learners applied to the same data, using a meta-classifier. The Bagging method or Bootstrap aggregation is designed to improve the stability and accuracy of machine learning algorithms. It also reduces variance and helps to avoid over fitting. In the Borda Count method, points are assigned to candidates based on their ranking (1 point for last, 2 points for second-to-last, and so on). The point values for all ballots are totaled, and the first one is the winner. This method will allow us to select the best technique based on three criteria. Moreover, the Grid Search is applied as an optimization method (hyper parameter optimization) that allows testing a series of parameters and comparing performance
The Impact of Systolic Blood Pressure Level and Comparative Study
111
to derive the best setting. Besides, the chosen tool to classify is Weka which is a Javadeveloped application for Data Mining, to process and analyze data. Many machine learning methods are integrated and all possible pre-processing, treatment, and visualization techniques are possible. The four classification techniques previously mentioned and the two ensemble learning methods are applied to two datasets. All these classification techniques have been applied previously using Weka default parameters than using Grid Search. Weka software provides the confusion matrix from which three performance criteria are computed. These criteria are accuracy, specificity, and precision. The objective is to compare the performances of the different techniques applied based on these three criteria. The confusion matrix provides there Positive, False Positive (predicted positive results), False Negative, and True Negative (predicted negative results). Concerning accuracy, it is the proportion of correct classifications (true positive and negative) from an overall number of cases. The Precision is the proportion of positive classifications (TP) from cases that are predicted as positive. In addition, specificity shows how well a binary classification test correctly identifies the negative cases. TP TP + FP
(1)
TN + TP TN + TP + FP + FN
(2)
TN TN + FP
(3)
Precision = Accuracy =
Specificity =
To measure the stability of the performance of the proposed model the data is divided into training data set and testing data with 10-fold cross-validation which is the method
Fig. 1. 10-cross validation process
112
K. Douifir and N. C. Benabdellah
used by default on Weka. This method consists of dividing the original sample into 10 samples, then selecting one of the 10 samples as the validation set while the other 9 samples constitute the training set (Fig. 1).
3 Data Description and Data Preprocessing 3.1 Data Description The first database used in this work was obtained from the Cleveland Clinic Foundation. The data set is made of 303 records. This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them shown in the Table 1. This dataset makes it possible to know the nature of the disease if it is present. Indeed, with the attributes concerning the electrocardiogram and by evaluating the values of the ST-T waves, it is possible to detect whether it is a coronary disease or simple arterial hypertension. The second cardiovascular disease detection dataset has 13 attributes. There are 3 types of input features: objective input which is the factual information, checking input which is the result of medical examination and subjective input that expresses information given by the patient. All of the dataset values were collected at the moment of medical examination. The data set is available in https://www.kaggle.com/datasets/ sulianova/cardiovascular-disease-dataset. Table 1. Attribute description of the first used dataset Attribute name
Attribute description
1
Age
age in years
2
Sex
sex (1 = male; 0 = female)
3
Cp
Chest pain type 1: typical angina 2: atypical angina 3: non-anginal pain 4: asymptomatic
4
Trestbps
resting blood pressure (in mm Hg on admission to the hospital)
5
Chol
serum cholestoral in mg/dl
6
Fbs
(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7
Restecg
Resting electrocardiographic results 0: normal 1: having ST-T wave abnormality 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria (continued)
The Impact of Systolic Blood Pressure Level and Comparative Study
113
Table 1. (continued) Attribute name
Attribute description
8
Thalach
maximum heart rate achieved
9
Exang
exercise induced angina (1 = yes; 0 = no)
10
Oldpeak
ST depression induced by exercise relative to rest
11
Slope
the slope of the peak exercise ST segment
12
Ca
number of major vessels (0–3) colored by flourosopy
13
Thal
3 = normal; 6 = fixed defect; 7 = reversable defect
14
Num
Diagnosis of heart disease ( angiographic disease status) 0: healthy 1: presence of a heart disease
Table 2. Attribute description of the second used dataset Attribute name
Attribute description
1
Id
Id of the patient
2
Age
Age in days
3
Height
Height in cm
4
Weight
Weight in kg
5
Gender
Gender (categorical code)
6
ap_hi
Systolicblood pressure
7
ap_lo
Diastolicblood pressure
8
Cholesterol
1: normal, 2: above normal, 3: well above normal
9
Gluc
Glucose 1: normal, 2: above normal, 3: well above normal
10
Smoke
Smoking (binary)
11
Alco
Alcohol intake (binary)
12
Active
Physical activity (binary)
13
Cardio
Presence or absence of cardiovascular disease
The database in Table 2 contains the majority of cardiovascular risk factors that allow physicians to detect the presence of heart disease. The absence of a family history of
114
K. Douifir and N. C. Benabdellah
cardiovascular accidents is noted in this dataset. These factors are defined as a clinical or biological condition that increases the risk of a given cardiovascular event. 3.2 Data Integration and Merging Process Since both datasets have significant attributes in common, namely age, gender, cholesterol level, systolic blood pressure and blood sugar level, a new dataset containing these attributes was created by merging the two Table 3 summarizes the chosen factors with the data integration process made to homogenize data and to fusion record for a third new data set. Table 3. Attributes of the new third dataset Attribute name in the first dataset
Attribute name in the second dataset
Attribute name in the third dataset
Description
Age
Age
ageindays
Age of the patient in days
Sex
Gender
gender
Sex (1 = male; 0 = female)
Trestbps
ap_hi
bpsys
Systolic blood pressure (in mm Hg on admission to the hospital)
Chol
Cholesterol
chol
1: normal 2: above normal
Fbs
Gluc
gluc
Glucose 1: normal 2: above normal
Num
Cardio
cardio
Diagnosis of heart disease 0: healthy 1: presence of a heart disease
The age in the first dataset is done per days and in the other data set it is per year, the decision is to switch all to days instead. To assess the level of risk for heart disease, doctors order a lipid panel, which is a blood test that targets the fatty compounds in the blood: cholesterol and triglycerides. A lipid panel is an examination of the different lipid compounds present in the blood, namely: • Total cholesterol (a fatty substance that is a component of cell membranes and is used in the synthesis of steroid hormones);
The Impact of Systolic Blood Pressure Level and Comparative Study
115
• LDL-cholesterol, which is considered the “bad” cholesterol. The fatty substance is indeed linked to transporters, the LDL (for low-density lipoproteins) which navigate from the liver to the rest of the body; • HDL-cholesterol, known as the “good” cholesterol. It is linked to HDL (for highdensity lipoproteins) which circulates towards the liver. This is where the cholesterol is stored; • Triglycerides (a type of fat that constitutes an important energy reserve and that comes essentially from sugars and alcohol ingested in large quantities). The two datasets in our possession contain as an attribute the total cholesterol. A total cholesterol level lower than 5.2 mmol/L which is equal to 201 mg/dl is considered normal. This information was used to homogenize the cholesterol values in the database and categorize them as in the second dataset. What will be considered is numerical data related to each field and the prediction will be compared to the overall results and analyse the impact. 3.3 Data Preprocessing Concerning the first dataset, all the attributes are significant and give a major meaning in the prediction of coronary diseases. We therefore kept them all in this work. The second dataset holds no missing data. The Weka tool is used to calculate the number of missing values as shown in Fig. 2 concerning the cardio attribute.
Fig. 2. Number of missing values concerning the cardio attribute in the second dataset
The “id” attribute is of little use in this database for predicting heart disease. We therefore prefer to delete it. A database is obtained whose 12 attributes are all related to cardiovascular risk factors.
4 Results and Discussion 4.1 Comparison Between Single Techniques and Ensemble Learning Methods After applying respectively the 19 classification techniques (four single techniques, bagging and stacking using all the combinations of the four algorithms) with the Weka default parameters and then using Grid Search, their performances have been calculated. The Figs. 3, 4, 5 and 6 represent the performances of the different classification techniques used and applied to both datasets. For the first one, we notice that all the
116
K. Douifir and N. C. Benabdellah
Fig. 3. Performances of the 19 classifiers using WEKA default parameters applied to the first dataset
Fig. 4. Performances of the 19 classifiers using WEKA default parameters applied to the second dataset
techniques have a performance rate greater than 0.7 and does not exceed 0.85 when using WEKA default parameters. Concerning the second dataset, performances are between 0.698 and 0.87.The grid search technique gave different results, where performances are higher than the first ones. The Table 4 presents the ranking of the different classification techniques used and applied on the two datasets, based on three criteria (accuracy, precision, specificity).
The Impact of Systolic Blood Pressure Level and Comparative Study
117
Fig. 5. Performances of the 19 classifiers using GRID SEARCH applied to the first dataset
Fig. 6. Performances of the 19 classifiers using GRID SEARCH applied to the second dataset
Table 4. Classifiers ranked at the top five positions of Borda count Dataset 1
Dataset 2
Weka default parameters
Grid Search
Weka default parameters
Grid Search
Rank
Classifier
Rank
Rank
Rank
1
KNN/SVM
1
KNN/SVM
1
SVM
1
SVM/ MLP
2
SVM
1
SVM
2
BAGSVM
2
SVM
3
J48/SVM
2
J48/SVM
3
SVM/KNN/J48
3
SVM/J48/MLP/KNN
4
MLP/SVM/J48
3
BAGSVM
4
KNN/SVM
4
BAGSVM
5
BAGSVM
4
MLP/SVM/J48
4
SVM/J48
5
SVM/KNN
Classifier
Classifier
Classifier
For the stacking method, the four techniques (KNN, SVM, J48, and MLP) were combined two by two, three by three, and the four of them. The SVM easily handles
118
K. Douifir and N. C. Benabdellah
nonlinear data points. The combination of SVM with all the other algorithms gave the best performances based on the results in this paper. For the first dataset, the best remains the combination between KNN and SVM using the Stacking method either with the default settings of Weka or using Grid Search with a rate of 84% in accuracy and precision. Concerning the second dataset, the algorithm that outperforms is the SVM when using the Weka default parameters (with an accuracy of 86%), and the combination of SVM and KNN when using the Grid Search method (with an accuracy of 82%). In addition, in this work, we notice that heterogeneous ensembles perform better than homogenous ones. A homogeneous ensemble is a set of algorithms of the same type applied to different sets of data, while a heterogeneous ensemble combines using a meta to classifier several algorithms applied to the same data. It is found that no learner is absolutely better than the other learner; there is no good learner for many problems. If we have many multiple classifiers, even if they are weak, they may make independent errors. So, all the learners which are different may work well in different parts of the data. And by combining these weak learners, it is possible to get a strong learner. Talking about bias and variance, ensembling provides us a possibility to have a low bias and variance. Bagging is known for reducing variance. For instance, KNN is a lazy and stable learner, which explains why bagging is not that efficient for this classifier. 4.2 The Impact of Systolic Blood Pressure Level in Predicting Cardiovascular Diseases In the present work, in addition to process the techniques at each data set, the possibility to process for the common factors is relevant as well. After merging the two datasets and generate a third database having for attributes those in common, this part aims to define a threshold from which a patient is more likely to be at risk of contracting a form of coronary disease, and this through one of the machine learning techniques. For this reason, the Decision Tree algorithm is the most suitable to achieve this distinction. In order to generate clear information from the tree, its depth has been set to 4 as shown in Fig. 7.
Fig. 7. Decision tree parameters
The Impact of Systolic Blood Pressure Level and Comparative Study
119
Fig. 8. Visualization of the top part of the decision tree
Note that gini is a synthetic indicator that gives an account of the level of inequality for a given variable and population. It varies between 0 and 1. According to the tree represented in Figs. 8 and 9, it is shown that the threshold of systolic blood pressure from which a given patient is more likely to develop a cardiovascular disease is estimated at 129.5. Strictly above this value, the risk is high. Otherwise, the case where the cardio class = 1 and therefore the patient is ill is defined as follows: bpsys = 22258 or (ageindays = 1.5)). This agrees with the values used by physicians. According to the US Centers for Disease Control, uncontrolled HTN (hypertension) rates are rising in the US, with nearly half of adults in the US (108 million, or 45%) having HTN defined as a systolic BP ≥ 130 mm Hg or a diastolic BP ≥ 80 mm Hg or are taking medication for hypertension [11]. Furthermore, in order to determine the prevalence of hypertension (HT) and other cardiovascular risk factors, a study was conducted among the population of Brazzaville in Africa [12]. It is found that for patients suffering from hypertension, the mean systolic blood pressure (BP) was 134.5 mm Hg, 131.8 mm Hg and 130.6 mm Hg respectively for the measurement 1, measurement 2 and measurement 3. Another study aimed to determine the 24-h mean systolic BP (SBP) threshold at which cardiovascular risk increases in individuals who are intensively treated to control their cardiovascular risk factors in Israel [13]. The blood pressure threshold on 24H ABPM (ambulatory blood pressure measurement) was established as 130/80 mm Hg for 24 h, 135/85 mm Hg for daytime, and 120/75 mm Hg for nighttime.
120
K. Douifir and N. C. Benabdellah
Fig. 9. Visualization of the bottom part of the decision tree
5 Conclusion and Future Works Considering the importance of cardiovascular diseases and the high mortality rate they cause, it is compelling to carry out studies to find an intelligent system according to doctors an effective solution to properly diagnose diseases and find them a convenient treatment. Thanks to the first dataset used, it is possible to predict the presence or absence of any heart disease due to cardiovascular risk factors known in the medical world. The second allows knowing the type of disease by analyzing the values related to the electrocardiogram. Indeed, in this work, a comparative study was carried out between
The Impact of Systolic Blood Pressure Level and Comparative Study
121
the different classification techniques, in particular the single techniques (KNN, J48, MLP, and SVM) and the ensemble learning algorithms (Bagging and Stacking). As a result the comparison between the 19 classification techniques, the combination between SVM and different other algorithms has proven to be the most efficient, compared to the other combinations between other single techniques. As future works, we can then apply the same strategy on a real database which will include all the cardiovascular risk factors as attributes, by adding those relating to the ultrasounds made by the doctors and the body mass index. This work was applied on the overall factors in both data sets. The performed technique (Decision Tree) was applied per factors in common as the systolic blood pressure. It is possible to proceed per factor without merging data together but the purpose of the fusion is to process in more significant amount of data for better results. It is found that level of systolic blood pressure from which the risk of getting a heart disease is high is estimated at 129.5 mm Hg. As an extension of this work, another application on different factor may be processed and a comparison rate in term of precision, accuracy and specificity will allow us to order the impact per factor at our disposal.
References 1. Shaji, S.P.: Prediction and diagnosis of heart disease patients using data mining technique. In: International Conference on Communication and Signal Processing (ICCSP), pp. 0848–0852 (2019). https://doi.org/10.1109/ICCSP.2019.8697977 2. Kaddour, A.A., Elyassami, S.: Implementation of an incremental deep learning model for survival prediction of cardiovascular patients. Int. J. Artif. Intell. 10(1), 101–109 (2021). https://doi.org/10.11591/ijai.v10.i1.pp101-109 3. Erdogmus, P., Ekiz, S.: Comparative study of heart disease classification. EBBT (2017). https://doi.org/10.1109/EBBT.2017.7956761 4. Vivek, E.M., et al.: Heart disease diagnosis using data mining technique. In: International Conference on Electronics, Communication and Aerospace Technology ICECA (2017). https:// doi.org/10.1109/ICECA.2017.8203643 5. Paul, B.K., Ahmed, K., Ali, M.M.: Heart disease prediction using supervised machine learning algorithms: performance analysis and comparison. In: Computers in Biology and Medicine, vol. 136 (2021). https://doi.org/10.1016/j.compbiomed.2021.104672 6. Venkateswarlu, B., Maini, B., Marwaha, D., Maini, E.: Machine learning based heart disease prediction system for Indian population: an exploratory study done in South India. Med. J. Armed Forces India, 0377–1237 (2020). https://doi.org/10.1016/j.mjafi.2020.10.013 7. Kadi, I., Fernandez-Aleman, J.L., Idri, A.: Systematic mapping study of datamining-based empirical studies in cardiology. Health Inform. J. 25(3), 741–770 (2019). https://doi.org/10. 1177/1460458217717636 8. da SilvaIvan, M.A.M., et al.: Frequency of cardiovascular risk factors. 4(59) (2013). https:// doi.org/10.1016/j.ramb.2013.02.009 9. Ghosh, P.: Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques(2021). https://doi.org/10.1109/ACCESS. 2017 10. Boyd, C.: Machine learning quantitation of cardiovascular and cerebrovascular disease: a systematic review of clinical applications (2021). https://doi.org/10.3390/diagnostics1103 0551
122
K. Douifir and N. C. Benabdellah
11. German, C., Agarwala, A., Satish, P., Iluyomade, A., Bays, H.E.: Ten things to know about ten cardiovascular disease risk factors – 2022 (2022). https://doi.org/10.1016/j.ajpc.2022.100342 12. Bakekolo, R.P., et al.: Prevalence of arterial hypertension and others cardiovascular risk factors and their relationship with variations of systolic and diastolic blood pressure at Brazzaville (Republic of the Congo). Arch. Cardiovasc. Dis. Suppl. 12(1) (2019). https://doi.org/10.1016/ j.acvdsp.2019.09.390 13. Leshno, M., Shlomai, G., Leibowitz, A., Sharabi, Y., Grossman, E., Rock, W.: The association between ambulatory systolic blood pressure and cardiovascular events in a selected population with intensive control of cardiovascular risk factors. J. Am. Soc. Hypertension 8(7) (2014). https://doi.org/10.1016/j.jash.2014.03.331
Contribution to Solving the Cover Set Scheduling Problem and Maximizing Wireless Sensor Networks Lifetime Using an Adapted Genetic Algorithm Ibtissam Larhlimi(B) , Maryem Lachgar, Hicham Ouchitachen, Anouar Darif, and Hicham Mouncif Laboratory of Innovation in Mathematics, Applications and Information Technology, Polydisciplinary Faculty, University Sultan Moulay Slimane, Beni-Mellal, Morocco [email protected]
Abstract. Wireless sensor networks (WSNs) are a rapidly developing field of study with many uses. Sensor nodes typically have restricted energy in utility applications. To successfully increase the network lifetime, energy consumption must be managed. This paper aims at programming sensor availability and activity to enhance network coverage lifetime. The typical strategy is to consider subsets of nodes that cover each target continually. These subsets often referred to as cover sets, are then shifted to active mode while the rest are in low-power or sleep mode. This problem is NP-hard and is known as the Maximum Coverage Set Scheduling Problem (MCSS). In this research, the genetic algorithm is adapted to prolong WSN lifetime. The proposed method was compared with Greedy-MCSS and MCSSA algorithms. The simulation results demonstrate the importance and beneficial effects of using genetic algorithm in our solution. Keywords: Wireless Sensor Network · Network Lifetime Cover set Scheduling · Genetic Algorithm
1
· Coverage ·
Introduction
Nowadays, innovation is progressing at a rapid pace. At first, sensors were used to measure and monitor a physical value at the level. With the advent of the Internet and research into wireless technologies [1], these sensors were equipped with connectivity, giving rise to wireless sensor networks. IoT is considered the next step in the evolution of the Internet. It can connect and communicate via the Internet with almost any heterogeneous object in the real world to facilitate information sharing, as Fig. 1 shown. Wireless sensor networks can collect, analyze, and deploy data, resulting in valuable information and knowledge from the Internet of Things. IoT infrastructure depends heavily on wireless sensor c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 123–133, 2023. https://doi.org/10.1007/978-3-031-43520-1_11
124
I. Larhlimi et al.
Fig. 1. WSN architecture.
networks (WSNs). In this field of network, the major challenge is the energy consumption optimization [2]. The energy optimization approach we consider is based on inactivity/activity. In short, this paper proposes a method to form subsets of sensor nodes that will be successively activated for defining durations while the other nodes will be on standby, maximizing the duration. This problem, known as the Maximum Lifetime Coverage Problem (MLCP), is NP-hard and has been widely studied in the literature without taking into account the energy consumed by the sensors that are in standby mode [3,5]. To maximize the network lifetime, the first step must identify as many coverage sets as possible for each sensor in the network, and the second step must schedule those coverage sets in a way that maximizes their lifetime. Therefore, the maximum coverage set scheduling (MCSS) problem is essential in all coverage problems. For a coverage set C = (C1 , · · · , Cm ), in which each coverage set Cj ∈ C consists of a set of sensors with sufficient battery power and can fully cover the entire coverage area. MCSS aims to maximize network lifetime by finding the optimal scheduling strategy for coverage set C [6]. The organization of this research paper is given as follows: Sect. 2 discusses relevant literature. Section 3 presents the problem statement of this research and describes the proposed approach. In Sect. 4, we present and discuss the simulation results. Finally, the conclusion part is stated in Sect. 5.
2
Related Works
The WSN’s coverage is frequently regarded as an actual performance and quality of service metric [7]. Therefore, in accordance with [8], it is vital to ensure that at least one specific sensor can detect an event that occurs at any location inside the region being watched by the sensors. In [9], the researchers suggest robust coverage-aware multiple path-planning algorithms (CAMP). If dying nodes cause coverage holes, CAMP can arrange effective routes for MCs and work with any coverage hole-repair algorithm to fill them.
Contribution to Solving the Cover Set Scheduling Problem
125
The paper [10] presented an approach to optimize the network’s lifetime aiming at dividing the sensor nodes into Non-Disjoint subsets of sensors, or cover sets, and scheduling these covers with variable activation periods. This study deals with local (α) and global (β) coverage leveling thresholds under network connectivity constraints. The authors propose an exact Integer Linear Programming (ILP) formulation for the Column Generation (CG) is subproblem if the dedicated Heuristic (DH) fails to compute an attractive solution in each iteration of the Column Generation (CG) computation process. In [11], the authors were interested to solve this problem using a dynamic planning algorithm in order to increase the coverage area and cover the hole positions; the dynamic planning algorithm is based on tracking the route to find the optimal path with the most negligible cost. The paper [12] presents a mathematical model to geometrically optimize the density of active sensor nodes in a wireless sensor network (WSN). It uses concentric hexagonal tessellations and the concept of coverage contribution area for randomly deployed nodes in the field of interest. An algorithm has been proposed to generate a maximum number of disjoint-independent subsets of sensor nodes as an optimized solution to the k-coverage problem. In [13], authors suggested a recursive neighborhood-based estimation of distribution method (NEDA) to deal with it. Each individual in NEDA represents a coverage strategy in which the sensors are only partially enabled to monitor all the targets. In order to optimize the network lifespan given the current population, a linear programming (LP) model is created to assign activation times to the population’s schemes. The paper [14] aimed at presenting a WSN’s incoming routing strategy employing a hybrid energy-efficient distributing (HEED) algorithm and fuzzy approach mix to extend node lifetime and energy. There are two parts to the FLH-P proposal algorithm. The clustering of WSNs is accomplished using the stable election procedure HEED method. Then, metrics, including residual energy, minimal hops, and node traffic counts, are considered using a combination of fuzzy inference and the low-energy adaptive clustering hierarchy (LEACH) method. In [15], an Efficient Topology driven Cooperative Self-Scheduling (TDCSS) model is proposed. A hybrid strategy is suggested instead of centrally scheduling the network nodes. The suggested TDCSS technique executes scheduling in both ways, depending on the circumstance. The overhead during the transmission of control packets is decreased by periodically sharing the node statistics. Their research [16] focused on the Maximum α-Lifetime Problem, which aims to develop a heuristic solution for the maximum network lifetime while satisfying the coverage criterion. To increase the network lifetime, they turn on and off different subsets of sensors gradually while maintaining the required minimum coverage rate.
126
3
I. Larhlimi et al.
Modelling and Problem Formulation
As stated by [17], WSN lifetime is defined as: Lif etime = ts − tf
(1)
where ts is the start-up period and tf is the exit period. It is easier to understand the coverage issue if it can be divided into two phases: the first step objective is finding as many coverage sets as possible from the sensor networks. Second step aims at arranging the obtained coverage sets to extend the network’s lifetime. Therefore, the maximum coverage set scheduling problem (MCSS) is crucial to all coverage problems. Several coverage’s sets Cj ∈ C have an initial monitoring time and can cover the coverage area for a given coverage set C = {C1 , · · · , Cm }. 3.1
Mathematical Model
Let’s S = {S1 , S2 · · · , Sn } is the set of n sensors, where each sensor si has bi active time slots, τj is defined as the active time of the cover set Cj , and the binary variable λi;j . The MCSS can be formulated as an integer linear programming (ILP) problem [6] as follows: m τj (2) max j=1
Subject to:
m
(λi,j τj ) ≤ bi , ∀si ∈ S
(3)
j=i
– i is the index of sensor, 1 ≤ i ≤ n. – j is the index of the cover sets, 1 ≤ j ≤ m. – λi,j is defined as follows:
λi,j
1, = 0,
if si ∈ Cj otherwise
We put Yj = τj , so the problem to be solved can be reformulated as follows: max g(Y ) =
m
Yj
(4)
j=1
Subject to:
m j=1
(λi,j Yj ) ≤ bi , ∀si ∈ S
(5)
Contribution to Solving the Cover Set Scheduling Problem
3.2
127
Genetic Algorithm
A Genetic Algorithm (GA) contains several important steps as illustrated by Fig. 2. These steps can be briefly described as follows: Step 1: A coding concept for the population’s individuals, this step usually comes after the problem has been mathematically modeled. The effectiveness of the genetic algorithm depends on how well the data are coded. Step 2: A system for creating the initial population. A population of individuals who will form the foundation of subsequent generations must be able to be reproduced through this method. The initial population selection affects the rate of convergence to a global optimum. Step 3: Specify evaluation criteria, typically “Fitness”. This function aims to assess a solution and contrast it with others. Step 4: Use a selection process to choose the solutions for a potential coupling. Step 5: Use the crossover and mutation operators to produce new solutions. The mutation operator seeks to ensure the exploration of the search space by introducing new people, whereas the crossover operator recomposes the genes of existing individuals in the population. Step 6: An insertion mechanism establishes a trade-off between the generated solutions (offspring) and the generating solutions (parents). Step 7: A crucial stopping evaluation used to assess individuals’ quality.
Fig. 2. Genetic algorithm steps.
128
3.3
I. Larhlimi et al.
Proposed Approach to Adapt GA
This sub section describes the different steps that we introduce to adapt GA in order to solve the problem modeled in sub Sect. 3.1. In effect, the network lifetime will be extended by devising a scheduling technique for the cover sets in C that ensures only one cover set is active during each time slot while maximizing their overall active time. In the following, we describe the steps to adapt GA. – Coding: The chromosome C can be represented as C = {C1 , C1 , · · · , Cm }, where m is the number of genes, which is equal to the number of covers Cj ∈ C, and each gene represents the set of covers Cj . – Initialization: The initial population of a limited number of chromosomes is randomly generated. A gene represents the cover set Cj . Therefore, the chromosome represents the scheduling strategy of the collection of cover sets C, the lifetime can be calculated by the sum τj of the genes in the chromosome. – Fitness: The fitness must ensure that not all sensors in the candidate solution have exceeded the energy constraint. For a sensor si ∈ s, the sum of energy consumption across all covers in a schedule must not exceed the initial energy bi . m (λi,j Yj ) ≤ bi , ∀si ∈ S (6) j=i
The fitness function will exclude the candidate schedule from the population that does not meet this constraint from future GA processes. – Selection: From the members of the adapted population, select the two best candidate chromosomes with the maximum lifetime as parents. max g(Y ) =
m
Yj
(7)
j=1
– Crossover: To update the population, the crossover operator is applied to produce offspring for the next generation. Randomly chosen points are used in double-point crossovers. – Mutation: By applying the mutation operator, the optimal local solution is avoided while improving the children. Updates are made to one or more genes at random. Genes and chromosomes are considered more valuable when enhanced. – Fitness of children: To make sure that no sensor exceeds its initial energy, the fitness function (Sect. 3.1-Eq. 3) is reapplied to the “new chromosome” children. 3.4
Explanatory Example
Thanks to the different steps mentioned in the previous sub-section. An adapted genetic algorithm is obtained. To explain how this algorithm works, an explanatory example is presented. En effect, suppose that set of four sensors S = {s1 , s2 , s3 , s4 } with corresponding active time slots {1, 4, 2, 5} is given,
Contribution to Solving the Cover Set Scheduling Problem
129
cover three targets. Let’s C1 = {s1 , s4 }, C2 = {s2 , s3 , s4 }, C3 = {s2 , s4 }, C4 = {s1 , s2 , s3 , s4 }, C5 = {s1 , s2 , s4 }, C6 = {s2 , s3 , s4 }, C7 = {s1 , s3 , s4 }, and C = {C1 , C2 , C3 , C4 , C5 , C6 , C7 }. Since C1 ⊆ C5 , C1 ⊆ C7 , C2 ⊆ C4 , C2 ⊆ C6 , C4 , C5 , C6 and C7 are deleted from C, as shown Fig. 3. Therefore, we have C = {C1 , C2 , C3 }, in which each coverage set Cj ∈ C consists of a set of sensors with can fully cover all targets and τj as the active time of the cover set, where 1 ≤ j ≤ 3. The genetic algorithm presented in the previous subsection is applied to maximize the network lifetime, so the problem to be solved is defined as follows: max g(Y ) =
3
Yj = Y1 + Y2 + Y3
(8)
j=1
Subject to ⎧ Y1 ≤ b1 ⎪ ⎪ ⎪ 3 ⎨Y + Y ≤ b 2 3 2 (λi,j Yj ) ≤ bi , ∀si ∈ S =⇒ ⎪ Y2 ≤ b3 ⎪ j=1 ⎪ ⎩ Y1 + Y2 + Y3 ≤ b4
(9)
Fig. 3. Illustrative example
4
Simulation and Results
We model a network of T targets monitored by N sensors distributed randomly across an area of 1000 m × 1000 m. The network lifetime is calculated using the suggested technique on these coverage sets. Each test result is an average
130
I. Larhlimi et al. Table 1. Simulation parameters Length of chromosome
The scheduling strategy of the collection of cover sets C
R (Sensing range of each sensor node)
50
T (Targets number)
10 to 20
of 100 runs. The adapted genetic algorithm’s performance was compared to the Greedy-MCSS and MCSSA algorithms provided in [6]. The parameters listed in (Table 1) are taken into account in the simulation phase: As shown in Fig. 4, we used T = 10 targets and N = 300 sensors. We randomly assigned active time slots of each sensor between 5 and 20, and varied the number of coverage sets from 10 to 50. The network lifetime obtained by the proposed algorithm increases proportionally to the number of coverage sets. The results show that the lifetime calculated by the adapted algorithm is better than those obtained by the Greedy-MCSS and MCSSA algorithms.
Fig. 4. Network lifetime vs coverage sets number
Using T = 20 targets randomly distributed, M = 100, and by varying the number of sensors from 200 to 600, with an increment of 200 for each sensor, we set the time slot to 10. In Fig. 5, we show our results compared with the Greedy MCSS and MCSSA algorithms as a function of the sensors number. We observe that the network lifetime increases proportion to the number of sensors. Indeed, the fitness and selection part of the genetic algorithm selects the two best-proposed solutions with the maximum lifetime as parents, so the possibility of combining the best parents to produce offspring with good genomes increases proportionally to the number of probes. The results show that the lifetimes calculated by the proposed algorithm are better than those obtained by the Greedy MCSS and MCSSA algorithms.
Contribution to Solving the Cover Set Scheduling Problem
131
Fig. 5. Network lifetime vs sensors number.
Figure 6 shows the lifetime calculated by the adapted genetic algorithm as a function of the active time slots of the sensors, in comparison with the two algorithms: Greedy-MCSS and MCSSA. This results are obtained using T = 20, N = 100, M = 100 and by varying the active time slots of the sensors from [0, 10] to [50, 60], where [10, 20] means that the active time slots of each sensor are assigned from 10 to 20, randomly. The results show that the lifetimes calculated by the adapted algorithm are better than those obtained with the other algorithms, thanks to the parts of the proposed algorithm that ultimately produce offspring with good genomes each time we select and combine the best choices as parents, thus obtaining parents with maximum lifetimes. This operation positively influences the network lifetime.
Fig. 6. Network lifetime vs time slots
132
5
I. Larhlimi et al.
Conclusion
In this paper, we studied the MCSS problem. This study aims at finding an optimal schedule and coverage strategy to maximize WSN lifetime. Due to its NP-hard nature, the MCSS problem is formulated as a linear integer programming problem. An adapted genetic algorithm is proposed as an efficient algorithm to increase the sensors lifetime. The simulation results showed the performance of this algorithm in terms of prolonging the lifetime of the sensors. This performance is justified by a comparative study of used algorithm with Greedy-MCSS and MCSSA algorithms. The next step is to improve this study by introducing and analyzing additional a adaptive parameters concerning different critical operation such as selection, crossover and mutation.
References 1. Darif, A., Ouchitachen, H.: Performance improvement of a new MAC protocol for ultra wide band wireless sensor network. J. Theor. Appl. Inf. Technol. 100(4), 1015–1026 (2022) 2. Ouchitachen, H., Hair, A., Idrissi, N.: Improved multi-objective weighted clustering algorithm in wire-less sensor network. Egypt. Inform. J. 18(1), 45–54 (2017) 3. Thai, M.T., Wang, F., Du, D.H., Jia, X.: Coverage problems in wireless sensor networks: designs and analysis. Int. J. Sens. Netw. 3(3), 191 (2008) 4. Fan, G., Jin, S.: Coverage problem in wireless sensor network: a survey. J. Netw. 5(9), 1033–1040 (2010) 5. Cardei, M.M.T., Thai, Y., Li, Wu, W.: Energy-efficient target coverage in wireless sensor networks. In: Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies, Miami, FL, USA, pp. 1976–1984 (2005) 6. Chuanwen, L., Yi, H., Deying, L., Yongcai, W., Wenping, C., Qian, H.: Maximizing network lifetime using coverage sets scheduling in wireless sensor networks. Ad Hoc Netw. (2019). https://doi.org/10.1016/j.adhoc.2019.102037 7. Serper, E.Z., AltınKayhan, A.: Coverage and connectivity based lifetime maximization with topology update for WSN in smart grid applications. Comput. Netw. 109 (2022). https://doi.org/10.1016/j.comnet.2022.108940 8. Khoufi, I., Minet, P., Laouiti, A., Mahfoudh, S.: Survey of deployment algorithms in wireless sensor networks: coverage and connectivity issues and challenges. Int. J. Auton. Adapt. Commun. Syst. 10, 314–390 (2017) 9. Khalifa, B., Al Aghbari, Z., Khedr, A.M.: An optimization-based coverage aware path planning algorithm for multiple mobile collectors in wireless sensor networks. Wireless Netw. 28(5), 2155–2168 (2022) 10. Charr, J.-C., Deschinkel, K., Haj Mansour, R., Hakem, M.: Partial coverage optimization under network connectivity constraints in heterogeneous sensor networks. Comput. Netw. 210, 108928 (2022) 11. Alhaddad, Z.A., Manimurugan, S.: Maximum coverage area and energy aware path planner in WSN. Mater. Today Proc. (2021) 12. Dua, A., Jastrz¸ab, T., Czech, Z.J., Kr¨ umer, P.: A randomized algorithm for wireless sensor network lifetime optimization. In: Q2SWinet 2022 - Proceedings of the 18th ACM International Symposium on QoS and Security for Wireless and Mobile Networks, pp. 87–93 (2022)
Contribution to Solving the Cover Set Scheduling Problem
133
13. Chauhan, N., Chauhan, S.: A novel area coverage technique for maximizing the wire-less sensor network lifetime. Arab. J. Sci. Eng. 46(4), 3329–3343 (2021). https://doi.org/10.1007/s13369-020-05182-2 14. Chen, Z.-G., Lin, Y., Gong, Y.-J., Zhan, Z.-H., Zhang, J.: Maximizing life-time of range-adjustable wireless sensor networks: a neighborhood-based estimation of distribution algorithm. IEEE Trans. Cybern. 1–12 (2020). https://doi.org/10.1109/ tcyb.2020.2977858 15. Jabbar, M.S., Issa, S.S., Ali, A.H.: Improving WSNs execution using energyefficient clustering algorithms with con med energy and lifetime maximization. Indones. J. Electr. Eng. Comput. Sci. 29(2), 1122–1131 (2023) 16. Brindha, G., Ezhilarasi, P.: Topology driven cooperative self scheduling for improved lifetime maximization in WSN. Comput. Syst. Sci. Eng. 45(1), 445–458 (2023) 17. Shi, T., Cheng, S., Cai, Z., Li, J.: Adaptive connected dominating set discovering algorithm in energy-harvest sensor networks. In: IEEE International Conference on Computer Communications (INFOCOM), pp. 1–9 (2016)
A Multi-Agent System for the Optimization of Medical Waste Management Ahmed Chtioui1(B) , Imane Bouhaddou1 , Abla Chaouni Benabdella2 , and Asmaa Benghabrit3 1 L2M3S Laboratory ENSAM, Moulay Ismaïl University, Meknes, Morocco
[email protected]
2 Rabat Business School, International University of Rabat, Rabat, Morocco 3 LMAID Laboratory, ENSMR Mohamed V University, Rabat, Morocco
Abstract. Medical Waste Management (MWM) is one of the most complex processes in any healthcare structure. This complexity was revealed during the Covid19 crisis, as Medical Waste (MW) can be a vector of virus transmission if the process is not properly controlled. This fact calls into question the current models of MWM, especially in developing countries where in the majority of cases the MWM is operated in a hazardous manner, representing a real danger for patients, hospital staff, the population and especially for the environment. The objective of this paper is to optimize the medical waste management process. To do this, we will first analyze the current MWM model in a Moroccan public hospital and show its shortcomings. Second, we propose a model for MWM that meets the challenges of sustainable development and the circular economy, especially in terms of preserving the environment and reducing hospital costs. The proposed model is scrupulously described through a multi-agent approach, one of the successful paradigms for managing complex systems, in order to model and eventually automate the process. Keywords: Medical Waste Management · Moroccan Public Hospital · Optimization · Complexity · Multi-Agent System
1 Introduction As the world’s population grows and the demand for medical services increases, Medical Waste Management (MWM) is one of the complexes and demanding challenges facing humanity [1]. MWs consist of a wide range of hazardous, non-hazardous, and infectious wastes, sharps, chemical wastes, pharmaceutical wastes, pressurized containers, genotoxic wastes, radioactive wastes, and domestic wastes [2, 3] According to a 2009 World Health Organization (WHO) study, 80% of MWs are similar to domestic wastes, the remaining 20% is considered hazardous (infectious, toxic, and/or radioactive). In most countries, medical wastes are still collected and disposed along with other domestic wastes and this poses serious health risks for healthcare personnel, the public and the environment [4]. Such concerns necessitate taking precautions regarding the collection, transportation and disposal of medical wastes [5–7]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 134–149, 2023. https://doi.org/10.1007/978-3-031-43520-1_12
A Multi-Agent System for the Optimization of Medical Waste Management
135
The recent pandemic arising from the outbreak of COVID-19 has increased the demand for establishing a timely and efficient waste management system. Pandemics boost the generation rate of infectious medical waste at hospitals and clinics. Designing an efficient and reliable collection, transportation and disposal system in this situation can help to provide on-time services and control the disease [8]. The objective of this paper is to optimize the medical waste management process. To do so, we will first analyze the current MWM model in a Moroccan public hospital and show its shortcomings. Then, we propose a model for MWM that meets the challenges of sustainable development and the circular economy, especially in terms of preserving the environment and reducing hospital costs. The proposed model would allow the hospital to become increasingly independent from external suppliers, but also to develop this activity which is one of the inescapable levers of the hospital supply chain and consequently of hospital management. In addition, and as mentioned above, the management of medical waste is one of the most complex processes within a care structure, so to describe the proposed model in an explicit and rigorous way, we chose the Multi-Agent System (MAS), which is one of the strongest paradigms to understand the complex systems, as a formalism for the design of the process before moving on to the modeling and simulation phase and eventually thinking about automating the process. The remainder of this paper is organized as follows. Section 2 defines the key concepts of our study, particularly medical waste management process and Multi-agent system. Section 3 describes the current medical waste management model in the target hospital of the study and the proposed model. Section 4 presents the multi-agent system design of the proposed process. Section 5 concludes this work, recapitulating our contribution and highlighting future directions.
2 Theoretical Background 2.1 Medical Waste Management Process Medical waste can be defined as “all waste materials generated at health care facilities, such as hospitals, clinics, physician’s offices, dental practices, blood banks, and veterinary hospitals/clinics, as well as medical research facilities and laboratories” [9]. Medical waste materials can also be generated during healthcare at home of patients. However, after provided medical services, the healthcare provider transports generated medical waste to its health care facility for future proper disposal [10]. Medical waste includes “needles and syringes to soiled dressings, body parts, diagnostic samples, blood, chemicals, pharmaceuticals, medical devices and radioactive materials” [11–13]. Further on, the problem of medical waste and its disposal is growing rapidly throughout the world as a direct result of fast urbanization and population growth, requiring specialized treatment and management [14]. Also, medical waste can “potentially expose healthcare workers, waste handlers, patients and the community at large to infection, toxic effects and injuries” [13]. It can also lead to risks of environmental pollution and degradation [12, 15]. This danger is increased by the fact that improper medical waste management can cause numerous diseases (infectious and non-infectious) as well as injuries at work [16].
136
A. Chtioui et al.
Based on the properties and the level of risk associated with individual groups of medical waste, World Health Organization (WHO) divides them into seven basic categories presented in Table 1. Table 1. Medical waste categories established by WHO. Source: [17, 18]. Waste category
Description
Infectious waste
Waste contaminated with blood and other body fluids (from discarded diagnostic samples), cultures and stocks of infectious agents from laboratory work (waste from autopsies and infected animals from laboratories), or waste from patients in isolation wards and equipment ((swabs, bandages and disposable medical devices)
Pathological waste
Human tissues, organs or fluids, body parts and contaminated animal carcasses
Sharps
Syringes, needles, disposable scalpels and blades, etc
Chemicals
Solvents used for laboratory preparations, disinfectants, and heavy metals contained in medical devices (mercury in broken thermometers) and batteries
Pharmaceuticals
Expired, unused and contaminated drugs and vaccines
Genotoxic waste
Highly hazardous, mutagenic, teratogenic or carcinogenic, such as cytotoxic drugs used in cancer treatment and their metabolites
Radioactive waste
Products contaminated by radionuclides including radioactive diagnostic material or radio therapeutic materials; and non-hazardous or general waste: waste that does not pose any particular biological, chemical, radioactive or physical hazard
However, there is no comprehensive and globally accepted definition of medical waste, which is problematic for comparative purposes, as changing definitions lead to a difference in understanding of the concept between countries, or even between regions within a country. In addition, the lack of a standard definition of medical waste has led to a lack of standardization of medical waste streams and disposal containers [19]. In the Moroccan context, Article 3 of Decree 2–09-139 governing the management of Medical and Pharmaceutical Waste (MPW) defines 4 categories of MPW according to their characteristics and nature: Category 1 – Waste with a risk of infection because it contains viable microorganisms or toxins that can cause disease in humans or other living organisms as well as unidentifiable human or animal organs and tissues. – Sharps intended to be used in large quantities, whether or not they have been in contact with a biological product. – Incompletely used, spoiled or expired blood products and derivatives for therapeutic use.
A Multi-Agent System for the Optimization of Medical Waste Management
137
Category 2: Unused, spoiled or expired drugs, chemicals and biologicals. Cytostatic waste. Category 3: Includes human or animal organs and tissues readily identifiable by a non-specialist. Category 4: Is the same as household waste provided it is generated in a health establishment. However, what interests us most in this work is the management of medical waste as a major process of the hospital supply chain, from its production at the level of the care units to its treatment or elimination either within the hospital or in specialized units, through the operations of collection, sorting, intermediate and final warehouse, transport etc.
2.2 Multi-Agent System In the modern complex system to deal with the huge data problem, the necessity of providing fast and powerful computing and simulation tools such as artificial intelligence, is of particular significance. This trend in computing led to the evolution of Multi-Agent System (MAS) technology, where the field of MAS deals with agent designs on micro level and social system designing at comprehensive level. MAS tries to solve complex problems with entities called agents, using their collaborative and autonomous properties [20, 21]. Still according to [20], various authors have defined Multi-Agent in many ways and the essence of each definition is that a MAS comprises of two or more independent agents with some information so as to achieve a set target. Table 2 lists down the various definitions of MAS in the literature. In order to meet the requirements of operation of MAS, there are several challenges that agents have to fulfill. Those are: – To operate independently; – To represent best interests; – To cooperate, negotiate and reach to a point of agreement. According to Ferber [29], a multi-agent system is a system composed of the following elements: 1. An environment E, i.e. a space generally having a metric. 2. A set of objects O. These objects are situated, i.e. for any object, it is possible, at a given time, to associate a position in E. These objects are passive, i.e., they can be perceived, created, destroyed and modified by the agents. 3. A set of agents A, which are particular objects, which represent the active entities of the system. 4. A set of relations R that link objects (and thus agents) together. 5. A set of operations Op allowing the agents of A to perceive, produce, consume, transform and manipulate objects of O. 6. Operators in charge of representing the application of these operations and the reaction of the world to this attempt at modification, which we will call the laws of the universe.
138
A. Chtioui et al. Table 2. MAS definitions.
MAS definitions
Reference
By an agent-based system, we mean one in which the key abstraction used is that of an agent
[22]
A MAS is a structure given by an environment together with a set of artificial agents capable to act on this environment
[23]
A MAS is a distributed and coupled network of intelligent hardware and software agents that work together to achieve a global goal
[24]
A MAS is defined as a loosely coupled network of problem solvers that work [25] together to solve problems that are beyond the individual capabilities or knowledge of each problem solver A MAS is simply a system comprising two or more agents or intelligent agents
[26]
A MAS consists of several interacting agents that cooperate, coordinate and negotiate
[27]
An MAS is a collection of independent computational entities (agents), which perform specified actions based on goals
[28]
Concerning the typology, we traditionally distinguish two types of agents, each located at the end of a scale of cross-coplexity: the reactive agent and the cognitive agent. According to Ferber [29], the cognitive/reactive distinction depends on the ability of agents to accomplish complex tasks individually and to plan their actions.
3 From the Current MWM Process to the Proposed Process 3.1 The Current Process Before beginning this section, we recall that the experimental field of our study is a public hospital located in Meknes, a city in Morocco. The Moulay Ismail Military Hospital is a medium to a large care facility (370 beds) belonging to the University Hospital of Fez, which will allow the results of this study to be generalized to a wide range of Moroccan and foreign hospitals. The logistic activity of MWM at the target hospital of this study is totally outsourced. It includes three essential steps: a summary sorting at the level of the care units operated by the medical and paramedical staff, the internal transport and the storage in a central warehouse by the agents of the service provider company. The realization of these various stages requires human and material means: – Personnel for the management of medical waste (collection, storage,…). – Approved receptacles and containers necessary for the correct collection of waste. – Means adapted for the transport of the waste from the places of production to the place of storage (central warehouse).
A Multi-Agent System for the Optimization of Medical Waste Management
139
Figure 1 illustrates the current medical waste management process at the hospital:
Production of medical waste + a summary sorting (care staff in care units)
Collection + internal transport (agents of the provider company) (care units)
Storage (central warehouse)
Removal and transport to an external treatment unit (means of the provider company) Fig. 1. The current MWM process at the hospital.
So as it is shown in Fig. 1, once the waste is generated in the care units, it is sorted in a summary way, i.e. the healthcare staff put the medical waste in yellow containers with red plastic bags which are both adapted to receive the medical waste. The healthcare staff indicate to the agents of the provider company the nature of the waste that they put in the containers. From this stage, the hospital has no more control on the waste that it generated. In the following section, we will show the disadvantages or risks of this way of proceeding and how to remedy them. 3.2 The Proposed Process The model we propose to the hospital is based on the principle of managing almost the entire process internally with a view to possible automation in the future. In fact, we consider that such an approach would allow the hospital to control the waste it produces throughout the process, from its generation in the care units to its internal valorization or elimination in facilities dedicated to this purpose, but also to become more and more autonomous from external suppliers. To do this, the hospital would have to invest in facilities and machines operating with technologies that treat medical waste efficiently, safely, while taking into account the ecological and environmental aspect.
140
A. Chtioui et al.
In this context, we recall that the current model adopted in the hospital which consists in transporting all the medical waste to external treatment units could constitute a real danger for the environment if it is not rigorously controlled. From an economic point of view, and in light of the very high invoices inherent to the outsourcing of this logistical activity, we believe that in-house management would not only reduce the costs due to the dependence on external suppliers, but also generate profits by selling the residual materials after their valorization. Thus, we can say that the proposed model would allow the hospital to gain both ecologically and economically, in other words to position itself on the perspectives offered by sustainable development. Figure 2 illustrates the proposed model for medical waste management in the target hospital of the study: So, as shown in Fig. 2, once the waste is produced in the care units, it is put in specific containers of different colors according to their categories (cat1: red, cat2: green, cat3: yellow, cat4: blue), the categorization that we have adopted is in accordance with Moroccan regulations. This step constitutes the first sorting which must be done in a rigorous way by the healthcare staff in the care units. Then, the containers are conditioned in an optimal way (closing, labeling, marking, security, temperature…), they are then sent to intermediate warehouses, each warehouse receives the waste of the care units which are close to it according to a predefined plan, in this stage will be operated a checking of the first sorting and the conditioning of the containers. Then, for the anatomical parts (cat3), they are transported to the office of hygiene for the procedures of burial. For the other categories, they are routed towards the final warehouse. Afterwards, the categories 1 and 2 are transferred to the treatment facilities inside the hospital; the materials resulting from the treatment are either valorized or sent with the household waste (cat4) to the public landfill.
4 Design of the MAS for the Proposed MWM Model 4.1 Functional Analysis of the Proposed MAS The decomposition of a complex system or a problem into a multi-agent system comes down to knowing what exactly an agent represents in relation to the problem to solve. In other words, it is a question of defining the relationship between the productive function of the agent and the productive function of the organization to which it belongs. Indeed, at the stage of the functional analysis, we must answer the following question: – What are the limits of the space that represents the environment of the considered MAS? – What should we consider as an agent in our system, and what should be its type, reactive or cognitive? – What are the objects of our system and what is the relationship between the agents and the tasks to be performed (the operations)?
A Multi-Agent System for the Optimization of Medical Waste Management Production of medical waste (care units)
First sorting (healthcare staff in care units)
CAT 4 -Household waste generated at the hospital
CAT 1 -Infectious waste
CAT 2 - Unused, spoiled
CAT 3 - Human organs
-Sharpe and prickly objects. -Blood products and derivatives.
or expired drugs, chemicals and biological. -Cytostatic waste.
and tissues readily identifiable by a non-specialist.
Specified conditioning (specific containers)
Transfer + verification and confirmation of sorting and conditioning (intermediate warehouses)
Transport and storage (final warehouse)
Transport and burial (hygiene office)
Transformation of medical waste into household waste (treatment facility in the hospital)
NO
Recoverable material ??
YES
Transport to public landfill
(care units)
Recovery of residual material
(care units)
Fig. 2. The proposed process for MWM in the target hospital of the study.
141
142
A. Chtioui et al.
4.1.1 Environment Description We can say that the environment of an agent is everything but him in a MAS, however, we must limit the space of this environment so that it does not include unnecessary elements or likely to hamper the development of agents and the proper functioning of the system. Thus, the environment (E) of our system where the objects will be handled, the agents will act and the tasks will be executed is the circuit that the medical waste will follow from its generation in the care units to its final treatment within the hospital. 4.1.2 Identification and Description of Agents The functional analysis carried out from the target hospital of the study and the model that we proposed allowed us to determine the functions that must ensure the various actors of our system. These functions are organically linked to personnel or machines. Thus, we can count the following functions: – A planning and control station, which as its name indicates is responsible for planning, controlling and monitoring all movements of waste containers within the hospital. – The medical and paramedical staff who ensure the first sorting in the care units. – Logistical staff in charge of conditioning and transporting the containers from the care units to the intermediate warehouses. – Logistic staff in the intermediate warehouses in charge of checking the first sorting and the conditioning of the containers. – Logistical staff in charge of loading and unloading the containers. – Logistics personnel in charge of routing the Cat1, Cat2 and Cat4 containers from the intermediate warehouses to the final warehouse. – Logistics personnel in charge of routing the Cat1 and Cat2 containers to the treatment facility within the hospital. – Treatment facility responsible for converting infectious medical waste into household waste. Thus, this determination of the main functions within our system allowed us to define the agents and their intelligence levels. At first sight, we can say that our system is composed of a single cognitive agent who manages and gives instructions to the other agents who are all reactive agents. In light of the above, our agents (Ai) can be identified and described as follows: A1 : Control and Planning Station (CPS): this agent, as its name indicates, is responsible for planning, commanding and controlling all operations in the process. In other words, it is the brain that makes the MAS run and therefore, it directs the other agents because of the authority it has over them. It is thus the only purely cognitive agent of our MAS. A2 : Healthcare Staff (HS): when determining the functions, the role of the healthcare staff in the medical waste management process is to ensure the first sorting in a rigorous manner as well as the pre-conditioning, that is to say to follow the instructions of the CPS by putting each category of waste in the container which is reserved for it. Therefore, the healthcare staff acts according to the instructions of the CPS, which is why they are called reactive agents.
A Multi-Agent System for the Optimization of Medical Waste Management
143
A3 : Logistic Staff (LS1): this agent allows to pack the containers according to the regulations in force and to transport them in adapted carts from the care unit to the intermediate warehouse of attachment according to the planning established by the CPS. He is therefore also a reactive agent. A4 : Logistic Staff (LS2): this agent, who is located in the intermediate warehouses, is in charge of checking the first sorting and conditioning of the containers coming from the care units and of preparing them for the loading and transport to the final warehouse. He performs these tasks according to a plan established by the CPC. It is therefore another reactive agent. A5 : Logistic Staff (LS3): these agents are found in the intermediate warehouses. They allow, thanks to their means, to load and unload the containers. They are also reactive agents. A6 : Logistic Staff (LS4): like (LS3), they are agents in charge of loading and unloading containers, but in the final warehouse. A7 : Logistic Staff (LS5): like (LS3) and (LS4), they are agents in charge of loading and unloading containers in the Medical Waste Treatment Facility. A8 : Logistic Staff (LS6): these agents transport category 1, 2, and 4 containers from intermediate warehouses to the final warehouse by adapted vehicles and container trailers following the CPS instructions. They are therefore reactive agents. A9 : Logistic Staff (LS7): these agents transport the Category 1 and 2 containers from the final warehouse to the hospital’s medical waste treatment facility. These are reactive agents. A10 : Medical Waste Treatment Facility (MWTF): it is a set of machines using technologies that allow treating and valorizing the medical waste into classic household waste. So it is an agent that is controlled and monitored and therefore it is reactive. Thus, our system is composed of ten (10) agents, one cognitive and nine (09) reactive, which makes it a hybrid MAS. 4.1.3 Determination of Objects and Operations The final objective of our approach, through the design of the MAS, is to optimize the management of medical waste within the hospital. This objective can be achieved only if each agent performs these tasks correctly, this is a necessary but not sufficient condition, because the final objective is not the summation of the objectives of each agent, but it is above all the cooperation between agents to achieve a common objective. All the actions of all the agents of our systems are manipulations of waste containers whatever their category, which makes them the objects of the MAS. These manipulations are therefore the operations of our system, they can be cited as follows: – – – –
Op1: Sorting is the action of agent A2 (HS) on the container. Op2: Routing is the action of agent A3 (LS1) on the container. Op3: Check is the action of agent A4 (LS2) on the container. Op4: Load; this is the action of agent A5 (LS3), A6 (LS4) and A7 (LS5) on the container. – Op5: Unload; this is the action of agent A5 (LS3), A6 (LS4) and A7 (LS5) on the container.
144
A. Chtioui et al.
– Op6: Transport is the action of agent A8 (LS6) and A9 (LS7) on the container. – Op7: Treat is the action of agent A10 (MWTF) on the contents of category 1 and 2 containers. 4.2 Structural Analysis of the Proposed MAS 4.2.1 Subordination Structure of the MAS The authority relationship between agents in our MAS follows a two-level subordination relationship: – A higher level in which the CPS agent is located, which by its nature of cognitive agent is able to plan, command and control the actions of all other agents in the system. – A lower level, composed of the other agents (HS, LS1, LS2, LS3, LS4, LS5, LS6, LS7 and MWTF), all of which are subordinate to the CPS, but which maintain an equal relationship with each other. Thus, we can say that our MAS is hierarchical. 4.2.2 Coupling and Constitution In our MAS (macroscopic view), the tasks of each agent are defined according to a fixedhierarchical-predetermined organization structure, this being in normal situation. But it can happen that sometimes the MAS is in front of punctual situations at precise points of the system (microscopic view). These situations require for example a change in the planning or to answer to critical situations (concentration of means) etc,. In this case the number of agents in the points of the system will change according to the situation, and the structure of the organization in the points of the system will be variable-hierarchicalpredetermined. This situation can be illustrated in the Fig. 3:
Fig. 3. Possible MASs in charge of intervening to face particular situations in the process.
A Multi-Agent System for the Optimization of Medical Waste Management
145
4.3 The Concretization Parameters for the Proposed MAS The question for a multi-agent system designer is the following: knowing the general task to be solved, how do we allocate the set of competences to the different agents in such a way that the job is actually accomplished? Do we prefer highly specialized agents that possess only one competence, or do we favor approaches with totipotent agents in which each agent possesses all the desired competences, only the number making the difference? In this case, each agent can then take the place of another agent if it fails, which increases the reliability of the system in case of failure [29]. To this end, if we call Ca the set of competences available to an agent a, CP the set of competences needed to perform a job P, and CA the union of the competences of a set of agents A, then we can say that the job P can be performed if CP ⊆ CA, i.e., if all the competences needed to perform a job are at least present in the set A. We will say that an agent a is totipotent if it has all the necessary competences to perform the job by itself, that is, if CP ⊆ Ca We will denote by CP a the set of competences useful to an agent a to perform a job P: CP a = C a ∩ Cp . It is therefore possible to characterize the way a job is distributed in a multi-agent system by two parameters: the degree of specialization and the degree of redundancy. The degree of specialization of an agent a for a problem P indicates the rate of competence that an agent possesses in relation to the number of competences needed to solve a problem. It is defined as follows: Card C P − Card CaP P Ha = Card C P − 1 If HP a = 0, then the agent is totipotent for the problem P since he has all the required competences. Conversely, when he has only one competence, the rate is equal to 1, and the closer this rate is to 1, the more specialized the agent is and the lower his contribution to the whole problem will be. The degree of redundancy characterizes the number of agents with the same competences. If we call Pc the number of agents with competence c (Pc = Card({a|c ∈ Ca})), then the degree of redundancy for a competence can also be between 0 and 1 using the formula: Rc =
Pc − 1 Card (C) − 1
If Rc is 0, it means that there is only one agent able to perform a task requiring competence c. The degree of redundancy is 1 when all agents have this competence, the redundancy being maximal [29]. From these definitions and the functional analysis explained above, the proposed MAS can be characterized in terms of specialization and redundancy as shown in Tables 3 and 4: Therefore, the calculation of the specialization rate for each agent and the redundancy rate for each competence allowed us to obtain the specialization and redundancy rates for the whole system. These rates are respectively the average of the specializations of each agent (0.94) and the average of the redundancies of each competence (0.035).
146
A. Chtioui et al. Table 3. Degree of specialization in the proposed MAS Agents CPS HS LS1 LS2 LS3 LS4 LS5 LS6 LS7 MWTF
SpecializaƟon rate 0.82 1 0.91 0.91 0.91 0.91 0.91 1 1 1
system specializaƟon rate
Degree of specializaƟon Rate close to 1: specialized agent Rate =1: purely specialized agent Rate close to 1: specialized agent Rate close to 1: specialized agent Rate close to 1: specialized agent Rate close to 1: specialized agent Rate close to 1: specialized agent Rate =1: purely specialized agent Rate =1: purely specialized agent Rate =1: purely specialized agent
0.94
Table 4. Degree of redundancy in the proposed MAS Competences
Agents
Hospital sites
Plan Command Control Sort Pack Route Check sorƟng Prepare loading
CPS CPS CPS HS LS1 LS1 LS2 LS2 LS3 LS4 LS5 LS3 LS4 LS5 LS6 LS7 MWTF
CPS post CPS post CPS post Care units(CU) Care units(CU) CU----- Inter.Warehouse(IW) IW IW IW Final.Warehouse(FW) MWTF IW FW MWTF IW--------FW FW------MWTF MWTF
Load
Unload Transport Treat waste
Redundancy rate 0 0 0 0 0 0 0 0
Degree of redundancy
0.17
Very low redundancy
0.17
Very low redundancy
0.08
Very low redundancy
0
No redundancy
No redundancy No redundancy No redundancy No redundancy No redundancy No redundancy No redundancy No redundancy
system redundancy rate
0.035
After this analysis, we can say that the studied system presents a very specialized and weakly redundant organization; this is justified by the functional approach that we adopted where in the majority of the cases each function is represented in the form of an agent. These agents are either purely specialized or very specialized, i.e. their individual contributions to reach the final goal (optimal management of hospital waste) is very low, so the notion of totipotency is completely absent in the system. On the other hand, most of the competences have a redundancy equal or close to zero (0), i.e. in most cases there is only one agent able to perform a defined task (one agent per specialty). Except for the competence “Load” which can be performed by
A Multi-Agent System for the Optimization of Medical Waste Management
147
three (03) agents (LS3, LS4, LS5), the competence “Unload” (LS3, LS4, LS5) and the competence “Transport” (LS6, LS7). In concrete terms, if for example the agent LS3 is failing for the competence ‘’Load”, the agent LS4 or LS5 can be called to replace him, and conversely. Same thing for the competence “Transport”, if the agent LS6 is out of order, the agent LS7 can replace him temporarily and vice versa, which gives more adaptability and reliability to the system.
5 Conclusion Medical waste management is a major challenge facing healthcare structures around the world. In developing countries, this process, which is as complex as it is preponderant in the hospital supply chain and consequently in hospital management, is often managed in a hazardous manner. This compromises the health of patients, healthcare staff, population and constitutes a real danger for the environment. This fact motivated us to look for ways to optimize the MWM process. To do this, we have first analyzed a model of MWM from a Moroccan hospital by emphasizing its shortcomings we proposed a model able to overcome these shortcomings, i.e. to preserve the environment while reducing the bill inherent in the outsourcing of this logistic activity. To explicitly describe the proposed model, we have chosen the multi-agent systems as a design paradigm, since it is a tool that has already proven itself for the apprehension of complex systems and especially their modeling, a step that we will take in our next work while keeping as a final objective the automation of the process. Moreover, the ultimate goal of our work is to present to the hospital a MWM project that reduces hospital costs while emphasizing the ecological aspect and the recovery of residual material in accordance with the precepts of the circular economy. This objective can only be achieved if our approach is supplemented by a financial study that would justify the cost savings that the hospital can generate by joining our project.
References 1. Windfeld, E.S., Brooks, M.S.: Medical waste management–a review. J. Environ. Manag. 163, 98−108 (2015). https://doi.org/10.1016/j.jenvman.2015.08.013 2. Komilis, D., Makroleivaditis, N., Nikolakopoulou, E.: Generation and composition of medical wastes from private medical microbiology laboratories. Waste Manage. 61, 539–546 (2017) 3. Hong, J., Zhan, S., Yu, Z., Hong, J., Qi, C.: Life-cycle environmental and economic assessment of medical waste treatment. J. Clean. Prod. 174, 65–73 (2018) 4. Dursun, M., Karsak, E.E., Karadayi, M.A.: Assessment of health-care waste treatment alternatives using fuzzy multi-criteria decision making approaches. Resour. Conserv. Recycl. 57, 98–107 (2011) 5. He, Z., Li, Q., Fang, J.: The solutions and recommendations for logistics problems in the collection of medical waste in China. Procedia Environ. Sci. 31, 447–456 (2016) 6. Patwary, M.A., O’Hare, W.T., Sarker, M.H.: Assessment of occupational and environmental safety associated with medical waste disposal in developing countries: a qualitative approach. Saf. Sci. 49, 1200–1207 (2011)
148
A. Chtioui et al.
7. Su, E.C.Y., Chen, Y.T.: Policy or income to affect the generation of medical wastes: an application of environmental Kuznets curve by using Taiwan as an example. J. Clean. Prod. 188, 489–496 (2018) 8. Babaee Tirkolaee, E., Aydın, N.S.: A sustainable medical waste collection and transportation model for pandemics. Waste Manage. Res. 39, 34−44 (2021). https://doi.org/10.1177/073 4242X211000437 9. US Environmental Protection Agency e US EPA, Medical Waste (2013). http://www.epa.gov/ osw/nonhaz/industrial/medical/index.html. Accessed 10 Dec 2014 10. Makajic-Nikolic, D., Petrovic, N., Belic, A., Rokvic, M., Radakovic, JA., Tubic, V.: The fault tree analysis of infectious medical waste management. J. Cleaner Prod. 113, 365−73 (2016). https://doi.org/10.1016/j.jclepro.2015.11.022 11. Komilis, D., Fouki, A., Papadopoulos, D.: Hazardous medical waste generation rates of different categories of health-care facilities. Waste Manag. 32(7), 1434−1441 (2012) 12. Nwachukwu, N.C., Orji, F.A., Ugbogu, O.C.: Health care waste management–public health benefits, and the need for effective environmental regulatory surveillance in federal Republic of Nigeria. Curr. Topics Public Health. 2, 149−178 (2013). https://doi.org/10.5772/53196 13. World Health Organization e WHO, Medical Waste (2013). http://www.who.int/topics/med ical_waste/en/. Accessed 25 May 2015 14. Bangladesh, P.R.I.S.M.: Survey Report on Hospital Waste Management in Dhaka City. Unpublished Rep., PRISM Bangladesh, Dhaka (2005) 15. Al-Habash, M., Al-Zu’bi, A.: Efficiency and effectiveness of medical waste management performance, health sector and its impact on environment in Jordan applied study. World Appl. Sci. J. 19(6), 880−893 (2012) 16. Hossain, M.S., Santhanam, A., Norulaini, N.N., Omar, A.M.: Clinical solid waste management practices and its impact on human health and environment–a review. Waste Manag. 31(4), 754−766 (2011) 17. Rolewicz-Kali´nska, A.: Logistic constraints as a part of a sustainable medical waste management system. In: 2nd International Conference Green Cities - Green Logistics for Greener Cities (2016). https://doi.org/10.1016/j.trpro.2016.11.044 18. Emmanuel, J., et al.: Safe management of wastes from health-care activities. In: 2nd edition. World Health Organization (2014). http://www.healthcarewasteorg/fileadmin/user_upload/ resources/Safe-Management-of-Wastes-from-Health-Care-Activities-2.pdf 19. Insa, E., Zamorano, M., López, R.: Critical review of medical waste legislation in Spain. Resour. Conserv. Recycl. 54(12), 1048−1059 (2010) 20. Liau, C.J.: Belief, information acquisition, and trust in multi-agent systems—a modal logic formulation. Artif. Intell. 149, 31–60 (2003) 21. Kamdar, R., Paliwal, P., Kumar, Y.: A state of art review on various aspects of multi-agent system. J. Circuits Syst. Comput. 27(11), 1830006 (2018). https://doi.org/10.1142/S02181 26618300064 22. Wooldridge, M.: Responding to real-world complexity: introduction to multi-agent technology. Magenta Technol. Whitepaper (2005) 23. Nagata, T., Sasaki, H.: A multi-agent approach to power system restoration. IEEE Trans. Power Syst. 17, 457–462 (2002) 24. Solanki, J.M., Khushalani, S., Schulz, N.N.: A multi-agent solution to distribution systems restoration. IEEE Trans. Power Syst. 22, 1026–1034 (2007) 25. Xiangjun, Z., Li, K.K., Chan, W.L., Sheng, S.: Multi-agents based protection for distributed generation systems. In: Proceedings of 2004 IEEE International Conference on Electric Utility Deregulation, Restructuring and Power Technologies, vol. 1, pp. 393−397. IEEE (2004) 26. McArthur, S.D.J., et al.: Multi-agent systems for power engineering applications—part I: concepts, approaches, and technical challenges. IEEE Trans. Power Syst. 22, 1743–1752 (2007)
A Multi-Agent System for the Optimization of Medical Waste Management
149
27. Wooldridge, M.: An Introduction to Multi-Agent Systems, 2nd edn. John Wiley & Sons, Chichester, UK (2009) 28. Colson, C.M., Nehrir, M.H.: A review of challenges to real-time power management of microgrids. In: 2009 IEEE Power Energy Society General Meeting, pp. 1−8. IEEE (2009) 29. Ferber, J.: Multi-agent system: towards a collective intelligence, iia, InterEdition (1995)
A Relaxed Variant of Distributed Q-Learning Algorithm for Cooperative Matrix Games Elmehdi Amhraoui1(B) and Tawfik Masrour1,2 1
Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), ENSAM-Meknes, Moulay Ismail University of Meknes, Meknes, Morocco [email protected], [email protected] 2 University of Quebec at Rimouski, Rimouski, Canada
Abstract. The Distributed Q-learning algorithm is known to converge in the case of deterministic multiagent systems. However, the algorithm fails to converge in the presence of the stochasticity problem due to the over-estimation of action values. In this article, we present a new relaxation of the Distributed Q-learning by introducing a new update rule for the Q-function of each agent. We discuss the existing literature, then compare our algorithm with different algorithms found in the literature: Decentralized Q-learning algorithm, Distributed Q-Learning algorithm, Hysteretic Q-Learning algorithm, and Lenient Q-Learning algorithm. Experiments in popular matrix games demonstrate that our algorithm is very effective in terms of convergence. Keywords: Cooperative multiagent systems · matrix games · independent learners · stochasticity problem · distributed Q-learning game theory
1
·
Introduction and Related Works
Multiagent reinforcement learning (MARL) has recently received a lot of attention due to advances in single-agent reinforcement learning (RL) techniques [1– 3]. Several studies have attempted to extend the existing algorithms and techniques of RL to MARL [4–6]. However, MARL is still facing some challenging problems including the curse of dimensionality, the non-stationarity problem, and the coordination problem [5]. MARL is widely studied using concepts of the game theory, a sub-field of mathematics that studies the optimal decisionmaking of autonomous agents. In game theory, multiagent systems (MASs) are generally formalized as a Markov game [7]. A particular case of Markov games is the fully cooperative Markov games where the agents have similar reward functions and so share a common goal. Cooperative Markov games have application in wide range of real applications including logistics and robotics [3,8]. E. Amhraoui and T. Masrour—These authors contributed equally to this work. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 150–160, 2023. https://doi.org/10.1007/978-3-031-43520-1_13
A Relaxed Variant of Distributed Q-Learning
151
Many approaches have been proposed to find optimal policies in fully cooperative Markov games. An important approach is Decentralized Learning where each agent tries to learn its individual optimal policy [9]. This approach is more realistic given that agents in multiagent systems are inherently independent and distributed [9,10]. Within the Decentralized Learning framework, many studies have focused on the case where each agent has no access to actions performed by other agents, this approach is called Independent Learners approach (ILs) [11]. This article focuses on the ILs approach because it is more practical and does not require the exchange of information between agents. The first ILs algorithm proposed in the literature is Decentralized Q-Learning and it is an intuitive extension of the single-agent Q-learning to multiagent systems [11]. Despite its successful applications [8], Decentralized Q-Learning suffers from a problem called the relative over-generalization problem. In [12], the authors proposed the Distributed Q-Learning algorithm. Distributed Q-Learning has been shown to converge to an optimal joint policy in Markov games with deterministic rewards and transitions. However, the algorithm is extremely susceptible to the stochasticity of rewards and transitions. Therefore, the extension of distributed Q-learning to stochastic domains is still an open question. In [13], the authors proposed the Hysteretic Q-learning algorithm to address the vulnerabilities of distributed Q-learning. In contrast to distributed Q-learning which ignores the negative experiences, the Hysteretic Q-learning proposes to incorporate all negative experiences in the update of Q-values using a small learning rate. The algorithm shows some improvements in stochastic domains but still fails in domains with many mis-coordination factors. An important ILs algorithm that has been proposed in the literature is Lenient Q-learning [14–16]. The algorithm uses the concept of leniency to address the mis-coordination factors. Briefly, leniency consists of storing a temperature value for every state and action, this temperature is used to define the degree of leniency towards negative experiences. The temperature starts with high values and decreases on every visit of the state-action pair. The lenient Q-learning algorithm is found to outperform other algorithms. However, the algorithm requires many steps to converge and tuning its parameters is difficult. In this paper, we address the vulnerability of distributed Q-learning to stochasticity by proposing a new update rule of the Q-values of each agent. In contrast to distributed Q-learning, our algorithm does not neglect negative temporal difference (TD) errors, but it incorporates them using a learning rate that depends on TD error. As a result, our algorithm can be considered as a relaxed variant of the distributed Q-learning algorithm. As we will show, our algorithm has the same strength as distributed Q-learning in deterministic games, in addition, it performs better in stochastic problems. The performance of our algorithm is demonstrated on a variety of popular matrix games. This paper is structured as follows: Sect. 2 provides the basic background in Multiagent reinforcement learning. In Sect. 3, we review some of the factors responsible for the lack of coordination in the ILs approach. In Sect. 4, we review and discuss the state-of-the-art ILs algorithms found in the literature. In Sect. 5,
152
E. Amhraoui and T. Masrour
we propose a relaxed variant of distributed Q-learning algorithm. Numerical experiments are presented in Sect. 6 and Sect. 7 concludes the paper.
2
Background: Multiagent Reinforcement Learning
This section provides some basic concepts in multi-agent reinforcement learning (MARL). 2.1
Markov Games
In MARL, MASs are commonly formalized as a Markov Game [7], which is an extended model of Markov decision process (MDP) [17]. Definition 1. The Markov game consists of the following elements: The number of agents denoted by m, The set of all possible states denoted by S, The set of all actions available to the agent i denoted by Ai , The transition model denoted by T , such that T : S × A × S −→ [0, 1] and P (st+1 = s | st = s, at = a) = T (s, a, s ) – The reward function for agent i denoted by Ri , such that Ri : S × A −→ R.
– – – –
A special case of Markov games is repeated matrix games . A repeated matrix game is a Markov game without a state variable and thus without a transitional model, that is a tuple < n, A1 , · · · , An , R1 , · · · , Rm >. An example of matrix game is the penalty game 1. Matrix games have been studied extensively in game theory because of their simplicity and being well understood. In the following, the individual elements of agent i, such as the individual action, the individual policy, the individual Q-function, are indexed by i. Elements shared among agents are written in bold e.g., the joint action aS. We define an individual policy for agent i as a function πi : S × Ai −→ [0, 1] that associates to every state-action pair (s, ai ), the probability of performing the action ai in the state s, that is πi (s, ai ) = P (ati = ai | st = s) 2.2
(1)
Fully Cooperative Markov Game
An important subclass of Markov games is the fully cooperative Markov game in which all agents have equal rewards functions. This model is generally used for modeling real cooperative MASs, such as Automated Guided Vehicle System [18]. This article will focus on the fully cooperative Markov game. The advantage of fully cooperative Markov games is that agents have similar reward functions, hence a single value function can be defined. As a result, we
A Relaxed Variant of Distributed Q-Learning
153
Fig. 1. Penalty game payoff matrix
can define an optimal joint policy as a policy π ∗ that maximizes the expected discounted sum of rewards received by all agents, that is ⎡ ⎤ ∞ μj rt+j+1 | st = s⎦ ∀s ∈ S. (2) π ∗ ∈ arg max Eπ ⎣ π
j=0
An intuitive framework for finding an optimal joint policy in fully cooperative Markov games is the centralized learning framework where the multiagent system is reduced to a single big agent. In this case, the optimal joint policy can be find by learning the optimal joint action Q-function using the single-agent Q-learning algorithm [19]. However, assuming the existence of a central entity that can observe the joint actions of all agents is unrealistic given that multiagent are inherently distributed systems. In addition, the search space in the centralized learning approach grows exponentially with the number of agents. An alternative approach to centralized learning is the decentralized learning framework where each agent tries to find its individual optimal policy in a completely independent learning process [20]. This framework is very interesting especially the case where each agent is blind to actions executed by other agents (Independent Learners (ILs) [20]). As a result, each agent has its own Q-function that evaluates its individual action in a given state. In this paper, we will focus on the ILs approach as it does not require the exchange of local information between agents. The exchange of information might be expensive or even impossible.
3
Primary Factors of Non-coordination in ILs Approach
The primary challenge with ILs approach is the coordination problem. In this section, we will review some of the factors that contribute to non-coordination in ILs. A detailed analysis of these factors can be found in [10,21]. The non-stationarity problem: Since each agent is blind to the actions executed by other agents, the observed information is no longer stationary and Markovian in the local perspective of each agent. The stationarity and the
154
E. Amhraoui and T. Masrour
(a) Deterministic climbing game (D)
(b) Partially game (PS)
stochastic
climbing
(c) Fully stochastic climbing game (FS)
Fig. 2. The climbing game and its variants
Markov property are fundamental hypothesis in the theory of single agent reinforcement learning [22–24]. Learning optimal policies in a decentralized manner is extremely challenging due to the violation of these basic assumptions [4,5,10]. The miscoordination problem: Independent learners must learn to choose their individual actions from the same optimal joint policy. Since the optimal joint policy is not necessary unique, it is possible that agents choose their actions from different optimal joint policies and the resulting joint policy is not optimal. To illustrate this issue, we consider the penalty game in Fig. 1. For k = −100, the penalty game will have exactly two optimal joint actions (< a, a > and < c, c >). Therefore, each agent can choose a or c as its individual optimal action. If one agent chooses the action a and the other chooses the action c, the resulting joint action is < a, c > or < a, c >. Unfortunately, these resulting joint actions are the worst in terms of associated reward. The stochasticity problem: A major problem that lead to non-coordination in ILs is the noise associated with rewards and transitions. For example, when an agent observes a change in the observed reward, he will not be able to pinpoint the source of the change, whether it is due to the actions of other agents or environmental noise. The relative over-generalization problem: This problem occurs when a sub-optimal policy is given a higher value than the optimal policy. This problem is common among algorithms that estimates values of action based on the average observed rewards. To illustrate this problem, we consider the deterministic climbing game Fig. 2a. If agent 2 selects randomly its actions (with equal probabilities), the expected reward of agent 1 is:
A Relaxed Variant of Distributed Q-Learning
155
Q1 (a) −6 Q1 (b) −5 Q1 (c) 1 Although a is the theoretically optimal individual action for each agent, c has the highest value while a has the lowest. In this section, we looked at some factors that contribute to non-coordination in the ILs approach. An effective independent learners algorithm should address Each of these factors.
4
The State-of-the-Art Independent Learners Algorithms
In this section, we will review four algorithms from the literature of cooperative MARL. These algorithms are: Decentralized Q-Learning, Distributed QLearning, Hysteretic Q-Learning, and Lenient Q-Learning. Because our experiments will only be conducted on repeated matrix games, we will only present the version of these algorithms for repeated matrix games. 4.1
The Decentralized Q-Learning Algorithm
Let G =< n, A1 , · · · , An , R > be a cooperative repeated matrix game. Let at = (at1 , · · · , atm ) be the joint action at time step t, and rt be the gained reward. The Decentralized Q-Learning [11] updates the Q-function for agent i by performing the update (3): (ati ) = Qti (ati ) + α rt − Qti (ati ) (3) Qt+1 i The Decentralized Q-learning is know to suffer from the relative overgeneralization problem since it is an average-based algorithm. 4.2
The Distributed Q-Learning Algorithm
Let G =< n, A1 , · · · , An , R > be a cooperative repeated matrix game. Let at = (at1 , · · · , atm ) be the joint action at time step t, and rt be the gained reward. The Distributed Q-Learning algorithm [12] updates the Q-function for agent i by performing the update (4). Qti (ati ) + αδit , if δit > 0, t+1 t Qi (ai ) = (4) Qti (ati ), otherwise, where δit = rt − Qti (ati ). In deterministic case, the Distributed Q-learning algorithm can easily overcome the relative over-generalization problem since it estimates the value of actions based on maximum observed rewards during learning. However, the algorithm performs poorly in problems with stochastic rewards.
156
4.3
E. Amhraoui and T. Masrour
The Hysteretic Q-Learning Algorithm
The distributed Q-learning algorithm assumes that the negative experiences (negative temporal difference errors) are a result of agents bad behaviors. Therefore, the algorithm ignores them. However, in stochastic cooperative games, negative experiences may be caused by the environmental noise and ignoring them may result in the over-estimation of the action values. The Hysteretic Q-Learning [13] method suggests incorporating all negative experiences in the update of Qvalues using a learning rate smaller than the one used when incorporating the positive experiences. Let G =< n, A1 , · · · , An , R > be a cooperative repeated matrix game. Let at = (at1 , · · · , atm ) be the joint action at time step t, and rt be the gained reward. The Hysteretic Q-Learning algorithm’s update for agent i is given in (5). Qti (ati ) + αδit , if δit > 0, t+1 t Qi (ai ) = (5) Qti (ati ) + βδit , otherwise. where δit = rt − Qti (ati ) and β α. Since the Hysteric Q-learning algorithm incorporates all negative experiences and does not have any mean to differentiate between environmental noise and the behavior of agents, the algorithm may suffer from the relative over-generalization problem. 4.4
The Lenient Q-Learning Algorithm
In the Lenient Q-Learning algorithm [14,16], every agent maintains a temperature Ti (ai ) for every action ai . The temperature takes high values at the beginning of learning process and is decayed on every visit of the action ai . The temperature function is used by the agent to define the probability of ignoring or incorporating the actual negative experience in the update of its Q-function. The update of the Q-function and the temperature function for agent i are given in (6) and (7), respectively. Qti (ati ) + αδit , if δit > 0 or u < 1 − exp(− θT t1(at ) ) t+1 t i i Qi (ai ) = (6) Qti (ati ), otherwise. Tit+1 (ati ) = v × Tit (ati )
(7)
where δit = rt − Qti (ati ), θ is the lenience moderation factor, u is an outcome of the uniform random variable, v is the temperature decay factor. Parameter tuning is the major difficulty in the Lenient Q-Learning algorithm[10]. Additionally, the algorithm often requires many steps to converge than the alternative algorithms [16].
A Relaxed Variant of Distributed Q-Learning
5
157
Our Contribution
In Deterministic problems, the Distributed Q-learning algorithm can easily overcome relative over-generalization problem. Moreover, the algorithm can solve the mis-coordination problem if it is combined with an effective action-selection strategy. However, the Distributed Q-learning algorithm typically fails in problems with stochastic transitions and rewards. Our main goal is to provide an algorithm that have the same strengths of Distributed Q-learning but it also performs well in stochastic domains. For this reason, we will construct an algorithm that relaxes the optimistic assumption (the maximum approach) in distributed Q-learning by introducing a continuous learning rate that incorporates all agents experiences in the update of Q-values using different values of the learning rate which are calculated dynamically based on the value of the temporal difference error. t as in Eq. (8). We define αi,θ . t αi,θ = α × fθ (δit )
(8)
where α ∈]0, 1] and fθ (x) is a function that can take many forms. For example fθ (x) =
arctan(θx) + π
or fθ (x) =
π 2
1 1 + exp(−θx)
Let’s consider the update Eq. (9). t Qt+1 (ati ) = Qti (ati ) + αi,θ δit i
(9)
From Eq. (8) and (9), we can easily notice that if θ is sufficiently large, the update rule (9) is equivalent to Distributed Q-Learning Algorithm (4). Therefore, a relaxation of Distributed Q-learning can be obtained when θ is well chosen. Generally, θ should not be large (e.g., θ ∈]0, 1]). The relaxed version of distributed Q-learning will update the Q-value function as in Eq. (10). Qt+1 (ati ) = Qti (ati ) + α × fθ (δit ) × δit i
(10)
where δit = rt − Qti (st , ati ).
6
Numerical Experiments
In this section, we will compare our algorithm with various independent learners algorithms discussed in Sect. 4. The experiment will be carried out on the cooperative repeated matrix games shown in Fig. 1 and Fig. 2. The community has paid close attention to these games [10,12–14,16]. Although they seem
158
E. Amhraoui and T. Masrour
Table 1. Number of trials (out of 1000) converging to an optimal joint action. The values of the parameters are listed in Table 2 Penalty game
Climbing game
k=0
k = −100
Deterministic
Partially Stochastic
Our algorithm
1000
1000
1000
999
1000
Lenient Q-learning
1000
924
1000
990
888
Fully Stochastic
Hysteretic Q-Learning
1000
1000
1000
581
237
Decentralized Q-learning
1000
998
275
313
283
Distributed Q-learning
1000
1000
1000
0
0
Table 2. Parameters used for the results presented in Table 1 Penalty game
Climbing game
k = 0 and k = −100
D
PS
FS
Our algorithm
α = 0.1
α = 0.1
α = 0.1
θ = 0.1
θ = 0.05
θ = 0.5
θ = 0.1
θ = 0.1
Lenient Q-learning
mintemp = 2
mintemp = 2
mintemp = 2
mintemp = 2
maxtemp = 50
maxtemp = 50
maxtemp = 50
maxtemp = 50
ω = 1, θ = 1
ω = 1, θ = 106
ω = 1, θ = 104
ω = 1, θ = 10
τ = 0.1
τ = 0.1
τ = 0.1
τ = 0.1
δ = 0.995
δ = 0.995
δ = 0.995
δ = 0.995
Hysteretic Q-learning
α = 0.1
α = 0.1
α = 0.1
α = 0.1
β = 0.01
β = 0.01
β = 0.01
β = 0.05
Decentralized Q-learning
α = 0.1
α = 0.5
α = 0.5
α = 0.5
α = 0.1
α = 0.11
α = 0.1a
Distributed Q-learning α = 0.1 all other possible values give the same results
a
straightforward, modern independent learner algorithms find them very challenging [16]). These games are also intriguing since they make it simple to combine all non-coordination factors. Except for the Lenient Q-Learning algorithm, which has its own method, all algorithms in this experiment will employ the Boltzmann selection method. The temperature of Boltzmann method will remain the same for all games and all algorithms. The formula of the temperature of the Boltzmann method is τ (t) = 50 × exp(−0.001 × t), where t is the repetition number and the minimal temperature is 1. For all algorithms, each trial consists of N = 5000 repetition of the game. Q-values are always initialized to 0 for all algorithms. In this experiments, the performance measure is the number of trials that converge to an optimal joint action. Therefore, when a trail is completed (A trial is defined as N repetitions of the game), we extract the action that maximizes the individual Q-function for each agent and check if the resulting joint action is optimal. The number of trials (out of 1000) converging to an optimal joint action according to the game and the algorithm are shown in Table 1. From Table 1, we notice that all algorithms perform well in the penalty game with k = 0. As expected, the decentralized Q-learning does not perform well in the penalty game with k = −100 and the deterministic climbing, this is due do to the presence of the relative over-generalization problem which a very challenging problem for average-based algorithms including Decentralized Q-learning. Other
A Relaxed Variant of Distributed Q-Learning
(a) Agent 1
159
(b) Agent 2
Fig. 3. The Q-values of the three actions obtained by the distributed Q-learning algorithm in the partially stochastic climbing game
algorithms are performing well in all deterministic games. However, in the presence of stochastic, our algorithm outperforms all other algorithms. In fact, our algorithm is the only one that completely solves the stochastic variants of the climbing game. The distributed Q-learning algorithm has the worst performance in the stochastic variant of the climbing game, this is due to the over-estimation of the Q-value of the action b (see Fig. 3). Therefore, our algorithm can be seen as an extension of the distributed Q-learning to stochastic domains.
7
Conclusion
In this paper, we presented a relaxed version of the traditional distributed Qlearning algorithm. This algorithm is meant to conserve the strengths of the distributed Q-learning and to overcome some factors that lead to non-coordination in cooperative MASs, especially the stochasticity problem. We compared our algorithm against the state-of-the-art algorithms from the literature. Our results have shown that our algorithm outperforms competing algorithms in term of convergence. In addition, the algorithm has fewer parameters to tune. Although our approach can extended to cooperative Markov games, the paper studies only cooperative repeated matrix games. In future work, we will extend the research to complex and practical domains that involve states and several agents. Acknowledgments. The first author’s work is supported by the national center for scientific and technical research, Morocco.
References 1. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015) 2. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
160
E. Amhraoui and T. Masrour
3. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1), 1334–1373 (2016) 4. Panait, L., Luke, S.: Cooperative multi-agent learning: the state of the art. Auton. Agents Multi-Agent Syst. 11(3), 387–434 (2005) 5. Bu¸soniu, L., Babuˇska, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. Innov. Multi-Agent Syst. Appl. 1, 183–221 (2010) 6. Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 33(6), 750–797 (2019) 7. Shapley, L.S.: Stochastic games. Proc. National Acad. Sci. 39(10), 1095–1100 (1953) 8. Busoniu, L., De Schutter, B., Babuska, R.: Decentralized reinforcement learning control of a robotic manipulator, pp. 1–6, IEEE (2006) 9. Boutilier, C.: Planning, learning and coordination in multiagent decision processes, vol. 96, pp. 195–210, Citeseer (1996) 10. Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27, 1–31 (2012) 11. Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents, pp. 330–337. Morgan Kaufmann (1993) 12. Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems, Citeseer (2000) 13. Matignon, L., Laurent, G. J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, pp. 64– 69. IEEE (2007) 14. Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems, pp. 801–803 (2006) 15. Panait, L., Tuyls, K., Luke, S.: Theoretical advantages of lenient learners: an evolutionary game theoretic perspective. J. Mach. Learn. Res. 9, 423–457 (2008) 16. Wei, E., Luke, S.: Lenient learning in independent-learner stochastic cooperative games. J. Mach. Learn. Res. 17(1), 2914–2955 (2016) 17. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons (2014) 18. Rhazzaf, M., Masrour, T.: Deep Learning Approach for Automated Guided Vehicle System. In: Masrour, T., Cherrafi, A., El Hassani, I. (eds.) A2IA 2020. AISC, vol. 1193, pp. 227–237. Springer, Cham (2021). https://doi.org/10.1007/978-3-03051186-9 16 19. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992) 20. Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752), 2 (1998) 21. Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative q-learning systems. 2007, 780–785 (2007) 22. Tuyls, K., Weiss, G.: Multiagent learning: Basics, challenges, and prospects. Ai Magazine 33(3), 41–41 (2012) 23. Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: a survey. J. Artif. Intell. Res. 53, 659–697 (2015) 24. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction (2018)
Remote Sensing Image Super-Resolution Using Deep Convolutional Neural Networks and Autoencoder Safae Belamfedel Alaoui(B) , Hassan Chafik , Abdeslam Ahmadi , and Mohamed Berrada Department of Mathematics and Computer Science, National High School of Arts and Crafts, Moulay Ismail University Meknes, Marjane 2, B.P., 15290 Al-Mansor, Meknes, Morocco [email protected]
Abstract. Deep convolutional neural networks have become the most used architectures in learning-based super-resolution methods. They can extract the mapping relationship between low-resolution images and high-resolution ones. This paper explores deep convolutional auto-encoder-based super-resolution, which includes one auto-encoder unit with several convolutional blocks in both the encoding and the decoding sub-units. In this study, we used RGB slices composed of the red, green, and blue bands of Sentinel 2B MSI images with a spatial resolution of 10 m. After degrading their resolution to restore their initial resolution, we introduced them into the model. The model’s performance is encouraging (PSNR: 31.94, SSIM: 0.906, MSE: 0.00637training accuracy: 0.9055, validation accuracy: 0.90, and test accuracy: 0.89). Compared to other advanced methods such as SRCNN and VDSR for the same data, our proposed method achieves satisfactory results. Keywords: Convolutional neural network · DCASR · Deep learning · Autoencoders · Image super-resolution · Satellite images
1 Introduction Remote sensing technology enables humanity to acquire information on the Earth’s surface for significant purposes such as weather, forestry, agriculture, surface changes, and biodiversity. The publicly available spatial remote sensing products have a meter or several meters spatial resolution. Such a resolution does not fulfil the increasing demand for high-resolution images and precise applications. In addition, purchasing commercial products of the sub-meter resolution is costlier, especially for larger areas. Image super-resolution (SR) consists of improving an image’s resolution by restoring high-resolution images from one or more corresponding low-resolution images. In other words, increasing the number of pixels in the image while retaining finer spatial details than those in the original image. It is an important tool in computer vision and image processing. It is widely used in real-world applications such as medical imaging for diagnosis, monitoring, and satellite imaging, among others. In particular, it can also provide better visual experience for watching movies and playing video games (A. Lugmayr, M. Danelljan, and et al. 2021) [24]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 161–171, 2023. https://doi.org/10.1007/978-3-031-43520-1_14
162
S. B. Alaoui et al.
Moreover, the great development of deep learning (DL) techniques has allowed the creation of models to deal with the problems of super-resolution of images and videos (G, D. Zhu et al. 2021) [26]. As a result, ANN architecture has improved from simple linear perceptron classifiers to deeper architectures called deep neural networks (DNNs), which can handle more complex models that can extract more information from the image than ANN. Today (DNNs) have demonstrated their high performance in solving super-resolution tasks (W. Peijuan et al. 2022) [25]. Image super-resolution (SR) provides a class of technics to overcome the spatial resolution limitation of sensors (J. Yang et al. 2010) [1]. These technics can enhance the image’s quality resolution by transcending the diffraction limit of the sensor or the lens. In this case, optical SR proceeds by multiplexing the spatial-frequency bands. Geometric SR is used when an image is degraded with noise or blurring artifact (Chen, et al. 2022) [27]. Literature regroups SR methods into three classes (Lu et al. 2015) [14]: interpolationbased, reconstruction-based, and learning-based approaches. In this paper, we are interested in the last category. Learning-based image SR methods (Zhengfeng Shao et al. 2019) [3] proceed by finding the relationship between a low- resolution and its corresponding high-resolution image. They include deep learning- based (DLSR) or neighborhood-embedding methods for such a task. In DLSR, SR uses informative features detected by the deep network structure (e.g., convolutional neural network). The only complication of those methods resides in finding the best network architecture and supplying sufficient training samples. This work contributes by proposing a customized architecture of deep convolutional auto-encoder-based SR (DCASR) for metric resolution satellite images. Deep learning (DL) is a breakthrough in machine learning technology (Yang et al. 2019) [4]. DL applications demonstrated a robust performance that outperformed other machine learning tools. Nowadays, one can refer to the literature to see how DL-based SR has been actively explored (e.g., Dong et al. 2014; Kim et al. 2016) [5, 6]. The architecture of the proposed model is relatively simple compared to what other studies have proposed (zhu et al. 2021) [13], containing a series of convolution blocks alternating with max-pooling operation in both the encoding and the decoding part. In this case study, we use an RGB image composed of the red, green, and blue bands of the Sentinel 2B MSI image. The image slices undergo resolution degradation to create low-resolution (LR) images. Since auto-encoder methods are unsupervised, we feed the proposed DCASR model with a subset of LR images as training samples. Another two subsets are reserved for validation and test steps. We evaluated our model in terms of PSNR, SSIM, and accuracy metrics, then compared our results against two state-of-the-art super-resolution methods: SRCNN [20] and VDSR [19]. The model showed a high accuracy: 0.90, 0.90, and 0.89 for, respectively, training accuracy, validation accuracy, and test accuracy. Furthermore, a higher PSNR value indicates better image reconstruction quality.
Remote Sensing Image Super-Resolution Using Deep Convolutional
163
2 Material and Methods 2.1 Data Preparation This work uses data from the Sentinel 2 satellite of the European Space Agency. The Multi-Spectral Instrument (MSI) sensor provides 13 bands of different resolutions from 10 to 60 m, covering the visible, shortwave infrared (SWIR), and near-infrared (NIR) (Table 1). The images are available on the USGS Earth Explorer platform. For This Study, we downloaded the tile number T30STC captured above Meknes city, Morocco. The cloud cover of the scene is at 2.40330%. (Table 1) presents more details about the scene for reproduction. We have used only 3 of the13 bands, Red, Green, and Blue (RGB), with a resolution of 10 m (Table 2), on which we have applied atmospheric and radiometric corrections. After that, we clipped a scene of (10793, 10793), then we split our RGB image into small images of size (251, 251) (Fig. 1). Our data set counts for 1849 sample images. In order to increase the volume of data, we used a mirror method, a data augmentation tool that generates a mirror-like twin of an original image. With this procedure, we have 3698 images. The model is trained using 2898 samples, with 400 samples reserved for testing and another 400 for validation. Table 1. Sentinel 2A MSI image characteristics. Band name
Central wavelength(nm)
Entity ID
L1C_T30STC_A020426_20210202T110316
Tile Number
T30STC
Platform
SENTINEL-2B
Pixel Depth
32 Bits
Cloud Cover
2.40330
Table 2. Corresponding band characteristics of Sentinel 2A MSI. Band name
Resolution
Central wavelength(nm)
Blue
10
492.1
Green
10
559.0
Red
10
665.0
164
S. B. Alaoui et al.
Fig. 1. In the left the filter used for clipping and in the right the clipped image examples.
2.2 Convolutional Neural Network Convolutional neural networks (CNN) are a type of deep neural network that have proved their performance in a variety of tasks in computer vision fields such as Object detection, Face recognition, pan-sharpening [15–17], image classification [18], and image superresolution (SR) [5]. CNN is a mathematical assembly of four types of convolutional layers, pooling, activation function and fully connected layer. The first two, convolution and pooling layers, perform feature extraction, while the last, a fully connected layer, maps extracted features into the final output. CNN consists of several convolution layers. Each layer is made up of a set of filters (called kernels) that are applied to an input image. For each layer we define an activation function to introduce nonlinearity into the network by which they learn and approximate continuous and complex relationships between variables. Several activation functions exist, e.g. Relu, Softmax, tanH, and Sigmoid. The output of the convolutional layer is a feature map, which contains the most important features of a given input image in a process called Feature extraction. In this case, we built five different CN layers, distributed in the DCASR architecture and sometimes repeated, with a doubling number of filters (64, 128, 256, 512, and 1024) with a stride of 1. All the filters are of the same size (3, 3). We defined the same activation function the Rectified Non-Linear Unit (ReLU) for all the CN layers due to its simplicity and performance in combating the vanishing gradient problem. The ReLU function formula is as follows: ReLU(x) = max(0, x).
Remote Sensing Image Super-Resolution Using Deep Convolutional
165
Pooling Layers are typically used after a Convolutional Layer. They aim to reduce the size of the convolved feature map before it is fed into a fully connected layer. This will reduce the training cost. There are two main types of pooling operations: max pooling and average pooling. In our case, the max-pooling layer is used to extract the maximum value from each feature map. The fully connected layer is the final layer of a convolutional neural network (CNN). Every neuron in a fully-connected layer is fully connected to every neuron in the previous layer. As a result, it can be computed using a matrix multiplication followed by a bias affecting the fully connected layer assists in mapping the representation between the input and output. 2.3 Autoencoders Autoencoders [22] are a type of feed forward generative model used for unsupervised learning. They aim to learn how to compress and encode data and then reconstruct the output from this reduced encoded representation to get an output that is identical to the input and has the same dimensionality. DCASR architecture (Fig. 2) consists of three parts: The encoder: is responsible for compressing the input into a lower-dimensional representation. In this work, the input image that has 256 × 256 pixels is processed by five CNN blocks. The first block contains two convolution layers; each one defines 64 filters sized 3 × 3 and ReLU activation function. The convolution layer is followed by a max-pooling layer. The block ends with a dropout layer to reduce the overfitting. The other four blocks are configured similarly but lack the dropout layer. Each block doubles the previous block’s number of filters. The output is a feature map, which is a compressed image size of 8 × 8. The hidden layer: also known as the bottleneck, is a critical component of the network smaller than the input layer and the decoder, which contains the compressed input. We defined a convolutional layer with the same configuration and 1024 filters. The decoder: is the architecture component that decompresses and reconstructs data from the encoded representation. The decoder is a mirror-like architecture of the encoder, but it incorporates upsampling layers instead of max pooling. Hence, the decoder blocks begin with an upsampling layer followed by two convolution layers. As we move from the block to the next, the number of filters is divided by two. The decoder, unlike the encoder unit, recalls specific convolution layers from the encoder. It joins them together between the blocks.
166
S. B. Alaoui et al.
Fig. 2. The DCASR architecture.
3 Results and Discussion 3.1 Data Preparation The data preparation step resulted in 1849 slices (251, 251) from the clipped scene (10793, 10793). Data augmentation generated mirror-like images from the initial 1849 ones. In total, we had 3698 images. The dataset was then divided into 2898 samples for training, 400 samples for testing, and 400 samples for validation. Before using the input image in the model, we have proceded by lowering the resolution without changing the size to 1/4th of its size. Then, we scaled back to the original size using bicubic interpolation (Fig. 3). We have now the original high- resolution image to compare the output with and the low-resolution images to provide to the model after being resized (256, 256) to be compatible with the model architecture (Fig. 2) (Figs. 4 and 5).
Remote Sensing Image Super-Resolution Using Deep Convolutional
167
Fig. 3. The original images on the left; the degraded images in the middle, and our model selection on the right is.
3.2 DCASR Model For validation purpose a quantitative evaluation of the results for our datasets are shown in (Table 3) and (Table 4). The accuracy is satisfying and the mean square error (MSE) is very low. We have also compared our model with other well-known architectures SRCNN model [20], VDSR [19], and the bi-cubic interpolation. All the models were run on the same dataset. The comparison was done based on tow metrics to judge the performance of the models, the peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). As shown in (Table 3), our model recorded the highest PSNR and SSIM.
168
S. B. Alaoui et al.
Fig. 4. The visual comparison between the resolution of degraded image (LR) and our HR result.
Fig. 5. The visual comparison between the resolution of degraded image (LR) and our HR result.
Table 3. Comparison of PSNR, and SSIM values for the three implemented models and bicubic interpolation. In bold the value indicating the best performing model
PSNR SSIM
SRCNN
Bi-cubic
VDSR
Ours
30.98
29.56
31.68
31.94
0.859
0.709
0.890
0.906
Remote Sensing Image Super-Resolution Using Deep Convolutional
169
Table 4. Comparison The accuracy and MSE value for DCASR model. Accuracy
MSE
0.9055
0.00637
DCASR model is training in our Dataset in 50 epochs the accuracy and the loss calculated (Fig. 6) (Fig. 7) were provided a high accuracy and a low loss that improve the performance of our proposed model
Fig. 6. Summarize history for accuracy.
Fig. 7. Summarize history for loss.
We observe that 30 epochs are sufficient and that the accuracy and the loss do not change significantly after 30 epochs (Fig. 6) (Fig. 7). As a result, reducing the number of epochs reduces the model’s complexity (the execution time).
170
S. B. Alaoui et al.
4 Conclusion and Perspectives The aim of this work was to explore the utility of applying super resolution methods on enhancing the spatial resolution of satellite images. This work presented a customized deep convolutional auto-encoder-based SR (DCASR) architecture. DCASR has proven its performance in applying it on Sentinel 2B MSI image slices. This was proven by the high accuracy and the low MSE (respectively 0.9055, 0.00637). Besides, our model outperformed other architectures (SRCNN, VDSR, and the bi-cubic interpolation). Hence our model could enhance the quality of the satellite images. The use of Super_resolution in several domains implies challenges to improve the performance and reduce the learning time [27]; in this context, the perspectives proposed in this paper are as follows: • Reduces the overall runtime of the training is important for improving performance. Batch normalization is a technique that achieves these goals in this context. In this regard, batch normalization techniques for super-resolution should be further investigated. • In Super-resolution, network design architectures should be more thoroughly investigated because they affect how well any SR model performs. • Another area of network design for quick and precise reconstruction of the HR image is the simultaneous use of low- and high-level representations to speed up the SR process.
References 1. Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010) 2. Lu, Z.W., Wu, C.D., Chen, D.Y., Qi, Y.C., Wei, C.P.: Overview on imag esuper resolution reconstruction. In: Proceeding 26th Chines Control Decision Conference, pp. 2009–2014 (2014) 3. Shao, Z., et al.: (2019). https://doi.org/10.1109/JSTARS.2019.2925456 4. Yang, W., Zhang, X., Tian, Y., Wang, W., Xue, J.-H., Liao, Q.: Deep learning for single image super-resolution: a brief review. IEEE Trans. Multimedia 21(12), 3106–3121 (2019) 5. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image superresolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13 6. Kim, J., Lee, J.K., Lee, K.M.: Deeply-recursive convolutional network for image superresolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1637−1645 (2016) 7. Wang, Z., Chen, J., Hoi, S.C.H.: Deep learning for image super-resolution: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3365−3387 (2020) 8. Zou, W.W., Yuen, P.C.: Very low-resolution face recognition problem. IEEE Trans. Image Process. 21, 327–340 (2012) 9. Huo, S., Zhou, Y., Lei, J., Ling, N., Hou, C.: Iterative feedback control-based salient object segmentation. IEEE Trans. Multimedia 20, 1350–1364 (2018) 10. Zhou, Y., Mao, A., Huo, S., Lei, J., Kung, S.: Salient object detection via fuzzy theory and object-level enhancement. IEEE Trans. Multimedia 21, 74–85 (2019)
Remote Sensing Image Super-Resolution Using Deep Convolutional
171
11. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136−144 (2017). https://doi.org/10.1109/CVPRW.2017.151 12. Zhang, H., Wang, P., Zhang, C., Jiang, Z.: A comparable study of CNN-based single imagesuper-resolution for space-based imaging sensors. Sensors 19(14), 3234 (2019). https:// doi.org/10.3390/s19143234 13. Zhu, X., Li, Z., Lou, J., Shen, Q.: Video super-resolution based on a spatio-temporal matching network. Pattern Recogn. 110(2), 107619 (2021). https://doi.org/10.1016/j.patcog.2020. 107619 14. Lu, J., Forsyth, D.: Sparse depth super resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2245–2253 (2015) 15. Li, Z., Cheng, C.: A CNN-based pan-sharpening method for integrating panchromatic and multispectral images using landsat 8. Remote Sensing. 11(22), 2606 (2019). https://doi.org/ 10.3390/rs11222606 16. Syaffeza, A.R., Khalil-Hani, M., Liew, S.S., et al.: Convolutional neural network for face recognition with pose and illumination variation. Int. J. Eng. Technol. 6(1), 44–57 (2014) 17. Huang, K.Q., Ren, W.Q., Tan, T.N.:A review on image object classification and detection. Chin. J. Comput. 37(6), 1225−1240 (2014) 18. Jmour, N. et al.: Convolutional neural networks for image classification. In: International Conference on Advanced Systems and Electric Technologies (IC_ASET), 14 June 2018 19. Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646−1654, June 2016 20. Dong, C., Loy, C.C., He, K., Tang, X.: Image super- resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295307 (2016) 21. Liu, Z.S., Siu, W-C., Chan, Y-L.: Photorealistic image super-resolution via variational autoencoders. IEEE Trans. Circ. Syst. Video Technol. 2, 1 (2020) 22. Liu, Z.S., Siu, W.C., Wang, L.W., Li, C.T., Cani, M.P.: Unsupervised real image superresolution via generative variational autoencoder. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 442−443 (2020) 23. Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 295–307 (2014) 24. Lugmayr, A., Danelljan, M., Timofte, R.: NTIRE 2021 learning the super-resolution space challenge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 596−612 (2021) 25. Wang, P., Bayram, B., Sertel, E.: A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth-Sci. Rev. 104110 (2022) 26. Gao, G., Zhu, D., Lu, H., Yu, Y., Chang, H., Yue, D.: Robust facial image super-resolution by kernel locality-constrained coupled-layer regression. ACM Trans. Internet Technol. (TOIT). 21(3), 1−5 (2021) 27. Chen, H., He, X., Wu, Y., Ren, C., Zhu, C.: Real-world single image super-resolution: a brief review. Inf. Fusion. 79, 124–145 (2022)
Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language: Particularities and Challenges Rkia Bani1(B) , Samir Amri2 , Lahbib Zenkouar1 , and Zouhair Guennoun1 1 Smart Communication Research Team (SCRT), Mohammadia School of Engineering (EMI),
Rabat, Morocco [email protected] 2 ENSAM School of Engineering, Moulay Ismail University, Meknes, Morocco [email protected]
Abstract. The objective of this paper is to present the particularities and the challenges of part of speech (hereby referred to as POS) tagging task. Indeed, POS tagging is a crucial step before every natural language processing application (syn tactic analysis, machine translation, autocorrection….) because the performance of each application depends, inter alia, on the performance of the used POS tagging. Thus, in order to realize an efficient POS tagger, we should care about improving the quality of these three phases: the phase of segmentation, the organization of the lexical units and the phase of disambiguation. Keywords: Morphosyntax · NLP · Corpus · POS tagging · Amazigh · Language
1 Introduction Amazigh language is among the low resourced language and the least used in the internet, hence the motivation of its computerization and development of its natural language processing (NLP). Besides, many researches has been done in the NLP and has concluded multiple approaches and algorithms which leads to applications and sophisticated systems. From a general point of view, to implement NLP, researchers need: – Base units of phrase and word segmentation and morphological, syntactic, and semantic analysis. – Linguistic resources (dictionaries and aggregates, lexical data, corpus….) – Linguistic or machine learning expertise. In term of this article, we will focus on the part of speech tagging which is a primordial step to realize the most NLP application, as it can determine the grammatical category of text’s words and the description of different base units in mainstream applications such as syntactic analysis, autoresumer and information retrieval…etc. It is also very useful in the processing of words in the performances optimization system and speech © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 172–182, 2023. https://doi.org/10.1007/978-3-031-43520-1_15
Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language
173
recognition. Generally, the POS tagging is a necessary and difficult step to do. Thus, we decided to highlight this problem, particularly, for the Amazigh language. The following of this article is structured like this: the Sect. 2 is dedicated to a stat of art about Amazigh language and its natural processing. The Sect. 3 is focused on part of speech tagging. The Sect. 4 is a discussion about prerequisite and points to improve to reach complete and robust POS tagging system especially efficient one for Amazigh language. Finally, we will conclude with many perspectives for future works in Amazigh NLP area.
2 Amazigh and NLP 2.1 An Overview of Amazigh Language Amazigh language is talked in form of different dialects and talks. The latter are spoken in a large territory that covers many nations: Egypt, Libya, Tunisia, Morocco, Algeria, Mali, Niger, Mauritania. However, Morocco and Algeria are the two nations where the big number of Imazighen is concentrated in the sense that being Amazigh is to speak one of Amazigh’s talks. Depending on the region, these dialects take different names. So, in Algeria, we find the Kabyle, Mozabite and Chaoui dialects. In Morocco there are three main languages: Tarifit in the north, Tachelhit in the south-west and Tamazight in the center. Despite multiple researches, the Amazigh is considered as a difficult language to master because of its morphological richness, Research in NLP has addressed various issues such as morphology, machine translation, documents indexation…. During this passage, we will present the particularity of Amazigh language as well as some of its morphological and syntactic properties. The creation of the Royal Institute of Amazigh Culture (IRCAM) in 2001 and the formalization of Amazighe language in 2011 provide its promotion and establish an official orthography (Ameur et al.2004), an appropriate coding in the Unicode standard (Andries 2008; Zenkouar 2008), linguistic structures (Ameur et al. 2004; Boukhriss et al. 2008). The Amazigh language has its own spelling, Tifinagh, a standard alphabetic system more adequate and can be used for all current Amazigh dialects. Thus in 2003, IRCAM developed an alphabet system under the name of Tifinagh-IRCAM. The alphabet standardized by IRCAM is based on a graphical system with a phonological tendency, this alphabet comprises: - 27 consonants including: labials (ⴼ, ⴱ, ⵎ), dentals (ⵜ, ⴷ, ⵟ, ⴹ, ⵏ,ⵔ, ⵕ, ⵍ), alveolar (ⵙ, ⵣ, ⵚ, ⵥ), palatal (ⵛ, ⵊ), velar (ⴽ, ⴳ), labiovelaries (ⴽ ⵯ, ⴳ ⵯ), uvulars (ⵇ, ⵅ, ⵖ), pharyngeal (ⵃ, ⵄ) and laryngeal (ⵀ). - 2 semi-consonants: ⵢ and ⵡ. - 4 vowels: three full vowels ⴰ, ⵉ, ⵓ and the neutral vowel ⴻ which has a special status in Amazigh phonology.
174
R. Bani et al.
Moreover, it is the transliteration into the Latin alphabet that is used in all the examples presented in this article. In the lexicon of Amazigh language, there are three main categories of words: Verbs, nouns, and particles (Boukhriss et al. 2008) which can be divided into different sub-categories: preposition, conjunction, pronoun, article, interjection, and adverb: – The noun is either masculine or feminine. It is plural or singular: the plural starts from two as in French. The name is either in the free state or in the annexation state. For example, for the male name: afus/ifassn (hand/hands), igr/igran (field/fields), for the feminine name: tuzzalt/tuzzalin (knife/knives), tasarut/tisura (key/keys). – The verb is generally constructed by affixation and composition. Some verbs are derivations by affixation (prefixes, suffixes), other verbs are not necessarily derived from nouns, they are composed either from a verb and a noun, or from two verbs, without forgetting of course the aspects of the conjugation which some times impact the morphology of the verb in a significant way. Example of the verb in Amazigh: sw (to drink), ddu (to go), rwl (to run). – Pronouns are isolated from the words to which they refer. Pronouns in the Amazigh language are either demonstrative, exclamatory, indefinite, interrogative, personal, possessive, or relative. – Adverbs are subdivided into adverbs of place, time, quantity, manner and interrogative adverbs. – The prepositions are a set of independent characters from the name they precede; however, if the preposition is followed by a personal pronoun, the preposition and the personal pronoun form a single string delimited by blanks or else a blank and a punctuation mark. – The particles are always isolated, they are of several types: • • • •
Aspectual particles such as “ar, ad”. Negation particle “ur”. Orientation particles “s”. Preaching particle “d”.
– The determinants always take the form of a single word delimited by two spaces, they are divided into articles, demonstrative, exclamatory, indefinite articles, interrogatives, ordinal digits, possessive, presentative and quantifier. – Moroccan Amazigh punctuation marks are similar to punctuation marks adopted by international languages, they have the same functions. 2.2 Amazigh NLP Natural Language Processing (NLP) is divided into generally in two parts: • Language processing: concerns systems capable of behaving as readers/listeners. • Language generation: concerns systems capable of behaving as editors/producers.
Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language
175
After this subdivision, we glimpse levels in the NLP: – Phonology level: interpretation of speech through words. – The morphological level: deals with the composition of words (prefix, suf fix, radical,…). – Lexical level: gives meaning to the word taken individually. – Syntactic level: discover the grammatical structure of the sentence. – Semantic level: deals with the meaning of words and sentences. – Conversation level: deals with the overall meaning of corpora. It does not consider a text as a concatenation of sentences, but as a together provided with meaning. – Pragmatic level: explains the implicit meanings of sentences and words. Regarding NLPAM, Amazigh language doesn’t have sufficient linguistic resources and NLP tools (Outahajala 2015). However, we will list some works already done for the NLPAM: • The Tifinagh alphabet is integrated by the Unicode standard, which allowed the development of tools adapted to the processing of this language (Rachidi and Mammass 2005). • Creation of keyboards and typefaces dedicated to writing Tifinagh (IRCAM 2003b; IRCAM 2004). • Transliteration of texts written in the Tifinagh alphabet to the Arabic or Latin alphabet (Ataa Allah 2013). • The construction of a large, annotated corpus for the Amazigh language (Outahajla 2014). • The Tifinagh character recognition project carried out in 2009 (Ait Ouguengay and Taalabi 2009). • The morphological analyzer for Amazighe names (Raiss and Cavalli-Sforza 2012). • The conjugator of the verbs of the Amazighe language (Ataa Allah and Boulaknadel 2014). • The pseudo-rooting (Ataa Allah and Boulaknadel 2010). • The concordancer (Ataa Allah 2009), allowing the search for a word in a set of texts to study its use. From above, Amazigh NLP field needs vision and strategy of everyone (researchers, linguists…) to succeed this great project and bring to the scientific community and to the great public of relevant and high value-added systems and projects.
3 POS Tagging Amazigh Language It is a process of detecting the part of speech tagging category of a word in a context; this action is not trivial for the automatic processing of the written language. Indeed, to make a computer capable of knowing the grammar category of a word requires the use of sophisticated methods, especially for ambiguous words. Automatic systems dedicated to this activity are Part-Of-Speech tagger. These consist in assigning specific part of speech tagging to each word of a sentence of a text (grammatical category, morphological information such as gen der, number,
176
R. Bani et al.
condition… etc.). Correct tagging example of the sentence (idda yidir s tmzgida) is as follows: idda. Verb yidir. Proper nouns. Preposition tmzgida. Noun. The main difficulty of part of speech tagging comes from the fact that the words of the language are ambiguous, that we can assign several tags to a given word from a sentence. A part of speech tagger must therefore perform a phase of disambiguation in order to select a sequence of possible tags for the sequence of words in the sentence, and if possible the correct sequence. In fact, part of speech tagging has been widely studied in the past, it is now considered a relatively solved problem for some languages like English and French. The performance of current taggers of these languages is very high (around 97.50% of words correctly tagged). To initiate this discipline, several approaches have been proposed to automatically annotate the words of a text (Fig. 1). The mechanism of part of speech tagging is generally based on the assumption that the category of a word depends on its local context, which can be reduced, for example, to a word or two that precede it. In what follows, we will present different part of speech tagging methods, and carry out a brief census of the taggers who exist, for Amazigh language. There are two main families of taggers: • Symbolic taggers are those that apply rules that have been reported by human experts. In this type of taggers, there is very little automation; it is the designer who handles all the tagging rules and provides a list if necessary morphemes. The conception is not automated: the tagger provides automatic tagging once its rules are developed. The conception of such a tagger is time consuming and expensive. Moreover, those taggers are not easily portable, i.e. they are only effective for a given language and a given field (example: finance, politics, etc.). • Taggers with machine learning which we will focus on in the rest of this study. Among the taggers of this type, there are two main types: supervised taggers that learn from pre-tagged corpora (Brill 1993; Khoja et al.2001; Diab et al. 2004) and unsupervised taggers, which learn from raw corpora without additional information. Whether supervised or not, taggers with learning can be grouped into three families: systems-based rules, statistics or neural.
POS tagger
Supervii ed
Rucei bai ed
Sta t tac
Uni upervii ed
Neural
Rucei bai ed
Sta t tac
Fig. 1. List of part-of-speech tagging methods with machine learning
Neural
Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language
177
4 POS Tagger At the beginning of this section, we will list some part of speech taggers that are available for scientific research (Table 1), and which have a great advantage of being independent of the language, we just need a corpus to implement them for learning and another for tests and a lexicon for a few (TreeTagger). Unknown words seem to be a problem for all taggers based on learning algorithms that produce models of language. However, some mentioned tagger can be modified to consider lexical knowledge and perform lemmatization in particular Brill and CRF++. Stanford and MXPOST can be extensible as well, but their code is rather complex, which makes probably the development of extensions difficult. As for Unsupos, the unsupervised learning approach remains a track if the annotated corpus is not available for the language to be studied. In terms of performance, discriminative probabilistic models such as maximum entropy models (Ratnaparkhi et al. 1994; Toutanova and Manning 2000), support vector machine (Gimenez and Marquez 2004) or conditional random fields (Kudo and Matsumoto 2000) provide good results in part of speech tagging. Table 1. Some available part of speech tagger for research with reference and machine learning techniques Tagger
Reference
Used technique
Tree tagger (supervised)
Schmidt (1994)
Hidden Markov model and decision tree
Trigram ‘n’ Tags or TnT (supervised)
Brants (2000)
Hidden Markov model
SVMtool (supervised)
Gimenez and Marquez (2004)
Support vector machine
CRF++ (supervised)
Lafferty and Pereira, (2001)
Conditional random fields
Yamcha (Supervised)
Takukudo Yuji (2000)
Support vector machine
MXPOST (Supervised)
Ratnaparkhi (1994)
Maximum Entropy
Stanford POS tagger (Supervised)
Toutanova and Manning (2000)
Maximum entropy
Unsupos (Unsupervised)
Chris Bieman (2007)
Viterbi
Brill
1993
Lexical and contextual rules
4.1 Corpus and Tagset A corpus is a collection of various materials gathered according to a set of criteria so that it is representative and balanced. The use of corpus is a critical of NLP systems based on statistical methods (Habash and Rambow 2005).
178
R. Bani et al.
The most popular English corpus are the Brown Corpus (Kurcera and Francis 1967) which contains about a million words and the Penn Treebank which is a corpus marketed by the Data Consortium Linguistics (LDC). For the Arabic language, the first annotated corpus produced is that of Khoja and its co-authors, this corpus contains 50,000 annotated words (Khoja et al. 2001). Other corpora are used such as the Penn Arabic Treebank (Maamouri 2004) and the Prague Arabic Dependency Treebank (Smrž and Hajiˇc 2006). For languages with few electronic resources and few computerized like Amazigh language, the main motivation to have an annotated corpus is to obtain training data for the part of speech taggers and to provide for applications of NLPAM a basic tool. Despite the various research carried out on the natural processing of Amazigh language, it is difficult to find linguistic resources ready-made, we can cite the manually annotated corpus (Amri et al. 2017). This corpus contains 60k words using a Tagset described in Table 2, this is an important step for a lexical tagging work which must be based on the word classes of the language and must reflect all the part of speech relationships of the words of Amazigh corpus: Table 2. Amazigh Tagset N°
TAG
Designation
N°
TAG
Designation
1
NN
2
NNK
Common noun
15
PROR
Particle, orientation
Kinship noun
16
PRPR
Particle, preverbal
3
NNP
Proper noun
17
PROT
Particle, other
4
VB
Verb, base form
5
VBP
Verb, participle
18
PDEM
Demonstrative pro-noun
19
PP
Personal pronoun
6
ADJ
Adjective
7
ADV
Adverb
20
PPOS
Possessive pronoun
21
INT
Interrogative
8
C
Conjunction
22
REL
Relative
9
DT
Determiner
23
S
Preposition
10
FOC
Focalizer
24
FW
Foreign word
11
IN
interjection
25
NUM
Numeral
12
NEG
Particle, negative
26
DATE
Date
13
VOC
Vocative
27
ROT
Residual, other
14
PRED
Particle, predicate
28
PUNC
Punctuation
In the case of the Amazigh language, the question of the classification of grammatical categories is a difficult task and still under debate within from IRCAM. The tagset must represent the wealth of lexical information, as well as the information needed for disambiguation.
Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language
179
4.2 The Automatic POS Tagging Part of speech tagging is a process, which is generally carried out in 3 steps: • Segmentation of the text into lexical units. • Tagging consists of assigning for each lexical unit the set of possible part of speech tags.. • Disambiguation, which makes it possible to attribute, for each of the lexical units according to its context, the likely tag. 4.2.1 Segmentation The part of speech tagging for Amazigh is still a topic of interest to many researchers because of its role as a basic building block in many NLP applications. Although many systems were carried out using different methods, tracks for improvement are still very open. Before starting part of speech tagging, the input text must first be preprocessed: the text must be tokenized, i.e. segmented at the lexical level. Segmentation is a necessary process in the morphological processing of the language. The purpose of segmentation is to divide a text in a sequence of morphemes in order to prepare POS tagging. 4.2.2 POS Tagging of Amazigh Language Training is a necessary operation for a pattern recognition system (the pos tagging system), it makes it possible to estimate the parameters of the mod el. Incorrect or insufficient training decreases the performance of the labeling system. To prepare the learning corpus, we proceed by successive approximations. A first training corpus, relatively short, allows to label a much more important. This is corrected, which makes it possible to re-estimate the probabilities, it is therefore used for a second training, and so on. In general, there are three methods of estimating these parameters: – Maximum likelihood estimation, it is carried out by the Baum-Welch algorithm (Baum 1972) or the Viterbi algorithm (Celux and Clairambault 1992) – The posterior maximum estimate (A. Rice 2006). – The estimate by mutual maximum information (Kapadia et al. 1993). In our case, we used the maximum likelihood estimate because it is the most used and the easiest to calculate. So, if we take a training set R = {Ph1,…, PhK}, consisting of the sentences Ph1,…,PhK labeled manually, the formulas for estimating the parameters of the model λ = (, A, B) are given by: K the number of times the transition eti et etj exist in sentense Phn aij = n=1 K n=1 the number of times state eti is reached along the sentense Phn K δ[the tag eti is an initial state in the sentense Phn ] = n=1 i K K the number of times the word wt for the tag eti along of the sentense Phn bij = n=1K n=1 the number of times state eti is reached along the sentence Phn
180
R. Bani et al.
and: δ [x] = { 1 if the event x is true, 0 otherwise} In addition of that, the Fig. 2 and Fig. 3 illustrate these two indispensable steps for the conception and the development of a part of speech tagger.
Fig. 2. Training section of part of speech tagging
Fig. 3. Tagging section of part of speech tagging
4.2.3 Disambiguation Two major problems prevent POS taggers from achieving 100% accuracy: word ambiguity and unknown words. For example, the Amazigh word “tazla” can be a noun and a verb; it depends on the context of use. Tagging systems implement algorithms to adjust the question, these algorithms are not always efficient. Sometimes semantic knowledge is essential to resolve ambiguities. In POS tagging, however, the form of words is important and not semantics, which is a separate field. Unknown words (or words outside the vocabulary) are those that cannot be found in corpus and which are supposed to be found by the system. To be robust in face of these problems, most taggers use statistical information. According to (Manning and Schütze 1999), there are two possible sources of information for tagging: See the categories surrounding words. See the probability of occurrence of lexical category. We can calculate the probabilities of the tags which correspond to the current word, considering two (02) (bigrams) or three (03) (trigrams) categories and/or word values located before and/or after. Trigrams are more efficient because they take more account of the context.
5 Conclusion POS tagging is the first building block of most NLP applications, the accuracy of any NLP application depends on the accuracy of the tagger.
Part of Speech Tagging of Amazigh Language as a Very Low-Resourced Language
181
Moreover, different approaches can be used by researchers for the development of Amazigh language taggers. In this article, it was found that the Amazigh language is a language morphologically rich, hence the need for the development of a morphological analyzer while exploiting machine learning techniques for the construction of the taggers which have similar precision of European language taggers (English, French, German, …). In addition, limited work has been done on Amazigh language for the POS tagging, therefore different approaches can be used for the development of robust and efficient Amazigh POS tagger.
References Ameur, M., Bouhjar, A., Boukhris, F., Boukouss, A., Boumalk, A.: Initiation à la langue Amazighe. Publications de l’IRCAM (2004) Andries, P.: Demain, encore plus de tifinaghes sur Internet. Publication de l’IRCAM (2008) Zenkouar, L.: Normes des technologies de l’information pour l’ancrage de l’écriture amazighe. Etude et documents berbères, pp. 159–172 (2008) Boukhris, F., Boumalk, A., El Moujahid, E., Souifi: “La nouvelle grammaire de l’Amazighe”. Publications de l’IRCAM (2008) Outahajala, M.: Apprentissage supervisé d’un étiqueteur morphosyn-taxique automatique de la langue Amazighe. Phd thesis. Ecole Mohammedia d’Ingénieurs, Université Mohamed V-Rabat (2015) Rachidi, A., Mammass, D.: Vers un système d’écriture informatique Amazighe: méthodes et développements. RECITAL (2005) Ataa Allah, F., Frain, J., Ait Ouguengay, Y.: Amazigh language desktop converter. In: Proceedings of the SITACAM (2013) Outahajala, M., Zenkouar, L., Rosso, P.: Construction d’un grand corpus annoté pour la langue Amazighe. In: La revue Etudes et Documents Berbères n°33, pp. 57–74 (2014) Ait Ouguengay, Y., Taalabi, M.: Elaboration d’un réseau de neurones arti ficiels pour la reconnaissance optique de la graphie amazighe: Phase d’apprentissage. In: Proceedings of the Conference of «Systèmes Intelligents: Théories et Applications» (2009) Cavalli-Sforza, H.R.V.: ANMorph: amazigh nouns morphological analyzer. In: Proceedings of the 5th International Conference on Amazigh and ICT (2012) Allah, F.A., Boulaknadel, S.: Amazigh verb conjugator. In: Proceedings of the Language Resources and Evaluation Conference (2014) Ataa Allah, F., Boulaknadel, S.: Pseudo-racinisation de la langue Amazighe. In: Proceedings of TALN 2010, Montréal, pp. 19–23 (2010) Brill, E.: Tagging an unfamiliar text with minimal human supervision. In: Proceedings of the Fall Symposium on Probabilistic Approach to Natural Language (1993) Khoja, S., Garside, R., Knowles, G.: A tagset for the morphosyntactic tagging of arabic. In: Proceedings of Corpus Linguistics, Lancaster, UK, pp. 341–353 (2001) Diab,M., Hacioglu, K., Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. In: HLT-NAACL, pp. 149–152 (2004). Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester, pp. 44–49 (1994) Brants, T.: TNT-a statistical part of speech tagger. In: Proceedings of the ANLP, Seattle (2000)
182
R. Bani et al.
Giménez, J., Màrquez, L.: SVMTool: a general POS tagger generator based on support vector machines. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, May 2004, pp. 43–46 (2004) Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML-01, pp. 282–289 (2001) Kudo, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: Proceedings of CoNLL-2000 and LLL-2000 (2000) Ratnaparkhi, A., Reynar, J., Roukos, S.: A maximum entropy model for prepositional phrase attachment. In: Proceedings of the Human Language Technology Workshop (ARP, 1994), pp. 250–255 (1994) Toutanova, K., Manning, C.: Enriching the knowledge sources used in a maximum entropy partof-speech tagger. In: Proceedings of the Joint SIGDAT Conference EMNLP/VLC, pp. 63–71 (2000) Biemann, C., Giuliano, C., Gliozzo, A.: Unsupervised part-of-speech tagging supporting supervised methods. In: Proceedings of the Recent Advances in Natural Language Processing (2007) Habash, N., Rambow. Part-of-speech tagging and morphological disambiguation in one fellswoop. In: Proceedings of the American Association of Computational Linguistic Conference (ACL) Short Papers, Michigan, USA (2005) Kurcera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Brown University Press, Providence (1967) Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a largescale annotated Arabic corpus. In: Proceedings of NEMLAR Conference on Arabic Language Resources and Tools (2004) Smrž, O., Hajiˇc, J.: The other Arabic treebank: Prague dependenciesand functions. In: Arabic Computational Linguistics: Current Implementations. CSLI Publications (2006) Amri, S., Zenkouar, L., Outahajala, M.: Build a morphosyntaxically annotated amazigh corpus. In: Proceedings of the 2nd International Conference on Big Data, Cloud and Applications (2017) Baum, L.: An inequality and association maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequality 3 (1972) Celux, G., Clairambault, J.: Estimation de chaines de Markov cachées: méthodes et problèmes. In: CNRS, Thematic Days on Markovian Approches on Images and Signal (1992) Rice, J.: Mathematical Statistics and Data Analysis, pp. 511–540 (2006) Kapadia, S., Valtchev, V., Young, S.J.: MMI training for continuous phoneme recognition on the TIMIT database. In: Proceedings of the ICASSP, Minneapolis, vol. II, pp. 491–494 (1993) Manning, C., Schutze, H.: Foundation of statistical natural language processing (1999)
Learning Sparse Fully Connected Layers in Convolutional Neural Networks Mohamed Quasdane1(B) , Hassan Ramchoun1,2 , and Tawfik Masrour1,3 1
2
Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), Moulay Ismail University of Meknes, ENSAM Meknes, Morocco [email protected] National School of Business and Management, Moulay Ismail University of Meknes, Meknes, Morocco 3 University of Quebec at Rimouski, Rimouski, Canada
Abstract. Convolutional neural networks (CNNs) are very powerful learning method in the deep learning framework. However, they contain a very large number of parameters, which restricts their utilization in platform-limited devices. Searching for an appropriate and simple convolutional neural network architecture with an optimal number of parameters is still a challenging problem. In fact, in many well-known convolutional neural networks like LeNet, AlexNet, and VGGNet, the percentage of weight in their fully connected layers exceeded 86% of the total weights in the network. Based on this remark, we propose a sparse regularization term based on smoothing the L0 and L2 regularizers to minimize the unnecessary neurons in the fully connected layers. This ensures neuron sparsity and reduces effectively the complexity of convolutional neural networks as shown in many experiments.
Keywords: Convolutional Neural Networks Smoothing L0
1
· Sparsity · Overfitting ·
Introduction
Recently, Convolutional Neural Networks (CNNs) have achieved great success in various fields [1–3]. These successes are due to the ability of these kinds of networks to extract the rules from available training datasets. Generally speaking, data availability is the key behind the increased use of deep neural networks, as it is the heart of artificial intelligence. Different works have shown that deep neural networks are the most efficient than shallow ones [6,7]. That is why various architectures of convolutional neural networks have been developed, including AlexNet [8], VGGNet [9], ResNet [10], and DenseNet [11]. However, because of the enormous number of parameters, these models are susceptible to the challenge of heavy memory and computation costs which make them inapplicable to resource-limited platforms [4]. Moreover, deep models are more easily prone to overfitting, especially when the amount of training dataset is insufficient [5]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 183–200, 2023. https://doi.org/10.1007/978-3-031-43520-1_16
184
M. Quasdane et al.
Overparameterization and overfitting are both affected by the amount of training data and the complexity of the networks. Finding a trade-off between theses two terms lead to a good generalization and simplicity of the model. If we have enough training data, it is natural to search for an appropriate and simple structure of convolutional neural network that approximate an ideal solution. However, building an adequate and compact one to solve such problem like classification is a challenging task. Searching for the best model or determining the best number of layers and number of filters or neurons in each layer have being studied in the literature. Generally, there are two classes of methods to find the best network: constructive approaches and destructive approaches. Constructive approaches consist of incrementally extending a network by adding additional neurons, which were successfully used in [9], where the final deep network was formed by adding new layers to the original shallower architecture. However, the constructive approaches start with shallow networks which are unable to learn effectively the non-linearity rules, leading to poor initialization for the constructive steps. On the other hand, destructive approaches start with a deep network and try to reduce it while maintaining or even improving its performance. The fundamental idea behind destructive methods is to examine the impact of individual parameters or neurons and eliminate those that have a negligible effect on the network’s output. Furthermore, convolutional neural networks are known to have many redundant and unnecessary parameters, and thus more compact architectures could perform just as well as very deep ones. Promoting sparsity in deep neural networks is one of the most used ideas in the context of destructive methods. We can classify all the methods that Promote sparsity into three types: pruning, dropout, and sparse regularization based on penalty term. Pruning methods consist of removing parameters, filters, or neurons that have a slight effect on the network’s accuracy based on some criteria such as the hessian loss function in [12,13]. Dropout reduces the complexity of networks during training by randomly removing units from deep neural networks using the Bernoulli distribution. Many works inspired their idea from dropout, including dropConnect [14], dropblock [15], shakedrop [16], dropfilter [17]. These methods can prevent overfitting. However, they can only reduce the size of networks during training but, finally the original size remains constant. Sparse regularization methods promote sparsity in networks by introducing and adding penalty term to the error function. This kind of approaches regularize the model based on the fact that forcing weights or parameters to be small can prevent overfitting and improve generalization. In other words, small weights make the model more stable, that is why a small change in the input leads to small change in the network’s output, resulting in generalization capability. Moreover, this kind of method force some weights or parameters to be very close to zero, which indicates their unnecessary and their redundancy. In some well-known convolutional neural network architectures, we found that the number of weights in fully connected layers is very large than those in convolutional layers. As an example, for both LeNet-5 and AlexNet, the number
Learning Sparse Fully Connected Layers in CNNs
185
of weights in fully connected layers exceeded 95% of the total weights in the networks. Which means that reducing the number of weights and neurons in fully connected layers can greatly affect the complexity of the model. Based on this remark and the merits of sparse regularization methods, we would propose a sparse regularization term based on smooth L0 and L2 regularizers to enforce the unnecessary neurons in the fully connected layers to be close to zero. Which ensures sparsity in both individual weight and neuron levels. The rest of paper is organized as follows. Section 2 presents the structure of convolutional neural networks. Section 3 surveys existing sparse regularization based on penalty term methods. Section 4 introduces the proposed regularization term which is based on smooth L0 and L2 regularizers. Before concluding, we discuss the results of different numerical simulations in Sect. 5
2
The Structure of Convolutional Neural Networks
Convolutional neural networks have shown significant performances in several computer vision and machine learning tasks, which increases the quantity of research on convolutional networks. In this section, we describe in more detail the standard architecture of CNNs. Many convolutional neural network variants have been proposed in the literature. These variants are different in terms of depth and structure. However, their basic components are very similar. LeNet-5 is the famous and traditional convolutional neural network. It consists of three types of layers: convolutional layer, pooling layer, and fully connected layers. In addition to these three principal layers, GoogLeNet used the inception module, which helps to reduce the number of parameters in the network. To overcome the vanishing gradient problem caused by the increasing of network’s depth, ResNet architecture conserved the three basic components and introduced residual connections. The convolutional layer is an important component of CNNs, which consists of many filters to compute different feature maps. The convolution operation is performed between the input and all the filters of the current layer. Various types of convolution have been proposed to decrease the number of parameters in the convolutional layers, including tiled convolution [42] and dilated convolution [43]. The output of the convolution operation is fed to the nonlinear activation function to compute the activation feature map. The activation function is a fundamental operation behind learning non-linearity rules in deep learning. Many nonlinear activation functions have been used to improve the performance of networks such as sigmoid, ReLU, and Leaky ReLU.
186
M. Quasdane et al.
The pooling layer uses a pooling operator, which is another important concept of CNNs to reduce the dimensionality of feature maps produced by convolutional layers. The pooling layer helps make the representations more invariant to small translations of the input. Various pooling methods have been proposed: Lp pooling [44] used the Lp norm of all the components within the pooling region. Lp pooling generalized max pooling and average pooling. Mixed pooling [45] used a random value to combine max pooling and average pooling. Stochastic pooling [46] randomly chooses the activations based on a multinomial distribution. Spectral pooling [47] reduces the dimensionality of feature maps using their representation in the frequency domain. The important and optimal features of the input are extracted using many convolutional and pooling layers. These features are considered as the input vector of the fully connected layers. We can formulate the convolutional neural network as follows: fθ,w (x) = Φw ◦ Ψθ (x)
(1)
where – Ψθ represents all the compositions of convolution and pooling layers that aim to extract the important features from the input x of CNN. And θ contain all the filters of all convolutional layers. – Φw represents the compositions fully connected layers of CNN, where w contain all their trainable weights We are interested in solving a given classification problem, where we search to build an appropriate and simple convolutional neural network that approximate the ideal solution f : X −→ Y. In fact, this ideal solution f is unknown. Our main goal is to find a feasible solution fθ,w with an optimal number of neurons in the fully connected layers using only the training data set D = {(xi , yi )N i=1 } ⊂ X×Y. Training the network without any regularization term is equivalently minimizing the following empirical loss on the training data set D ED (w, θ) =
N 1 L(fθ,w (xi ), yi ) N i=1
(2)
where – θ contain all trainable parameters of all convolutional layers. – w contain all trainable parameters of fully connected layers. – L is the loss function that computes the gap between the prediction of the network fθ,w (xi ) and the true target yi Generally, the network that we will obtain by minimizing the empirical loss function in (2) suffers from overfitting and overparameterization. That is to say, the trained network works very well on the training data set but, it is unable to generalize the learned rules to the novel and unseen data. In addition, using this empirical loss function can not lead to an optimal structure with a minimal
Learning Sparse Fully Connected Layers in CNNs
187
number of parameters. Regularization based on penalty term methods are effective to overcome these issues. the main idea behind this kind of regularization approaches is to add penalty terms to the empirical loss function in order to control the magnitude of weights in the training process, which make the model more stable. Therefore the regularized error function is given by: E(w, θ) = ED (w, θ) + λΩ(w, θ)
(3)
where: – Ω is the regularization term. – λ > 0 is a regularization parameter that is used to find a right tradeoff between the training accuracy and the network complexity.
3
Related Works
In this section, we introduce various approaches in the context of regularization based on penalty term methods, which are our main course of interest in this paper. We can classify the existing methods into two categories: regularization techniques that act on individual weight level, and group regularization approaches that act on a group of weights level. L2 regularization [18] is considered one of the first and most well-known regularization methods to avoid overfitting and improve generalization ability of deep neural networks. The secret behind its success, is the reduction of weights magnitude, which leads to stability and smoothness of the networks. However, L2 regularization is not effective for obtaining sparse networks. Lasso or L1 regularization [19] is the most common classical approach for learning sparse models. It was first introduced for shrinkage and selection for regression problem. This method has a notable disadvantage in that, while it produces a large number of weights that are close to zero, it nearly never produces weights that are exactly zero. Zou and Hastie [20] proposed an elastic net which combined both L1 and L2 regularizers. L1/2 regularization [21] was proposed To obtain more sparse solutions. L1/2 outperforms L1 regularization in terms of sparsity. L1/2 regularizer can be taken as a representative of all Lp regularizers when p ∈]0, 1[ at sparsity level. When p ∈]0, 1/2[, Lp regularizers produced similar results to L1/2 regularization term and when p ∈ [1/2, 1[, L1/2 regularizer enjoys better sparsity then all Lp regularizers. The convergence of online and batch gradient methods with smoothing L1/2 regularizer for feedforward neural networks has been proven in [22,23]. In [24], modified L1/2 regularizer was used as a pruning method for convolutional neural networks. L0 regularization [25] aims to force the unimportant weights to be exactly zero, which makes it an ideal regularizer to obtain sparse networks. However, the L0 semi norm which is defined as the number of non-zero components of a given vector leads to an NP-hard optimization problem. This is why, it is difficult to use directly this regularizer. H.Zhang, Y.Tang, and X.Liu [26,28] established the convergence of online and batch gradient methods with smoothing L0 regularizer for feedforward neural networks under mild conditions.
188
M. Quasdane et al.
All these aforementioned methods act and regularize the networks based on individual weight level, which yield unstructured sparsity. Instead of forcing weights to be close to zero, we can use the same idea to impose groups of weights like neurons or filters in CNNs to be near to 0. Group lasso [29] was the first regularization method that used the idea of grouping to neglect a group of parameters. This method consist of dividing the set of all weights W into groups g ∈ G where G is the set of groups. The group lasso regularization term is defined as: w2 (4) ΩGL (W ) = w∈g
g∈G
Group lasso was used for pruning feedforward neural networks in [30] as to automatically determine the number of neurons in each layer of the network in [4], and for feature selection in [31]. The convergence analyses of group lasso regularization on feedforward neural networks are proven in [32]. Group lasso can produce sparsity at group level but not within the surviving groups. Sparse group lasso was proposed in [5,33] to guarantee the sparsity at both group and the individual levels. The sparse group lasso approach combines group lasso and lasso. Exclusive sparse regularization was proposed in [34] to introduce competition within each group. The exclusive sparse regularization term is defined as: 2 |w| (5) ΩES (W ) = g∈G
w∈g
Yoon and Hwang et al. [35] proposed combined group and exclusive sparse regularization approach for pruning convolutional neural networks. Group L1/2 regularization method makes the networks more sparse by forcing the unnecessary groups and unnecessary weights in the surviving groups to be close to zero. The group L1/2 regularizer was defined as: ΩGL1/2 (W ) =
g∈G
|w|
(6)
w∈g
Group L1/2 regularization method was used for pruning input and hidden layer nodes of feedforward neural networks in [36,37]. Based on group lasso and L0 regularizers, Bui.k et al. [39] proposed nonconvex sparse group regularization. Recently, theoretical results of Batch gradient training method with smoothing group L0 regularization for feedfoward neural networks are established in [39]. Ramchoun and Ettaouil in [40] demonstrated the convergence of batch gradient algorithm with smoothing the composition of L0 and L1/2 regularizers for feedforward neural networks.
Learning Sparse Fully Connected Layers in CNNs
4
189
Algorithm Description
Many well-known architectures of convolutional neural networks like LeNet, AlexNet, and VGGNet, contain more trainable weight in their fully connected layers as shown in Table 1. This indicates that the complexity of convolutional neural networks is dominated by The fully connected part. Table 1. Percentage of weights in both convolutional layers(CLs) and fully connected layers (FCLs) in various CNNs architectures. Architectures
LeNet-5 AlexNet
VGGNet-19 GoogLeNet
Percentage of weights in CLs 4,15% 4,04% 13,94 % 44,04% 86,06% 55,96% Percentage of weights in FCLs 95,85% 95,96% 61.470 61.090.496 143.652.544 12.984.768 Total number of weights
Based on this observation, we propose a sparse regularization term to force the unimportant and unnecessary neurons in fully connected layers to be close to zero. Our main idea is to use a composition of L0 and L2 regularizers to force all the incoming and outcoming connections of unimportant neurons to be zero. The structure of convolutional neural network is mathematically formulated in (1) as fθ,w (x) = Φw ◦ Ψθ (x) where, w represent the vector of all weights in the L fully connected layers Φw of CNN. Each fully connected layer l ∈ {1, . . . , L} consists − + of nl neurons. Let wl ,n ∈ Rnl−1 and wl ,n ∈ Rnl+1 represent respectively the incoming and outcoming weight vectors of neuron n = 1, . . . , nl in the hidden − + l,n = wl ,n , wl ,n ∈ layer l ∈ {2, . . . , L − 1}. Therefore, we can write w Rnl−1 ×nl+1 , where l = 2, . . . , L − 1 and n = 1, . . . , nl . The L0 regularization term has been considered one of the most ideal ways to obtain more sparse solutions. For w ∈ Rd , the L0 regularizer counts the number of nonzero components of w as follows: ΩL0 (w) =
d
I{wi =0}
(7)
i=1
Our main idea is to use the L0 regularizer to penalize the number of nonzero neurons in each hidden fully connected layer of CNN. Each neuron n in lth hidden layer has been defined by the vector of incoming and outcoming weights wl,n . Therefore, forcing all the components of wl,n to be close to zero is equivalently driving the associated neuron to zero. As shown in the Fig. 1, our proposed method applies the L2 regularizer to the incoming and outcoming weights vector wl,n of each neuron n. Then, the L0 regularizer is applied to the resulting vector to distinguish the important neurons from the unimportant ones in each layer l ∈ {2, . . . , L − 1}. Therefore, we can formulate our proposed regularization term
190
M. Quasdane et al.
as follows: ΩL0 −L2 (w) =
L−1 l=2
l √
nl I wl,n = 0 2
n
(8)
n=1
Our proposed regularization term in (8) is discontinuous and nondifferentiable due to the discontinuity of the L0 regularizer. On the other hand, during training, the L0 regularizer tries to force wl,n 2 to be zero, which means that all the components of wl,n will be zero. That makes the wl,n 2 non-differentiable after a certain number of epochs. These constraints will cause difficulties in the training process when we use the gradient descent-based backpropagation method as a learning algorithm. To overcome these issues we propose to approximate the L0 regularizer by smooth one. To this end, we use the following smoothing function: h (x) =
x2 (9) + 2 1 if x = 0 = . Therefore, we can 0 else
x2
As shown in Fig. 2, lim h (x) = I{x=0} →0
approximate the L0 regularizer in (7) by: H (w) =
d
h (wi )
(10)
i=1
where, lim H (w) = ΩL0 (w) →0
To prevent the non differentiability of w2 if all the components of w ∈ Rd are zero, we use the following simple approximation:
d
√ Gη (w) = wi2 + η − η (11) i=1
where, lim Gη (w) = w2 η→0+
Using the smooth approximations of both ΩL0 and L2 regularizers, the smooth version of our proposed term in (8) will be written as: ΩSL0 −L2 (w) =
L−1
√
nl
nl
h Gη (wl,n )
(12)
n=1
l=2
Therefore, the proposed regularized error function was defined as: L−1 nl √ l,n nl h Gη (w ) E(w, θ) = ED (w, θ) + λ l=2
n=1
(13)
Learning Sparse Fully Connected Layers in CNNs
191
Fig. 1. Illustration of the proposed regularization method on standard convolutional neural networks
192
M. Quasdane et al.
Fig. 2. Smoothing functions h approximate the Indicator function Ix=0 for different values of ∈ {0.05, 0.5, 1}
5
Numerical Experiments
To evaluate the effectiveness of our proposed approach, we have performed various experiment using The MNIST dataset which is one of the classical and most often used datasets for image classification. The MNIST database consists of 70,000 grayscale images of handwritten digits. 60,000 images are preserved for training and 10,000 for testing. All these images are divided uniformly and equally into 10 classes, which means 6,000 training images and 1,000 testing instances per class. We use this data set to train from scratch the LeNet-5 architecture. We chose pytorch as our based deep learning framework. The weights of the network are initialized using kaiming uniform method [41], and all the bias are initialized by 0. The network is trained using Adam optimizer with an initial learning rate of 0.001, which decayed by 0.9 after every 20 epochs. We set batch size to 128 and the hyperparameter η in (12) to 10−10 . We trained the network for 700 epochs, and We repeat this learning process five times to calculate the means of test accuracy and neuron sparsity results. Under all these setups, we have compared our proposed approach to both the lasso (L1 ) and group lasso (GL) regularization methods. The comparisons were based on test accuracy and neuron sparsity N Sτ which has been defined as follows: nl L−1
N Sτ =
l=2 n=1
I wl,n < τ 2 L−1 l=2
nl
(14)
Learning Sparse Fully Connected Layers in CNNs
193
N Sτ represents the percentage of neurons whose incoming and outcoming weights vector is below a threshold τ in terms of L2 norm. In our experiments, we chose three threshold values τ ∈ {10−3 , 10−5 , 10−8 } to evaluate the sparsification strength of each approach. The regularization parameter λ is an important factor that affects the neuron sparsity and the accuracy of each approach. This is why we compared our proposed approach with the aforementioned methods under the same regularization parameters λ. Our proposed regularization term in (12) depends on smoothing L0 parameter . In fact, this hyperparameter influences the sparsification ability of the proposed approach. To analyse this influence, we chose three values of λ ∈ {10−6 , 2.5 × 10−5 , 10−4 }. And for each λ we trained the network using the proposed term with different values of in the interval from 0.05 to 1.25. For each pair (λ, ), we calculate the achieved mean test accuracy and neuron sparsity results. Figure 3, shows that, for λ ∈ {2.5 × 10−5 , 10−4 }, neuron sparsity N Sτ increase on increasing the value of and reach a stable state if is close to 1. However, for λ = 10−6 , the sparsification capability of the proposed term is weak. Furthermore, if is close to 1 and λ is not very small, then the N Sτ for τ ∈ {10−3 , 10−5 , 10−8 } are approximately equal.
Fig. 3. Neuron sparsity evolution N Sτ of the proposed approach For each pair (λ, )
194
M. Quasdane et al.
Fig. 4. Test accuracy evolution of the proposed approach For each pair (λ, )
From Fig. 4, we observe the evolution of test accuracy for different values of λ and . Where, we lost a little accuracy when is close to 0.5. However, we can achieve approximately the best test accuracy when in the neighborhood of 1. This is why we set the smoothing parameter to 1 in all the rest of the experiments. To observe the effect of regularization parameter on each method, we set experimentally different values of λ in the range from 10−7 to 5 × 10−4 . Figure 5 shows the evolution of neuron sparsity N Sτ as a function of λ. If the threshold τ = 10−3 , we found that for all λ, our proposed approach achieved roughly the same neuron sparsity results as the group lasso, and best than those of the lasso. Furthermore, When the threshold τ = 10−5 , our proposed approach outperforms both lasso and group lasso methods. For group lasso, we observe that N S10−5 decreased with the increase of λ. The lasso approach acts on individual weight level, this is why it failed to force the neurons to be small than τ = 10−5 . If the threshold τ = 10−3 , our proposed method achieved best neuron sparsity results. but for group lasso, we found that N Sτ = 0 for all λ. That is to say, the proposed approach is efficient to force the unnecessary neurons to be very close to zero. Form Fig. 1, we can see that our proposed term yields roughly the same neuron sparsity for different threshold values τ ∈ {10−3 , 10−5 , 10−8 } especially when λ is not very small. Which indicates the comportment and the strength of using L0 regularizer at neuron level. Moreover, If the threshold τ = 10−8 , our proposed method achieved the best neuron sparsity results, but for group lasso, we found that N S10−8 = 0 for all λ. That is to say, as opposed to lasso and group lasso methods, the proposed approach efficiently forces the unnecessary neurons to be very close to zero. Form Fig. 6, we can see that our proposed term yields roughly the same neuron sparsity for different threshold values τ ∈ {10−3 , 10−5 , 10−8 } especially when λ is not very small. This means that our proposed approach forces neurons to be either very close to zero or far from zero, which is consistent with the L0 regularizer behavior. In terms of test accuracy, Fig. 7 shows that,
Learning Sparse Fully Connected Layers in CNNs
195
Fig. 5. Evolution of Neuron Sparsity N Sτ produced by the compared methods for different λ, where τ ∈ {10−3 , 10−5 , 10−8 }
Fig. 6. Evolution of Neuron Sparsity N Sτ produced by the proposed method for different λ, where τ ∈ {10−3 , 10−5 , 10−8 }
196
M. Quasdane et al.
Fig. 7. Evolution of test accuracy of all the compared methods for different λ
for all λ, the group lasso achieved the best results. Where the proposed method lost a little accuracy. And this loss increase with the increase of lambda. Without loss of generality, we chose five values of regularization parameter λ ∈ {10−7 , 10−6 , 10−5 , 10−4 , 10−3 } and we observe the evolution of neuron sparsity produced by the compared methods in each learning process of 700 epochs. As shown in Fig. 8, for different λ, the proposed method converge faster in terms of N Sτ . In contrast, there is no evolution of both N S10−8 for group lasso and even N S10−5 for lasso regularisation method. In fact, the regularization parameter influences the stability of the neuron sparsity results. We can see more oscillation when λ ∈ {10−4 , 10−3 }, which indicates the instability of results as opposed when λ is small.
Learning Sparse Fully Connected Layers in CNNs
197
Fig. 8. Evolution of Neuron Sparsity N Sτ produced by the compared methods in the learning process for λ ∈ {10−7 , 10−6 , 10−5 , 10−4 , 10−3 }, where τ ∈ {10−3 , 10−5 , 10−8 }
6
Conclusion and Future Work
In this paper, we have observed that a large percentage of weights in many wellknown convolutional neural network architectures are concentrated in their fully connected layers. This encouraged us to propose a more sparse regularizer that forces the unimportant neurons in the fully connected layers to be very close to zero. Our main idea is based on the use of the composition of both L0 and L2 regularizers. However, the L0 semi norm is discontinuous and non-differentiable,
198
M. Quasdane et al.
which causes problems in the learning process. To overcome this drawback, we used the smoothing indicator function for the L0 regularization term. We have evaluated the effectiveness of our proposed approach and we performed different experiments that demonstrate our superiority against the lasso and group lasso methods in terms of neuron sparsity. Finally, we are aiming to extend our idea to the concolutional layers and demonstrate theoretically its convergence in convolutional neural networks.
References 1. Ian, G., Bengio, Y.: Deep Learning. MIT Press, Aaron Courville (2016) 2. Yann, L.C., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 3. Weibo, L., et al.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017) 4. Alvarez Jose M., Mathieu Salzmann: Learning the number of neurons in deep networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016) 5. Simone, S., et al.: Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017) 6. Neyshabur Behnam, et al.: Towards understanding the role of over-parametrization in generalization of neural networks. arXiv preprint arXiv:1805.12076 (2018) 7. Ding-Xuan, Z.: Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal. 48(2), 787–794 (2020) 8. Krizhevsky Alex, Ilya Sutskever, Geoffrey E. Hinton: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012) 9. Karen, S., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 10. He Kaiming, et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition (2016) 11. Gao, H., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 12. Chao, L., Zhang, Z., Wang, D.: Pruning deep neural networks by optimal brain damage. In: Fifteenth Annual Conference of the International Speech Communication Association (2014) 13. Babak, H., Stork, D.G., Wolff, G.J.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks. IEEE (1993) 14. Li, W., et al.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, PMLR (2013) 15. Golnaz, G., Lin, T.-Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018) 16. Yamada, Y., Iwamura, M., Akiba, T., Kise, K.: Shakedrop regularization for deep residual learning. IEEE Access 7, 186126–186136 (2019) 17. Pan, H., Niu, X., Li, R., Shen, S., Dou, Y.: DropFilter: a novel regularization method for learning convolutional neural networks. Neural Process. Lett. 51, 1285– 1298 (2020) 18. Krogh, A., Hertz, J.: A simple weight decay can improve generalization. In: Advances in Neural Information Processing Systems, vol. 4 (1991)
Learning Sparse Fully Connected Layers in CNNs
199
19. Robert, T.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996) 20. Hui, Z., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Statist. Soc. Ser. B (Statist. Methodol.) 67(2), 301–320 (2005) 21. Zongben, X., et al.: L1/2 regularization. Sci. China Inf. Sci. 53(6), 1159–1169 (2010) 22. Qinwei, F., Zurada Jacek, M., Wei, W.: Convergence of online gradient method for feedforward neural networks with smoothing L1/2 regularization penalty. Neurocomputing 131, 208–216 (2014) 23. Wei, W., et al.: Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks. Neural Netw. 50, 72–78 (2014) 24. Jing, C., Sha, J.: Prune deep neural networks with the modified l1 /2 penalty. IEEE Access 7, 2273–2280 (2018) 25. Kausik, N.B.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995) 26. Zhang, H.S., Tang, Y.L., Liu, X.D.: Batch gradient training method with smoothing regularization for l0 feedforward neural networks. Neural Comput. Appl. 26(2), 383–390 (2015) 27. Huisheng, Z., Tang, Y.: Online gradient method with smoothing l0 regularization for feedforward neural networks. Neurocomputing 224, 1–8 (2017) 28. Louizos Christos, Max Welling, Kingma Diederik P.: Learning sparse neural networks through L0 regularization. arXiv preprint arXiv:1712.01312 (2017) 29. Ming, Y., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Statist. Soc. Ser. B (Statist. Methodol.) 68(1), 49–67 (2006) 30. Jian, W., et al.: A novel pruning algorithm for smoothing feedforward neural networks based on group lasso method. IEEE Trans. Neural Netwo. Learn. Syst. 29(5), 2012–2024 (2017) 31. Huaqing, Z., et al.: Feature selection for neural networks using group lasso regularization. IEEE Trans. Knowl. Data Eng. 32(4), 659–673 (2019) 32. Jian, W., et al.: Convergence analyses on sparse feedforward neural networks via group lasso regularization. Inf. Sci. 381, 250–269 (2017) 33. Noah, S., et al.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013) 34. Zhou, Y., Jin, R., Hoi, S.C.-H.: Exclusive lasso for multi-task feature selection. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, (2010) 35. Yoon, J., Hwang, S.J.: Combined group and exclusive sparsity for deep neural networks. In: International Conference on Machine Learning. PMLR (2017) 36. Zegeye, A.H., et al.: Group L1/2 regularization for pruning hidden layer nodes of feedforward neural networks. IEEE Access 7, 9540–9557 (2019) 37. Feng, L., Zurada Jacek, M., Wei, W.: Smooth group L1/2 regularization for input layer of feedforward neural networks. Neurocomputing 314, 109–119 (2018) 38. Bui, K., et al.: Structured sparsity of convolutional neural networks via nonconvex sparse group regularization. Front. Appl. Math. Statist. 62, 529564 (2021) 39. Zhang, Y., et al.: Batch gradient training method with smoothing group L0 regularization for feedfoward neural networks. Neural Process. Lett. 55, 1–17 (2022) 40. Ramchoun, H., Ettaouil, M.: Convergence of batch gradient algorithm with smoothing composition of group l0 and l1/2 regularization for feedforward neural networks. Progr. Artif. Intell. 11, 1–10 (2022)
200
M. Quasdane et al.
41. He, K., et al.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (2015) 42. Ngiam, J., et al.: Tiled convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 23 (2010) 43. Yu, F., Vladlen, K.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015) 44. Hyv¨ arinen, A., Urs, K.: Complex cell pooling and the statistics of natural images. Netw. Comput. Neural Syst. 18(2), 81-100 (2007) 45. Yu, D., Wang, H., Chen, P., Wei, Z.: Mixed pooling for convolutional neural ´ ezak, D., Peters, G., Hu, Q., Wang, R. networks. In: Miao, D., Pedrycz, W., Sl¸ (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 364–375. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9 34 46. Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013) 47. Oren, R., Snoek, J., Adams, R.P.: Spectral representations for convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Using Crank-Nicolson Scheme for Continuous Hopfield Network Equilibrium Safae Rbihou1(B) , Nour-Eddine Joudar2 , Zakariae En-Naimani3 , and Khalid Haddouch1 1
Engineering, Systems and Applications Laboratory, ENSA Fez, Sidi Mohamed Ben Abdellah University, Fez, Morocco {safae.rbihou,khalid.haddouch}@usmba.ac.ma 2 Research Center STIS, M2CS, Department of Applied Mathematics and Informatics, ENSAM, Mohammed V University in Rabat, Rabat, Morocco [email protected] 3 Signaux Syst`emes Distribu´es et Intelligence Artificielle Laboratory, ENSET Mohammedia, Hassan II University, Casablanca, Morocco [email protected]
Abstract. The Continuous Hopfield Network (CHN) so far remained a concrete example of recurrent neural networks. The use of CHN in various applications is excellent, which generally leads to better results. The continuous Hopfield network has an energy function associated with it, and the network is bound to converge if the activity of each neuron with respect to time is given by a differential equation. The resolution of this differential equation of CHN in most works is based on the Euler method. In this work, we try to use another method for the solution of the CHN differential equation and rely on the Crank-Nicolson method. This work aims to improve and develop the performance of CHN using the Crank-Nicolson method. For this purpose, we have carried out an analytical and comparative study between the two methods of solving the CHN differential equation Euler and Crank-Nicolson applied to the problem of task assignment. Keywords: Continuous Hopfield network · Crank-Nicolson Equilibrium points · Combinatorial optimization problems
1
· Euler ·
Introduction
The Hopfield neural network (HNN), invented by Dr. John J. Hopfield in 1982 [3,7], is a fully connected recurrent model. Its use has spread in various fields such as operations research [14,15], artificial intelligence [16], optimization [9,17, 18], etc., which has given good results in most cases and is an efficient method compared to other conventional methods [8]. The Hopfiel network is defined by a Lyapunov function. The HNN is classified into types: discrete time and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 201–210, 2023. https://doi.org/10.1007/978-3-031-43520-1_17
202
S. Rbihou et al.
continuous time models such as the continuous Hopfield network (CHN) is a generalization of the discrete case [9]. The CHN consists of minimizing a function called the energy function [4,9], which is given as a function of the threshold attached to each neuron and the interconnection weight between two neurons [4]. If the weight matrix is symmetric with a zero diagonal, then the energy function guarantees a stable state. The continuous energy function of CHN guarantees the existence of a stable state if the weight matrix is symmetric with zero diagonal and it is monotonically decreasing with time, which guarantees the convergence to a local minimum. The dynamics of the CHN is described by the first-order differential equation that describes the state of each CHN neuron [3,4]. Many works use Euler’s method to approximate the Hopfield network differential equation because it is the simplest numerical method for solving differential equations [4]. The development of the results and efficiency of CHN has prompted some to consider using a different method among the existing methods for solving first-order differential equations. In their work, they used a method based on a multi-step scheme called Adams-Bashforth, which showed good results and better performance than the classical Euler method [10]. In order to increase the performance of the CHN for wedge in this work, we use another method for solving the differential equation of the CHN and stay within the framework of one-step methods based on the progressive and regressive Euler method, which is the method of Crank and Nicolson. The Crank and Nicolson scheme is more accurate than the two Euler schemes (progressive and regressive). The advantage of the Crank-Nicholson scheme is that it will reduce the error on the time derivative to O(Δt2 ), instead of O(Δt) [12,13]. In this work, we use the Crank-Nicholson scheme based on the algorithm proposed by [1] to obtain the equilibrium point of CHN, but in our work to improve and develop the performance of this algorithm, we add modifications. These modifications included the modification at the stop-test level, the scheme used to solve the differential equation associated with CHN, and the type of selection used for the penalty parameters of the CHN energy function. On the other hand, to know the performance of this new proposed system, we compare the classical CHN algorithm [4] and our system by solving a combinatorial optimization problem, which is the task assignment problem (TAP). The paper is organized as follows: the continuous Hopfield network is described in Sect. 2. Section 3 presents the equilibrium point search algorithm using the Crank-Nicholson method. Section 4 shows the analytical and the experimental study of the proposed algorithm applied to solve the TAP.
2
Continuous Hopfield Networks (CHN)
John Hopfield in 1982 [7] proposed a model based on neural networks to facilitate the solution of NP-complete optimization problems and also use it in various other domains. The continuous Hopfied network is a network containing n neurons. These neurons are connected to each other by a matrix of weights T and a
Using Crank-Nicolson Scheme for Continuous Hopfield Network Equilibrium
203
bias vector I. The dynamics of the CHN is described by the following differential equation [3]: du = Tv + I (1) dt where u and v are respectively proportional to the vectors of the states and outputs. The CHN is defined by an energy function E based on magnetic systems which is defined by [6]: 1 E(v) = − v t T v − I t v (2) 2 Definition 1. A point ue is called an equilibrium point of the system (1) if for an input vector u0 , ue satisfies u(t) = ue ∀t ≥ te for some te ≥ 0. Talavan in 2005 [4] used Euler’s method for the solution of the Eq. (1), which is considered a classical method with one step, allowing, from the initial condition, to calculate the values of the solution at each step. This method has been used in most of the CHN work. The Euler explicit scheme is given by the recurrence relation: Given a time step Δt > 0 the two versions of the Euler method [26]: ui (t + Δt) = ui (t) + Δt
dui dt
(3)
In this work, we treat and use optimization problems expressed by [5,25]: ⎧ ⎪ ⎪M in f (x) ⎪ ⎨s.c (P) ⎪ h(x) = 0 ⎪ ⎪ ⎩ x ∈ {0, 1}
(4)
The energy function (2) associated with the optimization problem (P ) [6]: E(v) = E 0 (v) + E R (v)
(5)
where, – E 0 (v) = αf (x) , with α ≥ 0, so is directly proportional to the objective function. – E R (v) is a quadratic constraint-collecting function that not only penalizes the violated constraints of the problem, but also guarantees the feasibility of the solution obtained by the CHN. The solution to the CHN optimization problems (P ) is to reduce the energy function (5) using the algorithm proposed in [4], allowing one to obtain the CHN equilibrium point. The Euler method is used by this algorithm to solve the differential Eq. (1). In the next section, we use another Crank-Nicolson method to solve the differential equation of CHN to increase the performance of the problems that CHN solves.
204
3
S. Rbihou et al.
The Proposed Algorithm
We will stay with one-step schemes the Crank-Nicolson (CN) method for simulating scattering systems described by Crank and Nicolson (1947) [19]. It is similar to the trapezoidal method in the domain of ordinary differential equations (Lapidus and Seinfeld, 1971; Jain, 1984) [20]. In general, the characteristic of finite difference methods is the convergence of the exact solution. In this work, we use the Crank-Nicolson method for the solution of the CHN differential Eq. (1), whose formula is given by: Given a time step of Δt > 0, the CN method scheme is as follows [12,27]: Δt dui (t) dui (t + Δt) ( + ) (6) 2 dt dt In this work, we rely on the algorithm to minimize the CHN energy function (2) by using the CHN method to solve their differential Eq. (1). This algorithm is based on the principle of an algorithm to obtain the equilibrium point [4], but there are a set of modifications that we have made which are reflected in: ui (t + Δt) = ui (t) +
– We have used the CN scheme to solve the CHN differential Eq. (1) – To get better solutions and reduce execution time, we changed the stopping criterion from the one used in [4]. – We have chosen the continuous type for the choice of the energy function parameters (5).
Algorithm 1. The algorithm to obtain the equilibrium point of CHN by the Crank-Nicolson method Input Data Initialization of the problem data t←0 v(t=0) ), ε Initialize v0 (Starting point), u(t = 0) = u20 ln( 1−v(t=0) 3 while maxi (|vi (t + Δt) − vi (t)|) ≥ ε do Generate the hyper-parameters of function energy (5) Calculate the time step Δt (f (ui (t)) + f (ui (t + Δt))) Compute ui (t + Δt) = ui (t) + Δt 2 Compute vi (t + Δt) = 12 (1 + tanh( ui (t+Δt) )) u0 t ← t + Δt end while Output
4
Using Crank Nicolson Scheme for Solving CHN Applied to TAP
We conducted an analytical and comparative study in order to compare the performances of the two schemes (Euler and CN). To perform this experiment, we used the combinatorial optimization problem “Task assignment problem (TAP)” [5].
Using Crank-Nicolson Scheme for Continuous Hopfield Network Equilibrium
4.1
205
Brief Description of TAP
The TAP problem entails assigning tasks to processor as efficiently as possible, i.e. minimizing total production costs under the following conditions: each processor can perform a unique task and each task must be performed by a unique processor [23]. In general for optimization problems, the energy function (5) is formulated as a function that adds together the objective function of the optimization problem and their constraints, for our TAP the power function is defined as follows [5]: 1 Φ E(v) = α( xt Qx + q t x) + v t x + βv + γv t (1 − v) 2 2
(7)
With α ∈ IR+ , (β, γ, Φ) ∈ IR3 , Q is the weight matrix. The parameters (γ, β) of the energy function (7) whose values have already been defined as a function of (α, ε) by the hyperplane method [5]. 4.2
Mounting of Experience
The objective of this study is to make a comparison between the algorithm of obtaining the equilibrium point of CHN by the method of Euler [4] and the algorithm proposed in this work. The values of the parameters we used in the classical CHN algorithm and the proposed algorithm are presented in the following Table 1: Table 1. The parameters values used in the two experiments. methods
Type
Parameters α
φ
ε
CHN-Euler
Discrete
Incremental value
0.1
10−4
[11]
CHN-Crank-Nicolson
Continuous
[0.01, 0.1]
[0.001, 0.1]
[10−6 , 0.1]
[0.5, [0, 1], [11], [6]
Starting point
Where, the starting point v = 0.5 means that each vij = 0.5, the starting point [0, 1] means that each vij ∈ [0, 1], [11] means that the selected starting point is presented in this reference. All the experiments have been performed on a personal computer equipped with an Intel Core i3-4005U @ 1.70 GHz processor, and 4 GB of RAM. This solver is implemented by the python language. 4.3
Results
The two Tables 2 and 3 represent the results of the two methods used, Euler and Crank-Nicolson. We have taken the 20 instances that are given in the TAP challenge [2]. The results of this experiment are given as a function of: optimal value (OV) in the TAP challenge [2], best feasible solution obtained (BFS), average of the feasible solutions obtained (Mean), sum of the iterations of the feasible solutions (SI) and execution time in seconds (ET).
206
S. Rbihou et al. Table 2. The result of each TAP by the CHN-Euler Instances
OV
BFS
Mean
Mode
SI
ET
tassnu 10 3 1
–719
–719
–319.55
–253
27524
19.83
tassnu 10 3 2
–790
–790
304.34
129
27277
37.97
tassnu 10 3 3
–624
–614
–256.73
–362
28993
32.21
tassnu 10 3 4
–734
–709
–349.06
–524
30755
44.01
tassnu 10 3 5
–871
–825
–357.68
–168
31215
26.87
tassnu 10 3 6
–677
–677
–196.02
–236
30343
36.07
tassnu 10 3 7
–613
–613
–214.35
145
29111
41.56
tassnu 10 3 8
–495
–479
–54.01
323
31611
26.15
tassnu 10 3 9
–750
–730
–310.64
–34
27809
22.45
tassnu 10 3 10
–486
–452
–81.63
–129
28640
20.31
tassnu 15 5 1
–1985
–1905
–844.38
–311
54827
100.9
tassnu 15 5 2
–1568
–1484
–556.56
43
53883
141.48
tassnu 15 5 3
–1892
–1565
–874.74
–259
49768
169.93
tassnu 10 5 4
–1806
–1539
–711.81
–156
55381
112.15
tassnu 15 5 5
–1881
–1796
–828.1
132
53673
103.36
tassnu 15 5 6
–1950
–1822
–881.98
–141
53139
102.36
tassnu 15 5 7
–1893
–1817
–720.73
–213
54866
143.94
tassnu 15 5 8
–1733
–1694
–616.27
–174
51382
101.99
tassnu 15 5 9
–1798
–1512
–791.53
–531
56952
131.43
tassnu 15 5 10
–1763
–1481
–774.23
–1118
54644
134.98
Mean
–1251.4
–1161.15
–502.22
–192.3
41589.65
77.49
Table 3. The result of each TAP by the CHN-Crank-Nicolson Instances
OV
BFS
Mean
Mode
SI
ET
tassnu 10 3 1
–719
–719
–82.11
–620
20490
26.3
tassnu 10 3 2
–790
–790
72.45
753
15974
226.54
tassnu 10 3 3
–624
–624
–154.52
–354
23284
431.25
tassnu 10 3 4
–734
–671
–285.45
–567
25194
451.66
tassnu 10 3 5
–871
–871
–353.33
–704
22721
109.71
tassnu 10 3 6
–677
–677
–241.15
–677
21451
112.41
tassnu 10 3 7
–613
–613
–200.12
–613
29241
104.74
tassnu 10 3 8
–495
–493
82.32
–20
22000
126.41
tassnu 10 3 9
–750
–730
–116.26
–669
21697
125.13
tassnu 10 3 10
–486
–486
119.54
461
23845
133.43
tassnu 15 5 1
–1985
–1985
–1300.83
–1250
48466
151
tassnu 15 5 2
–1568
–1515
–237.73
381
17671
667.65
tassnu 15 5 3
–1892
–1781
–522.29
–149
18103
626.77
tassnu 15 5 4
–1806
–1659
–290.38
214
18057
689.97
tassnu 15 5 5
–1881
–1796
–88.51
–3
16595
567.19
tassnu 15 5 6
–1950
–1922
–647.46
–466
15518
564.38
tassnu 15 5 7
–1893
–1818
–42.01
–340
15923
566.6
tassnu 15 5 8
–1733
–1711
–154.128
376
21276
731.78
tassnu 15 5 9
–1798
–1560
–308.11
416
24555
835.37
tassnu 15 5 10
–1763
–1762
–474.33
213
16508
461.92
Mean
–1251.4
–1209.15
–263.02
–180.9
21928.45
385.51
Using Crank-Nicolson Scheme for Continuous Hopfield Network Equilibrium
4.4
207
Performance Comparison of Two Methods
From the results presented in the previous subsection, we can group the different results in the following three Figs. (1, 2, 3). The measure we have adopted in the Figure is given by the performance ratio: tour length Performance ratio = optimum tour length
Fig. 1. The average Performance ratio of CHN-Euler, CHN-CN.
Fig. 2. The sum of iteration of CHN by the two methods Euler and CN.
From the results obtained in the experimental phase represented in the two Tables 2 and 3 represent respectively the application of algorithm to obtain the equilibrium point of CHN by the two methods of solving the differential equation Euler and Crank-Nicolson. The three Figs. (1, 2 and 3) generalize the results of the two tables above, it seems that the minimum obtained for each instance by our algorithm is much better than the classical CHN whose percentage of
208
S. Rbihou et al.
Fig. 3. Average execution time of each method.
variance is estimated to 40% and then the percentage of obtaining the optimal point for the CHN-Crank-Nicolson method is 40% and for CHN-Euler represents a percentage of 20%. On the other hand at the level of the sum of the iterations our method requires more iterations which is normal whose percentage of variance is estimated at 28% for the simple reason that the Crank-Nicolson method combines both the explicit and implicit Euler method which also leads to a higher execution time. There are many criteria to measure the performance of a differential equation solution method. In this study, we rely on the comparison of the two methods (Euler and Crank-Nicolson) on three important elements: first, the quality of the CHN output solution; second, the number of iterations required for each execution; and finally, the execution time required.
5
Conclusion
The Crank-Nicolson method is a one-step method used in this work to calculate the equilibrium point of the CHN in order to confirm the effectiveness of this method by a comparative and analytical study with the classical CHN. Therefore, we apply this method to an additional assignment problem. Through the results presented in this work, it has been found that the use of the Crank-Nicolson method is better and more efficient compared to the classical CHN method which depends on the method. This is what we are looking for in our research, which is the development of Hopfield network to improve the results of different uses of Hopfield network in different areas.
References 1. Oishi, C.M., Yuan, J.Y., Cuminato, J.A., Stewart, D.E.: Stability analysis of Crank-Nicolson and Euler schemes for time-dependent diffusion equations. BIT Numer. Math. 55(2), 487–513 (2015)
Using Crank-Nicolson Scheme for Continuous Hopfield Network Equilibrium
209
2. The task assignment problem, a library of instances. http://cedric.cnam.fr/oc/ TAP/TAP.html (2004) 3. Hopfield, J.J., Tank, D.W.: “Neural” computation of decisions in optimization problems. Biol. Cybern. 52(3), 141–152 (1985) 4. Talav´ an, P.M., Y´ an ˜ez, J.: A continuous Hopfield network equilibrium points algorithm. Comput. Oper. Res. 32(8), 2179–2196 (2005) 5. Ettaouil, M., Loqman, C., Hami, Y., Haddouch, K.: Task assignment problem solved by continuous Hopfield network. Int. J. Comput. Sci. Iss. (IJCSI) 9(2), 206 (2012) 6. Talav´ an, P.M., Y´ an ˜ez, J.: The generalized quadratic knapsack problem. A neuronal network approach. Neural Netw. 19(4), 416–428 (2006) 7. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79(8), 2554–2558 (1982) 8. Rbihou, S., Haddouch, K.: Comparative study between a neural network, approach metaheuristic and exact method for solving Traveling salesman Problem. In: 2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS), pp. 1–5. IEEE (2021) 9. Wen, U.P., Lan, K.M., Shih, H.S.: A review of Hopfield neural networks for solving mathematical programming problems. Eur. J. Oper. Res. 198(3), 675–687 (2009) 10. El Alaoui, M., El Moutaouakil, K., Ettaouil, M.: A multi-step method to calculate the equilibrium point of the continuous Hopfield networks: application to the maxstable problem. Int. J. Comput. Sci. Inf. Secur. (IJCSIS) 14(6), 216–221 (2016) 11. Haddouch, K., El Moutaouakil, K.: New starting point of the continuous Hopfield network. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds.) BDCA 2018. CCIS, vol. 872, pp. 379–389. Springer, Cham (2018). https://doi.org/10.1007/9783-319-96292-4 30 12. Crank-Nicolson method. https://en.wikipedia.org/wiki/Crank-Nicolson method 13. Euler method. https://en.wikipedia.org/wiki/Euler method 14. Smith, K.A., Gupta, J.N.: Neural networks in business: techniques and applications for the operations researcher. Comput. Oper. Res. 27(11–12), 1023–1044 (2000) 15. Seidl, P., et al.: Improving few-and zero-shot reaction template prediction using modern Hopfield networks. J. Chem. Inf. Model. 62(9), 2111–2120 (2022) 16. Ramsauer, H., et al.: Hopfield networks is all you need. arXiv preprint arXiv:2008.02217 (2020) 17. Kasihmuddin, M.S.M., Mansor, M.A., Sathasivam, S.: Hybrid genetic algorithm in the Hopfield network for logic satisfiability problem. Pertanika J. Sci. Technol. 25(1), JST-0599-2016 (2017) 18. Tan, K.C., Tang, H., Ge, S.S.: On parameter settings of Hopfield networks applied to traveling salesman problems. IEEE Trans. Circuits Syst. I Regul. Pap. 52(5), 994–1002 (2005) 19. Crank, J., Nicolson, P.: A practical method for numerical evaluation of solutions of partial differential equations of the heat-conduction type. In: Mathematical proceedings of the Cambridge philosophical Society, Cambridge University Press, vol. 43, no. 1, pp. 50–67 (1947) 20. Lapidus, L., Seinfeld, J. H.: Numerical solution of ordinary differential equations. Academic Press (1971) 21. Chaudhary, V., Aggarwal, J.K.: A generalized scheme for mapping parallel algorithms. IEEE Trans. Parallel Distrib. Syst. 4(3), 328–346 (1993) 22. Norman, M.G., Thanisch, P.: Models of machines and computation for mapping in multicomputers. ACM Comput. Surv. (CSUR) 25(3), 263–302 (1993)
210
S. Rbihou et al.
23. Hartmanis, J.: Computers and intractability: a guide to the theory of NPcompleteness (Michael R. Garey and David S. Johnson). Siam Rev. 24(1), 90 (1982) 24. Salman, A., Ahmad, I., Al-Madani, S.: Particle swarm optimization for task assignment problem. Microprocess. Microsyst. 26(8), 363–371 (2002) 25. Borchardt, M.: An exact penalty approach for solving a class of minimization problems with Boolean variables. Optimization 19(6), 829–838 (1988) 26. Qureshi, S., Shaikh, A.A., Chandio, M.S.: A new iterative integrator for Cauchy problems. Sindh Univ. Res. J.-SURJ (Sci. Ser.) 45(3) (2019) 27. Farag´ o, I.: Splitting methods and their application to the abstract Cauchy problems. In: Li, Z., Vulkov, L., Wa´sniewski, J. (eds.) NAA 2004. LNCS, vol. 3401, pp. 35–45. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31852-1 4
Fire and Smoke Detection Model for Real-Time CCTV Applications Tarik Hajji(B) , Ibtissam El Hassani, Abdelkader Fassi Fihri, Yassine Talhaoui, and Chaimae Belmarouf Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), Artificial Intelligence for Engineering Sciences Team, ENSAM-Meknes, Moulay Ismail University of Meknes, Meknes, Morocco [email protected]
Abstract. Since wildfires, whether they occur in a town or elsewhere, are among the most expensive and deadly disasters, there is a lot of research being done in the field of fire detection. This research is being done, in particular, to ensure the safety of people. It endangers not just their lives but also their long-term wellbeing. The fastest detection and notification of the location of firemen, operational forces, and even people remain the top priorities. The chance of spotting the smoke before even the development of disastrous flames is thought to make it worthwhile, though. The model should also be able to discriminate between smoke and fog to increase its quality. In order to maximize the utility of this instrument, this study discusses these concerns, including the potential use of detection based on infrared images produced by public or private cameras. Real-time detection is also a crucial tenet for developing a system capable of sending warnings with evidence produced from RGB or infrared video camera monitoring, expanding the possibilities investigated by this methodology. Keywords: CNN · fire detection · infrared image · real-time · artificial intelligence · deep learning · YOLOv5 · wildfires
1 Introduction Not just financial losses, environmental catastrophes, or severe harm to living things can result from fire incidents. Wildfires and industrial fires in particular could seriously harm cities and forests and have repercussions in the future. These worrying statistics encourage experts to look for creative approaches to early fire detection and management. Recent developments in aerial monitoring systems can give operational forces and first responders more precise information on the behavior of fires for improved fire management. To provide direct detection and prevention of severe outcomes, sensors are put in an environment that closely resembles an industrial setting. People are usually stationed in lookout towers to watch flames using visual and infrared imaging in traditional fire monitoring methods.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 211–220, 2023. https://doi.org/10.1007/978-3-031-43520-1_18
212
T. Hajji et al.
Deep learning for fire detection is a relatively young subject, and assessment structures and large-scale datasets are currently lacking. Due to its consideration of the fundamental characteristics of fire, this work makes a contribution to the area. It is built on the fundamentals of monitoring urban, industrial, and public spaces utilizing securitydriven CCTV video systems, as well as automatic real-time fire (or flame) detection by analyzing video sequences. as a first line of defense against fire (in addition to heat and smoke). The requirement for autonomous fire detection from such platforms is further increased by the ongoing consideration of remote vehicles for automatic fire detection and monitoring activities [1] and [2]. Among other item classification challenges, detecting fire in photos and videos is difficult because to the irregularity of its shape or pattern, which varies depending on the underlying material composition. Vision-based research in subjects like image processing and computer vision has seen a fair share of positive results due to the advancement in many artificial intelligence fields. In particular computer vision applications, such as picture categorization, various deep learning (DL) models have successfully outperformed human level performance. These ground-breaking improvements have also proved useful for the visual-based fire identification method. Compared to conventional fire detection systems, visual-based fire detection systems offer a number of benefits. Hardware-based alarm systems are preferable in terms of price, accuracy, robustness, and dependability.
2 Related Works In our current research, we are building upon our previous experiences in artificial intelligence (AI) and computer vision. We have conducted several previous works that utilize AI in various domains, such as applying deep learning for image classification in the medical field, utilizing a machine learning algorithm to detect fraud in financial transactions, using computer vision techniques for object detection and tracking in security, as well as facial recognition for access control in public spaces. One of our previous studies utilized a deep learning approach for image classification in the medical domain, achieving high accuracy in identifying different types of medical images. Another project focused on utilizing a machine learning algorithm to detect fraud in financial transactions with a high level of precision [3–8]. Additionally, we have used computer vision techniques in several works, including object detection and tracking in the security domain, as well as facial recognition for access control in public spaces [9, 13]. In our current study, we are focusing on fire detection using the YOLOv5 model. We are leveraging our expertise in these areas to develop a robust and efficient approach to fire detection that can be easily implemented by embedded devices. Fire detection is a crucial concern, and there is an urgent need for a suitable detection system to minimize the harm caused by the numerous fire mishaps that occur every day [14]. Previous research has utilized different strategies to obtain effective methods, such as an auto-adaptive edge detection technique for flame detection put forth by Qiu et al. [15], and a methodology that focused on the motion and color characteristics of
Fire and Smoke Detection Model for Real-Time CCTV Applications
213
flame detection by Thou-Ho et al. [16]. Rossi et al. [17] proposed a technique to derive geometric fire features from stereoscope videos, while Thomson et al. [18] proposed a small and low-complexity CNN architecture (ShuffleNetV2-OnFire) that reduces the complexity of CNN architectures while preserving accuracy and maximizing computational efficiency through experimental analysis and filter pruning. Our method, which utilizes the YOLOv5 model, expands upon earlier research by effectively utilizing public cameras and taking smoke detection into account as a significant and relevant parameter. In future studies, we will focus on additional synthetic image data training using generative adversarial networks in big data context like in [19].
3 Dataset In [19], Wu et al. achieved H. Fire detection using pre-trained ANN architectures, MobileNet and AlexNet, with a publicly available fire dataset that only contained images captured on land. However, to train our model for emergency situation detection, we collected a diverse set of photographs from the internet depicting emergency scenarios in various environments such as buildings, industrial areas, roadways, and outdoors like in Fig. 1. To improve the quality and quantity of our dataset, we applied various data augmentation techniques such as flipping, rotation, zooming, and cropping. By augmenting our dataset in this way, we were able to increase its diversity and size, which helped our model to better generalize to new and unseen images. Additionally, we normalized the images in our dataset to ensure consistent scaling and color values across the entire dataset. Our final dataset consisted of 226 images of flames, IR fires, smoke, and fog with different resolutions and sizes, which we divided into two categories with 56 images for validation and 80 for training.
Fig. 1. The dataset’s many classes.
The YOLO labeling format provides one text file per image, which is widely supported by annotation providers. Each text file includes bounding-box annotations for each object in the image, with annotations ranging from 0 to 1 and scaled to fit the image size. We manually annotated the images using Makesense, a free online application for identifying photos. This dataset contains RGB photos of fire, smoke, and fog, as well as
214
T. Hajji et al.
infrared images of fire, with annotations that can be readily used with current machine learning techniques. It can be a valuable resource for other researchers working on fire detection, offering a diverse range of scenarios, such as forests, agricultural land, towns, mountains, and more, making it a challenging benchmarking objective.
4 Methodology Our methodology involved several steps as shown in the Fig. 2.
Fig. 2. From Collection to Integration: A Methodology for Developing an Effective Fire Detection System
The steps of our methodology are 1. Data collection: Collect images of different types of fires, including forest fires, building fires, chimney fires, etc. 2. Data annotation: Use annotation tools to mark the images with the locations of fires in the image. 3. Model training: Use YOLOv5 to train a fire detection model using the annotated data. 4. Model evaluation: Evaluate the model’s performance using test images and metrics such as precision and recall. 5. Integration: Integrate the trained model into a fire detection system that can be used in real-world scenarios, such as for early fire detection and prevention in buildings, forests, and other areas. We selected YOLO models in Fig. 3 as they are known for their efficiency and accuracy in object detection, and fire detection requires a high level of effectiveness. YOLO, or “You Only Look Once,” uses a grid structure to divide images into cells, with each cell responsible for detecting objects within it. Among the various versions of YOLO, we opted for YOLOv5, which was introduced by Glenn Jocher using the Pytorch framework. YOLOv5 employs advanced algorithm optimization techniques, such as auto learning bounding box anchors, mosaic data augmentation, and cross-stage partial network, to improve its detection capabilities. The YOLOv5 architecture consists of four components: input, backbone, neck, and output. Preprocessing, including mosaic data augmentation and adaptive image filling, is mainly performed at the input component. YOLOv5 also incorporates adaptive anchor
Fire and Smoke Detection Model for Real-Time CCTV Applications
215
frame calculation to automatically adjust the anchor frame size for different datasets. The backbone network utilizes multiple convolution and pooling to extract feature maps of different sizes from the input image, using the cross-stage partial network and spatial pyramid pooling. The neck network employs feature pyramid architectures such as FPN and PAN to transmit semantic and localization features between feature maps, further enhancing the detection capabilities. Finally, the head output predicts targets of various sizes on the feature maps in the last detecting phase. Overall, YOLOv5’s architecture and optimization techniques make it a suitable choice for fire detection, particularly with its ability to adjust to different datasets and enhance feature extraction and detection.
Fig. 3. YOLOv5 is an object detection algorithm that uses a deep neural network to localize objects in images. Its architecture consists of a backbone network, neck network, and head network that work together to produce bounding boxes and class probabilities for each object in the image. YOLOv5 is designed to be fast, accurate, and efficient.
The four architectural in Fig. 4 kinds that make up the YOLOv5 are YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The quantity of feature extraction modules and convolution kernels at particular points on the network is where they diverge most from one another. The figure depicts the YOLOv5 network structure. Here’s a brief overview of each model: • YOLOv5s: This is the smallest model, with 12.7 million parameters and a speed of 63 FPS on a Tesla V100 GPU. It is suitable for real-time applications and can detect objects accurately, but it may struggle with smaller objects or in cluttered scenes.
216
T. Hajji et al.
• YOLOv5m: This is a medium-sized model, with 35.9 million parameters and a speed of 40 FPS on a Tesla V100 GPU. It has a good balance between speed and accuracy and can handle a wider range of object sizes and cluttered scenes. • YOLOv5l: This is a larger model, with 63.3 million parameters and a speed of 27 FPS on a Tesla V100 GPU. It has higher accuracy than YOLOv5m and can handle more complex scenes, but it is slower and requires more computational resources. • YOLOv5x: This is the largest and most complex model, with 87.5 million parameters and a speed of 17 FPS on a Tesla V100 GPU. It has the highest accuracy and can detect small objects in cluttered scenes, but it requires significant computational resources and may not be suitable for real-time applications.
Fig. 4. The YOLOv5 architecture has four pre-trained models with different sizes and complexities, which are commonly referred to as YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x.
5 Training and Results In this stage, we train our Yolov5 model with evolving input. From our initial values described in Sect. 1 and maximizing the fitness defined in Sect. 2, to evolve hyperparameters relevant to this scenario. To find the optimum hyperparameters in machine learning, use Evolve as a guide. Then, using those hyperparameters to train our model with an epoch of 150 and a batch size of 4, we obtain the results in Fig. 5. This figure shows an exponential decline in the cls loss and validation cls loss; otherwise, precision and mAP (mean average precision) grow. The cls loss (classification loss) measures the error in the model’s classification predictions, while the validation cls loss measures the error on a validation set of images that the model has not seen during training. When a model is trained, it starts with high cls loss and validation cls loss, indicating that the model is making many errors in classification. However, as the model is trained on more data, it learns to detect and classify objects more accurately, leading
Fire and Smoke Detection Model for Real-Time CCTV Applications
217
Fig. 5. Improvement of YOLOv5 Object Detection Model Performance Over Training Epochs.
to a gradual decrease in cls loss and validation cls loss over time. This decrease in loss indicates that the model is improving its classification accuracy. At the same time, as the model becomes more accurate in its predictions, the precision (a measure of the model’s ability to correctly identify true positives) and mAP (mean average precision, a measure of the overall accuracy of the model) increase. This means that the model is better at detecting and classifying objects, leading to better overall performance. The exponential decline in cls loss and validation cls loss, combined with the increase in precision and mAP, is a good indication that the YOLOv5 model is improving in accuracy as it is trained on more data. Figure 6. Also shows that there is no confusion between classes. Because of the quality of the data, each class and background are unclear.
Fig. 6. Class separation evaluation in our ambiguous object detection data using YOLOv5 Model
218
T. Hajji et al.
We created an interactive web interface with Streamlit to communicate with users who are either going to post photographs, videos, or work in real-time. The platform’s key goals are to make it possible for the user to install the model, visualize the location of the camera or the data source, and receive real-time alerts that include incident photos. Our experiments demonstrate the effectiveness of the YOLOv5 object detection method for fire detection in a diverse range of scenarios. We observed a high precision and recall rate in detecting flames, smoke, and fog in the test dataset. The model also achieved accurate results in identifying the different classes of fire-related objects with no confusion between them. Additionally, our analysis revealed that the YOLOv5 model has a fast processing time, allowing for real-time detection of fires in images and videos. The use of mosaic data augmentation and adaptive anchor frame calculation techniques contributed to improving the model’s robustness and adaptability to different datasets. Overall, our results suggest that the YOLOv5 method can be a valuable tool for fire detection applications.
6 Conclusion In this work, we have shown that the YOLOv5 object detection model can be successfully used for fire detection in a variety of scenarios. However, the use of traditional supervised learning techniques has some limitations, especially when it comes to detecting rare or unusual events. One promising direction for future research is the application of reinforcement learning to fire detection. Reinforcement learning is a type of machine learning that involves training an agent to make decisions based on rewards and punishments. In the context of fire detection, reinforcement learning could be used to train an agent to identify subtle changes in temperature or other environmental factors that could indicate the presence of a fire. By incorporating reinforcement learning into our detection system, we could improve its ability to detect fires in challenging scenarios and reduce the risk of false alarms. While reinforcement learning is a relatively new area of research, it has shown promise in a variety of applications, including robotics, gaming, and natural language processing. We believe that it could be a valuable tool for fire detection, and we look forward to exploring this possibility in future work. Overall, our results demonstrate the effectiveness of the YOLOv5 object detection model for fire detection, and we believe that the application of reinforcement learning could further improve its performance in challenging scenarios. We hope that our work will inspire further research in this area and contribute to the development of more effective fire detection systems.
References 1. Bradshaw, A.: The UK security and fire fighting advanced robot project. In: IEE Colloquium on Advanced Robotic Initiatives in the UK, pp. 1–114 (1991) 2. Martinez-de Dios, J.R., Merino, L., Caballero, F., Ollero, A., Viegas, D.: Experimental results of automatic fire detection and monitoring with UAVS. Forest Ecology Manag. 234(1), S232 (2006). https://doi.org/10.1016/j.foreco.2006.08.259
Fire and Smoke Detection Model for Real-Time CCTV Applications
219
3. Tarik, H., Kodad, M., Miloud, J.E.: Digital movements images restoring by artificial neural netwoks. Comput. Sci. Eng. 10, 36–42 (2014) 4. Hajji, T., El Jasouli, S.Y., Mbarki, J., Jaara, E.M.: Microfinance risk analysis using the business intelligence. In: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 675–680. IEEE (2016) 5. Tarik, H., Jamil, O.M.: Weather data for the prevention of agricultural production with convolutional neural networks. In: 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), pp. 1–6. IEEE (2019) 6. Ouerdi, N., Hajji, T., Palisse, A., Lanet, J.-L., Azizi, A.: Classification of ransomware based on artificial neural networks. In: Rocha, Á., Serrhini, M. (eds.) EMENA-ISTL 2018. SIST, vol. 111, pp. 384–392. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03577-8_43 7. Hajji, T., Ouerdi, N., Azizi, A., Azizi, M.: EMV cards vulnerabilities detection using deterministic finite automaton. Procedia Comput. Sci. 127, 531–538 (2018) 8. Tarik, H., Mohammed, O.J.: Big data analytics and artificial intelligence serving agriculture. In: Ezziyyani, M. (ed.) AI2SD 2019. AISC, vol. 1103, pp. 57–65. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36664-3_7 9. Tarik, H., Tawfik, M., Youssef, D., Simohammed, S., Mohammed, O.J., Miloud, J.E.: Towards an improved CNN architecture for brain tumor classification. In: Serrhini, M., Silva, C., Aljahdali, S. (eds.) EMENA-ISTL 2019. LAIS, vol. 7, pp. 224–234. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36778-7_24 10. Hajji, T., Hassani, A.A., Jamil, M.O.: Incidents prediction in road junctions using artificial neural networks. In: IOP Conference Series: Materials Science and Engineering, vol. 353, no. 1, p. 012017. IOP Publishing (2018) 11. Douzi, Y., Kannouf, N., Hajji, T., Boukhana, T., Benabdellah, M., Azizi, A.: Recognition textures of the tumors of the medical pictures by neural networks. J. Eng. Appl. Sci 13, 4020–4024 (2018) 12. Benabdellah, M., Azizi, A., Masrour, T.: Classification and watermarking of brain tumor using artificial and convolutional neural networks. artificial intelligence and industrial applications. In: Artificial Intelligence Techniques for Cyber-Physical, Digital Twin Systems and Engineering Applications, vol. 144, p. 61 (2020) 13. Hajji, T., Masrour, T., Ouazzani Jamil, M., Iathriouan, Z., Faquir, S., Jaara, E.: Distributed and embedded system to control traffic collision based on artificial intelligence. In: Masrour, T., Cherrafi, A., El Hassani, I. (eds.) A2IA 2020. AISC, vol. 1193, pp. 173–183. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51186-9_12 14. Hamburger, C.: Quasimonotonicity, regularity and duality for nonlinear systems of partial differential equations. Annali di Matematica 169(1), 321–354 (1995). https://doi.org/10.1007/ BF01759359 15. Qiu, T., Yan, Y., Lu, G.: An autoadaptive edge-detection algorithm for flame and fire image processing. IEEE Trans. Instrum. Meas. 61(5), 1486–1493 (2012). https://doi.org/10.1109/ TIM.2011.2175833 16. Chen, T.-H., Wu, P.-H., Chiou, Y.-C.: An early fire-detection method based on image processing. In: 2004 International Conference on Image Processing, 2004. ICIP ’04., vol. 3, pp. 1707–17103 (2004). https://doi.org/10.1109/ICIP.2004.1421401 17. Rossi, L., Akhloufi, M., Tison, Y.: On the use of stereovision to develop a novel instrumentation system to extract geometric fire fronts characteristics. Fire Saf. J. 46(1), 9–20 (2011). https:// doi.org/10.1016/j.firesaf.2010.03.001.ForestFires 18. Thomson, W., Bhowmik, N., Breckon, T.P.: Efficient and Compact Convolutional Neural Network Architectures for Non-temporal Real- time Fire Detection. arXiv (2020). https:// doi.org/10.48550/ARXIV.2010.08833. https://arxiv.org/abs/2010.08833
220
T. Hajji et al.
19. Wu, H., Li, H., Shamsoshoara, A., Razi, A., Afghah, F.: Transfer learning for wildfire identification in UAV imagery. In: 2020 54th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6 (2020). https://doi.org/10.1109/CISS48834.2020.1570617429 20. Hajji, T., Loukili, R., El Hassani, I., Masrour, T.: Optimizations of distributed computing processes on apache spark platform. IAENG Int. J. Comput. Sci. 50(2), 422–433 (2023)
Vehicle Image Classification Method Using Vision Transformer Youssef Taki(B) and Elmoukhtar Zemmouri ENSAM Meknes, Moulay Ismail University, Meknes, Morocco [email protected], [email protected]
Abstract. Vehicle detection and classification has become a critical task of the Advanced Driver Assistance System (ADAS) and Intelligent Transportation System (ITS), with the goal of increasing road safety and saving lives. Advances in image processing, pattern recognition, and deep learning techniques have overcome many obstacles to achieve this goal and still do, especially convolutional neural network. However, most object detection models cannot achieve satisfactory performance under some conditions as low-quality images, nighttime, and other insufficient illumination conditions. In this paper, we propose to increase the effectiveness of ADAS using the Vision Transformer Pre-Trained Model, the latest model for computer vision tasks, for vehicle classification problem. The dataset we used is a small dataset of tiny, low-resolution vehicle images. The proposed model is based on the attention mechanism and pre-training on ImageNet-21k dataset. The proposed model achieves good performance compared with state-of-the-art methods. Keywords: Advanced Driver Assistance System · Vehicle classification · Vision Transformer · Deep Learning
1 Introduction Nowadays, with the world’s population growing, the roads are filled with cars and vehicles like a fleet, because of that the number of accidents is also increasing insanely, so that in just 48 hours (July 26–27, 2022), the Moroccan Ministry of Transport recorded 199 traffic accidents. As a result, 15 people were killed and 313 injured those numbers do not significantly differ from the recorded data during the same period in previous years [1]. This terrible situation is a global phenomenon caused by the same reasons and presents the same results, which we need to stand against. For example, according to the Traffic Accident Analysis System of the Korea Road Traffic Authority [4], the traffic accident situation has increased in the period from 2010 to 2019, the number of injured is alarming in the past four years, as reported by the Bangladesh Ministry of Transport compared to 2018, vehicle accidents increased by more than 51% and the death rate by more than 17%. In 2019, a total of 5,227 lives were lost in road accidents in Bangladesh, nearly one thousand more than in 2018 [5]. According to the analysis of traffic accident types, the vehicle-to-vehicle traffic accident rate was above 88%, This means that if we can detect and classify the types of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 221–230, 2023. https://doi.org/10.1007/978-3-031-43520-1_19
222
Y. Taki and E. Zemmouri
vehicles on the road and pay attention to the most dangerous type and the high probability of them getting into a traffic accident, we can significantly reduce the number of traffic accidents and save lives. Classifying vehicles on the road is an important challenge in Intelligent Transportation System (ITS) and Advanced Driver Assistance System (ADAS) applications, whose main role is to reduce the number of car accidents and save lives, as is the case with many other applications such as human path prediction [6], traffic sign detection and recognition [7]. In recent years, using Deep Learning have attracted many researchers, as an area for improvement, due to their importance and sensitivity. Recently, methods for vehicle classification have received a lot of attention and made great progress, thanks to the development of machine learning and thanks to the various technologies added to smart vehicles such as sensors and cameras. In this paper, we intend to use a small dataset with Low Quality images, collected by [8], aiming to normalize the use of low-cost standard cameras in intelligent systems such as ITS and ADAS applications. We first propose a direct application of vision transformer (ViT), which is based on [3]. Without any pretraining or fine-tuning on the dataset collected by [8], this model achieved good accuracy and minimal losses and then compared with previous well-known models. Second, we fine-tune a pre-trained ViT model on the target dataset, this model achieves the state of art results compared with the existing models. The following are a summary of the paper’s contributions: 1) We propose a first application of ViT in Vehicle classification problem (from scratch & pretrained model) 2) Experiments with a small dataset with Low Quality images show that ViT significantly improves type Vehicle recognition accuracy, gaining more than 99.3%, with minimal losses. The rest of our paper is organized as follows: a brief overview of related work is provided in Sect. 2, and the proposed model architecture is discussed in Sect. 3. Then, the details training and experiments comparisons with previous methods on the same dataset are presented in Sect. 4. Finally, some closing discussions are provided in Sect. 5.
2 Related Work Many researchers were interested in vision-based vehicle classification methods that used machine learning (ML), because it offers an effective and flexible approach to meeting the growing demands of ITS applications at that time. In this context, several works have been proposed using ML methods to classify vision-based compounds [9, 10]. [9] describes a method for classifying a vehicle image that uses a neural network (NN) with a conditional adaptive distance. [11] proposes a vehicle classification method based on multi-class support vector machine (SVM). In [12], A vehicle classification method based on fuzzy support vector machine is also provided. In [13], for vehicle classification, the AdaBoost method is used. In recent years, deep neural networks have been used in several fields such as computer vision, machine translation, speech recognition and so on. Revolutionary results
Vehicle Image Classification Method Using Vision Transformer
223
have been achieved in the computer vision field due to the convolutional neural network (CNN). CNNs have demonstrated better performance in image classification and have been largely applied for object detection [14], video classification [15] and segmentation [16]. But in the ADAS field, there are still many challenges like low-quality images, nighttime, and other insufficient illumination. To deal with these challenges, researchers in ITS and ADAS have begun to examine the implementation of deep learning approaches. Among the first papers use CNN in this task is [17] in 2015. They use a semi supervised Convolutional Neural Network for vehicle classification. In [18], Vehicle classification is accomplished using a progressive CNN architecture. Similar manner, [19] proposes a CNN-based vehicle type classification system. Furthermore, [20] presents a vehicle classification from the CNN pre-trained dataset. In addition, [21] provides a simple algorithm for vehicle classification that represents a deep CNN model. [22] presents a framework for vehicle classification using a deep learning neural network (Inception-v3 model). [23] proposes yet another study that employs CNN for vehicle classification. Furthermore, [24] presents a real-time vehicle type classification system based on Faster Region-convolutional neural networks (Faster R-CNN). Aside from that, an improved [25] also presents a faster R-CNN method-based vehicle classification. [26] proposes a vehicle classification and counting method based on CNN models. Vehicle classification using a stacking ensemble of three deep neural networks is presented in [27]. [10] also proposes a deep learning-based object detection algorithm (SSD: Single Shot MultiBox Detector) for vehicle classification. It is important to note that, despite the use of various classifiers in these works, CNN remains popular due to its superior performance on large-scale image datasets. In 2017, the research team at Google [2] proposed a new, attention-based approach to the Natural Language Processing (NLP) task. It is based on the idea that what we really need is to decide what to pay attention to and what not to pay attention to. This paper (attention is all you need) was an explosion fuse for research into the mechanism of self-attention and transformers in various domains. In 2021, the Google team again proposed a model called Vison Transformers (ViT) [3], which is a direct application of the attention mechanism and transformers to image classification task. But first it needs to transform the image to number of words by embedding, this paper is called an image is worth 16 × 16 words transformers for image recognition at scale. In the literature, there is not much paper using transformers in the task of image classification, especially for the problem of classifying vehicle images. In this work, we propose to deal with this problem using ViT on small size and low-resolution data set, which, to our knowledge the first attempt to apply this new technique for this problem.
3 Background on Vision Transformer 3.1 Model Architecture Figure 1 shows the architecture of ViT model in detail. It is based on the model proposed by Dosovitskiy et al. [3]. The structure of the ViT model is like the original transformer by Vaswani et al. [2] created primarily for NLP tasks. The distinction is that only the encoder component is used, and instead of word embeddings, the encoder accepts an input image ximg ∈ RH∗W∗C , The image has H*W dimensions and C channels. Considering that the
224
Y. Taki and E. Zemmouri
Transformer encoder can only process the as a series of input tokens, each raw images 2C P image is decomposed into a 2D patches xi ∈ R where N = H ∗ W/P2 the i=1......N number of batches with a fixed size of (P, P), After that, the patches are flattened into vectors via linear projection to D-dimension vectors, which are the patch embeddings and D is the hidden size of the Transformer through all of its layers. Each embedding receives a distinct position encoding of the same dimension D. The input for the transformer encoder is the resultant vector sum. The input sequence is processed by a series of identical encoder layers. Layer Normalization is applied to each input. The relationships between feature vectors are determined by the Multi-head Self-Attention layer (MSA). Vaswani et al. [2] found that by using multiple heads instead of just one, the model can jointly attend to information from different representation subspaces at different positions. The Multilayer Perceptron (MLP) is used to extract features. Its input is also normalized. The MLP is composed of two layers that are activated by GELU [28]. A residual connection is made between the LN output and the MSA/MLP output.
Fig. 1. We insert the input image into the CNN layers, then divide the output into fixed-size patches, linearly combine them, add position embeddings, and feed the resulting vector sequence to the standard transformer encoder, then into the MLP block, finally we get our input image class. The illustration is inspired by [3].
3.2 Transformer Encoder The transformer encoder [2] consists of 4 components: multiheaded self-attention, multilayer perceptron (MLP) blocks, normalization layer, and residual connections [29, 30] (Fig. 2).
Vehicle Image Classification Method Using Vision Transformer
225
Fig. 2. The transformer encoder (figure credit [3])
3.3 Attention Mechanism The attention mechanism describes the weighted average of (sequence) elements, where the weights are dynamically computed using the keys of the elements and an input query. The objective is to average out the characteristics of various elements. Instead of giving each element equal weight, we want to give them different weights based on their actual values. In other words, we want the ability to dynamically select which inputs we want to "attend" to more than others. To be more specific, we must define the following four components of the attention mechanism: the query, the keys, the values, and the score function. The result is calculated as a weighted sum of the values, with each value’s weight determined by how well the query matches its corresponding key. Our objective is to develop an attention mechanism that allows any element in a sequence to pay attention to any other element while maintaining computation efficiency. The fundamental idea behind that is the scaled dot product attention. 3.4 Scaled Dot Product Having an attention mechanism that allows any element in a sequence to pay attention to any other while still being computationally efficient is what we aim to achieve. The central idea of this mechanism is the scaled dot product attention, it takes as input a set of queries Q ∈ RT∗dk , keysK ∈ RT∗dk , and values V ∈ RT∗dv , where T is the sequence length, and dk and dv are the hidden dimensionality for queries/keys and values respectively. As follows, we calculate the dot product attention (Fig. 3): QKT V Att (Q,K,V) = SoftMax √ dk
3.5 Multi-Head Attention A network can monitor a sequence with the help of the scaled dot product attention. However, a sequence element frequently wants to focus on several distinct issues, so a single
226
Y. Taki and E. Zemmouri
Fig. 3. Scaled dot product (figure credit [2])
weighted average is not the best solution for it. For this reason, we expand the attention mechanisms to include multiple heads, i.e., multiple different query-key-value triplets on the same features. Essentially, given a query, key, and value matrix, we transform those into h sub-queries, sub-keys, and sub-values, which we independently pass through the scaled dot product attention. Following that, we concatenate the heads and combine them with a final weight matrix. This operation can be expressed mathematically as: Multihead = Q, K, V) = head1 , head2 , headh , WO Q
Q
K V where headi = A tt (QWi , KWiK , VWiV ), Wi...h ∈ RD∗dk , Wi...h ∈ RD∗dk , Wi...h ∈ D∗d O h∗d ∗d v out k R , and W ∈ R (D is the input dimensions) (Fig. 4).
Fig. 4. Multi-head attention layer (figure credit [2])
Unlike CNN-extracted features, Transformer models do not have image-specific inductive biases and process the image as a sequence of patches, allowing the model to pay different attention to either the whole image or the independent patches [31].
Vehicle Image Classification Method Using Vision Transformer
227
4 Experiments and Results 4.1 Dataset This dataset contains 4800 tiny and low-resolution vehicle images. The vehicles in the images are grouped in six classes: Bike, Car, Juggernaut, Minibus, Pickup, and Truck. For each class, there are 800 vehicle images with 100 × 100 pixels and 96 dpi resolution [33]. Figure 5 depict a sample of images from this dataset.
Fig. 5. Classes of vehicles: (a) bike, (b) car, (c) juggernaut, (d) minibus, (e) pickup, and (f) truck.
4.2 Training The dataset prepared for the network was divided into train, validation, and test sets prior to the experiments. The train set was used to train the model, the validation was used for optimizing the model’s parameters and the test sets were used to assess model forecasting performance on never-before-seen data. To be more specific, the validation set was used to tune the network parameters values (weights and biases), whereas the test set was used to see how well the trained model could generalize its results to new data. In the experiments, the test and validation set each contained 480 vehicle images (10% of the total), while the train set contained the remaining 3840 vehicle images (80%). 4.3 Results and Discussion In the first part, the model we explained in Sect. 2 was trained from scratch and tested on the created dataset [32], in order to assess their effectiveness. To train the models, an AdamW optimizer was used. This is because more stable training performance was achieved in initial experiments when compared to SGD and Adam optimizers. Our model achieved an accuracy of 94.7% and a loss of 27.3%.
228
Y. Taki and E. Zemmouri
In the second part, we use a pre-trained model on the ImageNet-21k dataset, and finetune it to our new dataset, this model achieved the state of art results with an accuracy of over 99.3%. Table 1 below summarizes a comparison of our model with previous architectures used for the same problem using the same dataset. Table 1. Comparison of the test accuracy and loss.
Pretrained models
From scratch
Model
Accuracy (%)
Loss (%)
paper
VGG16 Pre-trained
96
24.7
[33]
VGG16 Fine-tuning Pre-trained
99.2
7.7
[33]
ViT-Base-in21k
99.3
3.07
our
CNN
92
30.3
[3]
ViT
94.7
27.3
our
The results listed in Table 1 demonstrate that our Vision Transformer-based models achieve good accuracy and outperform existing models in this problem of vehicle classification. Our model is superior to the basic CNN model, and our pre-trained model is better than the pre-trained CNN models. Note that we used the model pre-trained on the small ImageNet 21k dataset. Because the ViT model was pre-trained on the large JFT-300M data set too and got more effective results.
5 Conclusion In this paper we addressed the problem of classifying vehicles in low-resolution images collected by a standard camera. For this, we proposed a Vision Transformer based classification model. According to the test results, our model achieves good accuracy although trained on a very small data set. Vision Transformer models need of big dataset to get better results and beat the state-of-the-art CNN models like VGG, ResNet, and others. We note also that we use just the base pre-trained model, which pre-trained on the small ImageNet-21k dataset, unlike the original paper which achieved the state-of-theart results by pre-training the ViT model using the JFT-300M dataset, then fine-tuning it on the targeted dataset. This paper is also an introduction of Vision Transformer to the field of intelligent transport systems. Inspired by the results of this approach in computer vision, encouraged us to make use of it in ITS. As future work, we will improve ViT model by adding some convolution layer in the Transformer encoder to help the attention mechanism to focus on important features and deal well with image scaling and distortion.
References 1. Taha Mebtoul (July 29, 2020, 12:44 p.m.). Morocco Records 199 Traffic Accidents on July 26 and 27. https://www.moroccoworldnews.com/2020/07/312891/morocco-records-199-tra ffic-accidents-on-july-26-and-27
Vehicle Image Classification Method Using Vision Transformer
229
2. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) 3. Dosovitskiy, A., et al.: An image is worth 16 × 16 words: Transformers for image recognition at scale. ICLR (2021) 4. Korea Traffic Accident Analysis Systems. http://taas.koroad.or.kr/. Accessed 10 Sept 2020 5. Bangladesh: Alarming rise in road crashes. https://www.aa.com.tr/en/asia-pacific/bangla desh-alarming-rise-in-road-crashes/1692643. Accessed 25 May 2020 6. Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social- STGCNN: a social spatiotemporal graph convolutional neural network for human trajectory prediction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14 424–14 433 (2020) 7. Taki, Y., Zemmouri, E.: An overview of real-time traffic sign detection and classification. In: Proceedings of the 5th International Conference on Smart City Applications, October 7–9, 2020, Safranbolu, Türkiye (2020). https://doi.org/10.1007/978-3-030-66840-2_26 8. Tas, S., Sari, O., Dalveren, Y., Pazar, S., Kara, A., Derawi, M.: A Dataset Containing Tiny and Low-Quality Images for Vehicle Classification. Zenodo (2022) 9. de Matos, F.M.S., de Souza, R.M.C.R.: hierarchical classification of vehicle images using Nn with conditional adaptive distance. In: Lee, M., Hirose, A., Hou, Z.-G., Kil, R.M. (eds.) ICONIP 2013. LNCS, vol. 8227, pp. 745–752. Springer, Heidelberg (2013). https://doi.org/ 10.1007/978-3-642-42042-9_92 10. Yang, Y.: Realization of vehicle classification system based on deep learning. In: Proceedings of the 2020 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), pp. 308–311. Shenyang, China, 28–30 July 2020 11. Ng, L.T., Suandi, S.A., Teoh, S.S.: Vehicle classification using visual background extractor and multi-class support vector machines. In: Mat Sakim, H., Mustaffa, M. (eds.) The 8th International Conference on Robotic, Vision, Signal Processing & Power Applications. LNEE, vol. 291. Springer, Singapore. https://doi.org/10.1007/978-981-4585-42-2_26 12. Chen, Y., Qin, G.: Video-based vehicle detection and classification in challenging scenarios. Int. J. Smart Sens. Intell. Syst. 7, 1077–1094 (2014) 13. Wen, X., Shao, L., Xue, Y., Fang, W.: A rapid learning algorithm for vehicle classification. Inf. Sci. 295, 395–406 (2015) 14. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) 15. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014) 16. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) 17. Dong, Z., Wu, Y., Pei, M., Jia, Y.: Vehicle type classification using a semisupervised convolutional neural network. IEEE Trans. Intell. Transp. Syst. 16, 2247–2256 (2015) 18. Cao, J., Wang, W., Wang, X., Li, C., Tang, J.: End-to-End view-aware vehicle classification via progressive CNN learning. In: Yang, J., et al. (eds.) CCCV 2017. CCIS, vol. 771, pp. 729–737. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-7299-4_61 19. Hicham, B., Ahmed, A., Mohammed, M.: Vehicle type classification using convolutional neural network. In: Proceedings of the 2018 IEEE 5th International Congress on Information Science and Technology (CiSt), pp. 313–316. Marrakech, Morocco, 21–27 October 2018 20. Jo, S.Y., Ahn, N., Lee, Y., Kang, S.-J.: Transfer learning-based vehicle classification. In: Proceedings of the 2018 International SoC Design Conference (ISOCC), pp. 127–128. Daegu, Korea, 12–15 November 2018
230
Y. Taki and E. Zemmouri
21. Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Vision-based occlusion handling and vehicle classification for traffic surveillance systems. IEEE Intell. Transp. Syst. Mag. 10, 80–92 (2018) 22. Cai, J., Deng, J., Khokhar, M.S., Aftab, M.U.: Vehicle classification based on deep convolutional neural networks model for traffic surveillance systems. In: Proceedings of the 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 224–227. Chengdu, China, 14–16 December 2018 23. Maungmai, W., Nuthong, C.: Vehicle classification with deep learning. In: Proceedings of the 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), pp. 294–298. Singapore, 23–5 February 2019 24. Wang, X., Zhang, W., Wu, X., Xiao, L., Qian, Y., Fang, Z.: Real-time vehicle type classification with deep convolutional neural networks. J. Real-Time Image Process. 16, 5–14 (2019) 25. Mittal, U., Potnuru, R., Chawla, P.: Vehicle detection and classification using improved faster region based convolution neural network. In: Proceedings of the 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), pp. 511–514. Noida, India, 4–5 June 2020 26. Chauhan, M.S., Singh, A., Khemka, M.; Prateek, A., Sen, R.: Embedded CNN based vehicle classification and counting in non-laned road traffic. In: Proceedings of the 10th International Conference on Information and Communication Technologies and Development, pp. 1–11. Ahmedabad, India, 4–7 January 2019 27. Hedeya, M.A., Eid, A.H., Abdel-Kader, R.F.: A super-learner ensemble of deep networks for vehicle-type classification. IEEE Access 8, 98266–98280 (2020) 28. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606. 08415 (2016) 29. Wang, Q., et al.: Learning deep transformer models for machine translation. In: ACL (2019) 30. Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. In: ICLR (2019) 31. Li, M., Lv, T., Cui, L., Lu, Y.: TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. arXiv:2109.10282v3[cs.CL] 25 Sep 32. Tas, S., Sari, O., Dalveren, Y., Pazar, S., Kara, A., Derawi, M.: A Dataset Containing Tiny and Low Quality Images for Vehicle Classification. https://doi.org/10.5281/zenodo.6634554 33. Tas, S., Sari, O., Dalveren, Y., Pazar, S., Kara, A., Derawi, M.: Deep learning-based vehicle classification for low quality images. Sensors 22, 4740 (2022). https://doi.org/10.3390/s22 134740
A Review of Variational Inference for Bayesian Neural Network Bakhouya Mostafa1(B) , Ramchoun Hassan1,2 , Hadda Mohammed1 , and Masrour Tawfik1 1
2
National School of Arts and Crafts, My Ismail University, Meknes, Morocco [email protected], [email protected] Laboratory of Mathematical Modeling, Simulation and Smart Systems, National School of Business and Management, My Ismail University, Meknes, Morocco
Abstract. Deep learning has made significant progress in artificial intelligence, providing new solutions to a variety of previously challenging problems. However, standard deep learning algorithms only offer point estimates of the model, which fail to express the model uncertainty, leading to overconfident decisions. To address these issues, the Bayesian approach has been adopted in deep learning, which provides a probabilistic interpretation of deep learning models. However, inferring the Bayesian posterior is frequently challenging. Therefore, approximations of the true posterior with another simple approximate distribution are often employed; this technique is known as the variational inference method. This paper presents an overview of Bayesian Neural Networks (BNNs) using current variational inference methods and attempts to explore the tools necessary to create, apply, train, and evaluate neural networks in a Bayesian framework.
Keywords: Bayesian Neural Network Uncertainty
1
· Variational Inference ·
Introduction
Deep learning has enabled important advances in artificial intelligence by providing efficient and alternative solutions to several previously challenging issues in machine learning. Deep learning is based on a computational system consisting of a collection of nodes interconnected with each other called artificial neural networks, which enables the machine to process the data entering it by learning from previous examples and experiences like human learning [12]. Deep learning is most commonly used in computer vision (image classification [37,39], object detection [15,34], face recognition, and medical diagnosis), speech recognition [30], natural language processing, as well as in economic transactions such as marketing and financial forecasts. However, deep learning algorithms cannot overcome the overfitting problem [40], especially when there is a lack of data, where the model becomes unable c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 231–243, 2023. https://doi.org/10.1007/978-3-031-43520-1_20
232
B. Mostafa et al.
to generalize the learning to the unseen data. Furthermore, deep learning is unable to quantify the model’s uncertainty; that is, it cannot measure confidence in the model’s predictions because artificial neural networks are deterministic approaches that provide only point estimates for the model’s output, which may result in undesirable results, particularly in delicate tasks where decision-making is critical, such as medical diagnosis [21], and self-driving [33]. To address these issues, the Bayesian method may be used to offer a probabilistic study for deep learning algorithms. In contrast to the frequentist technique, which offers point and deterministic estimates of the model, the Bayesian method treats all components of artificial neural networks as random variables generated from probability distributions, allowing for the measurement of model uncertainty and giving more reliability to the network outputs even in the case of a lack of data [9,13,20,24]. Employing Bayesian statistics in artificial neural networks requires first proposing a prior distribution of the parameters or hidden variables in the network as beliefs or observations on the model before seeing the data by experts on the topic to be solved. After collecting the data and defining the network architecture of the model using specific parameters, we determine the conditional probability of the data given the network architecture, known as the likelihood, which expresses the cost function of the model. The Bayesian neural networks are trained by finding the posterior distribution of the parameters after seeing the data based on the prior distribution and the likelihood through the Bayes theorem [16,19,28,41]. Bayesian neural networks can be considered an inverse problem in which we search for the distribution of parameters given the data, which is solved by Bayes’ theorem, unlike standard neural networks, in which we search for the distribution of data given the parameters known as the likelihood. At test time, Bayesian neural networks determine the model’s predictions for unseen data by the predictive distribution determined by computing the expectation of the network outcomes over the posterior distribution [16,42]. Bayesian statistics provides a probabilistic approach that enables the detection of two forms of uncertainty in neural networks: the first emerges from the data or the likelihood, called the aleatoric uncertainty, and the second comes from the model or the posterior distribution, called the epistemic uncertainty, by computing the variance over the posterior distribution of the predictive distribution [9]. However, the analytical computation of the posterior distribution of neural networks using Bayes’ theorem is problematic due to the high dimensionality of these networks. To address this issue, many estimate approaches have been adopted to approximate the posterior distribution, the most prominent of which are variational inference [5] and Markov Chain Monte Carlo (MCMC) sampling [1,8,17,31]. In this work, we will try to explain the principle and features of the variational inference method for solving Bayesian inference, and we will present the modern variational methods used to train Bayesian neural networks by show-
A Review of Variational Inference for Bayesian Neural Network
233
ing the features of each of them separately, namely: Bayes by backprop [6], and Monte Carlo dropout [13,14]. The other sections of this paper are arranged as follows. Section 2 provides a quick review of stochastic approximate inference methods, especially MCMC sampling methods. Section 3 presents a brief overview of deterministic approximate inference methods, including Laplace approximation and variational inference. Section 4 explains the modern variational methods used to train Bayesian neural networks and their features and limitations. Finally, Sect. 5 is a summary of the paper.
2
Stochastic Approximate Inference Methods: MCMC Sampling
MCMC sampling is one of the most prominent stochastic approximation methods for estimating intractable integrals or expectations over complicated distributions that model a wide range of issues in physics, mathematical modelling, decision analysis, data science, and Bayesian statistics.. . The main idea of MCMC techniques is to generate a series of randomly selected samples {θ1 , ..., θM } from a specific distribution according to the Markov process; that is, the generation of a sample θi at iteration i is primarily related to the previous sample θi−1 , in order to produce approximate numerical values of intractable expressions. The Metropolis-Hasting technique is one of the most famous MCMC methods [2]. The implementation of Bayesian neural networks using the MetropolisHasting technique is achieved by simulating the posterior distribution of the network parameters, p(θ|D). As there is no analytical expression for the posterior distribution p(θ|D) to be sampled since it contains the evidence term p(D), it is possible to sample the function p(θ, D) = p(θ|D)p(θ), which is proportional to the posterior distribution, without considering the normalization constant. It starts by giving the initial values of the parameters, θ0 and then iterates the process of generating a new sample of parameters, θi using a proposal distribution, q(θi |θi−1 ), which is conditional on the previous sample of parameters, θi−1 , and finally evaluates θi based on a procedure that determines the probability of their acceptance or rejection (see Algorithm 1). At test time, we estimate the predictive distribution by averaging the network outputs using the simulated parameters of the posterior distribution from each iteration, as shown below: p(y ∗ |x∗ , D) = p(y ∗ |x∗ , θ)p(θ|D)dθ T 1 ≈ p(y ∗ |x∗ , θ(t) ) T t=1
where θ(t) ∼ p(θ|D) ∝ p(θ, D), and T is the number of samples.
(1)
234
B. Mostafa et al.
Algorithm 1. Metropolis-Hasting algorithm [35] 1: Select randomly initial value θ0 2: for t=1,...,T do 3: 4:
Draw candidate θ∗ ∼ q(θ∗ |θ(t−1) ) ∗ ,D)q(θ (t−1) |θ ∗ ) α = min p(θp(θ (t−1) ,D)q(θ ∗ |θ (t−1) ) , 1
5:
θ
(t)
=
θ∗ with probability α θ(t−1) with probability 1 − α
Despite the solutions provided by the Metropolis-Hasting technique in many problems, it cannot control the random walk behaviour caused by the sampling criterion, leading to slow convergence to the desired distribution, especially for large-scale models. Fortunately, the Hamilton Monte Carlo (HMC) algorithm has been proposed as an improved method of the Metropolis-Hasting algorithm for reducing random walk behaviour using Hamiltonian dynamics [3,11,31]. The HMC method uses gradient information to suggest states away from the current state for parameter sampling with a high probability of acceptance. The HMC algorithm starts with a specified initial value for the parameters θ and a momentum variable M that sets a direction for moving in the parameter space. The current value of the parameters and momentum is updated using the “leapfrog” method, which provides a numerical trajectory of θ and M according to Hamiltonian dynamics. Then, a Metropolis acceptance step is applied. However, the HMC method requires a set of hyperparameters to be defined manually, which can lead to unsatisfactory results if these hyperparameters are inappropriately selected, especially for large-scale models.
3 3.1
Deterministic Approximate Inference Methods Laplace Approximation
The typical Laplace approximation is one of the most fundamental estimation methods in Bayesian inference that uses the Gaussian distribution to estimate the posterior distribution by computing the Taylor-squared expansion of the log-likelihood about the mode. To apply the Laplace approximation to neural networks, first determine the maximum a posteriori estimator that achieves the following equation: ωM AP = argmaxω∈Ω p(D|ω)p(ω) = argmaxω∈Ω p(D, ω)
(2)
where ω are the network parameters, p(D|ω) is the likelihood distribution, and p(ω) is the prior distribution.
A Review of Variational Inference for Bayesian Neural Network
235
Using Bayes’ theorem, the posterior distribution of network parameters is represented as: p(D, ω) p(D|ω)p(ω) = p(ω|D) = (3) p(D) p(D, ω)dω where p(D) is the evidence (the normalization constant) and is often difficult to calculate. Then we apply a second-order Taylor expansion to log p(D, ω) = log h(ω) around ωM AP , and we get the following equation: log p(D, ω) = log h(ω) ≈ log h(ωM AP ) + ∇ log h(ωM AP )(ω − ωM AP ) 1 + (ω − ωM AP )T ∇2 log h(ωM AP )(ω − ωM AP ) 2 1 = log h(ωM AP ) − (ω − ωM AP )T (∇2 − log h(ωM AP ))(ω − ωM AP ) 2 H
(4) With ∇ log h(ωM AP ) = 0 because ωM AP is a local maximum of the function h, which is also a local maximum of the function log h, and ∇2 log h is the Hessian matrix of logh, which is a negative semidefinite matrix since log h is a concave function. Thus, H is a positive semidefinite matrix. Then, we obtain: 1
(5) p(D, ω) = h(ω) ≈ h(ωM AP ) exp − (ω − ωM AP )T H(ω − ωM AP ) 2 As a result, we can approximate the posterior distribution to a Gaussian distribution with a mean equal to the maximum a posteriori estimator ωM AP and a covariance matrix representing the inverse of the Hessian matrix of the negative log-likelihood around the mean, as shown below:
h(ωM AP ) exp − 12 (ω − ωM AP )T H(ω − ωM AP )
(6) p(ω|D) ≈ h(ωM AP ) exp − 12 (ω − ωM AP )T H(ω − ωM AP ) dω 1
H1/2 exp − (ω − ωM AP )T H(ω − ωM AP ) = N (ω|ωM AP , H−1 ) D/2 2 (2π) where D is the dimension of ωM AP , and H is the determinant of H. At test time, the predictive distribution for unseen data x∗ is estimated as follows: p(y ∗ |x∗ , ω)p(ω|D)dω ≈ p(y ∗ |x∗ , ω)N (ω|ωM AP , H−1 )dω p(y ∗ |x∗ , D) = p(ω|D) ≈
Ω
Ω
(7) We conclude that the Laplace approximation tries to estimate the posterior distribution with a Gaussian distribution centred on a single mode, which makes it difficult to determine the closest distribution to the posterior distribution in the case of multi-modes since it will produce several different estimates [4]. Furthermore, computing the inverse of the Hessian matrix of the negative loglikelihood H is complicated, especially for deep neural networks.
236
3.2
B. Mostafa et al.
Variational Inference
Bayesian neural networks are a form of deep learning algorithm trained using Bayesian statistics, which give a stochastic character to the network components. Implementing the BNN, we first put the prior distribution p(ω) on all possible parameters before seeing the data. After collecting the data, we define the likelihood p(y|x, ω) that models the data given a specific network architecture. Then the BNN is trained by computing the posterior distribution of the parameters given the observed data using Bayes’ theorem, as illustrated below: p(ω|x, y) =
p(y|x, ω)p(ω) p(y|x, ω)p(ω) = p(D) p(y|x, ω )p(ω )dω ω
(8)
where p(D) is the normalization constant of the posterior distribution, also referred to the evidence. The prediction of unseen data is modelled by a probability distribution known as the predictive distribution obtained by computing the expectation of the model outputs over the posterior distribution [16,42], as illustrated below: (9) p(y ∗ |x∗ , D) = p(y ∗ |x∗ , ω)p(ω|D)dω = Ep(ω|x,y) [p(y ∗ |x∗ , ω)] As previously stated, uncertainty is a crucial feature of probabilistic approaches, which provide a solid framework for evaluating the model outcomes [29]. Uncertainty is a measure of confidence in the outputs of the neural network to address a particular problem. It provides more robustness and reliability to the model than deterministic approaches that only provide point estimates of the model without considering its credibility [25,32]. Uncertainty in neural networks appears in two forms: the aleatoric uncertainty caused by the noise generated by the observed data set; it is derived from the likelihood; and the epistemic uncertainty that results from the training model of the network; it is extracted from the posterior distribution [9,10,20]. Mathematically, the uncertainty is defined as the variance over the posterior distribution of the predictive distribution [20,27], as shown below: T
σ = Vp(ω|D) [p(y ∗ |x∗ , D)] = Ep(ω|D) [y ∗ y ∗ ] − Ep(ω|D) [y ∗ ]Ep(ω|D) [y ∗ ]T
(10)
For practical applications, the posterior distribution is intractable using Bayes’ theorem owing to the difficulty of analytically computing the integration of the evidence over all possible parameters ω p(y|x, ω)p(ω)dω, especially for neural networks that require large numbers of parameters [18]. To get around this problem, we adopt approximate methods for the posterior distribution, one of the most significant of which is the variational inference method [5,43]. Instead of calculating the integration of the evidence in Bayesian inference, the variational inference method uses an optimization procedure to estimate the posterior distribution p(ω|D) with another approximate distribution qθ (ω) known as the variational distribution parameterized by θ, known as the variational parameters. The goal of the variational method is to seek a set of parameters θ that render the variational distribution as close as possible to the posterior
A Review of Variational Inference for Bayesian Neural Network
237
distribution using a metric that measures the similarity between two probability distributions, known as the Kullback-Leibler divergence [26], which can be written as follows: q (ω)
θ dω (11) KL(qθ (ω)||p(ω|x, y)) = qθ (ω) log p(ω|x, y) The computation of KL divergence defined in Eq. (11) is still not possible because it always includes the posterior distribution term. However, we can derive an equivalent expression for the KL divergence that shifts the posterior distribution term in Eq. 5, known as the Evidence Lower BOund (ELBO) function, expressed as follows: q (Θ)
θ dω KL(qθ ||p) = log p(D) − qθ (ω) log p(y|x, ω)dω − qθ (ω) log p(ω) = log p(D) − Eqθ (ω) [log p(y|x, ω)] − KL(qθ (ω)||p(ω)) = log p(D) − L(θ) (12) where L(θ) = Eqθ (ω) [log p(y|x, ω)] − KL(qθ (ω)||p(ω)) represents the ELBO function. As log p(D) is a constant with respect to the variational parameters θ, it will not be optimized. Therefore, minimizing the KL divergence is equivalent to maximizing the ELBO function. Maximizing the ELBO function requires maximizing the expectation of the log-likelihood Eqθ (ω) [log p(y|x, ω)], which pushes the variational distribution to fit the data well. It also requires minimizing the KL divergence between the variational distribution and the prior distribution KL(qθ (ω)||p(ω)), which prompts the variational distribution to be close to the prior. At test time, the variational method estimates the predictive distribution as follows: p(y ∗ |x∗ , x, y) ≈ q(y ∗ |x∗ ) = p(y ∗ |x∗ , ω)qθˆ(ω)dω = Eqθˆ(ω) [p(y ∗ |x∗ , ω)] (13) where θˆ represents the optimal variational parameters.
4 4.1
Modern Variational Methods for Bayesian Neural Networks Bayes by Backprop
The Bayes by backprop method is one of the most famous variational inference approaches adapted to deep neural networks, proposed by Blundell et al. [6]. Due to the difficulty of computing the gradient of the cost function with stochastic parameters for probabilistic models, the Bayes by backprop approach was developed to address the backpropagation problem for training Bayesian neural networks employing the reparameterization trick of the parameters [22,23]. The
238
B. Mostafa et al.
reparameterization trick removes the randomness from the network parameters ω ∼ qθ (ω) by transforming them as a deterministic and differentiable function g of the variational parameters θ and a new random variable independent of the other variables follow a non-parameteric distribution ∼ p() such that ω = g(θ, ). As a result, the parameters are no longer stochastic variables, allowing the backpropagation algorithm to operate normally, similar to standard neural networks (see Algorithm 2). As a result, Blundell considered that if qθ (ω)dω = p()d, we can compute the derivative of the expectation over the variational parameters φ of an arbitrary differentiable function l(ω, θ) as follows: ∂ ∂ Eqθ (ω) l(ω, θ) = l(ω, θ)qθ (ω)dω ∂θ ∂θ ∂ = l(g(θ, ), θ)p()d ∂θ ∂ (14) l(g(θ, ), θ)p()d = ∂θ ∂l(ω = g(θ, ), θ) ∂g(θ, ) ∂l(ω, θ) = + p()d ∂ω ∂θ ∂θ ∂l(ω, θ) ∂ω ∂l(ω, θ) = Ep() + ∂ω ∂θ ∂θ Since the Bayes by backprop is a variational method, we can apply Eq. (14) to the ELBO function where l(ω, θ) = log qθ (ω) − log p(ω) − log p(y|x, ω). However, the analytical computation of the derivative of the ELBO function remains problematic, particularly for neural networks, due to the difficulty of calculating the integration, which must be approximated by an unbiased estimator using Monte Carlo integration, as illustrated below: M ∂ 1 ∂l(g(θ, (m) ), θ) ∂g(θ, (m) ) ∂l(g(θ, (m) ), θ) L(D, θ) ≈ + ∂θ M m=1 ∂θ ∂θ ∂ω (m)
(15)
where (m) ∼ p(), and l(g(θ, (m) ), θ) = log qθ (g(θ, (m) )) − log p(g(θ, (m) ) − log p(y|x, g(θ, (m) )).
A Review of Variational Inference for Bayesian Neural Network
239
Algorithm 2. Bayes by Backprop Training [6] 1: Initialize η, θ 2: Repeat until convergence 3:
for m = 1, 2, ...., M
4:
sample (m) ∼ p()
5:
ω (m) = g(θ, (m) )
6:
l(ω (m) , θ) = log qθ (ω (m) ) − log p(ω (m) ) − log p(y|x, ω (m) )
7: 8: 9:
∂l(ω (m) , θ) ∂ω (m) ∂l(ω (m) , θ) ∂ l(ω (m) , θ) = + (m) ∂θ ∂θ ∂θ ∂ω M (m) ∂ ˆ 1 ∂ L(ω, θ) = M , θ) m=1 ∂θ l(ω ∂θ ∂ ˆ L(ω, θ) θ ←− θ − η ∂θ
Whereas the Bayes by backprop technique is a variational approach that allows the backpropagation algorithm to operate in Bayesian neural networks, its implementation is more complicated, particularly for large neural networks. For example, if we take a fully factorized Gaussian distribution to estimate the posterior distribution using the Bayes by backprop method, it takes two times of backpropagation, one for means and the other for standard deviations, which is expensive, especially for large architectures [6,36]. 4.2
Monte Carlo Dropout
Before going into the Monte Carlo dropout method proposed by Yarin Gal [13, 14], it is crucial to clarify the dropout technique [38]. Dropout is a stochastic regularization method used to reduce overfitting in deep learning algorithms where the model becomes unable to generalize to new data, especially in cases of a lack of data. The effect of dropout in preventing co-adaptation between the network parameters can be explained by randomly removing a proportion of network units with probability p during training time by integrating a Bernoulli noise to the model inputs [38]. At test time, we employ a single network without 1 . applying dropout and multiply the network weights by 1−p Yarin Gal proved that training any neural network with the dropout technique is equivalent to training a Bayesian neural network using a variational method by transforming the noise generated by the dropout from the model inputs to the network weights, as illustrated below [13]:
240
B. Mostafa et al.
ˆi ∼ Bernoulli(|pi ) fi = σ((fi−1 ˆi )Wi ) = σ((fi−1 .diag(ˆ i )Wi ) i )Wi )) = σ(fi−1 (diag(ˆ = σ(fi−1 ω ˆi) ˆ i = diag(ˆ i )Wi is a stochastic where pi is the dropout rate for layer i, and ω parameter. Note in Eq. (16) that after reparameterizing the network weights, the parameters now contain a random component simulated from the Bernoulli distribution, allowing us to define a probabilistic distribution for the parameters as shown below: (16) qWi (ωi ) = Wi .p(i ) = Wi .Bernoulli(i |pi ) As a result, in this case, the neural network can be trained using a variational method, considering the approximative distribution is a fully factorized Bernoulli distribution multiplied by the network weights; hence, the ELBO function for N an independent data set {x = (xn )N n=1 , y = (yn )n=1 } can be written as: LV I (W ) = − qW (ω) log p(y|x, ω)dω + KL(qW (ω)||p(ω)) = − p() log p(y|x, ω = g(W, ))d + KL(qW (ω)||p(ω)) (17) =−
N
p() log p(yn |xn , ω = g(W, ))d + KL(qW (ω)||p(ω))
n=1
Implementing the last equation becomes more complicated with huge datasets (large N). Instead of performing computations over the entire dataset, we may use data sub-sampling (mini-batch optimization). M N p() log p(yn |xn , ω = g(W, ))d + KL(qW (ω)||p(ω)) LV Isub (W ) = − M n=1 (18) The first integration in Eq. 18 is intractable for practical applications. Therefore, we will approximate this integral with an unbiased estimator using Monte Carlo integration, and we obtain: M N log p(ym |xm , g(W, m )) + KL(qW (ω)||p(ω)) M m=1 (19) with m ∼ p(), for 1 ≤ m ≤ M As a result, Yarin Gal demonstrated the equivalence between the loss function (LM C , Eq. 19) used to train a Bayesian neural network using a variational method and the loss function used to train a neural network using dropout,
LV Isub (W ) ≈ LM C (W ) = −
A Review of Variational Inference for Bayesian Neural Network
241
plus a regularization term (L2 regularization) resulting from computing the KL(qW (ω)||p(ω)) for a specific prior distribution (Gaussian distribution). We conclude that the MC dropout technique is a variational method that is very suitable for deep neural networks, as it is fast and easy to implement and has less complexity compared to other models. However, the inappropriate choice of hyperparameters for the Monte Carlo method (dropout rate p) can reduce the flexibility of this method and lead to unsatisfactory results, especially for quantifying model uncertainty [7].
5
Conclusion
We conclude that Bayesian neural networks are probabilistic approaches that have offered reasonable solutions to a range of problems that deterministic approaches have failed to solve in neural networks, particularly in some critical problems that demand great care in decision-making, where Bayesian inference allows quantifying the model uncertainty, which provides additional and improved information for the network predictions. However, the application of Bayesian inference to neural networks remains a challenge due to the difficulty of determining the posterior distribution of unknown parameters. This paper gives a brief overview of one of the most important approximate methods for Bayesian inference, through an optimisation procedure. This method is known as variational inference. Finally, we have tried to introduce the most important modern variational methods used to train Bayesian neural networks.
References 1. Bardenet, R., Doucet, A., Holmes, C.C.: On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18(47), 1–43 (2017) 2. Beichl, I., Sullivan, F.: The metropolis algorithm. Comput. Sci. Eng. 2(1), 65–69 (2000) 3. Betancourt, M.: A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint arXiv:1701.02434 (2017) 4. Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning, vol. 4. Springer, New York (2006) 5. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017) 6. Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: International Conference on Machine Learning, pp. 1613–1622. PMLR (2015) 7. Chan, A., Alaa, A., Qian, Z., Van Der Schaar, M.: Unlabelled data improves Bayesian uncertainty calibration under covariate shift. In: International Conference on Machine Learning, pp. 1392–1402. PMLR (2020) 8. Chib, S., Greenberg, E.: Understanding the metropolis-hastings algorithm. Am. Stat. 49(4), 327–335 (1995) 9. Depeweg, S., Hernandez-Lobato, J.M., Doshi-Velez, F., Udluft, S.: Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In: International Conference on Machine Learning, pp. 1184–1193. PMLR (2018)
242
B. Mostafa et al.
10. Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? Does it matter? Struct. Saf. 31(2), 105–112 (2009) 11. Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987) 12. Fitch, F.B., Mcculloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943). J. Symbolic Logic 9(2), 49–50 (1944) 13. Gal, Y., Ghahramani, Z.: Bayesian convolutional neural networks with Bernoulli approximate variational inference. arXiv preprint arXiv:1506.02158 (2015) 14. Gal, Y., et al.: Uncertainty in deep learning 15. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) 16. Goan, E., Fookes, C.: Bayesian neural networks: an introduction and survey. In: Mengersen, K.L., Pudlo, P., Robert, C.P. (eds.) Case Studies in Applied Bayesian Data Science. LNM, vol. 2259, pp. 45–87. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-42553-1 3 17. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications (1970) 18. Izmailov, P., Vikram, S., Hoffman, M.D., Wilson, A.G.G.: What are Bayesian neural network posteriors really like? In: International Conference on Machine Learning, pp. 4629–4640. PMLR (2021) 19. Jospin, L.V., Laga, H., Boussaid, F., Buntine, W., Bennamoun, M.: Hands-on Bayesian neural networks-a tutorial for deep learning users. IEEE Comput. Intell. Mag. 17(2), 29–48 (2022) 20. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? arXiv preprint arXiv:1703.04977 (2017) 21. Ker, J., Wang, L., Rao, J., Lim, T.: Deep learning applications in medical image analysis. IEEE Access 6, 9375–9389 (2017) 22. Kingma, D.P., Welling, M.: An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691 (2019) 23. Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Advances in Neural Information Processing Systems, vol. 28 (2015) 24. Kristiadi, A., Hein, M., Hennig, P.: Being Bayesian, even just a bit, fixes overconfidence in ReLu networks. In: International Conference on Machine Learning, pp. 5436–5446. PMLR (2020) 25. Krzywinski, M., Altman, N.: Importance of being uncertain. Nat. Methods 10(9), 809–811 (2013) 26. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951) 27. Kwon, Y., Won, J.H., Kim, B.J., Paik, M.C.: Uncertainty quantification using Bayesian neural networks in classification: application to biomedical image segmentation. Comput. Stat. Data Anal. 142, 106816 (2020) 28. Lampinen, J., Vehtari, A.: Bayesian approach for neural networks-review and case studies. Neural Netw. 14(3), 257–274 (2001) 29. Mitros, J., Mac Namee, B.: On the validity of Bayesian neural networks for uncertainty estimation. arXiv preprint arXiv:1912.01530 (2019) 30. Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2011)
A Review of Variational Inference for Bayesian Neural Network
243
31. Neal, R.M., et al.: MCMC using Hamiltonian dynamics. Handbook Markov Chain Monte Carlo 2(11), 2 (2011) 32. Ovadia, Y., et al.: Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In: Advances in Neural Information Processing Systems, vol. 32 (2019) 33. Rao, Q., Frtunikj, J.: Deep learning for self-driving cars: chances and challenges. In: Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems, pp. 35–38 (2018) 34. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015) 35. Robert, C.P., Casella, G.: The metropolis-hastings algorithm. In: Monte Carlo Statistical Methods, pp. 231–283. Springer, New York (1999). https://doi.org/10. 1007/978-1-4757-4145-2 7 36. Shridhar, K., Laumann, F., Liwicki, M.: A comprehensive guide to Bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731 (2019) 37. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 38. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 39. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 40. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013) 41. Titterington, D.: Bayesian methods for neural networks and related models. Statistical science, pp. 128–139 (2004) 42. Wilson, A.G., Izmailov, P.: Bayesian deep learning and a probabilistic perspective of generalization. Adv. Neural. Inf. Process. Syst. 33, 4697–4708 (2020) 43. Zhang, C., B¨ utepage, J., Kjellstr¨ om, H., Mandt, S.: Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 2008–2026 (2018)
Regression and Machine Learning Modeling Comparative Analysis of Morocco’s Fossil Fuel Energy Forecast Dalal Nasreddin1 , Yasmine Abdellaoui1 , Aymane Cheracher1 , Soumia Aboutaleb1 , Youssef Benmoussa1 , Inass Sabbahi1 , Reda El Makroum1 , Saad Amrani Marrakchi1 , Asmae Khaldoun1 , Aymane El Alami1 , Imad Manssouri2 , and Houssame Limami1(B) 1 Laboratory of Sustainable Energy Materials, Al Akhawayn University, Ifrane, Morocco
[email protected] 2 Laboratory of Mechanics, Mechatronics, and Command, Team of Electrical Energy,
Maintenance and Innovation, ENSAM-Meknes, Moulay Ismail University, Meknes, Morocco
Abstract. Despite the numerous advantages introduced by renewable energy technologies to the global energy market, fossil fuels still hold the major share. Due to their greater reliability, many countries, for instance the Kingdom of Morocco, still depend heavily on fossil fuels. In fact, conventional energy sources account for more than 80% of the Moroccan energy mix. However, energy security is one of the main issues that Morocco is facing. One of the major challenges the country is facing is not having complete control over external factors influencing the consumption and production of fossil fuels. Therefore, its energy security could be described as susceptible to instability. For this purpose, it is important for Morocco to assess future perspectives related to energy security while relating them to the framework of the Nationally Determined Contributions (NDCs). This paper deploys Morocco’s historical data (from 1973 to 2019) by identifying 20 parameters influencing energy consumption and production of both oil and natural gas. Following the collection of data, these parameters were divided into five main categories: Energy consumption by sector, local socio-economic factors, local energy use, global energy trends and national trilemma index. Two forecasting models were developed to predict the energy consumption and production of oil and natural gas in Morocco up to 2040. The two used models consist of a regression analysis via excel and a machine learning model. Obtained findings are expected to inform policy and decision makers of the impact that the investigated factors have on the Moroccan energy sector; leading them to the development of strategies with the goal of strengthening the country’s energy stability. Keywords: Oil · Natural Gas · Fossil Fuel Energy Forecast · Machine Learning Modeling
1 Introduction Energy consumption represents a cornerstone in today’s economies. It is a crucial factor in both urbanization and technological breakthroughs taking place today. Global demand for energy is affected by a multitude of agents and usually undergoes a great deal of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 244–256, 2023. https://doi.org/10.1007/978-3-031-43520-1_21
Regression and Machine Learning Modeling Comparative Analysis
245
fluctuations due to events taking place internationally. For instance, the outbreak of the Covid-19 pandemic in 2019 has drastically impacted energy production and consumption [1]. Countries have been forced to impose the strictest lockdown measures to reduce the infection rate. This led to a rise in energy demand from the residential sector, as opposed to the industrial and commercial sectors, which witnessed a drop in energy consumption compared to 2019 [2]. The renewable energy sector has also been affected mainly by deficiencies in capital investment, supply chain disruption, and a reduced workforce. Nevertheless, it increased by 4.8% in 2020 compared to 2019 [2, 3]. On the other hand, the energy supply from coal and gas-fired power plants was substantially decreased, due primarily to lower energy demand and limited fuel supplies because of the reduction in maritime traffic. Additionally, Oil consumption is expected to have dropped by a record 9.3% in 2020 which resulted in a 5.5% decline in conventional energy shares compared to 2019 [3]. The energy situation differs from one country to another. Taking the example of a developing country, Morocco has little to no production of conventional energy sources which leaves it no choice but to rely on imports to meet its increasing energy demand. In 2018, Morocco’s energy supply represents 59% of oil, 4% of gas, 25% of coal, and 11% of renewable sources [4]. Despite the low use of renewables, Morocco has launched many projects aiming to go green by installing 2093 MW in wind, 2672 MW in solar, and 1770 MW in Hydro [5]. This initiative was done to achieve the goals set for the 2020 strategy that was agreed on in 2009 in order to have a more sustainable energy mix. With the current use of conventional sources, coal is still the most used source to produce electricity with 38% of the electricity mix in 2019 [6]. However, Morocco had been producing coal until 2006 when it started importing it instead which resulted in coal becoming an irrelevant factor in Morocco’s energy situation [7]. On the other hand, the ministry of energy and mines declared, in 2021, a new natural gas roadmap that will be used for industrial needs and later on for residences [6]. Natural gas represented 4% of the energy supply in 2018 [4], and its production increased from 2400 TJ in 1973 to 3100 TJ in 2019 [7], yet 93% of the energy consumption share is still being imported [8]. In 2018, oil contributed 59% to the energy supply in Morocco with a consumption increase from 84300 TJ in 1973 to 523677 TJ in 2019 [7]. So far, oil reserves have been found in Timhadit, Tarfaya, and Tanger with an amount of 57 billion barrels which led to many agreements arranged with foreign companies [9]. This emphasizes that Morocco is only selling raw oil instead of producing it ever since the refining company Samir closed in 2015 hence the decrease in oil production [10]. Machine learning (ML) and artificial intelligence (AI) have become widely used in many fields. Multi-agent system (MAS), a subset of AI, has been used along with AI to solve many maintenance and production issues in the Oil & Gas Industry (OGI). Several similar studies were conducted in relation to forecasting models used to predict energy demand, consumption, and production both globally and within a country [11]. Machine learning is a great tool that can be used in many fields including energy finance and economics. It is adapted to process data, forecast the prices of different energy sources, predict the demand, analyze energy trends, anticipate financial risks, and help plan a trading methodology. According to an analysis conducted by Ghoddusi et al. [12], the most used techniques for this purpose are genetic algorithms, artificial neutral network,
246
D. Nasreddin et al.
and support vector machines. Xiong et al. [13] conducted research to forecast China’s energy consumption and production from 2013 to 2017. The goal was to gather data to create effective energy policies as well as energy consumption plans for the country. A novel gray model was optimized by testing its prediction accuracy using previous data on consumption and production. The model reflected a respective 36.89% and 33.25% energy consumption and the production increase. Chavez et al. [14] conducted forecasting of energy production and consumption of the Asturias region in northern Space. The Univariate Box-Jenkins time-series analyses (ARIMA models) were used for prediction purposes based on historical data from 1980 to 1996. The years 1997–98 were predicted to design an energy plan for the region during that period. The results found that electric energy consumption will increase, black coal production will decrease, anthracite production will increase and finally electricity production in coal power stations will increase. Smith et al. [15] conducted a global assessment and forecast of fossil fuel consumption for the years 2020–21 due to the impact of COVID-19. A global vector autoregressive model (GVAR) was performed using 1984–2019 data. Three scenarios were investigated, the first being a forecast of the consumption with no effect of COVID19, the second an impact of one outbreak of the virus, and the third an impact of two outbreaks or waves of the virus. Scenario 1 sees a steady increase in the global consumption of energy, scenario 2 sees a peak in the consumption following the wave and scenario 3 sees several peaks in the consumption due to having more than one wave of COVID-19. Raza et al. [16] forecasted energy demand and production in Pakistan up to 2030 to predict the behavior of different energy resources, such as coal, natural gas, and solar energy using MATLAB and LEAP (Long-range Energy Alternative Planning). Pakistan’s energy demand forecast was found to be 399 TWh under the baseline scenario and 312 TWh following the energy conservation scenario, while the power generation potential is 500.041 TWh. Following the same trend, Nafila et al. [17] have developed three prediction methods: Exponential Smoothing, ARIMA, and Temporal Causation Modeling. The coefficient of determination was used to assess the fit accuracy. Obtained results showed that only the Temporal Causal Modeling model is valid in the Moroccan context. Dritsaki et al. [18] have forecasted and estimated oil demand in Greece using the Box-Jenkins technique. The study has covered the 1960–2020 timespan, concluding that ARIMA (1,1,1) is the best model to forecast the oil demand. Findings also revealed that oil consumption is expected to decrease in the upcoming years due to the outbreak of the coronavirus pandemic as well as the country’s policies to substitute oil with renewable energies. Following Liu et al. (2019), forecasting primary energy consumption in China reflected a fluctuation between 2954.04 Mtoe and 5618.67 Mtoe in 2021 [19]. In recent years, the gated recurrent has been successfully applied to handwriting recognition, human motion identification and robot control, etc., but it is rarely applied in the field of economic forecasting. Liu et al. selected three energy consumption forecasting models: multivariable linear regression (MLR), support vector regression (SVR) and Gated Recurrent Unit (GRU). By comparing these three models, they verified the superiority of the GRU modeling the simulation of energy consumption from 2008 to 2015 in China. Then, designed various scenarios to forecast Chinese primary energy consumption from 2015 to 2021. The results may help governments to develop a reasonable energy plan.
Regression and Machine Learning Modeling Comparative Analysis
247
The natural gas consumption in China is forecasted for the next five years via a discrete Gated recurrent prediction [20]. The grey model is also used in to forecast China’s crude oil consumption and synthesize suggestions and recommendations for China’s oil consumption planning [21]. When it comes to the oil and gas industry (OGI), the main challenges are due to the amount of data that must be dealt with, the unpredictable nature of global energy statuses over time as well the complex correlation of known factors affecting the industry [11]. The main research gaps this paper is focusing on is designing a comprehensive forecasting model considering different urban, socio-economic, technical, and political aspects affecting the Moroccan energy status. This will provide an energy crisis management tool to first predict, then prevent any energy mismanagement occurrences. For this purpose, this paper provides a comparative forecasting analysis of both natural gas and oil in Morocco as a function of multiple factors; via an excel-based Multiple Linear Regression (MLR) and a Machine Learning (ML) analysis.
2 Methodology The research methodology adopted for this paper is defined into two main steps. The first step is a wide data gathering of multiple factors structured into five main categories. The collected data, as presented in Appendix 1, consisted of factors directly impacting the consumption and production of oil and natural gas in Morocco. The developed categories provide a comprehensive investigation of parameters in different urban, social, financial, and political fields. Figure 1 illustrates the exploited historical data, between 1973 to 2019, in this study. The second step is a collection of data which was exploited to run two forecasting models of oil and natural gas consumption and production for the next 15 years. The main objective of the two developed models is to assess several factors alongside their trend over the years, between 1973 and 2019, and determine which factors have the most impact on the national Moroccan oil and natural gas consumption and production. The two used models consist of a Machine Learning Model as well as a Linear Regression one. Developing multiple models will provide a better evaluation of the validity of the predicted data as well as demonstrate the degree of impact of each factor on the Moroccan energy consumption and production trend. The first investigated model is Linear regression performed to determine the relationship between two or more correlated variables for prediction purposes [22]. This analysis will be performed through excel. The second analyzed model will deploy machine learning techniques, via Python as its main programming language, to train a model through large-scale observations by providing a detailed dataset and foreseeing the future trendline, aiming for factual forecasting [23]. The results of the models will be used to draw recommendations regarding the energy situation in Morocco for the foreseeable future by identifying the most crucial factors impacting local energy consumption and production. This will provide information as to which areas need more focus to progress towards energy independence and sustainability.
248
D. Nasreddin et al.
Fig. 1. Main factors impacting oil and natural gas production
3 Results and Discussion In this section, the results of the Machine Learning Modelling and the Multiple Linear Regression will be presented and discussed. Once both models’ results are presented, first a comparison will be made between each model. Then, a comparison will be made between the models developed in this paper and similar published research. The comparisons will include accuracy rates and trendlines. 3.1 Machine Learning Model The machine learning algorithm used in this paper relies on linear regression. Decision tree ML and logistic regression ML have been tested as well, however; linear regression proved to be the most accurate. The algorithm, built in python, is initiated by importing data related to factors impacting energy consumption and production. Needed libraries are then imported to the program as well. Training and testing of the data is one of the most crucial sections of prediction. The process through which it happens is to mask a
Regression and Machine Learning Modeling Comparative Analysis
249
time series of the data and have the model attempt to predict. It then becomes a recursive process where parameters are modified between iterations with the goal of reaching the highest accuracy possible. Once a satisfactory model is achieved, the model can be run over the desired period of prediction. A previous study conducted in [24] evaluated a long short-term memory network for load forecasting for differing periods ranging from 1 day to 30 days. The accuracy for the longest period of 30 days was 90.25%. Consumption of ten days was used in [25] to predict the load of the following day with an accuracy of 94.65%. Following the same trend, authors in [26] relied on convolutional neural networks to forecast energy consumptions with an accuracy of 95.28%.
2027
2025
2021
2023
2013
2011
2007
2009
1997
1999
1995
1993
1983
1985
1981
1979
CONSUMPTION/PRODUCTION
Natural Gas forecast 4000 3500 3000 2500 2000 1500 1000 500 0
YEARS Consumpon
Producon
Fig. 2. Natural gas forecast
1800 1600 1400 1200 1000 800 600 400 200 0
500000 400000 300000 200000 100000
YEARS Producon
Consumpon
Fig. 3. Oil forecast
2023 2025
2017 2019
2011 2013
2005
1997 1999
1991 1993
1985 1987
1979
0
CONSUMPTION (TJ)
600000
1973
PRODUCTION (TJ)
Oil forecast
250
D. Nasreddin et al.
The machine learning model for natural gas in Fig. 2 follows a decreasing linear trend for both production and consumption starting from the year 2020 as seen in Fig. 1. The equations that the production PG (T ) and consumption CG (T ) follows are respectively: PG (T ) = −4.0823(T − 1972) + 2271.3 CG (T ) = −6.4841(T − 1972) + 2299.8 where T is the year ranging from 1973 to 2029. Because the ML model took into account many socio-economic factors, it’s only natural that the gas consumption and production are decreasing with the years due to the ongoing plan of energy transition in Morocco. It is planned that the usage of fossil fuels will only be as a backup for the already existing renewable energy power plants. It’s worth mentioning that the model has presented a forecasting accuracy of 80.4% for consumption and 79.115% for production because many other factors that affect them were not considered in the simulation. However, the forecasting results for the oil consumption and production in Fig. 3 do not resemble the pattern that the natural gas forecasting follows. The equations that the production PO (T ) and consumption CO (T ) follows are respectively: PO (T ) = −13.844(T − 1972) + 861.59 CO (T ) = 8023.7(T − 1972) + 45670 Unlike natural gas, there is a high discrepancy between the production and consumption of oil, hence the necessity of a secondary axis in Fig. 3. Oil consumption has drastically decreased in 2020 due to the covid 19 pandemic which limited the use of transportation means that use oil as a fuel. Moreover, the consumption has shown a linear increase after 2020 because the forecasting for transportation in the socio-economic factors has also increased. The oil production is decreasing linearly as opposed to the consumption. It’s worth mentioning that the model has presented a forecasting accuracy of 99.5% for consumption and 26.5% for production because many other factors that affect them were not considered in the simulation. 3.2 Multiple Linear Regression Regression analysis is a statistical method analysing the relationships between dependent and independent variables [27]. The aim of this study is to conduct a multiple regression analysis of four depending variable scenarios: Oil Production, Oil Consumption, Natural Gas Production, and Natural Gas Consumption. Before conducting a regression analysis, a set of requirements needs to be met to check the validity of the study, as explained in Table 1.
Regression and Machine Learning Modeling Comparative Analysis
251
Table 1. Requirements for a valid regression analysis. Requirement
Condition
1
Collected data has an acceptable sample size
2
All variables should have a fluctuating behaviour (non-constant data)
3
Scale type of the dependent variable
4
No collinearity between the studied independent variables
Table 2. Calculated R2 and VIF for the different studied factors. Factors
R2
VIF
Transportation
0.999
1000.00
Industry
0.980
50.00
Residential
0.957
23.26
Commercial
0.997
333.33
Population Growth
0.961
25.64
Degree of Urbanization
0.990
100.00
GDP
0.999
1000.00
Installed Capacity of RE
0.999
1000.00
Reserves (Oil)
0.999
1000.00
Oil Products and Imports
0.998
500.00
Population growth (Global)
0.991
111.11
GDP (Global)
0.998
500.00
Oil Prices
0.943
17.54
Energy Security
0.929
14.08
Energy Equity
0.994
166.67
Environmental Sustainability
0.835
6.06
Reserves (Natural Gas)
0.810
5.26
Prices (Natural Gas)
0.898
9.80
It can be observed that the collected data has an acceptable size for conducting a regression study, with a fluctuating dataset between 1970–2019 is being analyzed. This confirms that this analysis is meeting the three first requirements. To check the fourth requirement in terms of independent variables collinearity, the Variance Inflation rate (VIF) is computed. To confirm no multicollinearity presence in the studied sample, the calculated VIF should have a value less than 10 [27]. Table 1 presents the calculated R2 and VIF parameter for the different investigated factors.
252
D. Nasreddin et al.
Observed findings reflected that 16 out of 18 independent variables are significantly linked to the other variables. Khan el at. [28] conducted a research examining the impact of energy use, population growth and financial development, on the economic growth of the top ten countries in the World Energy Trilemma Index 2020 from 1990 to 2016 using regression analysis. The results showed that there was a positive correlation between the economic growth and the energy trilemma and a negative correlation between the financial development and the energy trilemma. Narayan el at. [29] conducted a research examining the relationship between global energy consumption and GDP for the period 1980–2006. The literature in this research assumed a positive relationship between the two. However, the results found that only 60% of countries consider the relationship was positive using the least square estimator, panel-PP, panel-ADF and the unit root analysis. Borozan [30] conducted a research exploring the relationship between total energy consumption in Croatia and real GDP from 1992–2010. The methodology used was the bivariate vector autoregression (VAR) and Granger causality tests. The results found that the two variables are correlated and directly affect each other. Another research conducted a research on 78 countries for the period of 1995–2021 using the Stochastic Impacts by Regression to learn of the impact of urbanization on energy efficiency. The variables included population, affluence and technology models. The results showed correlation between energy efficiency and urbanization in all cases. To conclude, the correlation between the factors is explained both statistically and through the literature review. The regression analysis fails to conduct a comprehensive analysis of our depending variables that could be described by the different factors, that is why conducting an analysis using an AI model will give reliable results. To summarize, this paper enabled getting a perspective of the energy situation in Morocco. Collected findings reflected the increased consumption and the decreased production rate of fossil fuels in Morocco, encouraging to shift toward cleaner energy sources in different sectors such as construction [31–34], building [35–37], agriculture [38, 39] etc.…
4 Conclusion Despite the current use of renewable energies, countries are still relying on conventional energies which makes this study relevant. Morocco’s energy consumption and production remain predominant indicators of its development in many sectors including transportation, residential, commercial, and industrial in addition to local socio-economic and global factors that have an impact on Morocco’s energy status. This study showed the relation between the factors and the energy consumption and production in order to predict the energy status in the upcoming years using two forecasting models: Multiple Linear Regression (MLR) and Machine Learning (ML). In linear regression, four dependent variables and their factors were analyzed after checking their sample size, fluctuating behavior, scale type, and collinearity. Most of these factors, however, turned out to be invalid since they were not collinear. The collinearity was checked using VIF which is not supposed to be above 10. This was the case only for environmental sustainability with 6.06, reserves with 5.26, and prices with 9.8 which is why linear regression is not an option to do the forecasting. Instead, machine learning will be conducted to have better results.
Regression and Machine Learning Modeling Comparative Analysis
253
This paper’s machine learning approach is based on linear regression. Although decision tree ML and logistic regression ML were evaluated, linear regression proved to be the most accurate. The Python-based method is started by importing data on factors influencing energy usage and output. The program then imports any required libraries. One of the most important aspects of prediction is data training and testing. It is accomplished by masking a time series of data and having the model attempt to forecast. This paper is one of many studies that were developed to forecast the energy status of a country based on different factors. However, researchers are still trying to come up with new models that have more accurate results.
Appendix
Produ con (TJ)
1600
Degree of urbanisaon Install ed capaci Trans Popul Popul Consu Reser Reside Comm GDP(b ty of portat Indust aon aon mpo renew ves nal ercial illion ion ry (TJ) growt densit n (TJ) (TJ) (TJ) (TJ) USD) able (TJ) h (%) y (%) energi es (MW) 84300
28400 42900 27200 23000
2.08
1000
93800
28400 49800 28400 24300
2.12
800
30500 46100 29800 26000
2.18
31900 50400 31800 27700
2.24
461
96000 10300 0 11630 0 12250 0 13490 0 13420 0 12430 0 12910 0 12600 0 12410 0 12460 0 12430 0 12370 0 13070 0 14510 0 14996 2 15935 6 18022 4
419
17987
400 900 900 700 500 700 600 700 600 900 900 700 800 500 628 502
35700 56500 34100 30000
2.29
36500 60900 36400 31100
2.33
40500 67500 38600 32700
2.37
36400 71800 38700 33500
2.39
35900 64700 39000 34000
2.42
38700 64200 41500 35000
2.43
37500 67400 43500 36000
2.41
36100 67500 44700 36700
2.33
36600 72600 46700 37900
2.23
38800 68600 23100 38900
2.12
40700 61400 23700 39600
2.01
42700 74800 24200 33000
1.93
48500 86700 24800 33900
1.87
54400 82100 26600 37800
1.84
60300 87000 27400 39400
1.81
66900 92000 28300 42100
1.77
72300 81100 29400 44700
1.71
24.40 24.92 25.47 26.05 26.65 27.28 27.93 28.61 29.31 30.03 30.77 31.49 32.20 32.89 33.56 34.21 34.86 35.51 36.15 36.80 37.44
6.242
620
7.675
620
8.985
620
9.584
620
11.05
620
13.24
620
15.91
620
21.73
620
17.79
620
17.69
620
16.25
620
14.82
620
14.99
620
19.46
620
21.77
620
25.71
620
612 459 1958. 4 1774. 8 1346. 4 1530 1530 1836 16524 15912
Socio-economic factors (local)
Oil & produ cts Impor ts (TJ)
10257 6.6 12602 2.7 12811 6.1 13648 9.7 14946 8.8 15491 1.6 21227 0.8 17793 9 19343 0.2 19803 5.6 19175 5.4 20557 1.9 21310 8.1 20389 7.2 21520 1.5 22566 8.5 24702 1.2 26125 6.3 25330 1.4 29893 7.5
26.31
620
30.18
620
32.29
620
33.71
620
13090 .68
31.66
620
9816. 30479
15300 12240
Popul aon growt h (%)
GDP (Trillio n USD)
1.98
4.642
Oil prices ($/M W)
1.57
23.68 25.38 3
5.591 059 19.67 917 19.59 42 21.75 245 23.65 579 23.82 573 53.71 835 62.58 927 61.05 98 56.02 955 50.21 757 48.90 902 46.83 574 24.52 249 31.32 869 25.36 173 30.97 364 40.31 99 33.98 975 32.83 402
1.56
25.82
28.84
1.93
5.35
1.86
5.96
1.80
6.479
1.75
7.328
1.75
8.631 10.02 5 11.30 1 11.69 2 11.57 5 11.80 3 12.23 4 12.86 7 15.21 4 17.31 7 19.33 9 20.13 6 22.69 9
1.76 1.75 1.76 1.80 1.78 1.75 1.75 1.77 1.78 1.77 1.74 1.74 1.67
Energ y Securi ty (%)
Energ y Equity (%)
Enviro nment al Sustai nabilit y (%)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
(continued)
254
D. Nasreddin et al.
(continued) 0 335 209 209 502 502 461 544 419 544 419 461 293 419 461 377 335 419 419 293 335 209 209 209 194 203 181
19924 5 20224 2 20278 9 21656 1 22722 9 23995 4 23780 1 24360 1 24688 1 25405 2 29076 4 31400 3 33057 2 34888 4 36924 0 39448 9 40354 4 43103 7 43684 1 44959 5 44312 2 46274 0 47136 4 49715 7 50397 3 52367 7
78000 89600 30100 46600
1.63
82600 80100 30600 47700
1.53
88100 88600 31300 48400
1.43
94100 10000 0 10670 0 11230 0 11860 0 12420 0 12940 0 13400 0 13950 0 14370 0 15280 0 16740 0 17720 0 18640 0 19730 0 20030 0 20810 0 21080 0 22400 0 23390 0 24190 0 24860 0 26000 0
1.35
88500 32000 49300 90900 32900 51400
1.27
92900 31900 51800
1.22
93000 32400 52500
1.18
91300 34000 57500
1.15
89200 34600 58400
1.12
77400 35400 60500 90200 10430 0 11340 0 11460 0 11440 0 11490 0 12220 0 13550 0 14050 0 13870 0 12870 0 12600 0 12080 0 13650 0 13620 0 13470 0
1.11
55500 82500
1.11
53700 83000
1.12
51700 81700
1.14
49800 80900
1.16
50200 82300
1.19
47300 83500
1.24
47000 84200
1.29
48300 87800
1.35
49800 91200
1.39
50300 93700
1.42
51100 95800
1.40
51200 98400 10180 0 10510 53900 0 10550 54200 0 10880 55800 0 51900
1.37 1.33 1.29 1.25 1.22
38.05 38.64 39.19 39.72 40.23 40.73 41.21 41.69 42.16 42.63 43.10 43.59 44.09 44.60 45.14 45.70 46.29 46.92 47.58 48.26 48.94 49.61 50.28 50.93 51.57 52.20
35.6
620
39.03
620
43.16
1221
39.15
1221
41.81
1221
41.63
1221
38.86
1221
39.46
1221
42.24
1221
52.06
1319
59.63
1552
62.34
1783
68.64
1783
79.04
1843
48
9
9810. 36 10624 .32 7282. 8 7282. 8 12031 .92 12031 .92 11652 .48
32405 8.3 31024 1.9 28135 3 31066 0.6 31484 7.4 35964 6.1 35713 4 38434 8.2 35043 5.2 34792 3.1 41239 9.8 44170 7.4 43919 5.3 47771 3.9 47143 3.7 50744 0.2 55684 4.4 49027 4.3 66695 7.2 65649 0.2 62760 1.3 56731 1.4 53214 2.3 56563 6.7 55223 8.9 55140 1.6
11016 11016 9792 9792 9792
101.2
2728
103.3
2798
6542. 28 6046. 56 5116. 32 4602. 24 4602. 24 4186. 08 4186. 08 4186. 08 4186. 08 4186. 08 4186. 08
109.7
2948
-
118.1
3681
-
119.7
3681
-
92.51
1843
92.9
1970
93.22 101.0 4
2025
98.27
2025
106.8
2265
110.1
2567
1991
176
1.45
27.87 4 31.04 3 31.73 6
1.43
31.62
1.39
31.54 32.73 4 33.81 6 33.60 9 34.88 4 39.12 9 44.09 6 47.77 4 51.76 9 58.32 1 64.01 4
1.52 1.51
1.35 1.32 1.30 1.28 1.26 1.25 1.25 1.24 1.24 1.24 1.22 1.20 1.17 1.18 1.18 1.18 1.17 1.16 1.14 1.11 1.07
60.73 66.48 8 73.65 4 75.31 2 77.44 79.55 8 75.11 2 76.30 5 81.19 3 86.26 8 87.56 8
26.88 063 28.91 832 35.12 424 32.44 613 21.60 912 30.53 853 48.42 545 41.54 019 42.52 477 48.99 519 65.02 792 92.65 368 110.7 065 123.0 187 165.2 778 104.8 048 135.0 955 189.0 689 189.7 726 184.6 554 168.1 498 89.02 658 74.32 229 92.09 518 121.1 85 109.1 201
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
100
100 101.7 2
93.65
100 100.1 8 100.1 2
91.97
95.48
98.43
93.57
99.26
93.85
93.85
96.06
97 102.7 7
96.52
95.64
96.99
98.71
97.18
97.88
97.39
98.76
94.22
98.99
98.37 101.3 2 100.2 2
93.7
99.47 112.2 8 121.5 5 121.3 3 121.8 2 122.6 9 121.9 9 121.4 2 130.9 8 151.6 5
97.85 100.6 9 93.29 88.49 110.1 1 109.8 7 113.5 1 113.1 5 115.6 9
97.53 100.5 2 101.3 1 97.54 95.68 101.1 6 100.1 3 102.3 5 102.2 3 99.43
References 1. Jiang, P., van Fan, Y., Klemeš, J.J.: Appl. Energy 285, 116441 (2021) 2. Olabi, V., Wilberforce, T., Elsaid, K., Sayed, E.T., Abdelkareem, M.A.: Chem. Eng. Technol. 45, 558–571 (2022) 3. bp, Full Report – Statistical Review of World Energy, n.d (2021) 4. Energy Profile Morocco, United Arab Emirates (2021) 5. Boulakhbar, M., et al.: Towards a large-scale integration of renewable energies in Morocco. J. Energy Storage 32, 101806 (2020) 6. International Trade Administration (2021) 7. IEA Sankey Diagram (n.d.)
Regression and Machine Learning Modeling Comparative Analysis 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
22. 23. 24. 25.
26.
27.
28.
29. 30. 31. 32.
255
Worldometer (n.d.) Planete Energies (2016) Handaji, M.: Morocco World News (2020) Hanga, K.M., Kovalchuk, Y.: Machine learning and multi-agent systems in oil and gas industry applications: a survey. Comput. Sci. Rev. 34, 100191 (2019) Ghoddusi, H., Creamer, G.G., Rafizadeh, N.: Energy Econ. 81, 709–727 (2019) Xiong, P.P., Dang, Y.G., Yao, T.X., Wang, Z.X.: Optimal modeling and forecasting of the energy consumption and production in China. Energy 77, 623−634 (2014) Chavez, S.G., Bernat, J.X., Coalla, H.L.: Forecasting of energy production and consumption in asturias (Northern Spain). Energy 24(3), 18−198 (1999) Smith, L.V., Tarui, N., Yamagata, T.: Assessing the impact of COVID-19 on global fossil fuel consumption and CO2 emissions. Energy Econ. 97, 105170 (2021) Raza, M.A., et al.: Energy demand and production forecasting in Pakistan. Energy Strategy Rev. 39, 100788 (2022) Nafil, A., Bouzi, M., Anoune, K., Ettalabi, N.: Energy Rep. 6, 523–536 (2020) Dritsaki, C., Niklis, D., Stamatiou, P.: Oil consumption forecasting using ARIMA models: an empirical study for Greece. Int. J. Energy Econ. Policy 11, 214–224 (2021) Liu, B., Fu, C., Bielefield, A., Liu, Y.Q.: Forecasting of Chinese primary energy consumption in 2021 with GRU artificial neural network. Energies 10(10), 1453 (2017) Zhang, J., Qin, Y., Duo, H.: The development trend of China’s natural gas consumption: a forecasting viewpoint based on grey forecasting model. Energy Rep. 7, 4308–4324 (2021) Wang, Y., Zhang, Y., Nie, R., Chi, P., He, X., Zhang, L.: A novel fractional grey forecasting model with variable weighted buffer operator and its application in forecasting China’s crude oil consumption. Petroleum 8(2), 139−157 (2022) Uyanık, G.K., Güler, N.: A study on multiple linear regression analysis. Procedia Soc. Behav. Sci. 106, 234–240 (2013) Woolf, B.P.: Chapter 7-machine learning. Build. Intell. Interact. Tutors 221−297 (2009) Muzaffar, S., Afshari, A.: Short-term load forecasts using LSTM networks. Energy Procedia 158 2922–2927 (2019) Zheng, J., Xu, C., Zhang, Z., Li, X.: Electric load forecasting in smart grids using longshort-term-memory based recurrent neural network. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS), pp. 1−6. IEEE (2017) Tian, C., Ma, J., Zhang, C., Zhan, P.: A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies (Basel) 11, 3493 (2018) Sarstedt, M., Mooi, E.: In: Sarstedt, M., Mooi, E. (eds.) A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics, pp. 209–256. Springer, Berlin Heidelberg, Berlin, Heidelberg (2019) Khan, I., Hou, F., Irfan, M., Zakari, A., Le, H.P.: Does energy trilemma a driver of economic growth? the roles of energy use, population growth, and financial development. Renew. Sustain. Energy Rev. 146, 111157 (2021) Narayan, P.K., Narayan, S., Popp, S.: A note on the long-run elasticities from the energy consumption–GDP relationship. Appl. Energy 87(3), 1054-1057 (2010) Borozan, D.: Exploring the relationship between energy consumption and GDP: evidence from Croatia. Energy Policy 59, 373-381 (2013) Limami, H., Manssouri, I., Cherkaoui, K., Khaldoun, A.: J. Build. Eng. 27, 100956 (2020) Limami, H., Manssouri, I., Cherkaoui, K., Khaldoun, A.: Physicochemical, mechanical and thermal performance of lightweight bricks with recycled date pits waste additives. J. Build. Eng. 34, 101867 (2021)
256
D. Nasreddin et al.
33. Limami, H., Manssouri, I., Cherkaoui, K., Saadaoui, M., Khaldoun, A.: Thermal performance of unfired lightweight clay bricks with HDPE & PET waste plastics additives. J. Build. Eng. 30, 101251 (2020) 34. Limami, H., Manssouri, I., Cherkaoui, K., Khaldoun, A.: Recycled wastewater treatment plant sludge as a construction material additive to ecological lightweight earth bricks. Eng. Technol. 2, 100050 (2021) 35. Limami, H., Manssouri, I., Cherkaoui, K., Amazian, L., El Baraka, A., Khaldoun, A.: Unfired clay bricks with additives and mechanical simulation of perforated bricks. In: 2019 7th International Renewable and Sustainable Energy Conference (IRSEC), pp. 1−6. IEEE (2019) 36. Limami, H., Manssouri, I., Cherkaoui, K., Khaldoun, A.: J. Build. Eng. 34, 101867 (2021) 37. Houssame, L., Imad, M., Khalid, C., Asmae, K.: J. Energy Eng. 147 (2021) 4021020 38. Limami, H., Manssouri, I., Cherkaoui, K., Khaldoun, A.: Mechanical and physicochemical performances of reinforced unfired clay bricks with recycled typha-fibers waste as a construction material additive. Cleaner Eng. Technol. 2, 100037 (2021) 39. Limami, H., et al.: Thermophysical and mechanical assessment of unfired clay bricks with dry grass fibrous filler. Int. J. Thermophys. 43, 114 (2022)
Enhancing Brain Tumor Classification in Medical Imaging Through Image Fusion and Data Augmentation Techniques Tarik Hajji(B) , Youssef Douzi, and Tawfik Masrour Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), ENSAM-Meknes, Moulay Ismail University of Meknes, Meknes, Morocco [email protected]
Abstract. Brain tumors (BT) pose a significant health risk due to their uncontrolled and abnormal cell proliferation. Recent advances in deep learning (DL) have revolutionized medical imaging diagnostics; however, applying machine learning (ML) for automatic BT classification faces challenges related to limited and low-quality data accessibility [1]. Moreover, BT classification involves identifying over 120 distinct tumor types. In this paper, we propose an approach that leverages image fusion (IF) and an auto-coding technique for data augmentation (DA) and DL to achieve automatic classification of BT medical images (MI). The primary goal is to develop a diagnostic support system to assist medical practitioners in analyzing new BT images. To overcome the scarcity of data, we employ two data augmentation approaches at different stages: classical DA on a small dataset initially, followed by IF during the learning process. Our approach outperforms existing methods in the literature, demonstrating high accuracy rates for ten tumor classes. Keywords: Medicine 4.0 · Artificial Intelligence · Brain Tumor · Visual Recognition · Machine Learning · Deep Learning · Image Fusion · Data Augmentation · CNN · Autoencoder
1 Introduction Medicine 4.0, with its focus on digitalization, 3D imaging, AI algorithms, and personalized implants, has opened new possibilities in various fields, including healthcare. Visual recognition (VR) technologies, capable of extracting textual content from images, have garnered significant attention. A Harvard study in the healthcare sector revealed that DL achieved a success rate of 92% in analyzing MRI scans, while doctors achieved 96%. However, when AI was integrated with DL, the success rate surpassed 99.5%. Brain tumors, with over 120 recognized types according to the World Health Organization, pose a substantial health concern. DL, a subfield of AI specializing in data recognition and classification, has been extensively utilized in computer vision applications. However, developing an optimal model requires a labeled image dataset and specific learning algorithms to fine-tune model parameters. To address the challenge © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 257–271, 2023. https://doi.org/10.1007/978-3-031-43520-1_22
258
T. Hajji et al.
of BT classification, this paper proposes a combination of techniques, including convolutional neural networks (CNN), data augmentation (DA), image fusion (IF), and autoencoders, to achieve high recognition rates and overcome the scarcity of data. The paper is structured as follows: the state-of-the-art section introduces key concepts such as DL, CNN, DA, and IF techniques. The related works section presents scientific findings related to BT classification. The methodology section outlines the proposed approach, divided into three main parts: standard classification, DA techniques, and IF techniques using autoencoders. The results and discussion section presents the obtained results for each approach and compares them with state-of-the-art methods. The paper concludes with a general summary and conclusion.
2 Background The field of medicine has a rich history of innovation, involving various professionals ranging from designers, engineers, doctors, clinicians to developers, startups, and data scientists [7]. In recent years, remarkable advancements have been made in AI-based diagnostic algorithms, such as the introduction of the first FDA-approved algorithm for diabetic retinopathy fundus diagnosis in 2018 [8]. AI has also demonstrated its potential in automatic and real-time skin cancer recognition and classification [9], instantaneous electrocardiogram reading [10], arrhythmia detection [11], and personalized chemotherapy [12]. These breakthroughs hold the promise of significantly enhancing patient outcomes and revolutionizing the healthcare industry. 2.1 Deep Learning DL, a subfield of ML, is a key component of the broader field of Artificial Intelligence [13]. The concept of ML has its roots in the mid-20th century, when Alan Turing, a British mathematician, envisioned a “Learning Machine” capable of self-improvement [14]. Since then, various ML techniques have been developed to create algorithms capable of learning and enhancing their performance. Among these techniques, artificial neural networks (ANN) have played a significant role [15]. ANNs, which serve as the foundation for DL and technologies like image recognition and robotic vision, are inspired by the structure and function of neurons in the human brain. They consist of interconnected artificial neurons that enable complex information processing. 2.2 Data Augmentation In Fig. 1, data augmentation (DA) is depicted as a technique commonly employed in computer vision to augment the training dataset by generating additional images from existing ones. Its primary purpose is to enhance the diversity and quantity of available data, particularly when data scarcity or limitations are encountered. By introducing variations and augmentations to the existing images, DA enables deep learning (DL) models to learn from a larger and more diverse set of examples. Consequently, DL models trained with DA tend to exhibit improved accuracy and demonstrate superior performance in terms of training loss compared to models trained without DA.
Enhancing Brain Tumor Classification in Medical Imaging
259
There exist numerous DA methods specifically designed for image data. These methods encompass a range of transformations, such as adding noise, cropping, flipping, rotation, scaling, translation, and adjusting brightness, contrast, and saturation. Additionally, more advanced techniques like adversarial training, neural style transfer, and generative adversarial networks (GAN)-based augmentation have been developed. These transformations introduce variations and diversify the dataset, enabling the DL model to become more robust in handling different variations and real-world data.
Fig. 1. Example of three simple data augmentation operations (rotation, shearing and scaling).
2.3 Image Fusion Image fusion (IF) techniques can be classified into spatial domain and frequency domain methods [23]. Spatial domain methods involve directly combining the pixel intensities of images, while frequency domain methods transform images into frequency space before combining the frequency coefficients [23]. Spatial domain techniques include operations such as averaging, minimum, maximum, and other arithmetic operations, while frequency domain techniques involve methods like discrete cosine transform (DCT), discrete wavelet transform (DWT), and Fourier transform (FT) [24].
260
T. Hajji et al.
In the context of medical imaging, IF can be utilized to improve the diagnostic accuracy of a specific modality or provide additional information that cannot be obtained from a single modality alone [25]. For instance, the fusion of magnetic resonance imaging (MRI) and computed tomography (CT) images can offer both anatomical and functional information about the patient’s condition, enhancing the understanding of the pathology [25]. Moreover, IF techniques can be employed to reduce noise, eliminate artifacts, and enhance the overall image quality in low-resolution images [26]. By leveraging IF in medical imaging, healthcare professionals can gain better insights into a patient’s condition, leading to more precise diagnoses and improved treatment outcomes.
3 Related Work The L2M3S laboratory is dedicated to the application of AI in various domains, including speech recognition, computer vision, machine learning, natural language modeling, and robotics. With a strong belief in the potential of AI, the laboratory aims to address complex challenges in healthcare, industry, finance, and security [15, 32–38]. Brain tumor classification has been extensively studied, with modern approaches heavily relying on deep learning (DL) techniques. In one study [23], convolutional neural networks (CNNs) with transfer learning (TL) achieved a remarkable mean classification accuracy of 98% for brain tumor classification, surpassing previous related work. Another study [24] proposed a DL-based decision support system for multimodal brain tumors, achieving an accuracy of over 95% on two datasets. Further advancements have been made in brain tumor classification using CNNs. In a study [25], a CNN model achieved high accuracy and sensitivity in classifying brain tumors into three categories (glioma, meningioma, and pituitary). The model obtained an accuracy of 98.93% and a sensitivity of 98.18% for cropped lesions, 99% accuracy and 98.52% sensitivity for uncropped lesions, and 97.62% accuracy and 97.40% sensitivity for segmented lesion images. In another study [26], CNN combined with discrete wavelet transform (DWT) and principal component analysis demonstrated effective classification of brain MRIs into four classes, showcasing favorable performance across various evaluation metrics. To contribute to the field, this study proposes an approach that utilizes data augmentation with fusion techniques for brain tumor classification using DL and autoencoders. By leveraging these techniques, the aim is to enhance the accuracy and robustness of the classification system.
4 Methodology With a dataset that was relatively small, consisting of 240 images representing 10 classes of brain tumors, we initiated our three-step approach, as depicted in Fig. 2, using the resources of our L2M3S lab.
Enhancing Brain Tumor Classification in Medical Imaging
261
Fig. 2. The proposed methodology consists of three steps: classification without data augmentation, classification with classical data augmentation techniques and classification with data augmentation using fusion techniques.
4.1 Dataset As shown in Fig. 3, our dataset comprises two types of images, namely training and testing, each containing 10 distinct classes of brain pathologies including aneurysm, multiple sclerosis, hydrocephalus, stroke, infections, cysts, swelling, hemorrhage, bleeding, and inflammation. The training set consists of 24 images for each class, while the test set contains 8 images for each class.
Fig. 3. Extracted from the dataset used.
262
T. Hajji et al.
4.2 Data Pre-Processing for the Model To handle the large dataset efficiently, the study employed Keras functions, particularly the Image Data Generator. This tool enabled automatic data processing and generation of batch data streams from a directory, eliminating the need to load the entire dataset into memory simultaneously. Various image manipulations were performed using simple parameters such as rotation, resizing, and scaling, as outlined in Table 1. These transformations aimed to enhance the model’s ability to handle images that may not be present in the dataset, ensuring robustness. An example of such an image is depicted in Fig. 4. Table. 1. Configuration of the data augmentation generator. Parameter
Value
Description
Rotation_range
20
Rotate the image 20 degrees
Width_shift_range
0.10
Shift the pic width by a max of 5%
Height_shift_range
0.10
Shift the pic height by a max of 5%
Rescale
1/255
Rescale the image by normalizing it
Shear_range
0.1
Shear means cutting away part of the image (max 10%)
Zoom_range
0.1
Zoom in by 10% max
Horizontal_flip
True
Allo horizontal flipping
Fill_mode
nearest
Fill in missing pixels with the nearest filled value
4.3 Generation of Many Manipulated Images from a Directory The flow_from_directory method requires the images to be organized into subdirectories, with each subdirectory containing images of a single class, as shown in.
Fig. 4. This is a strict requirement, as the method will not work otherwise. We created one folder per image class and placed all the images belonging to the same class in that folder.
Enhancing Brain Tumor Classification in Medical Imaging
263
4.4 Design of the Model Architecture For visual learning and image recognition, the most commonly used machine learning algorithm is the Convolutional Neural Network (CNN), as shown in Fig. 5 and Fig. 6.
Fig. 5. General architecture of a convolution neural network.
4.5 Data Augmentation: A Comparative Study In order to train a model effectively, having a large and high-quality dataset is crucial. However, in certain situations, acquiring such a dataset may be challenging or unfeasible. To overcome this limitation, data augmentation techniques can be employed to artificially expand the dataset and introduce diversity. By applying transformations like rotation, flipping, zooming, and shifting to the original images, new variations can be created, thereby enlarging the learning space for the model. This augmentation process helps improve the model’s performance and enhances its robustness when predicting new data. Several parameters play a significant role in data augmentation. These parameters include the rotation range, width and height shift range, shear range, zoom range, and options for horizontal and vertical flipping. These parameters control the extent and type of transformations applied to the images, as illustrated in Table 2. By adjusting these parameters appropriately, the augmented dataset becomes more diverse, enabling the model to generalize better and improve its ability to handle unseen data.
264
T. Hajji et al.
Fig. 6. The architecture used consists of three convolution layers followed by a classifier with two hidden layers, Total params: 4,625,674.
Enhancing Brain Tumor Classification in Medical Imaging
265
Table. 2. The proposed data augmentation parameters as an alternative to the default values. Parameter
Default Value
Proposed value
Rotation_range
20
15
Width_shift_range
0.10
0.05
Height_shift_range
0.10
0.05
Rescale
1/255
1/255
Shear_range
0.1
0.75
Zoom_range
0.1
Zoom in by 10% max
Horizontal_flip
True
True, Vertical_flip = True
Fill_mode
Nearest
Brightness_range = [0.1, 1.5]
4.6 Data Augmentation with Image Fusion The quality of medical images is critical to the accuracy of the diagnosis. Different sensors can produce distinct characteristics or artifacts, making it challenging to describe the morphological structure of an organ with only one type of image. In order to overcome this challenge, a method called image fusion (IF) is utilized to integrate multiple images into a single image, containing a wealth of useful information as shown in Fig. 7.
Fig. 7. Data augmentation by image fusion using autoencoders.
266
T. Hajji et al.
The process involves several steps. First, an autoencoder is utilized to generate codes that encapsulate the significant information of each image. Next, the source image is decomposed into low-frequency sub-bands (LFS) and high-frequency sub-bands (HFS) using the non-subsampled shearlet transform (NSST). Then, the fusion is performed using fuzzy logic systems (FLS) and new sum-modified-Laplacian (NSML) fusion rules. Finally, the membership functions (MFs) of the FLSs are optimized using particle swarm optimization (PSO) to obtain the best fusion outcome. This method allows for efficient extraction and integration of important information from medical images to improve the accuracy of the diagnostic results. 4.7 Auto-Encoder Architecture The auto-encoder Fig. 8 is a specific architecture of a neural network that can be used for unsupervised learning. It is composed of two sub-networks, the first one is called the encoder and is used to extract meaningful features from the input data, while the second one is called the decoder and is used to reconstruct the input data from the extracted features. During training, the auto-encoder is fed with input data and learns to encode and decode it using a process called backpropagation. Once trained, the encoder can be used to extract features from new data, which can be used as input to other machine learning models. The auto-encoder is commonly used for dimensionality reduction, denoising, and image generation tasks.
Fig. 8. Operating principle of the autoencoders.
5 Results and Discussion 5.1 CNN Result Without Data Augmentation We used the accuracy and loss functions to evaluate the proposed model and we obtained a rate of 92.85%. Fig. 9 shows the training and validation accuracy and loss.
Enhancing Brain Tumor Classification in Medical Imaging
267
Fig. 9. Training and validation accuracy and loss of CNN without data augmentation.
5.2 CNN Result with Data Augmentation Automatic Generator The use of proposed data augmentation parameters allowed us to expect a rate of 94.99 with an increase of 2.24%. Fig. 10 shows the training and validation accuracy and loss.
Fig. 10. Training and validation accuracy and loss of CNN with using data augmentation proposed parameters.
5.3 CNN Result Based DA Using iF with BWT The use of Daubechies wavelet transforms like an image fusion for data augmentation allowed to expect a rate of 97.33%. Fig. 11 shows the training and validation accuracy and loss.
268
T. Hajji et al.
Fig. 11. Result based data augmentation using image fusion with Daubechies wavelet transform.
5.4 CNN Result Based DA Using if with Auto-Encoder Proposed Approach The use of the auto encoder as an image fusion process to achieve the data increase to obtain a perfect rate of 100%. Fig. 12 shows the training and validation accuracy and loss.
Fig. 12. The proposed approach results with CNN and DA using IF with auto-encoder.
Enhancing Brain Tumor Classification in Medical Imaging
269
The Table 3 provides a summary of the most important results. Table. 3. Summary of the most important results. Accuracy
Validation Accuracy
CNN model
0.889
1
data augmentation
0.9429
Daubechies
Loss
Validation Loss
Score
0.0417
0.0342
0.928
1
0.0243
0.0092
0.9499
0.97
0.99
0.0168
0.0027
0.9733
1
1
36.10–5
19.10–5
Wavelet auto encoder
1
6 Conclusion In the context of brain tumor detection, the presence of over 500 different classes poses a challenge in acquiring sufficient data for each type. To address this issue, a combination of techniques for data augmentation (DA) was proposed. Initially, a local database consisting of 10 distinct classes was constructed, and a comparative study was conducted to identify the most suitable convolutional neural network (CNN) architecture for the task. Without any DA techniques, the CNN architecture achieved a rate of 92.88%. To further enhance the performance, another comparative study was conducted to determine the optimal parameterization for the automatic data generator, resulting in consistent progress with an accuracy rate of approximately 94.99%. Subsequently, image fusion (IF) using the discrete wavelet transform (DWT) technique was employed as a form of DA, yielding a rate of 97.33%. Additionally, the proposed approach incorporated autoencoders as an image fusion technique for data augmentation, resulting in highly satisfactory results and an excellent rate of accuracy. Looking ahead, the perspective is to develop a recognition system for other types of brain tumors using alternative data augmentation techniques, such as generative adversarial networks (GANs), to further improve performance and expand the scope of detection.
References 1. The Johns Hopkins University, Hospital and Health System, Brain Tumor Types. https:// www.hopkinsmedicine.org/health/conditions-and-diseases/brain-tumor/brain-tumor-types. Accessed 24 Mars 2022 2. Wolf, B., Scholze, C.: Medicine 4.0. Curr. Dir. Biomed. Eng. 3(2), 183–186 (2017) 3. Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., Poggio, T.: A quantitative theory of immediate visual recognition. Prog. Brain Res.165, 33−56 (2007) 4. Davis, N.: AI equal with human experts in medical diagnosis, study finds. The Guardian. 24 20 (2019)
270
T. Hajji et al.
5. Tarik, H., Tawfik, M., Youssef, D., Simohammed, S., Mohammed, O.J., Miloud, J.E.: Towards an improved CNN architecture for brain tumor classification. In: Serrhini, M., Silva, C., Aljahdali, S. (eds.) EMENA-ISTL 2019. LAIS, vol. 7, pp. 224–234. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36778-7_24 6. Douzi, Y., Hajji, T., Benabdellah, M., Azizi, A., Masrour, T.: Classification and watermarking of brain tumor using artificial and convolutional neural networks. In: Masrour, T., El Hassani, I., Cherrafi, A. (eds.) A2IA 2020. LNNS, vol. 144, pp. 61–77. Springer, Cham (2021). https:// doi.org/10.1007/978-3-030-53970-2_6 7. Ursin, F., Timmermann, C., Orzechowski, M., Steger, F.: Diagnosing diabetic retinopathy with artificial intelligence: what information should be included to ensure ethical informed consent?. Front. Med. 8 695217 (2021). https://doi.org/10.3389/fmed.2021.695217 8. Moutei, H., Chraibi, F., Abdellaoui, M., Benatiya, I.: L’intelligence artificielle en ophtalmologie. J. de la Société Marocaine Ophtalmol. 30(2) 8−14 (2021) 9. Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 168−172. IEEE (2018) 10. Ribeiro, A.H., et al.: Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat. Commun. 11(1), 1760 (2020) 11. Karunadas, C.P., Mathew, C.: Comparison of arrhythmia detection by conventional holter and a novel ambulatory ECG system using patch and android app, over 24 h period. Indian pacing Electrophysiol. J. 20(2), 49−53 (2020) 12. André, N., Carré, M., Pasquier, E.: Metronomics: towards personalized chemotherapy?. Nat. Rev. Clin. Oncol. 11(7), 413−431 (2014) 13. Douzi, Y., Kannouf, N., Hajji, T., Boukhana, T., Benabdellah, M., Azizi, A.: Recognition textures of the tumors of the medical pictures by neural networks. J. Eng. Appl. Sci 13, 4020–4024 (2018) 14. Fradkov, A.L.: Early history of machine learning. IFAC-PapersOnLine 53(2), 1385–1390 (2020) 15. Hajji, T., Hassani, A.A., Jamil, M.O.: Incidents prediction in road junctions using artificial neural networks. In: IOP Conference Series: Materials Science and Engineering, vol. 353, no. 1, p. 012017. IOP Publishing (2018) 16. Tarik, H., Miloud, J.E.: Digital watermarking and signing by artificial neural networks. Am. J. Intell. Syst. 4, 21–31 (2014) 17. Jiang, W., Zhang, K., Wang, N., Yu, M.: MeshCut data augmentation for deep learning in computer vision. Plos One. 15(12), e0243613 (2020) 18. Nalepa, J., Marcinkiewicz, M., Kawulok, M.: Data augmentation for brain-tumor segmentation: a review. Front. Comput Neurosci. 13(83) 2019 19. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data. 6(1), 1−48 (2019) 20. Kaur, H., Koundal, D., Kadyan, V.: Image fusion techniques: a survey. Arch. Comput. Methods Eng. 28, 4425−4447 (2021).https://doi.org/10.1007/s11831-021-09540-7 21. Ghassemian, H.: A review of remote sensing image fusion methods. Inf. Fusion. 32, 75−89 (2016) 22. Singh, S., Gupta, D.: Detail enhanced feature-level medical image fusion in decorrelating decomposition domain. IEEE Trans. Instrum. Meas. 70, 1−9 (2020) 23. Deepak, S., Ameer, P.M.: Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 111, 103345 (2019) 24. Sharif, M.I., Khan, M.A., Alhussein, M., Aurangzeb, K., Raza, M.: A decision support system for multimodal brain tumor classification using deep learning. Complex Intell. Syst. 1−4 (2021)
Enhancing Brain Tumor Classification in Medical Imaging
271
25. Mohsen, H., El-Dahshan, E.S.A., El-Horbaty, E.S.M., Salem, A.B.M.: Classification using deep learning neural networks for brain tumors. Future Comput. Inf. J. 3(1), 68–71 (2018) 26. Saleh, A., Sukaik, R., Abu-Naser, S.S.: Brain tumor classification using deep learning. In: 2020 International Conference on Assistive and Rehabilitation Technologies (iCareTech), pp. 131−136. IEEE (2020) 27. Sajjad, M., Khan, S., Muhammad, K., Wu, W., Ullah, A., Baik, S.W.: Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J. Comput. Sci. 30, 174–182 (2019) 28. Paul, J.S., Plassard, A.J., Landman, B.A., Fabbri, D.: Deep learning for brain tumor classification. In: Medical Imaging 2017: Biomedical Applications in Molecular, Structural, and Functional Imaging, vol. 10137, pp. 253−268. SPIE (2017) 29. Sarhan, A.M.: Brain tumor classification in magnetic resonance images using deep learning and wavelet transform. J. Biomed. Sci. Eng. 13(06), 102 (2020) 30. Seetha, J., Raja, S.S.: Brain tumor classification using convolutional neural networks. Biomed. Pharmacol. J. 11(3), 1457 (2018) 31. Rathi, V.G.P., Palani, S.: Brain tumor detection and classification using deep learning classifier on MRI images. Res. J. Appl. Sci. Eng. Technol. 10(2), 177–187 (2015) 32. Khan, R.U., Zhang, X., Kumar, R.: Analysis of Resnet and googlenet models for malware detection. J. Comput. Virol. Hacking Tech. 15, 29−37 (2019) 33. Tarik, H., Kodad, M., Miloud, J.E.: Digital movements images restoring by artificial neural netwoks. Comput. Sci. Eng. 10, 36−42 (2014) 34. Hajji, T., El Jasouli, S.Y., Mbarki, J., Jaara, E.M.: Microfinance risk analysis using the business intelligence. In: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 675−680. IEEE (2016) 35. Tarik, H., Jamil, O.M.: Weather data for the prevention of agricultural production with convolutional neural networks. In: 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), pp. 1−6. IEEE (2019) 36. Ouerdi, N., Hajji, T., Palisse, A., Lanet, J.-L., Azizi, A.: Classification of ransomware based on artificial neural networks. In: Rocha, Á., Serrhini, M. (eds.) EMENA-ISTL 2018. SIST, vol. 111, pp. 384–392. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-03577-8_43 37. Hajji, T., Ouerdi, N., Azizi, A., Azizi, M.: EMV cards vulnerabilities detection using deterministic finite automaton. Procedia Comput. Sci. 127, 531–538 (2018) 38. Tarik, H., Mohammed, O.J.: Big data analytics and artificial intelligence serving agriculture. In: Ezziyyani, M. (ed.) AI2SD 2019. AISC, vol. 1103, pp. 57–65. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36664-3_7 39. Hajji, T., Masrour, T., Ouazzani Jamil, M., Iathriouan, Z., Faquir, S., Jaara, E.: Distributed and embedded system to control traffic collision based on artificial intelligence. In: Masrour, T., Cherrafi, A., El Hassani, I. (eds.) A2IA 2020. AISC, vol. 1193, pp. 173–183. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51186-9_12 40. Hajji, T., Loukili, R., El Hassani, I., Masrour, T.: Optimizations of distributed computing processes on apache spark platform. IAENG Int. J. Comput. Sci. 50(2), 422–433 (2023)
Train a Deep Neural Network by Minimizing an Energy Function to Solve Partial Differential Equations: A Review Idriss Barbara1,2(B) , Tawfik Masrour1,2 , and Mohammed Hadda1,2 1
2
Moulay Ismail University of Mekn`es, Meknes, Morocco [email protected] Laboratory of Mathematical Modeling, Simulation and Smart Systems L2M3S, Ecole Nationale Superieure d’Arts et Metiers, ENSAM, Meknes, Morocco {t.masrour,M.Hadda}@ensam.umi.ac.ma
Abstract. The numerical solution of partial differential equations (PDEs) is a crucial component of scientific computing. The idea of using a neural network to approximate PDE solutions is natural given the success of neural networks in many approximation problems. The main idea is to train a neural network to minimize the residual of the differential operator of the PDE, as well as the initial and boundary conditions. In this paper, we give a brief overview of the recent literature of methods based on the minimization of an energy function to solve PDEs. Keywords: Partial differential equation · neural networks · deep learning · curse of dimensionality · meshfree methods · loss function
1
Introduction
Deep learning has been utilized successfully during the last decade in different applications, including natural language processing and computer vision. However, has not yet been extensively applied in the field of scientific computing despite its extraordinary success in these and other related fields. So, deep neural networks and statistical learning approaches are now being applied to traditional applied mathematics issues. In this short survey, we aim to present a short overview of current advancements in the area of solving partial differential equations (PDEs) by minimizing an energy function using machine learning methods. According to the universal approximation theorem, a multilayer feedforward neural network with a sufficient number of neurons in the hidden layer can approximate any continuous function with arbitrary accuracy [24,30]. As a result, neural networks are extremely advantageous for function fitting. In [27] Lagaris et al. present a method for solving initial and boundary value problems using artificial neural networks (ANN). They construct a trial solution with two components, the first of which satisfies the initial and the boundary condition, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 272–286, 2023. https://doi.org/10.1007/978-3-031-43520-1_23
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
273
while the second is a feedforward neural network. Then they train the network to solve the differential equation. The methodology performs well, according to the experimental results presented. but the training time increases due to the larger training set for the high dimensional problems. Malek et al. in [32] proposed a method to solve ordinary differential equations (ODEs) based on neural network and optimization techniques. In the last five years, significant progress has been made in solving PDEs using deep learning methods. These methods are mesh-free approaches due to the use of a random point sample in the training set and taking advantage of automatic differentiation, which could break the curse of dimensionality [18, 38]. Various researchers use the variational form of the PDE and minimize the corresponding energy functional [20,29,60]. Galerkin type projections have also been taken into consideration because not all PDEs can be derived from a known functional. Sirignano and Spiliopoulos in [52] proposed Deep Galerkin Method (DGM) which is a deep learning approach for solving high-dimensional PDEs, including the Burgers’ equation and the Hamilton-Jacobi-Belllman (HJB) PDE. Furthermore, the Physics-informed neural networks (PINNs) presented by Raissi et al. in [42,43] and were subsequently published in [40] as a deep learning method under physical constraints for solving various forms of PDEs. The ability of PINNs to solve inverse problems with the least amount of coding modification of the forward problems is one of their appealing qualities [13,21,44,55]. As much research is being done in this field, it is impossible to cover it completely. Thus, In this paper, we aim to provide a useful overview that presents the state of the art of methods based on the minimization of an energy function for solving PDEs. We also want to draw your attention to other recent overviews [4,6] which contains several references, especially publications that concentrate on solving PDEs in high dimensions. The rest of this paper is organized as follows: In Sect. 2, after introducing the idea behind the PINN framework, we present here application for solving two main classes of problems: data-driven solution and data-driven discovery of PDEs. Section 3 devoted to methods based on the variation formulation to construct the energy function to be minimized like DRM. While the Sect. 4 provides a summary of DGM’s approach pursued by a concluding Sect. 5.
2
Physics-Informed Neural Networks
Physics-informed neural networks (PINNs) are deep learning methods for solving PDE problems in small data sets or zero-labeled datasets [46,54]. Raissi et al. [42,43] constructed a machine learning technique under physical constraints as the physics-informed neural networks. PINNs give an approximate solution to the PDEs by training a deep neural network. Whose loss function consists of PDE residuals and data mismatch. This loss formulation enables physicsinformed training that makes use of knowledge from both sparse observation data and physics equations. With the concept of automatic differentiation, it is convenient to use the weights and biases of the network to calculate the partial
274
I. Barbara et al.
derivatives of the space-time coordinates, which are then included in the loss function. This method also benefits from not requiring a mesh and from being applicable to the solution of PDEs regardless of the structure or even the complexity of the equations [34]. Furthermore, compared to data-driven approaches, the training data can be randomly sampled from the domain of interest, and fewer training point samples are needed to avoid the curse of dimensionality. There is no need for interpolation because the trained neural network is an analytical approximation of the latent solution. Not only may PINNs be used to solve the forward problem of approximating partial differential equation solutions, but they can also be utilized to solve the inverse problem of deriving partial differential equation parameters from training data [40,56,62]. Raissi used PINN to solve 1D Burger equations and the inverse problem of 2D/3D PDEs. The twopart study [42,43] that was subsequently published in [40] is when the name “PINN” was first introduced. The Burgers, Schr¨ odinger, and Allen-Cahn equations are used by the authors in the first part [42] to introduce and demonstrate the effectiveness of the PINN methodology for solving nonlinear PDEs. In the second part, [43], the authors focus on the data-driven discovery of PDEs that consists of simultaneously solving a nonlinear PDE and identifying the associated unknown parameters that enter the nonlinear part of the differential equation. This issue has been investigated within the framework of gaussian processes in [39,41,48]. We explain how to approximate the solution u : [0, T ]×Ω → R of an evolution problem using the PINN method. ∂t u(t, x) + N [u](t, x) = 0, u(0, x) = u0 (x), u(t, x) = ub (t, x),
(t, x) ∈ [0, T ] × Ω x ∈ Ω, (t, x) ∈ [0, T ] × ∂Ω
(1)
Depending on the sort of data available, the authors discuss both a timecontinuous and a time-discrete method for these two problem contexts. In this paper, we present a brief review of the data-driven solution and data-driven discovery of PDEs only in the continuous time framework and refer to [40] for the discrete-time variant. 2.1
Data-Driven Solution
According to [42], the purpose of the method to solve the parabolic PDE 1 is based on the residual of a particular approximation of the exact solution u using a neural network (2) uθ : [0, T ] × Ω → R. The class of neural networks used by the authors here is multilayer feed-forward neural networks, also referred to as multilayer perceptrons, which are defined as: uθ (z) := W L σ L W L−1 σ L−1 ... σ 1 W 0 z + b0 ... + bL−1 + bL where W l and bl are the parameters of the neural network (weight and bias), and z = [t, x]T .
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
275
The basic idea behind PINN for solving PDEs using a neural network is to formulate the PDE as a loss function for optimization. A process of optimization is necessary for all neural network training [8,16,17,47], such as stochastic gradient descent and Adam optimizer [26]. Thanks to the automatic differentiation given by the TensorFlow module [1] and the PyTorch modele [36] we can obtain the output’s partial derivatives of any order. PINN define f (t, x) to be given by the left-hand-side of Eq. 1 like: f := u∂ + N [u],
(3)
and proceed by approximating the exact solution by a deep neural network. The parameters θ = W 1 , W 2 , ..., W L , b1 , b2 , ..., bL of the neural networks can be learned by minimizing the mean squared error J(θ) defines as follows: J(θ) = M SEf + M SEu0 + M SEub Nf 1 i i 2 f tf , xf ; θ Nf i=1
(5)
N0 1 u 0, xi0 ; θ − u0 xi0 2 N0 i=1
(6)
Nb i i 1 u tb , xb ; θ − ub tib , xib 2 . Nb i=1
(7)
M SEf =
M SEu0 =
M SEub =
(4)
N
f where, {tif , xif }i=1 represents the collocation points generated randomly from the i i Nb 0 interior of Ω, and {xi0 }N i=0 denotes the sampling points at t = 0, and {tb , xb }i=1 indicates the sampling points on the boundaries. Accordingly, M SEf penalizes the differential operator of the equation to be satisfied on the collocation points, M SEu0 presents the loss on the initial data, and M SEub enforces the satisfaction of the boundary conditions. We arrive at an approximation of the PDE’s solution unless the value of J(θ) converges to zero. So, in order to find the optimal parameters of the neural network θ∗ that approximates the exact solution of the PDE, We solve the following optimization problem:
θ∗ = arg min J(θ),
(8)
θ∈V
where V denotes the parameter space. 2.2
Data-Driven Discovery
In [43] the authors focus on the problem of data-driven discovery of PDEs. Where the PINN method can be easily modified also to determine unknown parameters
276
I. Barbara et al.
in a general nonlinear PDE. In order to achieve this, let’s look at PDE equations that are parameterized and nonlinear form ∂t u + N [u, λ] = 0,
x ∈ Ω, t ∈ [0, T ],
(9)
where u denote the unknown solution, N [., λ] is a nonlinear operator parametrized by λ. This form of PDEs models many mathematical physics problems such as diffusion processes, conservation laws, and advection-diffusionreaction systems. The authors in [43] proposed the one dimensional Burgers’ equation as a motivating example to illustrate the method, where N [u, λ] = λ1 uux − λ2 uxx and λ = (λ1 , λ2 ). The issue of data-driven discovery of PDEs finding raises the following question: What are the parameters λ that best represent the observed data when a small set and perhaps noisy observations of the state u(t; x) of a system are available? So, The parameter identification setting assumes a set of training data D := tiu , xiu , ui , where ui = u(tiu , xiu ) are, possibly noisy, observations of the problem solution states 9 in order to identify the unknown parameter λ. Let’s define f (t, x) as given by f := ∂t u + N [u, λ],
(10)
and then use a deep neural network to approximate u(t, x). The parameters of the neural network and the parameters λ of the differential operator can be learned by minimizing the following mean squared error: M SE = M SEu + M SEf , where M SEu =
(11)
N 1 |u(tiu , xiu ) − ui |2 , N i=1
(12)
N 1 |f (tiu , xiu )|2 . N i=1
(13)
and M SEf =
While M SEu and M SEf are the mean squared error losses respectively correspond to the training data on u(t, x) and enforce the structure imposed by the Eq. 9 at the same number N of collocation points in the training data. In the same way that the unknowable weights W l and bias bl , the unknown parameters λ can be learned during the training by automatic differentiation of the loss function with respect to λ. PINNs can be used to solve nonlinear PDEs. This method has the significant benefit of being data-efficient in that it does not necessitate a huge number of training samples, which can be challenging to get in physical experiments. In actuality, no additional knowledge of solution values is needed beyond the initial time and spatial boundary. We point out that the approach’s main focus is on difficult physics aspects like shocks, convection dominance, and other phenomena
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
277
rather than on the solution of high-dimensional issues. Another benefit of this method is that the loss function value can be utilized as a stopping condition during training because it can be viewed as a gauge of approximation accuracy. We should also keep in mind that all the derivatives needed to derive the PINNs 3 can be generated using the chain method and assessed using automatic differentiation. Many recent studies display that physics-informed discrete learning schemes like convolutional neural networks (CNNs) have faster convergence and improved scalability for modeling PDE systems [5,14,15,45,59,63], due to their flexible architecture and power of effective filtering across the computational domain. With regard to the systems that don’t depend on time (e.g., steady-state PDEs), Zhu and Zabaras in [63,64] used CNNs to construct substitute PDE systems and quantify uncertainty (UQ) in a rectangular reference domain. Moreover, Geneva et all in [14] proposed PhyGeoNet which transforms the coordinates between the physical and reference domains for the geometry-adaptive to solve the steady-state PDEs. However, in the case of time-dependent systems, most neural network-based solutions continue to emphasize data-driven strategies in the regular/rectangle grid [28,53] or the irregular grid [37,50]. Jagtapa et all in [25] present XPINNS as a generalization of PINNS that involves several neural networks and enables parallelization in space and time via domain decomposition. For more recent methods in domain decomposition and machine learning see the review [22].
3
Variational Methods
The mean-square error between the output of the model created by a neural network and the given labeled data is frequently utilized as the loss function in machine learning projects. However, there are various ways to build the loss function when employing a neural network for solving PDEs. The loss function can be determined as the Euler-Lagrange equation of a variational problem. The typical example of these methods is the Deep Ritz Method (DRM) proposed by Weinan and Yu in [60]. In this section, we briefly discuss the approximation properties and definition of the deep neural network (DNN) for variational problems and the formulation of the DRM and these variants. The DRM is a deep learning based method for approximate the PDE solution by solving the variational problem. More precisely, the basic idea is to obtain a DNN which minimizes the energy function derived from variational problem associated to the PDE. While the training data is a set of points that chosen randomly over the given domain. As an illustration, we consider the flowing Poisson equation as an example: −Δu(x) = f (x), x ∈ Ω (14) u(x) = 0, x ∈ ∂Ω
278
I. Barbara et al.
where f represents the external force acting on the system and it’s given, while Ω is a bounded domain in Rd and d is the dimension of the domain. The energy function derived from the variational problem associated to the problem 14 is: 1 ∇v(x) · ∇v(x)dx − v(x)f (x)dx, v ∈ H(Ω) (15) J[v] = 2 Ω Ω Then, the approximate solution u of 14 can be obtained by solving the flowing optimization problem: u = arg min J(v). (16) v∈H(Ω)
Here H is the set of admissible functions of infinite dimension. But, the use of neural networks uθ (x) as a trial functions make H a finite-dimensional space. The goal, therefore, is to find the optimal set of parameters θ∗ in the neural network, so as to minimize the following energy function: 1 2 |∇ˆ uθ (x)| − f (x)ˆ uθ (x) dx. (17) J(θ) = 2 Ω Ω In general, the above function 17 is non-convex. And, due to the possibility of local minima, it is nontrivial to solve a non-convex problem. which presents a challenge for the DNN-based methods to solve PDEs. The SGD method [7] is typically used to tackle this problem since the θ parameter space has a relatively large dimension. To approximate the objective function above 17, the authors in [60] use a Monte Carlo quadrature scheme to discretize the integral J(θ) ≈
N 2 i 1 1 ∇ˆ uθ xi − f xi u ˆθ x , N i=1 2
(18)
with N sampling points {xi }N i=1 chosen randomly over Ω. The neural network used in this approach consists of several blocks, each block contains two linear transformations with two activation functions and one residual connection as shown in Fig. 1. The i − th block is expressible as y = fi (x) = σ (θi,2 · σ (θi,1 · x + bi,1 ) + bi,2 ) + x,
(19)
where x is the input, y is the output, θi,1 , θi,2 , ..., bi,1 , bi,2 , ... are the parameters of the network (weights and bias), and σ is the activation function. The authors decide to use the following activation faction to balance the simplicity and the accuracy (20) σ(x) = max{x3 , 0}, the completely connected n-layer network can be expressed as follows: fθ (x) = fn ◦ fn−1 · · · ◦ f1 (x).
(21)
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
279
Fig. 1. Network Structure of Deep Ritz Method
The boundary conditions are weakly imposed by adding a penalty term to the energy functional 17 like 1 2 |∇ˆ uθ (x)| − f (x)ˆ uθ (x) dx + β u ˆθ (x)2 dx, (22) J(θ) = 2 Ω Ω ∂Ω in the case of the homogeneous Dirichlet problem 1. The penalty parameter β is utilized to enforce the boundary conditions, we refer the reader for further details to [11,58,60]. Liao and Ming give an extension to the DRM approach in [29] named the Deep Nitsche method (DNM). The approach is based on Nitsche’s energy formulation [35], which omits the use of a Lagrange multiplier. J. Chen et all propose in [12] to use quasi-Monte Carlo sampling, instead of the Monte-Carlo method to approximate the integral in the loss function. For periodic boundary conditions, the construction of a specific DNN can automatically satisfy the boundary condition [19]. In order to effectively resolve a class of second-order boundary-value problems on complex geometries, Sheng and Yang in [51] introduce PFNN, a penalty-free neural network technique. The original problem is reformulated to a weak form in order to avoid evaluating high-order derivatives, which lowers the smoothness requirement. The approximate solution is constructed using two
280
I. Barbara et al.
neural networks rather than one. The first one satisfies the boundary conditions and the second handles the remaining portion of the domain. In the same field of irregular domain methods, The weak adversarial networks proposed by Zang et all in [61] approximate the trial and test functions using two neural networks, and train them alternately as an operator norm minimization issue. After formulation of PDEs as a saddle-point problem in the weak formulation.
4
Deep Galerkin Method
The Deep Galerkin Method (DGM) is a deep learning-based algorithm for solving high-dimensional PDEs proposed in [52] by Sirignano and Spiliopoulos. It is worth mentioning that DGM has no association with Galerkin from the standpoint of numerical PDEs. Although DGM is called after Galerkin, it is not a Galerkin method. DGM uses a DNN instead of a linear combination of basis functions. The main application of DGM is to solve the nonlinear second-order parabolic equations. Like the PINN method reviewed in Sect. 2, the DGM uses a deep neural network to approximate the PDE solution. The network is formed by minimizing the loss function defined as a residual of the strong solution in the least squares sense. Noting that the training data, like the previous mesh-free methods, consists of a set of points sampling randomly from the different regions in which the PDE is defined. In this section, we present a brief review of the DGM approach and for more discussion of DGM and its applications, we refer the reader to [3,52]. Consider the general form of a PDE as follows: ∂t u(t, x) + Lu(t, x) = 0, u(0, x) = u0 (x), u(t, x) = g(t, x),
(t, x) ∈ [0, T ] × Ω
x∈Ω x ∈ [0, T ] × ∂Ω
(23)
where L is a differential operator and u is unknown function defined on the region [0, T ] × Ω, while Ω ∈ Rd and ∂Ω is the boundary of Ω. The main goal is to approximate the unknown function u with a deep neural network f (t, x; θ) of parameters θ. The optimal parameters of the neural network f (t, x; θ) which approximates the exact solution u can be found by minimizing the following L2 error: 2
2
J(f ) = ∂t f + Lf 2,[0,T ]×Ω + f − g22,[0,T ]×∂Ω + f (0, ·) − u0 2,Ω .
(24)
The loss functional J(f ) consists of three parts that measure how well the approximate solution f satisfies the differential operator of PDE, initial, and boundary conditions. Notably, J(f ) can be derived directly from the PDE for any approximation f without prior knowledge of the
actual solution u. The norm used in this method is defined as h(y)Y,ν = Y |h(y)|2 ν(y)dy where ν is a density defined on the region Y. The objective is to identify a set of parameters θ such that the function f (t, x; θ) minimizes the positive error J(f ), i.e. find θ for which J(f (.; θ))
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
281
approaches 0 as much as possible. Therefore, the stochastic gradient descent (SGD) used to minimize the loss functional J(f ) on a set of time and space points sampling randomly from Ω and ∂Ω. Which make the method mesh-free. For more description of the DGM algorithm, we refer to [3,52]. Like the methods discussed before, the DGM algorithm may only converge to a local minimum due to the non-convex of f (t, ; θ). The core component of almost all methods for training deep learning models, SGD has nevertheless demonstrated to be quite effective in practice. The architecture of a deep neural network adopted by Sirignano and Spiliopoulos [52] is similar to the long short-term memory networks (LSTM) [23]. It consists of three layers named DGM layers, each DGM layer takes a mini-batch inputs x = (t, x) as an input (set of randomly sampled time-space points) and the output of the previous DGM layer. We present below the architecture in the equations proposed by the authors: S 1 = σ W 1 x + b1 = 1, . . . , L Z = σ U z, x + W z, S + bz, , g, g, 1 g, G = σ U x + W S + b , = 1, . . . , L r, r, r, (25) R = σ U x + W S + b , = 1, . . . , L h, h, h, H = σ U x + W S R +b , = 1, . . . , L +1 S = 1−G H +Z S , = 1, . . . , L f (t, x; θ) = W S L+1 + b where U, W and b are the parameters of the network, L and σ are the total number of layers and the activation function respectively, while denotes the Hadamard multiplication. To illustrate the mathematical representation of the DGM architecture 25 that is complicated, we use the following figures that visualize the architecture. Figure 2 represents the architecture of a single DGM layer, while Fig. 3 visualizes the whole network consisting of several DGM layers As mentioned by Sirignano and Spiliopoulos [52], the architecture of a neural network plays a very important role in the performance of the method, and good choice of an architecture is the architecture that takes advantage of a priori application knowledge. The authors tested the DGM algorithm on a class of high-dimensional (up to 200 dimensions) free-boundary PDEs. Also, present their results for the Burgers’ equation and Hamilton-Jacobi-Bellman PDE at high dimensions. The DGM approach has been used in several works in the literature, to name a few we refer the interested reader to [2,9,10,31,33,49,57]. The DGM algorithm can be simply adjusted to solve elliptic, hyperbolic, and partial-integral differential equations. And the algorithm essentially keeps the same performance for these other types of PDEs. Although, the authors in the original paper prove the approximation power of this approach for a class of quasilinear parabolic PDEs. So, numerical performance for these other types of PDEs remains to be investigated.
282
I. Barbara et al.
Fig. 2. The single DGM layer
Fig. 3. Network Structure of Deep Galerking Method
5
Conclusion
In this short review, we presented the state of the art of methods based on the use of deep learning to solve PDEs. Specifically these arise by minimizing an energy function to solve the desired PDE. Where PINNs (Sect. 2) have the potential to address pertinent issues, particularly with regard to inverse problems
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
283
and data assimilation. However, PINNs still have some problems like how to properly incorporate physical information into neural networks. And regarding the DRM method, as representative of the methods based on the variational formulation (Sect. 3), it’s was developed to solve Poisson equation and eigenvalue problems. While, the approximation performance of DGM approach (Sect. 4) was demonstrated for a class of quasilinear parabolic PDEs and can be easily modified to apply to other type of PDEs. However, here performances for these other types of PDEs remains to be studied. We believe that these developments will pave the way for an exciting future for this area of research. The methods discussed here open the door to finding solutions to a variety of recently intractable practical issues can now be found. But so far, we have not had a rigorous mathematical analyzes of approximative capabilities of these methodes. In particular, although the result presented by Sirignano and Spiliopoulos in [52] concerning the ability of neural networks to approximate a class of quasilinear parabolic PDEs, there is still a long way to go towards a complete mathematical theory.
References 1. Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous systems (2015) 2. Al-Aradi, A., Correia, A., Jardim, G., de Freitas Naiff, D., Saporito, Y.: Extensions of the deep Galerkin method. Appl. Math. Comput. 430, 127287 (2022) 3. Al-Aradi, A., Correia, A., Naiff, D., Jardim, G., Saporito, Y.: Solving nonlinear and high-dimensional partial differential equations via deep learning. arXiv preprint: arXiv:1811.08782 (2018) 4. Beck, C., Hutzenthaler, M., Jentzen, A., Kuckuck, B.: An overview on deep learning-based approximation methods for partial differential equations. arXiv preprint: arXiv:2012.12348 (2020) 5. Bhatnagar, S., Afshar, Y., Pan, S., Duraisamy, K., Kaushik, S.: Prediction of aerodynamic flow fields using convolutional neural networks. Comput. Mech. 64(2), 525–545 (2019) 6. Blechschmidt, J., Ernst, O.G.: Three ways to solve partial differential equations with neural networks-a review. GAMM-Mitteilungen 44(2), e202100006 (2021) 7. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’2010. hysicaVerlag HD, pp. 177–186. Springer, Cham (2010). https://doi.org/10.1007/978-37908-2604-3 16 8. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018) 9. Carmona, R., Lauri`ere, M.: Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games I: the ergodic case. SIAM J. Numer. Anal. 59(3), 1455–1485 (2021) 10. Carmona, R., Lauriere, M.: Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games: Ii–the finite horizon case. arXiv preprint: arXiv:1908.01613 (2019) 11. Chen, J., Du, R., Wu, K.: A comparison study of deep Galerkin method and deep Ritz method for elliptic problems with different boundary conditions. arXiv preprint: arXiv:2005.04554 (2020)
284
I. Barbara et al.
12. Chen, J., Du, R., Li, P., Lyu, L.: Quasi-monte Carlo sampling for machine-learning partial differential equations. arXiv preprint: arXiv:1911.01612(2019) 13. Chen, Y., Lu, L., Karniadakis, G.E., Dal Negro, L.: Physics-informed neural networks for inverse problems in Nano-optics and metamaterials. Opt. Express 28(8), 11618–11633 (2020) 14. Gao, H., Sun, L., Wang, J.-X.: PhyGeoNet: physics-informed geometry-adaptive convolutional neural networks for solving parameterized steady-state PDEs on irregular domain. J. Comput. Phys. 428, 110079 (2021) 15. Geneva, N., Zabaras, N.: Modeling the dynamics of PDE systems with physicsconstrained deep auto-regressive networks. J. Comput. Phys. 403, 109056 (2020) 16. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, pp. 249–256 (2010) 17. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016) 18. Grohs, P., Hornung, F., Jentzen, A., Von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. arXiv preprint: arXiv:1809.02362 (2018) 19. Han, J., Jianfeng, L., Zhou, M.: Solving high-dimensional eigenvalue problems using deep neural networks: a diffusion monte Carlo like approach. J. Comput. Phys. 423, 109792 (2020) 20. He, J., Li, L., Xu, J., Zheng, C.: Relu deep neural networks and linear finite elements. arXiv preprint: arXiv:1807.03973 (2018) 21. He, Q., Barajas-Solano, D., Tartakovsky, G., Tartakovsky, A.M.: Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport. Adv. Water Resour. 141, 103610 (2020) 22. Heinlein, A., Klawonn, A., Lanser, M., Weber, J.: Combining machine learning and domain decomposition methods for the solution of partial differential equations-a review. GAMM-Mitteilungen 44(1), e202100001 (2021) 23. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 24. Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990) 25. Jagtap, A.D., Kawaguchi, K., Karniadakis, G.E.: Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. J. Comput. Phys. 404, 109136 (2020) 26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint: arXiv:1412.6980 (2014) 27. Lagaris, I.E., Likas, A., Fotiadis, D.I.: Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural Netw. 9(5), 987–1000 (1998) 28. Li, Z., et al.: Fourier neural operator for parametric partial differential equations. arXiv preprint: arXiv:2010.08895 (2020) 29. Liao, Y., Ming, P.: Deep Nitsche method: Deep Ritz method with essential boundary conditions. arXiv preprint: arXiv:1912.01309 (2019) 30. Yulong, L., Jianfeng, L.: A universal approximation theorem of deep neural networks for expressing probability distributions. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3094–3105 (2020)
Train a DNN by Minimizing an Energy Function to Solve PDEs: A Review
285
31. Lyu, L., Zhang, Z., Chen, M., Chen, J.: MIM: a deep mixed residual method for solving high-order partial differential equations. J. Comput. Phys. 452, 110930 (2022) 32. Malek, A., Beidokhti, R.S.: Numerical solution for high order differential equations using a hybrid neural network-optimization method. Appl. Math. Comput. 183(1), 260–271 (2006) 33. Matsumoto, M.: Application of deep Galerkin method to solve compressible NavierStokes equations. Trans. Japan Soc. Aeronaut. Space Sci. 64(6), 348–357 (2021) 34. Meng, X., Li, Z., Zhang, D., Karniadakis, G.E.: PPINN: Parareal physics-informed neural network for time-dependent PDEs. Comput. Methods Appl. Mech. Eng. 370, 113250 (2020) ¨ 35. Nitsche, J.: Uber ein Variationsprinzip zur L¨ osung von Dirichlet-Problemen bei Verwendung von Teilr¨ aumen, die keinen Randbedingungen unterworfen sind. Abh. Math. Semin. Univ. Hambg. 36(1), 9–15 (1971). https://doi.org/10.1007/ BF02995904 36. Paszke, A., et al.: Automatic differentiation in PyTorch (2017) 37. Pfaff, T., Fortunato, M., Sanchez-Gonzalez, A., Battaglia, P.W.: Learning meshbased simulation with graph networks. arXiv preprint: arXiv:2010.03409 (2020) 38. Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., Liao, Q.: Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 14(5), 503–519 (2017) 39. Maziar Raissi and George Em Karniadakis: Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018) 40. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019) 41. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Machine learning of linear differential equations using gaussian processes. J. Comput. Phys. 348, 683–693 (2017) 42. Raissi, M., Perdikaris, P., Karniadakis, G.E: Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations. arXiv preprint: arXiv:1711.10561 (2017) 43. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (part II): data-driven solutions of nonlinear partial differential equations. arXiv preprint: arXiv:1711.10566 (2017) 44. Raissi, M., Yazdani, A., Karniadakis, G.E.: Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367(6481), 1026–1030 (2020) 45. Ranade, R., Hill, C., Pathak, J.: Discretizationnet: a machine-learning based solver for Navier-stokes equations using finite volume discretization. Comput. Methods Appl. Mech. Eng. 378, 113722 (2021) 46. Rao, C., Sun, H., Liu, Y.: Physics informed deep learning for computational Elastodynamics without labeled data. arXiv preprint: arXiv:2006.08472 (2020) 47. Ruder, S.: An overview of gradient descent optimization algorithms. arXiv preprint: arXiv:1609.04747 (2016) 48. SamRudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N.: Data-driven discovery of partial differential equations. Sci. Adv. 3(4), e1602614 (2017) 49. Saporito, Y.F., Zhang, Z.: Path-dependent deep Galerkin method: a neural network approach to solve path-dependent partial differential equations. SIAM J. Financ. Math. 12(3), 912–940 (2021)
286
I. Barbara et al.
50. Seo, S., Meng, C., Liu, Y.: Physics-aware difference graph networks for sparselyobserved dynamics. In: International Conference on Learning Representations (2019) 51. Sheng, H., Yang, C.: PFNN: a penalty-free neural network method for solving a class of second-order boundary-value problems on complex geometries. J. Comput. Phys. 428, 110085 (2021) 52. Sirignano, J., Spiliopoulos, K.: DGM: a deep learning algorithm for solving partial differential equations. J. Comput. Phys. 375, 1339–1364 (2018) 53. Sorteberg, W.E., Garasto, S., Pouplin, A.S., Cantwell, C.D., Bharath, A.A.: Approximating the solution to wave propagation using deep neural networks. arXiv preprint: arXiv:1812.01609 (2018) 54. Sun, L., Gao, H., Pan, S., Wang, J.-X.: Surrogate modeling for fluid flows based on physics-constrained deep learning without simulation data. Comput. Methods Appl. Mech. Eng. 361, 112732 (2020) 55. Tartakovsky, A.M., Marrero, C.O., Perdikaris, P., Tartakovsky, G.D., BarajasSolano, D.: Learning parameters and constitutive relationships with physics informed deep neural networks. arXiv preprint: arXiv:1808.03398 (2018) 56. Tipireddy, R., Perdikaris, P., Stinis, P., Tartakovsky, A.: A comparative study of physics-informed neural network models for learning unknown dynamics and constitutive relations. arXiv preprint: arXiv:1904.04058 (2019) 57. Vergunova, I., Vergunov, V., Rosemann, I.: Solving the coefficient inverse problem by the deep Galerkin method. In: 2021 11th International Conference on Advanced Computer Information Technologies (ACIT), pp. 65–70. IEEE (2021) 58. Wang, Z., Zhang, Z.: A mesh-free method for interface problems using the deep learning approach. J. Comput. Phys. 400, 108963 (2020) 59. Winovich, N., Ramani, K., Lin, G.: ConvPDE-UQ: convolutional neural networks with quantified uncertainty for heterogeneous elliptic partial differential equations on varied domains. J. Comput. Phys. 394, 263–279 (2019) 60. Bing, Yu., et al.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6(1), 1–12 (2018) 61. Zang, Y., Bao, G., Ye, X., Zhou, H.: Weak adversarial networks for highdimensional partial differential equations. J. Comput. Phys. 411, 109409 (2020) 62. Zhang, D., Guo, L., Karniadakis, G.E.: Learning in modal space: solving timedependent stochastic PDEs using physics-informed neural networks. SIAM J. Sci. Comput. 42(2), A639–A665 (2020) 63. Zhu, Y., Zabaras, N.: Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. J. Comput. Phys. 366, 415–447 (2018) 64. Zhu, Y., Zabaras, N., Koutsourelakis, P.-S., Perdikaris, P.: Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 394, 56–81 (2019)
Link Prediction Using Graph Neural Networks for Recommendation Systems Safae Hmaidi(B) , Imran Baali, Mohamed Lazaar, and Yasser El Madani El Alami ENSIAS, Mohammed V University in Rabat, Rabat, Morocco [email protected]
Abstract. In real-world applications, such as recommendation systems, link prediction is a difficult issue aimed at anticipating unobservable links between distinct objects through the learning of networked structured data. In this study, we present a graphical convolutional neural network (GCNN) model to address this problem, while simultaneously using the interaction relationship and content information on various items. The proposed GCN increases prediction accuracy by restricting the consistencies of the graph embedding from multiple perspectives, in contrast to existing strategies that directly concatenate hybrid approaches based on interaction and content information into a single view. Experimental results on three datasets, including Facebook, Google +, and Twitter datasets using various hyperparameters. Keywords: Recommendation systems · Link prediction · Graph convolutional neural network
1 Introduction Many e-commerce companies provide a diverse selection of items to their customers. Users frequently purchase things depending on their wants and preferences. Providing the most relevant items to consumers expedites the purchasing process and increases user happiness. Improved customer happiness keeps users loyal to the website, increasing sales and ultimately revenues for businesses. As a result, more shops begin promoting things to customers, necessitating an efficient analysis of user preferences in products. Amazon and Netflix, for example, employ recommender systems to recommend things to its customers. Standard recommendation algorithms, such as content-based filtering and collaborative filtering, model user ratings to objects as a matrix and predict customer ratings to unrated products based on user/item similarity [1]. Through recommender systems, users made recommendations. Examples of products and services include movies, music, books, websites, news, jokes, and restaurants serving. Data from the user’s characteristics, product descriptions, and previous purchasing, rating, and viewing behaviors are used by the suggestion process[1]. Indirectly, the data can be gathered by keeping track of previous user activity, such as songs listened to on music websites, news/movie watched on news/movie websites, goods bought on e-commerce websites, or books read on book-listing websites [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 287–298, 2023. https://doi.org/10.1007/978-3-031-43520-1_24
288
S. Hmaidi et al.
A common use of link prediction in heterogeneous networks is the recommender system. In a network represented as a graph, where nodes represent users and edges reflect interactions between users, the Link Prediction problem predicts future links. Users and items can both be nodes in recommender systems. A user buying something can be shown as an edge between the person node and the item node. By considering the recommendation problem as the work of selecting unknown links for each user node, it is possible to approach it as a link prediction problem [1]. In recent years, there has been an increase in research into graphic neural networks, which has resulted in a significant increase in the performance of tasks involving graphically structured data, This is critical for recommender applications. Graphical convolutional networks (GCN) are among the best-known techniques [2]. In recent years, there has been an increase in research into graphic neural networks, which has resulted in a significant increase in the performance of tasks involving graph structured data, This is critical for recommender applications. Graphical convolutional networks (GCN) are among the best-known techniques [3]. GCNs are built on the concept of creating a machine-learning model by iteratively merging feature information about graph structure and the structure of a node’s immediate graph neighborhood. The goal is to learn a mapping that integrates nodes as points into a low-dimensional vector space Rd as points. The primary contribution of representation learning techniques is the creation of a method to represent, or encode, a graph structure whose geometric connections in the embedding space correspond to the original network topology. The GCN has to work on the Laplacian eigenbasis, which takes a long time on large graph [3]. In this work, we have applied various link prediction measures in the context of recommender systems. We have applied graph convolutional network where we have chosen as well three dataset: Facebook, Twitter, and Google+ for experimentation. The paper’s first section mentioned link prediction and its methods, the second section presented the methodology of work and the final section explained the experiments.
2 Link Prediction People in social networks are linked with others in order to communicate. The social network is a graph representation in which two users are nodes and they are connected by some sort of edge. The network graph has both undirected and direct relationships. The link is directed if there is a relationship between the sources and the destination, and undirected if there is no connection between the sources and the destination. Link prediction is the current trend in analyzing the social network and predicting future linkages, which are incoming friends who are known or unknown in their shared friends list. Many social network services, such as Facebook, WeChat, and LinkedIn, are employed to predict the relationship [4]. The social network is evolving with time, which is referred to as a temporal network. Predicting a link in a temporal big network is still a difficult problem with many recorded parameter responses. The different link prediction algorithms are used to compute node similarity, forecast a missing link, recognize a collection of objects, and calculate node proximity. Many early efforts were given to establishing systems or tools to anticipate
Link Prediction Using Graph Neural Networks
289
future link states based on previous data using state-of-the-art methodologies that have certain limitations such as capacity and processing issues[5]. 2.1 Link Prediction Methods • Neighbor based Metrics To determine similarity between two nodes, a neighbor-based technique is utilized. • Common Neighbors (CN): This approach is predicated on the notion that two nodes with a large number of shared neighbors will eventually be connected. Both users are more likely to become friends the more neighbors they have [1, 6]. Because it is straightforward and logical, the Common Neighbor strategy is frequently used as a benchmark to assess the performance of other approaches. The complexity of this method is O(NK2 ). It is calculated as follows : (x, y) = (x) ∩ (y)
(1)
• Jaccard Coefficient (JC): The Jaccard coefficient, also known as the Jaccard index or Jaccard similarity coefficient, is a statistic that assesses how similar two sample sets are to one another. Commonly, it is written as J (x, y), where x and y stand for two separate network nodes [1, 7]. In link prediction, each node’s neighbors are considered as a set, and the prediction is made by calculating and rating how similar each node pair’s neighbor set is to another node pair. This method is based on the Common Neighbors method and has a complexity of O(NK2). This method is mathematically expressed as follows: (x) ∩ (y) (2) Jaccard Coefficient : (x, y) = (x) ∪ (y) • Adamic-Adar Coeffient (AA): Its original objective was to evaluate the link between personal homepages. As indicated in (3), the more friends z has, the lower the score it will be awarded. As a result, the common neighbor of a pair of nodes with few neighbors contributes more to the Adamic/Adar score (AA) value than those with numerous relationships [1, 8]. In a real-world social network, it may be read as follows: if a mutual acquaintance of two individuals has more friends, he is less likely to introduce the two people to each other than if he has less friends. It predicts friendship well based on the personal homepage and the Wikipedia Cooperation Graph but badly based on author collaboration. It is yet another strategy that is based on common neighbors; the complexity is also O (NK2 ). 1 Adamic_Adar : (x, y) = (3) log|N | z∈N(x)∩N (y)
where z is a shared neighbor of nodes x and y.
290
S. Hmaidi et al.
• Path-Based Metrics The network’s node sequence is linked to the network’s link sequence. As a result, node and neighbor information is required in this technique to determine node similarity. • Shortest Distance (SD): The distance between two nodes that are the most comparable is the shortest. SD (x, y) score = length (shorts(x, y)) • Local Path (LP): There is an adjacency matrix with lengths of 2 and 3, denoted as A2 and A3. The neighboring factor might be defined as α which is between 1 and −1. LP = A2 + αA3
(4)
• Kartz (KA): This approach is used to detect similarities when a path is lengthy and has a low weight. It is defined as a collection of all paths between x and y, with the length decay factor ranging from 0 to 1 (0 < β < 1). Katz (x, y) =
∞
l β l ∗paths(x,y) = βA + β 2 A2 + +β 3 A3 + . . .
(5)
l=1
• Random – Walk Based Metrics: The Random walk is a common similarity-based method for link prediction that detects similarities between nodes by randomly traversing the graph in global and quasi-local ways. Using the random walk with restart algorithm, the walker begins traversing from the first vertex by taking random steps and goes randomly to one of the first vertex’s neighbors with a probability of c before returning to the first vertex with a probability of (1- c) [9]. This index’s value for the pair I and j is equal to the probability of this random walker starting at vertex I and ending up at vertex j in the equilibrium state [10].
3 Methodology In this part, we explain and discuss relevant work on link prediction and the current development of graph convolutional networks. Prediction of a link was initially suggested in Physics by LibenNowell and Kleinberg [11], who employed several graph proximity metrics to forecast coauthorship networks. Since then, this difficult problem has been extensively researched and effectively applied to a variety of disciplines. Many link prediction algorithms have been suggested based on the evolution mechanisms of complex networks. Topology-based similarity measures have been particularly successful due to their excellent performance and minimal complexity[12]. Topology-based similarity measures are divided into three types: (1) neighborbased metrics (CN) [13], adamic and adar (AA) [14], and preferred Attachment (PA); (2) path-based metrics (Katz [15], local path (LP), and popFlow [16]; and
Link Prediction Using Graph Neural Networks
291
(3) pattern-based metrics [17]. When considering the quantity of resources in the information transmission process between nodes, the resource-allocation-based prediction algorithms perform quite well. They, however, disregard information routes and their information capability. The process of transmitting data between two ends. The Cannikin Law then inspired Li et al. [10] to suggest the potential information capacity (PIC) index takes into account the number of pathways as well as the volume of information. These pathways can transport data and achieve high link prediction performance. Convolutional graph networks Recently, there has been a rise of interest in generalizing convolutions to the graph domain, and they have exhibited considerable benefits in a range of applications such as network embedding [18, 19]. A graph convolution layer recovers local substructure properties for individual nodes, whereas a graph aggregation layer aggregates node-level features into a graph-level feature vector. Advances in this field are typically classed as spatial methods or spectral techniques. Convolutions are defined directly on the graph using geographically near neighbors in spatial techniques. Monti et al. [20] suggested a non-Euclidean spatial-domain model (MoNet) that might extend numerous earlier approaches. As specific cases of MoNet, the Geodesic CNN (GCNN) [19] and Anisotropic CNN (ACNN) [21] on manifolds or DCNN [22] on graphs might be developed. Bruna et al. [19] were the first to propose convolution for graph data in the spectrum domain using a graph Laplacian matrix, which functions similarly to the Fourier basis in signal processing. Bruna’s recommended solution, on the other hand, may need extensive calculation and may be impractical in reality. Defferrard et al. [23] presented ChebNet to tackle the efficiency problem, which defines a filter as Chebyshev polynomials of the diagonal matrix of eigenvalues. Instead of conducting the eigen-decomposition, Kipf and Welling [23] reduced the filtering even further by employing ChebNet’s first-order approximation. The graph convolution described by 1stChebNet is spatially confined. It bridges the gap between spectral- and spatial-based approaches. Huang et al. [19] suggested an adaptive layer-wise sampling strategy to speed up 1stChebNet training. By sampling neighborhoods, Chen et al. [23] lowered the receptive field size of the graph convolution to an arbitrarily tiny scale. 3.1 Graph convolutional network We assume a multiple layer graph convolutional network with the following layer-wise propagation strategy given a network’s adjacency matrix A ∈ {1, 0}nxn with n nodes and a node feature matrix X ∈ Rn × c with c dimensional feature vector for each node. ˜ (l)W(l)) ˜ − 1AH H(l + 1) = σ (D
(6)
where à = A + IN is the network G’s adjacency matrix with self-loops added. The iden˜ i = j Ai, ˜ jw(l)∈ tity matrix is represented by IN. De is a diagonal degree matrix with Di, Rcxc’ is a matrix of trainable graph convolution parameters, and H(l) R nc are network embeddings in the l − th layer; H(0) = X. The mechanism behind Eq. (8) is that starting node states X are first transformed linearly by multiplying W(0), and then transmitted to ˜ -1 Ã. The i-th row of H(l) after graph adjacent nodes through the propagation matrix D convolution becomes: 1 (l) xi w(0) + xj w(0) (7) Hi = σ ( i∈(i) |(i)| + 1
292
S. Hmaidi et al.
It aggregates node information as well as the first-order structural pattern from node vi. neighbors GCN stacks several graph convolutional layers and concatenates the node states of each layer to form the final node states[24]. • Hyperparameters • Activation function: An activation function is a function that is applied to an artificial neural network to assist it in learning complicated patterns in data. When contrasted to a neuronbased model seen in human brains, the activation function ultimately decides what neuron to fire next. Relu 6: The relu6 () function finds the rectified linear 6 i.e. min (max(x, 0), 6) element-wise. RELU 6(x) = min(max(0, x), 6)
(8)
Hardswish: is a form of activation function based on Swish, except it substitutes a piecewise linear equivalent for the computationally costly sigmoid. ⎧ if x ≤ −3 ⎨0 Hardwish(x) = x (9) if x ≥ +3 ⎩ x.(x + 3)/6 Otherwise Selu: SELUs, or Scaled Exponential Linear Units, are activation functions with self-normalizing features. SELU (x) = scale ∗ (max (0, x) + min(0, α ∗ (exp(x) − 1)))
(10)
• Optimization Algorithm: Optimization algorithms help us minimize (or maximize) an Objective function (also known as an Error function) E(x), which is simply a mathematical function based on the Model’s internal learnable parameters that are used in computing the target values (Y) from the set of predictors (X) used in the model. AdamW: AdamW is a stochastic optimization approach that extends Adam’s standard weight decay implementation by detaching weight decay from gradient updating. gt = ∇f (θt ) + wt θt
(11)
Adagrad: AdaGrad is an adaptation of the gradient descent optimization technique that allows the step size in each dimension utilized by the optimization process to be automatically modified depending on the gradients found for the variable (partial derivatives) encountered throughout the search. θt+1,i = θt,i −
η Gt,ii + ε
gt,i
(12)
Link Prediction Using Graph Neural Networks
293
AdaDelta: AdaDelta is a stochastic optimization methodology for SGD that enables per-dimension learning rate methods. It is an Adagrad modification that aims to lower its aggressive, monotonically declining learning rate. Rather than aggregating previous squared gradients. AdaDelta limits the size of the window of accumulated prior gradients. E[g]t = γ E[g]t−1 − (1 − γ )gt
(13)
• Learning Rate The learning rate of an optimization algorithm is a tuning parameter that defines the step size at each iteration while heading toward the minimum of a loss function.
4 Experiments and Results Before developing our recommendation system model, we discovered this solution for a recommendation system using Graph neural network [25], but with only 80% accuracy. Since we are unsatisfied with this precession, we developed our GCN model and chose the best optimization function, epochs numbers, and activation function based on benchmarking, and we improved accuracy from 80% to 94%. 4.1 Datasets The proposed solution aims to adapt to different types of social networks and improve link prediction accuracy. We tested three different types of real-world social network datasets with the other two methods. Table 1 lists the datasets that have been tested and their details. Table 1. The statistical information of each real-world networks Dataset
Edge
Node
Facebook
88234
4039
Twitter
1768149
81306
Google +
13673453
107614
In our study we are interested in three datasets: Facebook, Twitter, and google plus, First, we will start with the Facebook dataset that contains 4039 nodes that mean 4039 users and 88234 edges meaning 88234 relationships between users and based on 10 features like gender, birthday, education, languages, locales, etc.…, and for the Twitter dataset, we have 81306 nodes and 1768149 edges and 2 features. Finally for google plus we have 107614 nodes and 13673453 edges and 6 features. In our experiments, we choose datasets from snap Stanford.
294
S. Hmaidi et al. Table 2. Table of AUC-ROC using GCN
Datasets
GCN
Facebook
93%
Twitter
79%
Google +
89%
4.2 Experimental results After running our model, which has three layers: one for inputs, one for hiding, and one for output we get the results bellow Table 2; • Performance comparison on Facebook dataset depends on epochs Because we are not satisfied with 93% AUC, we decide to optimize the result. The first thing we do is increase the number of epochs that the model trains on, and the result is as follows:
Fig. 1. Model accuracy depend on epochs
As shown in Fig. 1, the higher accuracy is 0.9368, which corresponds to epochs 200. Therefore, the best epochs here is 200. • Performance comparison on Facebook dataset depends on Activation function In this part, we optimize the activation function by trying one of the three best functions: relu6, hardswish, and selu we can easily state that the relu6 is the most efficient one as the Fig. 2 shows. As illustrated in Fig. 2, the higher accuracy is 0.9421, which corresponds to the relu6 activation function, implying that relu6 is the most efficient.
Link Prediction Using Graph Neural Networks
295
Fig. 2. Model accuracy depend on activation function
• Performance comparison on Facebook dataset depends on Optimization algorithm In this part, we optimize the Optimization algorithm by trying 3 algorithms: AdamW, Adagrad and AdaDelta, After we train our model with these three algorithms, we get the result below:
Fig. 3. Model accuracy depend on Optimization algorithm
As illustrated in Fig. 3, the AdamW optimizer is clearly the most efficient. • Performance comparison on Facebook dataset depends on learning rate Now we will tweak our model by adjusting the learning rate. This is the outcome of training our model with this different learning rate.
296
S. Hmaidi et al.
Fig. 4. Model accuracy depend on learning rate
5 Conclusion Link prediction remains a difficult topic in link mining and analysis. The connection studies. Moreover, the undirected graphs are heavily emphasized in prediction. In this study, we investigate the current GCN. For link prediction, models were developed and tested on various datasets. Not only did we replicate the results, but we also presented a more fair and methodical comparison. Our findings reveal that various GCN designs perform similarly on additional link prediction benchmarks. There are various intriguing possibilities for the future of this paper. First, the datasets on which we benchmarked are still quite tiny, we might examine the in the future models on significantly bigger graphs, particularly for applications in the actual world. The second intriguing avenue for this study would be to apply more recently discovered GCN models. We might also try to create and construct our own GCN architecture and benchmark on link prediction tasks.
References 1. Lakshmi, T.J., Bhavani, S.D.: Link prediction approach to recommender systems (2012). https://doi.org/10.48550/arxiv.2102.09185 2. Zhang, S., Tong, H., Xu, J., Maciejewski, R.: Graph convolutional networks: a comprehensive review. Comput. Soc. Netw. 6(1), 1–23 (2019). https://doi.org/10.1186/S40649-019-0069-Y/ FIGURES/1 3. Shafqat, W,. Byun, Y.C.: Enabling untact culture via online product recommendations: an optimized graph-CNN based approach. Appl. Sci. 10(16), 5445 (2020 ). https://doi.org/10. 3390/APP10165445 4. Daud, N.N., Ab Hamid, S.H., Saadoon, M., Sahran, F., Anuar, N.B.: Applications of link prediction in social networks: a review. J. Netw. Comput. Appl. 166, 102716 (2020). https:// doi.org/10.1016/J.JNCA.2020.102716
Link Prediction Using Graph Neural Networks
297
5. Mutlu, E.C., Oghaz, T., Rajabi, A., Garibay, I.: Review on learning and extracting graph features for link prediction. Mach. Lear. Knowl. Extr. 2(4), 672−704 (2020). https://doi.org/ 10.3390/MAKE2040036 6. Ahmad, I., Akhtar, M.U., Noor, S., Shahnaz, A.: Missing link prediction using common neighbor and centrality based parameterized algorithm. Sci. Rep. 10(1), 364 (2020). https:// doi.org/10.1038/s41598-019-57304-y 7. Shibata, N., Kajikawa, Y., Sakata, I.: Link prediction in citation networks. J. Am. Soc. Inform. Sci. Technol. 63(1), 78–85 (2012). https://doi.org/10.1002/ASI.21664 8. Lü, L., Zhou, T.: Link prediction in weighted networks: The role of weak ties. Europhys Lett 89(1), 18001 (2010). https://doi.org/10.1209/0295-5075/89/18001 9. Tong, H., Faloutsos, C., Pan, J.Y.: Fast random walk with restart and its applications. In: Sixth International Conference on Data Mining (ICDM 2006), 613–622 (2006). https://doi.org/10. 1109/ICDM.2006.70 10. Berahmand, K., Nasiri, E., Forouzandeh, S., Li, Y.: A preference random walk algorithm for link prediction through mutual influence nodes in complex networks. J. King Saud Univ. Comput. Inf. Sci. 34(8), 5375–5387 (2022). https://doi.org/10.1016/J.JKSUCI.2021.05.006 11. Wang, P., Xu, B.W., Wu, Y.R., Zhou, X.Y.: Link prediction in social networks: the state-ofthe-art. Sci. China Inf. Sci. 58(1), 1–38 (2014). https://doi.org/10.48550/arxiv.1411.5118 12. Liu, S., Ji, X., Liu, C., Bai, Y.: Similarity indices based on link weight assignment for link prediction of unweighted complex networks. 31(2) (2017). https://doi.org/10.1142/S02179 79216502544 13. Newman, M.E.J.: Clustering and preferential attachment in growing networks. Phys. Rev. E. Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 64(2), 4 (2001). https://doi.org/10.1103/ PhysRevE.64.025102 14. Adamic, L.A., Adar, E.: Friends and neighbors on the Web. Soc. Netw. 25(3), 211–230 (2003). https://doi.org/10.1016/S0378-8733(03)00009-1 15. Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953). https://doi.org/10.1007/BF02289026 16. Pujari, M., Kanawati, R.: Link prediction in multiplex networks. Netw. Heterogeneous Media. 10(1) 17–35 (2015). https://doi.org/10.3934/NHM.2015.10.17 17. Wu, S.Y., Zhang, Q., Wu, M.: Cold-start link prediction in multi-relational networks. Phys Lett A. 381(39), 3405–3408 (2017). https://doi.org/10.1016/J.PHYSLETA.2017.08.046 18. Zhang, Z., Cui, P., Zhu, W.: Deep learning on graphs: a survey. IEEE Trans. Knowl. Data Eng. 34(1), 249–270 (2018). https://doi.org/10.48550/arxiv.1812.04202 19. Niepert, M., Ahmad, M., Kutzkov, K.: learning convolutional neural networks for graphs. In: 33rd International Conference on Machine Learning, ICML 2016, vol. 4, pp. 2958–2967 (2016). https://doi.org/10.48550/arxiv.1605.05273 20. Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model CNNs’. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 5425–5434 (2016). https://doi.org/10.48550/arxiv.1611.08402 21. Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. Adv. Neural Inf. Process Syst. 3197–3205 (2016). https://doi.org/10.48550/arxiv.1605.06437 22. Pan, C., Wang, Y., Shi, H., Shi, J., Cai, R.: Network traffic prediction incorporating prior knowledge for an intelligent network. Sensors. 22(7), 2674 (2022). https://doi.org/10.3390/ S22072674 23. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process Syst. 3844–3852 (2016). https://doi. org/10.48550/arxiv.1606.09375
298
S. Hmaidi et al.
24. Gao, C., et al.: A Survey of Graph Neural Networks for Recommender Systems: Challenges, Methods, and Directions (2021). https://doi.org/10.48550/arxiv.2109.12843 25. Zhao, T., Liu, G., Wang, D., Yu, M.J.: preprint arXiv:2106.02172, and undefined 2021, ‘Counterfactual graph learning for link prediction’, arxiv.org. https://arxiv.org/abs/2106.02172. Accessed 20 Sep 2022
Uncertainty Analysis of a Blade Element Momentum Model Using GSA and GLUE Methods Yassine Ouakki(B) , Amar Amour, and Abdelaziz Arbaoui INSCM Team, LM2I, National School of Arts and Crafts (ENSAM), Moulay Ismail University of Mekn´es, BP 4024, Meknes, Morocco [email protected]
Abstract. A blade element momentum (BEM) based models are prone to uncertainties, which should be treated adequately. This paper focuses on static airfoil data and uses a variance-based Global Sensitivity Analysis (GSA) combined with a Generalized Likelihood Uncertainty Estimation (GLUE) method to cope with them. Implementation of these methods requires several Monte Carlo simulations. The GSA allows the detection of the most sensitive airfoil data parameters, whereas the GLUE finds out the confidence interval of the power output. This approach provides a more rigorous quantitative model validation metric in contrast to a deterministic one. The used approach also enables the detection of correlations between airfoil data parameters and provides a representation of their posterior probability distribution. Furthermore, it allows useful hints to improve the model and a fair judgment when choosing the best structure (brake state correction as an example) based on precision and exactness as new criteria.
Keywords: BEM theory
1
· Uncertainty analysis · GLUE · GSA
Introduction
The aerodynamic performances of a wind turbine rotor are commonly predicted using the Blade Element Momentum theory (BEM), because it is extremely fast and reduces the design cycle time and cost. Several refinements of the classical BEM are proposed in order to increase the accuracy of prediction. These relate, principally, to the brake state correction and the stall delay phenomenon [1–4]. These models use a deterministic approach: once the wind turbine configuration and static airfoil data are specified, the solution is uniquely determined without imprecision. The widely used method to validate BEM models is based on graphical validation by plotting graphs of prediction next to experimental data. This viewgraph-based judgment cannot provide an evaluation of the confidence in This work was supported by the Moulay Ismail University within the framework of its research support program 2018. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 299–311, 2023. https://doi.org/10.1007/978-3-031-43520-1_25
300
Y. Ouakki et al.
wind turbine power predictions, calling into question, consequently, the model’s ability to produce decision support during the preliminary design phase [5]. In particular, it is unclear whether modeling errors reported in the literature result from errors in the airfoil data or errors in the assumptions associated with used models. Furthermore, using this approach to select the best model structure (brake state correction in this paper) based on one given set of static airfoil data may be wrong if another set of static airfoil data supports another model better than the best model identified [6]. To cope with this vagueness, a rigorous quantitative model validation metric is necessary. In this regard, Global Sensitivity Analysis (GSA) and Uncertainty Analysis (UA) can provide useful hints to improve models and information regarding the best model structure to be used. The GSA and GLUE methods are used in the present work because they are computationally cheap compared to other methods, such as the bayesian approach. Indeed, the GLUE method corresponds to a certain approximate Bayesian procedure [30]. International agencies recommend using these tools as best practices for model application, validation, and audit [7]. Several quantitative approaches for GSA and UA have been pursued in recent literature, including probability theory, evidence theory, possibility theory, and interval analysis [8–11]. This paper uses a combination of a variance-based GSA and Generalized Likelihood Uncertainty Estimation (GLUE) [10,11]. These tools are widely used to investigate the uncertainties in the hydrology domain. Indeed Mannina et al. [12] use the GLUE method to assess the uncertainty of a membrane bioreactor model. Mortier et al. [13] use the same approach for uncertainty analysis of a drying model of pharmaceutical granules. The remainder of this paper is organized as follows. Section 2 present the implementation of uncertainty quantification in a homemade BEM model, under the Scilab open software. The results obtained are shown and analyzed in Sect. 3. Conclusions are given in Sect. 4.
2 2.1
Methodology Blade Element Momentum Theory
The performance of the rotor is predicted using the BEM method. In 1926 Glauert shows by experiment that the method above breaks down when the axial induction factor becomes larger. To take into account this problem, the user of our tool can select either Shen correction [15] or advance brake model developed by Buhl et al. [16]. These models can extend the validity of the thrust coefficient for axial anduction factors higher than 0.5 (Fig. 1). Furthermore, to satisfy the need for algebraic equations which express the lift and drag coefficients variation with the angle of attack, we use the AERODAS model developed by Spera [17]. AERODAS model use 14 parameters which can be derived from experimental airfoil data obtained using a wind tunnel or those obtained by CFD simulation to parameterize the lift and drag curves. These parameters are
Uncertainty Analysis of a BEM Model Using GSA and GLUE Methods
301
the aspect ratio, thickness to chord ratio in addition to the 12 parameters illustrated under bracket in the Fig. 2; all other parameters necessary to define the lift and the drag curves are derived explicitly from the 14 parameters. AERODAS model is shown in Fig. 2 for the UAE wind turbine using reference values of the 14 parameters as proposed by Spera [17]. The BEM model is implemented in Scilab and solved by iterative method to predict the power curve the UAE wind turbine. The model inputs include the blade geometry, the rotational speed and static airfoil data.
Fig. 1. Thrust coefficient as a function of the axial induction factor predicted using momentum theory and complemented by Shen and Buhl brake state models.
2.2
GSA and GLUE Methods
Consider two nonzero integers p and m and the model Y = f (X), where X ∈ Dx Rp is the input and Y ∈ Dy Rm is the output of the model. We assume that Xi (i=1...p) is a random variable, so that the output variable Y = f (X1 , . . . , Xp ) is also a random variable. Global sensitivity analysis aims to study variations in the output with regard to inputs parameters. Several methods for GSA are reported in the literature, including linear regression, correlation analysis, importance measures, variance-based, and screening methods [10–13,18] . In this article, we use Sobol’s variance-based GSA method. This
302
Y. Ouakki et al.
Fig. 2. Description of AERODAS parameters for the UAE wind turbine
method uses a probability approach to define a sensitivity index which measures the fraction of output variance that can be assigned to the variation of the input variable, either alone or in conjunction with other input variables. This way, input variables can be sorted by order of importance they have on the output [19,20]. The main effect or the first-order sensitivity index of Y to the variable Xi is defined as the top marginal variance divided by the total variance, as follows Si =
V (E(Y /Xi )) V (Y )
(1)
Si is always between 0 and 1. When Xi has a large weight in the variance, V (E(Y /Xi)) is large so that Si is large. Other indices (total effect, high-order effects) exist; in general terms, the sensitivity index associated with the group of variables u {1, 2, .., p} is given by Su =
V (E(Y /Xu )) V (Y )
(2)
The total number of sensitivity indices is 2p − 1 and their total sum is equal to 1: p p Si + Sij + ... + S1,2,...,p = 1 (3) i=1
i 1 we repeat the prediction at one step until the final horizon h of the prediction. 3.2
Algorithm
The following is a summary of the learning and prediction phases of the double vector quantization algorithm [6]: 1. Construction the regressors xt and deformations zt . 2. Application of the clustering algorithm on the sets of regressors and deformations. 3. Creation the clusters of regressors ci , 0 ≤ i ≤ n1 and deformations cj ,0≤ j ≤ n2 , from the prototypes given in step 2. 4. Creation of the transition matrix T (., .) according to the relation (13) . 5. Selection of the prototype x ¯k closest to the regressor xt for the last known at time t, among the prototypes x ¯i , 0≤ i ≤ n1 .
352
H. El Fahfouhi et al.
6. Selection of the prototype z¯l according to the law described in the line k of the matrix T (., .) . 7. The estimate of x ˆt+1 reminding the relation before (14). 8. we extract x ˆ(t + 1), the first component of the estimated regressor x ˆt+1 .
4
Experimentation, Results and Discussion
In this section we will applied our methods on an electrical consumption time series. We will first present different results of prediction problem using DQ then we are going to illustrate a comparative study. We note for the rest of the paper DV QS ( Double Vector Quantization Using SOM) and DV QG (Double Vector Quantization Using GMM). 4.1
Problem of Predicting Using DVQ Methods
We are going to handle the prediction of the future values of electric consumption. The dataset contains 164 weekly values of electrical consumption in France from 2019 to 2022,this real problem requires to predict values of next six months We fix the size of regressor p = 7 also the number of clusters for the regressor and the deformation vectors respectively n1 = 6 and n2 = 6 .We share our data to N = 139 values for training the model and Ntest = 25 for testing the model. The Figs. 2 and 3 show the difference between predicted values and real values of the series with DV QS and DV QG methods. The following Table 1 represents the error of DV QS and DV QG , also the compilation time of each one : Table 1. The Mean Absolute, Squared Error and Compilation time for DV QS and DV QG Method
Percentage of data MAE
DV QS training 84.75 % DV QG training 84.75 % DV QS testing DV QG testing
15.25 % 15.25 %
MSE
0,109 .103 12,189.103 0,109 .10 9.439.10
3
3
Compilation time (GPU)
0,0012008.107 1 s 20,6100273.107 1 s 0,0012008.107 0,90 s 13,0898785.107 0,50 s
Comparative Study Between Double Vector Quantization
353
Fig. 2. The real values of the serie and the predicting values given by DV QS
Fig. 3. The real values of the serie and the predicting values given by DV QG
4.2
Comparison and Discussion
In the previous subsection, we solved the problem using DV QS and DV QG then showed the result. In this section, we are going to discuss the different results
354
H. El Fahfouhi et al.
and compare them, for that we calculated the errors of each method: Mean Squared Error and Mean Absolute Error for training and testing examples, we also calculate the compilation time to choose the fast method. The results in the Table 1 show DV QS is more efficient than DV QG because the MSE and MAE for DV QS are smaller than those for DV QG , and both methods take nearly the same compilation time. We also have two plots in the Figs. 2 and 3 which show that DV QS predicts values are close to reality but DV QG predicts values that are far from the reality. As the double Vector Quantization does not change the method used it’s just used it two times. DV QS can be advantageous compared to DV QG and we can explain that by the good partition and location of the clusters when we use SOM, SOM has a topology-respecting properties, which means that two similar vectors are either in the same cluster or in neighboring clusters. As a result, it is possible to group together ”elements” of neighboring clusters in the sense of the grille to form ”macro-clusters” [11].
5
Conclusion
This work compares two prediction methods Double Vector Quantization based on SOM and GMM devoted to predicting time series in the long-term, we conclude that DVQ using SOM is more efficient than GMM.It has also an easy algorithm to implement and can give a graphical representation.
References 1. Simon, G., Lendasse, A., Cottrell, M., Fort, J.C., Verleysen, M.: Double quantization of the regressor space for long-term time series prediction: method and proof of stability. Neural Netw. 17(8–9), 1169–1181 (2004) 2. Simon, G., Lendasse, A., Cottrell, M., Fort, J.-C., Verleysen, M.: Double SOM for long-term time series prediction, pp.35–40 (2004) 3. Simon, G., Lee, J. A., Verleysen, M., Cottrell, M.: Double quantization forecasting method for filling missing data in the CATS time series. In: 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Vol. 2, pp. 1635-1640. IEEE (2004) 4. Reynolds, D.A.: Gaussian mixture models. Encycl. Biometrics 741, 659–663 (2009) 5. Simon, G.: M´ethodes non lin´eaires pour s´eries temporelles: pr´ediction par Double Quantification Vectorielle et s´election du d´elai en hautes dimensions. Presses univ. de Louvain.(2007) 6. Kohonen, T.: Self-organising maps. Springer series in information sciences., 2nd edn. Springer, Berlin (1995) 7. Volume of electricity consumption in France between the 1st of April 2019 and the 29th of January 2022, https://www.statista.com/statistics/1107089/electricityconsumption-france/ 8. Carte de kohonen, http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c clus/ kohonen.html 9. Gaussian Mixture Models with Sckit-learn in Python, https://cmdlinetips.com/ 2021/03/gaussan-mixture-models-with-scikit-learn-in-python
Comparative Study Between Double Vector Quantization
355
10. Self-Organizing Maps: Theory and Implementation in Python with Numpy, https://stackabuse.com/self-organizing-maps-theory-and-implementation-inpython-with-numpy/ 11. Gepperth, A., Pf¨ ulb, B.: A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models. In: Farkaˇs, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 863–872. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-61616-8 69 12. Zakariae, E.-N, LAZAAR, M., Ettaouil, M.: Hybrid system of optimal self organizing maps and hidden Markov model for Arabic digits recognition. WSEAS Trans. Syst. 13(60), 606–616 (2014)
Towards Development of Synthetic Data in Surface Thermography to Enable Deep Learning Models for Early Breast Tumor Prediction Zakaryae Khomsi(B) , Achraf Elouerghi, and Larbi Bellarbi ST2I Laboratory, E2SN Team, ENSAM-ENSIAS, Mohammed V University in Rabat, Rabat, Morocco [email protected], [email protected], [email protected]
Abstract. With the continued advancements in thermal imaging technologies and Artificial Intelligence (AI), thermography looks more promising safe solution for early breast cancer detection. Especially, the integration of deep learning methods in this field has become an essential option to improve the sensitivity and specificity of several thermographic devices. However, the available data of breast thermograms are not sufficiently large. In this study, we developed a supporting method, which allows the generation of thermographic data for early breast cancer detection. Using Finite Element Method (FEM) based on the numerical modeling software COMSOL, we performed simulations of different early tumor possibilities. The synthetic data samples are successfully exported in png and txt file formats. Thus, we systematically analyze the impact of several tumor configurations on skin temperature. We found that it is possible to predict tumor parameters from the temperature distribution of the breast. Therefore, the proposed approach represents a promising solution to improve decision-making in the field of early breast cancer detection. Keywords: Data generation · Thermal imaging · Breast modeling · Cancer prediction · Medical decision
1 Introduction Worldwide, the most commonly diagnosed cancer in females is breast cancer. According to the American Cancer Society, more than 280,000 new cases of breast cancer have been diagnosed in 2021. Society estimated that nearly 43,250 women will die from breast cancer in 2022 [1], making cancer a major global health emergency. Screening tools play an important role in the early prevention of breast cancer. One of the most used techniques is mammography. It is widely used and generally gives good results. However, this technique depends on the emission of X-rays, which presents a cancerogenic risk to the patient’s health resulting from radiation exposure [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 356–365, 2023. https://doi.org/10.1007/978-3-031-43520-1_30
Towards Development of Synthetic Data in Surface Thermography
357
Surface thermography is a non-invasive technique that measures temperature distribution emanating from the breast skin surface. Cancerous tissue generates more temperature than healthy tissue. This is resulting from the high metabolism of the tumor [3–5]. With the continued advancements in thermal imaging technologies and the usage of artificial intelligence, thermography looks more promising safe solution for early breast cancer detection [6, 7]. Actually, the integration of Artificial Intelligence (AI) in medical imaging technologies has become an essential discipline that improves decision-making. Especially, deep learning techniques have already shown great successful potential in various areas and applications [8–10]. For example, Mambou et al. [11] presented a study on breast cancer detection using thermal imaging and a deep learning model. They suggest information and guidelines that can help to improve diagnosis accuracy. Thus, Roslidar et al. [12] reported a review of recent progress in thermal imaging and deep learning approaches for breast cancer detection. They cover most related works for the implementation of deep neural networks on breast thermogram prediction, and they suggest future research directions for developing representative datasets to improve training performance. Thus, more research studies found encouraging results on the potential usefulness of thermography with deep learning in the early breast cancer detection field [13–15]. Deep learning algorithms are very similar to biological neural networks, they compare new information with known information to deliver accurate output. Deep learning models have proved more advantages in image recognition, feature extraction, and the analysis of complex data. However, the performance of these algorithms is highly dependent on data availability. Deep learning methods need a great amount of training data to minimize test errors and increase training performance [16, 17]. Especially, in the field of surface thermography for early breast tumor prediction, the available data are very lacking [18]. In this study, we aim to develop a supporting method to generate synthetic data that mimic several situations of breast tumors using COMSOL numerical modeling software. We export data samples in different forms that can be useful to enable deep learning models. Thus, we presented a detailed thermal analysis to demonstrate the impact of the synthetic data to predict early breast tumors.
2 Related Works Synthetic data is artificial data that is used to train deep learning models when realworld data is difficult or expensive to get. Actually, with the recent developments in deep learning, synthetic data started becoming an attractive option that gives the ability to build big data for powerful models [19, 20]. Among the main advantages of synthetic data: • Cost and time efficiency: Synthetic data can take few days to process, while real data collection and processing take months, or even years for some applications. • Exploring rare data: There are situations when data is unavailable or difficult to collect. A collection of exceptional small tumors is an example of rare data. • Easy labeling and control: Technically, synthetic data simplifies labeling, and therefore improve the data control and adjustment.
358
Z. Khomsi et al.
In the medical field, synthetic data play an important role to improve deep learning models. For example, Moya-Sáez et al. [21] proposed a deep learning approach based on synthetic data for magnetic resonance imaging (MRI). They constructed various data samples of brain models to train a Convolutional Neural Network (CNN). The authors provide realistic parametric maps of the predictive model that is enabled with the synthetic datasets. Al Khalil et al. [22] presented the potential usefulness of synthetic data for improving the robustness of deep learning models of magnetic resonance images. They analyzed the quality of the generated synthetic data and studied the ability to replace real magnetic resonance data in the training process. Hernandez et al. [23] reviewed the synthetic data generation studies for health records in the last 5 years (2016–2021). The authors presented encouraging information and guidelines that support synthetic data in healthcare applications. In this work, we generate synthetic data using the Finite Element Method (FEM) based on COMSOL Multiphysics software to simulate the temperature effect produced by several situations of breast tumors. This allows enhancing the deep learning models for early breast cancer detection. Figure 1 shows the methodology used.
Fig. 1. Flowchart of the methodology used.
3 COMSOL-based Simulation of Breast Tumor In this section, we performed a 3D hemispherical breast model using COMSOL Multiphysics numerical modeling software as shown in Fig. 2. We approximated the entire breast layers including skin, fat, gland, and muscle as shown in Fig. 3. Thus, we include a sphere that simulates tumor. According to the Pennes equation in 1948 [24], the mathematical model used for bioheat transfer in biological tissues is simplified as Eq. (1). ∇(k · ∇T ) + ωb · cb · ρb (Ta − T ) + qm = 0
(1)
where k is the thermal conductivity of the tissue, ρb is the density of the blood (1055 kg.m−3 ), cb is the specific heat of the blood (3660 J.kg−1 .K−1 ), ωb is the rate of blood perfusion, qm is the rate of metabolic heat generated, Ta is the arterial blood temperature, and T is the local temperature of the breast tissue. The arterial blood temperature is approximated by the core body temperature (37 °C). Table 1 shows the exact values of these properties [25].
Towards Development of Synthetic Data in Surface Thermography
Fig. 2. Geometry of the 3D model of the breast.
Fig. 3. Sectional view of breast layers: (a) Computational model. (b) Mesh generated.
Table 1. Thermophysical properties of the breast. Physical Characteristics
Skin
Fat
Gland
Muscle
Tumor
Thikness (mm)
1.6
5.0
43.4
15
---
k (W/m.K)
0.235
0.21
0.48
0.48
0.48
qm (W/m3 )
368
400
700
700
70,000
ωb (ml.s−1 .ml−1 )
0
0.0002
0.0006
0.0009
0.012
359
360
Z. Khomsi et al.
4 Data Generation An initial simulation was performed to compare a breast with and without tumor. The temperature distribution of the 3D model is illustrated in the Fig. 4.
Fig. 4. Temperature distribution: (a) Normal breast. (b) Breast with included tumor.
The temperature difference between healthy and cancerous tissue is thermally visible on the skin surface of the breast as shown in Fig. 4. From these plots, we exported the first data format in png. Figure 5 shows different tumor cases of thermal images generated. A large quantity of these images can be used to train a deep learning model.
Fig. 5. Image data format (png) generated from COMSOL: (a) Normal breast. (b) 5mm tumor. (c) 10mm tumor.
The second format that can feed the neural network is data samples in a txt file form as shown in Fig. 6. Each file includes information about the temperature values collected at the surface of the breast model. Thus, these exported data are retrieved and utilized to demonstrate the impact of synthetic data.
Towards Development of Synthetic Data in Surface Thermography
361
Fig. 6. Txt data format exported from COMSOL.
5 Impact of Synthetic Data To affirm that it is possible to predict a tumor from a superficial temperature profile, we are led to perform a detailed thermal analysis considering several situations of breast cancer. First, we explore the influence of tumor sizes on surface temperature. To do this we include tumors with different sizes as shown in Fig. 7. The tumor size is kept under
Fig. 7. Tumor size configurations: (a) 5mm (Sizemin ). (b) 10mm (Sizemedium ). (c) 20mm (Sizemax ).
362
Z. Khomsi et al.
20mm to simulate early-stage tumors. Note that the tumor location is fixed in this case. Then, we explore the influence of tumor depth on surface temperature. We considered three different depths as shown in Fig. 8. Note that in this case the tumor size is fixed at 10mm.
Fig. 8. Tumor depth configurations: Depthmin , Depthmedium , Depthmax .
Figure 9 show the surface temperature collected at the breast arc length for each case.
Fig. 9. Surface temperature data: (a) Tumor size impact. (b) Tumor depth impact.
According to the results shown respectively in Fig. 9 (a) and (b), we note that the surface temperature increases when the tumor increases in size, as well as each time the tumor approaches the skin, and vice versa. We summarize the above-obtained results in Table 2. From Table 2, we notice that the average temperature is smaller when the tumor diameter is equal to 5mm (Sizemin ). As the tumor increase to 20mm (Sizemax ), the mean temperature becomes higher. This means that tumor with great diameter produces more heat in the breast compared to small tumors. Thus, as the tumor is very deep from the surface (Depthmax ), the average temperature decrease. These results demonstrate the relationship between tumor parameters (size and
Towards Development of Synthetic Data in Surface Thermography
363
Table 2. Synthetic data impact for different breast tumor configurations. Tumor configurations
Mean temperature (°C)
No tumor
34.122
Tumor size impact
Sizemin Sizemedium Sizemax
34.132 34.177 34.369
Tumor depth impact
Depthmin Depthmedium Depthmax
34.216 34.141 34.128
depth) and the skin temperature of the breast. Therefore, the above generated data can be exploited to formulate deep learning models to predict tumor parameters. Especially, image and temperature data can be used respectively to train a Convolutional Neural Network (CNN) and a Feed-Forward Neural Network (FFNN). The integration of such methods with surface thermography represents a supporting approach for better decisionmaking in the field of early breast cancer detection.
6 Conclusion In this paper, we proposed a supporting method to develop a big dataset of breast thermograms. Initially, we created a 3D hemispherical model to mimic the breast tissue using COMSOL numerical modeling software. Characteristics of the tumor, such as depth and size were varied, and breast temperatures were collected on the skin surface. We provided two data formats namely, thermal images in png and temperature data in a txt file. Through the detailed thermal analysis, we demonstrated the impact of tumor size and depth on breast skin surface temperature. Thus, we found that temperature data can reveal more information about tumor parameters. Furthermore, the proposed method is a very promising option to build and control big thermographic data more easily and at a low cost. In future work, we aim to develop a large dataset for training deep-learning models to estimate breast tumor characteristics. Acknowledgments. This work is supported by Al-Khawarizmi Program to support research in the field of Artificial Intelligence and its applications, Edition I.
References 1. Breast Cancer Statistics. How Common Is Breast Cancer? https://www.cancer.org/cancer/bre ast-cancer/about/how-common-is-breast-cancer.html. Accessed 24 Jan 2023 2. Little, M.P., et al.: Cancer risks among studies of medical diagnostic radiation exposure in early life without quantitative estimates of dose. Sci. Total Environ. 832, 154723 (2022). https://doi.org/10.1016/j.scitotenv.2022.154723
364
Z. Khomsi et al.
3. Kroemer, G., Pouyssegur, J.: Tumor cell metabolism: cancer’s achilles’ heel. Cancer Cell 13, 472–482 (2008). https://doi.org/10.1016/j.ccr.2008.05.005 4. Coller, H.A.: Is cancer a metabolic disease? Am J Pathol 184, 4–17 (2014). https://doi.org/ 10.1016/j.ajpath.2013.07.035 5. Shim, H., et al.: c-Myc transactivation of LDH-A: implications for tumor metabolism and growth. Proc Natl Acad Sci U S A 94, 6658–6663 (1997). https://doi.org/10.1073/pnas.94. 13.6658 6. Mashekova, A., Zhao, Y., Ng, E.Y.K., Zarikas, V., Fok, S.C., Mukhmetov, O.: Early detection of the breast cancer using infrared technology – a comprehensive review. Therm Sci Eng Prog 27, 101142 (2022). https://doi.org/10.1016/j.tsep.2021.101142 7. Gogoi, U.R., Majumdar, G., Bhowmik, M.K., Ghosh, A.K.: Evaluating the efficiency of infrared breast thermography for early breast cancer risk prediction in asymptomatic population. Infrared Phys Technol 99, 201–211 (2019). https://doi.org/10.1016/j.infrared.2019. 01.004 8. Ker, J., Wang, L., Rao, J., Lim, T.: Deep learning applications in medical image analysis. IEEE Access 6, 9375–9379 (2017). https://doi.org/10.1109/ACCESS.2017.2788044 9. Husaini, M.A.S.A., Habaebi, M.H., Hameed, S.A., Islam, M.R., Gunawan, T.S.: A systematic review of breast cancer detection using thermography and neural networks. IEEE Access 8, 208922–208937 (2020). https://doi.org/10.1109/ACCESS.2020.3038817 10. El Fezazi, M., Jbari, A., Jilbab, A.: Conceptual architecture of ai-enabled iot system for knee rehabilitation exercises telemonitoring. Lect Notes Networks Syst 144, 200–209 (2021). https://doi.org/10.1007/978-3-030-53970-2_19 11. Mambou, S.J., Maresova, P., Krejcar, O., Selamat, A., Kuca, K.: Breast cancer detection using infrared thermal imaging and a deep learning model. Sensors 18(9), 2799 (2018) 12. Roslidar, R., et al.: A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection. IEEE Access 8, 116176–116194 (2020). https://doi. org/10.1109/ACCESS.2020.3004056 13. Kakileti, S.T., Dalmia, A., Manjunath, G.: Exploring deep learning networks for tumour segmentation in infrared images. Quant Infrared Thermogr J. 17, 153–168 (2020). https://doi. org/10.1080/17686733.2019.1619355 14. Torres-Galván, J.C., Guevara, E., Kolosovas-Machuca, E.S., Oceguera-Villanueva, A., Flores, J.L., González, F.J.: Deep convolutional neural networks for classifying breast cancer using infrared thermography. Quant Infrared Thermogr J. 19, 283–294 (2022). https://doi.org/10. 1080/17686733.2021.1918514 15. Ucuzal, H., Baykara, M., Küçükakçali, Z.: Breast cancer diagnosis based on thermography images using pre-trained networks. J. Cogn. Syst. 6(2), 6468 (2021). https://doi.org/10.52876/ jcs.990948 16. Sajjad, M., Khan, S., Muhammad, K., Wu, W., Ullah, A., Baik, S.W.: Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J. Comput. Sci. 30, 174–182 (2019). https://doi.org/10.1016/j.jocs.2018.12.003 17. Atasever, S., Azgınoglu, N., Terzı, D.S., Terzı, R.: A comprehensive survey of deep learning research on medical image analysis with focus on transfer learning. Clin. Imaging 94, 18–41 (2022). https://doi.org/10.1016/j.clinimag.2022.11.003 18. Lozano, A., Hayes, J.C., Compton, L.M., Azarnoosh, J., Hassanipour, F.: Determining the thermal characteristics of breast cancer based on high-resolution infrared imaging, 3D breast scans, and magnetic resonance imaging. Sci. Rep. 10, 1–14 (2020). https://doi.org/10.1038/ s41598-020-66926-6 19. de Melo, C.M., Torralba, A., Guibas, L., DiCarlo, J., Chellappa, R., Hodgins, J.: Nextgeneration deep learning based on simulators and synthetic data. Trends Cogn. Sci. 26, 174–187 (2022). https://doi.org/10.1016/j.tics.2021.11.008
Towards Development of Synthetic Data in Surface Thermography
365
20. Manettas, C., Nikolakis, N., Alexopoulos, K.: Synthetic datasets for deep learning in computer-vision assisted tasks in manufacturing. Procedia CIRP 103, 237–242 (2021). https:// doi.org/10.1016/j.procir.2021.10.038 21. Moya-Sáez, E., Peña-Nogales, Ó., de Luis-García, R., Alberola-López, C.: A deep learning approach for synthetic MRI based on two routine sequences and training with synthetic data. Comput Methods Programs Biomed 210, 106371 (2021). https://doi.org/10.1016/j.cmpb. 2021.106371 22. Al Khalil, Y., Amirrajab, S., Lorenz, C., Weese, J., Pluim, J., Breeuwer, M.: On the usability of synthetic data for improving the robustness of deep learning-based segmentation of cardiac magnetic resonance images. Med. Image Anal. 84, 102688 (2021). https://doi.org/10.1016/j. media.2022.102688 23. Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., Rankin, D.: Synthetic data generation for tabular health records: a systematic review. Neurocomputing 493, 28–45 (2022). https://doi. org/10.1016/j.neucom.2022.04.053 24. Pennes, H.H.: Analysis of tissue and arterial blood temperatures in the resting human forearm. J Appl Physiol 1, 93–122 (1948). https://doi.org/10.1152/jappl.1948.1.2.93 25. Chanmugam, A., Hatwar, R., Herman, C.: Thermal analysis of cancerous breast model. ASME Int Mech Eng Congr Expo Proc 2012, 134–143 (2012). https://doi.org/10.1115/IMECE201288244
A Novel Model for Optimizing Multilayer Perceptron Neural Network Architecture Based on Genetic Algorithm Method Fatima Zahrae El-Hassani1(B) , Youssef Ghanou2 , and Khalid Haddouch1 1 National School of Applied Sciences, University Sidi Mohammed Ben Abdellah, Fez,
Morocco {fatimazahrae.elhassani,khalid.haddouch}@usmba.ac.ma 2 High School of Technology, Moulay Ismail University, Meknes, Morocco [email protected]
Abstract. Multilayer perceptron (MLP) have been widely used in a variety of applications and fields. The hyper-parameters of such a machine learning (ML) models must be tuned to fit it into different problems. Choosing the best hyperparameter configuration has a direct impact on the model performance. This work proposes a new optimization model, in order to find the optimal neural architecture solved by genetic algorithm method. We use a real architecture-representing chromosome that can express both the number of layers and the number of nodes in each layer. This novel proposed approach models the challenge of neural architecture optimization as non-linear constraint programming with mixed variables. The generalization potential of the MLP was further assessed, and the risk of over fitting was avoided, using a fold cross-validation technique. Results from the Iris dataset show an improvement in classification performance over earlier studies. The stability of the technique is also shown by the fact that the proposed method has the minimum of the mean accuracy rate’s standard deviation. Keywords: Multi-layer perceptron (MLP) · Genetic algorithm (GA) · Architecture optimization · Non-Linear Optimization · UCI Machine Learning
1 Introduction In an effort to mimic the learning capacities of biological neural systems, artificial neural networks (ANNs), which are non-linear models based on the architecture of the brain’s neural network, were created [1]. The most often used model in neural network applications is the multilayer perceptron (MLP), which employs the back propagation training procedure and successive layers of neurons with no connections between them. The architecture of a multilayer perceptron (MLP) neural network can be viewed as an optimization issue and must be defined carefully because too many connections could cause the training data to be over fitted, while too few connections could prevent the network from addressing the problem of insufficiently adjustable parameters [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 366–380, 2023. https://doi.org/10.1007/978-3-031-43520-1_31
A Novel Model for Optimizing Multilayer Perceptron Neural Network
367
The test-and-error approach to architecture optimization takes a lengthy time and has a limited capacity for analyzable architectures, making it less likely that an ideal architecture will be found when there is much viable architecture. To circumvent these restrictions, a number of methods have been proposed, including Particle Swarm Optimization (PSO) [3], Simulated Annealing (SA) [4], Back-Propagation (BP) [5], Ant Colony [6], and tabu search [7], as well as a number of GA algorithms to optimize the MLP topology. To choose the initial weights and determine the appropriate number of neurons in a single hidden layer, the G-Prop approach is proposed by [8]. Grammatical evolution is used to build the MLP topology in the GE-BP approach, which was put forth by [9], BP is used for training. The NNC technique, as out by [10], uses grammatical evolution to encode network architecture and synaptic weights. Another approach to figuring out architecture is provided by Miller et al [11], who map the network connection topology onto a binary adjacency matrix known as the Miller-Matrix (MM) that describes ANN architecture. The concept given forward by Bebis P.A. Castillo et al [12] is to combine GAs with weight elimination [13] to seek the architecture by pruning huge networks. The approach put out by Yao and Liu’s [14] combines the quest for the optimal weights with the search for the optimal topology of adaptable neural networks. Kapanova et al [15] offers a brand-new automated technique for finding a neural network design given a certain assignment, By varying the number of neurons, the number of hidden layers, the kinds of synaptic connections, and the use of transfer functions, this method enables the exploration of a multidimensional space of potential architectures. Wilson Castro et al [16] develops a methodology for optimizing multilayer-perceptron-type neural networks by analyzing the effects of three neural architecture parameters on the sum of squares error. These parameters are the number of hidden layers (HL), the number of neurons per hidden layer (NHL), and the type of activation function (AF) (SSE). Jenny V. Domashova et al [17] use a genetic algorithm to choose the neural network’s best architecture and determine how to answer classification problems as accurately as possible. Sam Ansari et al [18] use genetic algorithm to determine the best multi-layer perceptron parameter values. Different chromosomal coding is offered. For the purpose of calculating slope stability safety factors (SF) in a pure cohesive slope, Hong Wang et al [19] analyze and improve the design of an artificial neural network (ANN) in conjunction with a genetic algorithm (GA) optimization technique. In order to estimate the indoor environmental conditions of a building in real time, Miguel Martínez-Comesaña et al [20] propose an interpolation methodology based on the use of optimized multilayered perceptron neural networks, the multiobjective genetic algorithm NSGA-II is used to optimize the neural network and identify the architecture with the lowest error and complexity. In order to optimize hidden layers with the introduction of one decision variable for each layer [21], a mathematical model is given. In this paper, a novel mathematical programming methodology for creating the best multilayer perceptron network architecture is proposed. Genetic algorithms are used to address this problem as a non-linear programming problem with mixed constraints (GA). One chromosome that can express both the number of layers and the number of nodes in each layer was used to encode an individual. We also gave the hidden layer a binary variable that takes the value 1 when the layer is activated and 0 when it is not. The learning paradigms for artificial neural networks are described in Sect. 2. We outline
368
F. Z. El-Hassani et al.
the neural architecture optimization problem and provide a novel modeling in Sect. 3. How we use a genetic algorithm to tackle this problem is described in Sect. 4. Before concluding, Sect. 5 presents the simulation results.
2 Learning for Artificial Neural Network An artificial neural network (ANN) is a computational paradigm that aims to replicate the neuron-heavy structure of the human nervous system and brain [1]. There are three types of ANN learning paradigms: unsupervised, supervised and reinforcement learning. The unsupervised learning model identifies the pattern class information heuristically, and reinforcement learning learns through trial and error interactions with its environment (reward/penalty assignment). The supervised learning model assumes the presence of a supervisor who categorizes the training examples into classes and uses the information on the class membership of each training instance. In the presence of a variable to be explained Y or of a shape to be recognized which has been, jointly with X, observed on the same objects, it is indeed a problem of modeling or supervised learning: finding a function f likely, at best according to a criterion to be defined, to reproduce Y having observed X.
Y = f (X) + ε
(1)
Were ε is the noise or measurement error. We has a learning set made up of inputoutput type observation data: d1n : {(X1 , y1 ) . . . (Xn , yn )}, with Xi ∈ χ arbitrary (often equal to Rp ), yi ∈ƴ, for i = 1 … n. The set of characteristics attributes (features), factors or variables: (2) X = X(1) , . . . , X(p) Contrary to a traditional statistical approach in which the observation of the data is integrated into the methodology (planning of the experiment), the data are generally prior to the analysis. The goal is to develop a model from this learning sample that will enable us to predict the output Y associated with a new input (or predictor). The output Y can be qualitative (onset of cancer, recognition of numbers, etc.) or quantitative (stock price, electricity consumption, pollution map, etc.), depending on the space in which it takes its values: set of finished cardinal numbers, real numbers, or even functional numbers. If the output Y that needs to be explained is qualitative, then we talk about pattern recognition, classification, or discrimination, and if Y is quantitative, then we talk about a regression issue.
3 Proposed Method for Architecture Optimization The goal of neural architecture optimization is to determine the ideal number of hidden layers and neurons within each layer in artificial neural networks (ANN) in order to maximize their performance.
A Novel Model for Optimizing Multilayer Perceptron Neural Network
369
3.1 Problem Formulation An optimization method is used to initialize and optimize the weight parameters of a machine learning (ML) model until the objective function approaches a minimum or the accuracy approaches a maximum [22]. Similar to this, hyper-parameter optimization techniques aim to optimize the hyper-parameter configurations to enhance the architecture of a machine learning model. Neural network (NN) is an example of such a Machine learning (ML) model. In previous works to optimize the architecture, we considered the optimization of hidden layers by adding a decision variable for each layer [21], and in another, we assign to the connections between the node ni and layer i + 1 a binary variable which takes the value 1 if the connection exists in the network and 0 otherwise [23]. In order to optimize the artificial neural architecture, we proposed a new optimization model, motivated by our earlier work by encoding an individual using one chromosome that can express both the number of layers and the number of nodes in each layer; we used a real architecture-representing chromosome and assigned a binary variable to the hidden layer that takes the value 1 if the layer is activated and 0 otherwise. Using this newly presented approach, we characterized the problem of neural architecture optimization as non-linear constraint programming with mixed variables. To solve it, we use particular genetic operators. 3.2 Optimization of Neural Architecture through Modeling In order to model the issue of neural architecture optimization, we had to construct some notation. X : set of inputs us explicative variable of a model N : Number of hidden layers n0 : Number of neurons in the input layer ni : denotes the number of nodes in the layer i for i=1… N. nN+1 : Number of neurons in output layer. nopt : Optimal number of hidden layers hi : The output of hidden layer for i=1… N. Y : the variable to be predicted d : the desired output ƒ : activation function of all neurons F : transfer function of ANN cni : Binary variable for i=1… N-1, A multilayer perceptron performs a transformation of the inputs variables: Y = F(X; W; C) = (y1 , y2 , . . . , ynN+1 )
(3)
370
F. Z. El-Hassani et al.
The following expression gives the output of the first hidden layer: ⎞ ⎛ n 0 0 x ⎛ 1⎞ f w k k=1 k,1 h1 ⎜ ⎟ ⎜ : ⎟ ⎜ ⎟ : ⎜ ⎟ ⎜ ⎟ ⎜ : ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ : ⎟ ⎜ j ⎟ ⎜ ⎟ n0 0 h1 = ⎜ h1 ⎟ = ⎜ f k=1 wk,j xk ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ : ⎟ ⎜ ⎟ : ⎜ ⎟ ⎜ ⎟ ⎟ ⎝ : ⎠ ⎜ : ⎝ ⎠ n1 n h1 0 0 f k=1 wk,n1 xk
(4)
where (x1 , x2 …xn0 ) are the inputs of neural networks. The calculated result of the hidden layer hi is represented by the following term: ⎞ ⎛ ni−1 i−1 k ⎛ 1⎞ f w h i−1 k=1 k,1 hi ⎜ ⎟ ⎜ ⎜ : ⎟ ⎟ : ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ : ⎟ ⎟ : i i ⎜ ⎜ ⎟ ⎟
⎜ ⎜ j ⎟ ⎟ ni−1 i−1 k (5) hi = ⎜ hi ⎟ = hi−1 1 − cnk + cnk ⎜ f k=1 wk,j hi−1 ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ : ⎟ ⎟ k=1 k=1 : ⎜ ⎜ ⎟ ⎟ ⎜ ⎟ ⎝ : ⎠ : ⎝ ⎠ ni n i−1 k i−1 hi f k=1 wk,ni hi−1 where i=2… N-1 Flows are used to calculate the final hidden layer hN : ⎞ ⎛ nN −1 N −1 k ⎛ 1 ⎞ f w h N −1 k=1 k,1 hN ⎜ ⎟ ⎜ ⎜ : ⎟ ⎟ : ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ : ⎟ ⎟ : N N ⎜ ⎜ ⎟ ⎟
⎜ ⎜ j ⎟ ⎟ nN −1 N −1 k hN = ⎜ hN ⎟ = hN −1 1 − cnk + cnk ⎜ f k=1 wk,j hN −1 ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ ⎜ : ⎟ ⎟ k=1 k=1 : ⎜ ⎜ ⎟ ⎟ ⎜ ⎟ ⎝ : ⎠ : ⎝ ⎠ nN n N −1 N −1 hN k f k=1 wk,nN hN −1 nN N k yj = f wk,j hN j = 1... nN+1 k=1
(6)
(7)
With yj is the output of the neural network The objective function of the problem represents the discrepancy between the computed and desired output with the following restrictions: F(X ; W ; C) − d2
(8)
We can make sure that the first parameter is always bigger than zero, guaranteeing that there is at least one hidden layer. cn1 = 1
(9)
A Novel Model for Optimizing Multilayer Perceptron Neural Network
371
No additional layers will be added to the network if ni is equal to 0 or negative. 1 if ni > 0 cni = ∀i = 2 . . . N (10) 0 else The sum of hidden layers can equal one or more. N−1 cni ≥ 0 i=2
(11)
For each of the float values that represent a hidden layer, we first establish the lower and higher limits. The first hidden layer is given the range [n1min ,n1max ]. 0 < n1min ≤ n1 ≤ n1max
(12)
The probability that the remaining layers will end the layer count increases as their initial negative values get larger. nimin ≤ ni ≤ nimax ∀i = 2 . . . N The weights values are the real number. i i W = wk,j , wk,j ∈R 0≤i≤N 1 ≤ k ≤ ni 1 ≤ j ≤ ni+1
(13)
(14)
The subject of neural architecture optimization can be modeled using the model below: ⎧ Min||F(X ; W ; C) − d ||2 ⎪ ⎪ ⎪ ⎪ ⎪ SC: ⎪ ⎪ ⎪ ⎪ Cn1 = 1 ⎪ ⎪ ⎪ ⎪ 1 if ni > 0 ⎪ ⎪ Cni = ∀i = 2 . . . N ⎪ ⎪ ⎪ 0 else ⎪ ⎨ N −1 i=2 Cn1 ≥ 0 (P) ⎪ ⎪ 0 < n1 min ≤ n1 ≤ n1 max ⎪ ⎪ ⎪ ⎪ 0 < n ⎪ i min ≤ ni ≤ ni max ∀ i = 2 . . . N ⎪ ⎪ ⎪ ⎪ W = wik,j , wik,j ∈ R ⎪ ⎪ 0≤i≤N ⎪ ⎪ ⎪ ⎪ 1 ≤ k ≤ ni ⎪ ⎪ ⎩ 1 ≤ j ≤ ni+1 where i W = wk,j
C = (cn1 , . . . , cnN−1 ) ∈ {0, 1}N
is the weights matrix ni is the number of neurons in the 0≤i≤N 1 ≤ k ≤ ni 1 ≤ j ≤ ni+1 nth hidden layer, and N is the total number of hidden layers.
372
F. Z. El-Hassani et al.
Search the following matrix for variables that may be summed up: C = [cn1 , . . . , cni , . . . , cnN−1 ]
(15)
W = [W 0 , . . . , W i , . . . , W N ]
(16)
i where W i = wk,j
i = 0...N 1 ≤ k ≤ ni 1 ≤ j ≤ ni+1 The optimal number of hidden layers: N−1 copt = cni i=1
(17)
Our strategy starts with the most number of hidden layers possible. During the training phase, each observation is entered sequentially and in any sequence to the learning system. The optimization model requires the number of hidden layers and the weights. Additionally, a training period must be chosen concurrently with the number of hidden layers (weights adjust).
4 GA-Based Method for Solving the Proposed Mathematical Programming Model Based on the evolutionary theory, the Genetic Algorithm (GA) is a meta-heuristic algorithm commonly used in mathematical optimization problems. Its iterative computation process includes the following steps: encoding, population initialization, selection, genetic operation (crossover, mutation), evaluation, and stop decision [14]. Figure 1 depicts the basic genetic algorithm steps. The three key features of GA are: (1) robustness to local minima because of its global search property; (2) robustness to fitness function discontinuities because GA does not require fitness function to have a derivative and (3) directed search because it does not necessitate exploring the entire solutions space [24]. 4.1 Chromosome Representation In this study, we used a single chromosome to represent the vector “C” that can express both the number of layers and the number of nodes in each layer; we used a real architecture-representing chromosome and assigned a binary variable to the hidden layer that takes the value 1 if the layer is activated and 0 otherwise. Example:
In this case, ni denotes the number of nodes in layer i.
A Novel Model for Optimizing Multilayer Perceptron Neural Network
373
Fig. 1. Basic genetic algorithm steps
4.2 Initial Population The initial population is made up entirely of the individuals who will be subjected to the genetic operators and evaluated by the fitness function and it is generated at random, the weights are assigned random values between [0, 1]. Following the creation of the initial population, each individual is assessed and given a fitness value based on the fitness function. 4.3 Fitness Function A fitness function generates a statistic that can be used to gauge an individual’s level of environmental adaptation. This statistic is intended to focus the search for traits that will make an individual more flexible, or one who does a task better. The accuracy rate in classification tasks is the fitness that is indicated in our work. Accuracy is the percentage of all instances that were correctly categorized. The calculation formula of
374
F. Z. El-Hassani et al.
fitness suggested in our work is as follows: F(i) = Fitness(i) = Accuracy(i) Accuracy =
(18)
NT Nd
(19)
where (NT ) and (N d ) stand for, respectively, the number of examples that were successfully classified and the total number of examples in a dataset. 4.4 Selection In this study, we employ the two-individual tournament selection approach as shown in Fig. 2. We randomly select two individuals from the population, compare their fitness function values, and then choose the one with the highest value. This procedure was repeated for a second random selection of two people, and a winner was likewise chosen. The cross-over operator is then given these two selected winners. Chromosome
Fitness
10 2 8 6 . .
Randomly select 2 chromosomes Chromosome
Fitness
2 8
Chromosome
Fitness
8 Chromosome with best Fitness is selected
9
Fig. 2. Tournament selection
4.5 Crossover We use in this work simulated binary crossover operator that is a real-parameter genetic operator developed by [25] . Following is a description of the method for generating two children’s solutions, (1,t+1) (2,t+1) (1,t) (2,t) and xi , from the parent solutions, xi and xi : We determine βq as xi follows for a random number Ʋgenerated within the range [0, 1]:
(20)
The ordinate βq is discovered so that the probability curve’s area below it from 0 to βq equals the selected random integer . A single-point crossover in binary-coded GAs
A Novel Model for Optimizing Multilayer Perceptron Neural Network
375
is found to have a similar search power to the probability distribution used to produce a child solution: (21) Following the calculation of βq from the previous probability distribution, the child solutions are as follows: (1,t) (2,t) (1,t+1) (22) = 0.5 1 + βq xi + 1 − βq xi xi (1,t+1)
xi
(1,t) (2,t) = 0.5 1 − βq xi + 1 + βq xi
(23)
4.6 Mutation We use the polynomial mutation operator with an index parameter. In their theoretical study, Deb and Agrawal [26] claim that “a perturbation of O ((b − a)/η) in a variable, where a, b are the lower and upper bounds of the variable, induces an effect”. A solution that is near a parent is perturbed by this operator using a polynomial probability distribution. The mutation operator makes sure that no value outside of the specified ranges [a, b] is produced by adjusting the probability distribution to the left and right of a variable value. For a given parent solution p supplied at [a, b], the mutated solution p for a particular variable is constructed as follows for a random number selected between [0, 1]: (24) The parameters (δL ,δR ) are calculated, as follows: (25) (26)
5 Implementation and Numerical Result The Iris dataset from the UCI Machine Learning Repository was used to assess the classification performance of the suggested approach and the impact of architecture optimization on accuracy ratio. The dataset, which contains four features (length and width of sepals and petals) of 50 samples from three different Iris species, was used without any pre-processing or data augmentation approaches (Iris setosa, Iris virginica and Iris versicolor). The results of the proposed method will be reported in this section. Google colab, which includes the Keras deep learning framework; the most recent version of Numpy, pandas, matplotlib, and Deap were used to implement the models.
376
F. Z. El-Hassani et al.
5.1 Setting Parameters In our research, we employ an MLP classifier. To accomplish this goal, back-propagation training was employed after genetic algorithms were applied to develop the ideal design that would produce the best classification accuracy. The weights are assigned random values between [0, 1]. Following the generation of the initial population at random, each individual is assessed and given a fitness value based on the fitness function. We restrict the network to four hidden layers and present the chromosome as follows: [n1 ,n2 ,n3 , n4 ]. The MLP structure assigns the range [10, 15] to the first hidden layer, while the rest of the layers start with progressively larger negative values, increasing the probability that they will terminate the layer count. The range [-4, 10], [-10, 10], [-20, 10] to the second, third and fourth hidden layer respectively. The MLP’s generalization abilities were further assessed using a 3-fold crossvalidation method. Due to the dataset’s tiny number of instances, a 3-fold was used because training this dataset’s instances with a large proportion would increase the likelihood of over fitting. In this study, the number of elites was fixed to 3 in order to minimize population variety and the algorithm’s capacity to explore the search space. This method was applied to 30 independent runs. There are numerous factors that can influence an ANN’s performance including the weights initialization and the model architecture that present the objective of this work. When creating an artificial neural network using a genetic algorithm, factors like population size, generations, crossover rate, mutation rate, and probability must also be taken into account. All of the parameters used in our tests are listed in Table 1. Table 1. Parameters implementation of GA Pm
0.100
0.500
0.500
Pc
0.500
0.900
0.500
Ps
30
60
90
Mg
20
20
20
η
10
10
10
K
3
3
3
Pm: mutation probability, Pc: cross over probability, Ps: population size, Mg: maximum number of generation, η: index parameter of mutation, K: number of folds in cross validation
5.2 Experimental Results Table 2 displays the G-metric for the 30 independent runs on this dataset that underwent 3-fold cross-validation, with population size set at 30, simulated binary crossover probability set at 0.5 and polynomial mutation set at 0.01.The number of the respective GA generation is given in the first column .Following, the max fitness values in the 3-fold
A Novel Model for Optimizing Multilayer Perceptron Neural Network
377
Table 2. Results of GA-MLP on Generation Level. G-metric in 3-fold cross-validation Generation
Max
Mean
Min
Std
0
0.98
0.665333
0.293333
0.200218
1
0.98
0.76
0.333333
0.177113
2
0.98
0.896333
0.68
0.108964
3
0.98
0.921333
0.673333
0.0999911
4
0.98
0.944
0.686667
0.082203
5
0.98
0.954
0.666667
0.0781423
6
0.98
0.964333
0.666667
0.0682894
cross-validation as well as mean, minimum and standard deviation of the G-metric are reported. It is important to note that the Maximum G-metric in 3-fold cross-validation is the same across all generations; this is due to our elitist selection, as an optimal architecture was discovered by chance in the first generation. Fig. 3 depicts the evolution of the mean, Max and Min G-values. According to the chart, good results are attained early in the process for the 3-fold cross-mean validation’s accuracy rate. As a result, performance advances peak in the first four generations before plateauing in the absence of new developments. According to Table 2, we observe a high standard deviation values in three first generation, which means that individual observations are far away from the mean of the data. These values gradually decreases with generation, on the other hand, the individual values are closer to the mea.
Fig. 3. G-values For GA-MLP
378
F. Z. El-Hassani et al.
Table 3 displays the results of comparable studies and the best accuracy in classification test of the iris dataset attained during 6 generation on each of the 30 independent runs with its optimal architecture. Table 3. Comparison of Data Classification. Method
Optimization task
Architecture/neurons
Accuracy%
GE-BP
---
---
96.6 ± 6.14%
MLPGA + 4
Hyper-parameters
13.10 ± 11.30d
98.87% ± 0.33%
MLP-GA
Architecture
3 H-L /4 N-L
97.3%
GABP
Weights
1 H-L /10 N-L
95.0%
EBP
Weights
---
97.3%
P.Method
Architecture
2 H-L / [6, 11] N-L
98% ± 0.06%c
H-L: Hidden layers, N-L: Neuron each layers, c. 3-fold cross-mean validation’s accuracy rate, d. The MLP’s average number of neurons
We can observe from the result in Table 3 that the suggested technique exhibits a significantly reduced standard deviation when compared to MLPGA+4 and show greater accuracy rate in comparison to the GE-BP, Error Back Propagation (EBP), GABP, and MLP-GA Methods.
6 Conclusion In this research, we offer a new modeling approach to the problem of optimizing the neural architecture using evolutionary genetic algorithm, which is particularly good at determining the optimal solution to a difficult non-linear problem. The approach described here aims to use a real architecture representing chromosome that can express both the number of layers and the number of nodes in each layer. Where the activation of the ith hidden layer is dependent on ni , which specifies the number of neurons in this layer. We use the Iris dataset for pattern recognition as a test case to demonstrate the benefits of the suggested approach. The outcomes of the experiments demonstrate that the suggested approach enables training an MLP with improved classification results. The addition of weight initialization and regularization hyper-parameters, which can be optimized concurrently with the MLP topology and apply the founded model to address a wide range of practical problems will be a future improvement to this work.
References 1. Salchenberger, L.M., Lash, N.A.: Neural Networks: A New Tool for Predicting Thrift Failures. Decis Sci. 23(18)
A Novel Model for Optimizing Multilayer Perceptron Neural Network
379
2. Ludermir, T.B., Yamazaki, A., Zanchettin, C.: An optimization methodology for neural network weights and architectures. IEEE Trans Neural Netw 17, 1452–1459 (2006). https://doi. org/10.1109/TNN.2006.881047 3. Carvalho, M., Ludermir, T.B.: Particle swarm optimization of neural network architectures and weights. In: 7th International Conference on Hybrid Intelligent Systems (HIS 2007). IEEE, Kaiserslautern, Germany, pp. 336–339 (2007) 4. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220, 671–680 (1983) 5. Hecht-Nielsen, R., Drive, O., Diego, S.: Theory of tile Backpropagation Neural Network 6. Ghanou, Y., Bencheikh, G.: Architecture Optimization and Training for the Multilayer Perceptron using Ant System (2016) 7. Islam, T., Shahriar, Z., Perves, M.A., Hasan, M.: University timetable generator using Tabu search. J. Comput. Commun. 04, 28–37 (2016). https://doi.org/10.4236/jcc.2016.416003 8. Castillo, P.A., Merelo, J.J., Prieto, A., Rivas, V., Romero, G.: G-Prop: global optimization of multilayer perceptrons using GAs. Neurocomputing 35, 149–163 (2000). https://doi.org/10. 1016/S0925-2312(00)00302-7 9. Soltanian, K., Tab, F.A., Zar, F.A., Tsoulos, I.: Artificial Neural Networks Generation Using Grammatical Evolution. IEEE, pp. 1–5 (2013) 10. Tsoulos, I., Gavrilis, D., Glavas, E.: Neural Network Construction and Training Using Grammatical Evolution (2008) 11. Miller, G.F., Todd, P.M., Hegde, S.U.: Designing Neural Networks Using Genetic Algorithms. Pp. 379–384 (1989) 12. Kasparis, T.: Coupling Weight Elimination with Genetic Algorithms to Reduce Network Size and Preserve Generalization 13. Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by Weight-Elimination with Application to Forecasting 14. Yao, X., Liu, Y.: Towards designing artificial neural networks by evolution. Appl. Math. Comput. 91, 83–90 (1998). https://doi.org/10.1016/S0096-3003(97)10005-4 15. Kapanova, K., Dimov, I., Sellier, J.: A genetic approach to automatic neural network architecture optimization. Neural Comput. Appl. 29, 1481–1492 (2018) 16. Castro, W., Oblitas, J., Santa-Cruz, R., Avila-George, H.: Multilayer perceptron architecture optimization using parallel computing techniques. PLoS ONE 12, e0189369 (2017). https:// doi.org/10.1371/journal.pone.0189369 17. Domashova, J.V., Emtseva, S.S., Fail, V.S., Gridin, A.S.: Selecting an optimal architecture of neural network using genetic algorithm. Procedia Comput. Sci. 190, 263–273 (2021) 18. Ansari, S., Alnajjar, K.A., Abdallah, S., Saad, M., El-Moursy, A.A.: Parameter Tuning of MLP, RBF, and ANFIS Models Using Genetic Algorithm in Modeling and Classification Applications. IEEE, pp. 660–666 (2021) 19. Wang, H., Moayedi, H., Kok Foong, L.: Genetic algorithm hybridized with multilayer perceptron to have an economical slope stability design. Eng. Comput. 37, 3067–3078 (2021) 20. Martínez-Comesaña, M., Ogando-Martínez, A., Troncoso-Pastoriza, F., López-Gómez, J., Febrero-Garrido, L., Granada-Álvarez, E.: Use of optimised MLP neural networks for spatiotemporal estimation of indoor environmental conditions of existing buildings. Build Environ. 205, 108243 (2021) 21. Ettaouil, M., Ghanou, Y.: Neural Architectures Optimization and Genetic Algorithms 8, 13 (2009) 22. Sun, S., Cao, Z., Zhu, H., Zhao, J.: A Survey of Optimization Methods from a Machine Learning Perspective (2019) 23. Ramchoun, H., Amine, M., Idrissi, J., Ghanou, Y., Ettaouil, M.: Multilayer perceptron: architecture optimization and training. Int. J. Interact Multimed. Artif. Intell. 4, 26 (2016). https:// doi.org/10.9781/ijimai.2016.415
380
F. Z. El-Hassani et al.
24. Itano, F., de Abreu de Sousa, M.A., Del-Moral-Hernandez, E.: Extending MLP ANN hyperparameters optimization by using genetic algorithm. In: 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, Rio de Janeiro, pp. 1–8 (2018) 25. Deb, K., Beyer, H.-G.: Self-adaptive genetic algorithms with simulated binary crossover. Evol Comput 9, 197–221 (2001). https://doi.org/10.1162/106365601750190406 26. Deb, K., Agrawal, S.: A niched-penalty approach for constraint handling in genetic algorithms. In: Artificial Neural Nets and Genetic Algorithms. Springer Vienna, Vienna, pp. 235–243 (1999). https://doi.org/10.1007/978-3-7091-6384-9_40
An Improved YOLOv5 Based on Attention Model for Infrared Human Detection Aicha Khalfaoui(B) , Badri Abdelmajid, and El Mourabit Ilham LEEA and TI Laboratory, Faculty of Sciences and Techniques Mohammedia, Hassan II University of Casablanca, Casablanca, Morocco [email protected]
Abstract. Human detection plays an important role in surveillance by ensuring security and maintaining public order. It is still considered a complex task in the deep learning field due to the highly varying illumination conditions under which humans should be detected. This paper proposes a new approach based on the enhanced YOLOv5 to detect humans in thermal images. It consists of integrating the Convolutional Block Attention Module (CBAM) into the backbone network to enhance the model’s ability to extract features. To measure the effectiveness of our method, we evaluated its performance on two benchmark thermal image datasets: the Ohio State University thermal pedestrian dataset, and the autonomous system lab thermal infrared dataset. Both datasets represent various challenges and images are collected in different humidity and weather conditions. From the obtained results, our approach performs human detection with 96% mean average precision and 91,8% recall, outperforming state-of-the-art CNN-based techniques like YOLOv5, RCNN, and Cascade RCNN. Keyword: Human Detection · Deep Learning · Thermal Images · Attention Module · YOLOv5
1 Introduction Human detection is one of the main areas of computer vision research, particularly in the security and surveillance domain. In a real-time context, human detection is affected by a variety of factors, most notably the environment, background, and distance from the sensor. In contrast to traditional RGB cameras, thermal infrared cameras offer better resilience to particular challenges in real-time scenarios, including unfavorable weather conditions or nighttime [1]. The dispersion of temperature present in thermal images is used to detect persons in the image. However, thermal images differ from RGB images in several ways, including having a lower resolution, more noise, and a smaller dynamic range. Recently, thanks to the development of deep learning, several detection models with strong detecting abilities have emerged. The most popular algorithms in the deep learning field are those from the YOLO series algorithms. They use an end-to-end fully convolutional neural network for detection, which immediately generates the target classes and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 381–388, 2023. https://doi.org/10.1007/978-3-031-43520-1_32
382
A. Khalfaoui et al.
the bounding boxes. YOLO algorithms belong to the one-stage family of algorithms. it is preferred in practical applications as it has fewer parameters, simple network architecture, and a faster detection rate than the two-stage model. One of the best one-stage detection algorithms is YOLOv5. As the detection methods achieved good results on visible images, some researchers have enhanced these models and used them for infrared imagery. As infrared images represent a lower resolution compared to optical images, the model becomes frequently very complex if we want to get excellent detection results. For instance, Shu et al [2]. Consider boosting YOLOv5 by changing ResNet blocks for DenseNet blocks, which increases the detection accuracy of infrared targets while also adding a significant amount of computation to the model. Duan et al [3]. Recommend adding an auxiliary network to YOLO to enhance its ability to detect infrared targets. Although the aforementioned techniques can make the most of limited features, they increase memory consumption and complicate model deployment. On the other hand, some researchers have employed lightweight modules to enhance the model but still have certain problems. Wang et al [4]. Increase the Faster RCNN’s infrared target detection accuracy by including the category correction module, however, this does not address the issue of small infrared objects being ignored during the detection process. Li et al [5]. Recommended adding squeeze and excitation (SE) modules to YOLOv3’s backbone network to enhance the identification of infrared objects, but the SE module does not take into consideration the spatial information of the feature map. In this paper, an improved approach is presented to address the problems with the inability to detect people in infrared images. By adding the attention modules to the YOLOv5 baseline, the model’s detection precision can be enhanced. Our improvement consists of adding the convolutional block attention (CBAM) to the YOLOv5’s backbone network, which can help the model focus on the essential features. The detection rate of our module is improved while adding only a small number of parameters. The improvement impacts of various CBAM numbers on the model are compared, to find the optimal addition method of CBAM for infrared human recognition tasks. The structure of this paper consists of five sections. Starting with an introduction, then Related work in section two. Section three is about the proposed method. Experimental results and discussion are highlighted in section four. In the end, conclusion, and prospects.
2 Related Work The YOLO series algorithms are currently the most frequently used models in the field of deep learning [6–9]. The one-stage algorithms perform detection through an endto-end fully convolutional neural network, which straight produces the target classes and the bounding boxes. The one-stage model has fewer parameters, simpler network architecture, and a higher detection speed than the two-stage model, making it the model of choice in real-world applications. YOLOv5 is one of the top one-stage detection algorithms. Essentially, YOLOv5 is made up of three elements: first, the Backbone is employed to generate visual features for later detection or classification tasks. The neck is in charge
An Improved YOLOv5 Based on Attention Model for Infrared Human Detection
383
of fusing features from various levels generated by the backbone to make the model resistant to target scale transformation. Finally, the head is used to produce the bounding box and the class category of the object detected. YOLOv5 employs CSPDarkNet53 as a backbone, adding the CSPNet [10] to DarkNet53’s structure to minimize repeated calculations generated by redundant gradient information. CSPNet transmits gradient information, via various network pathways. The model employs PANet [11] as the neck, which benefits from both bottom-up and top-down pathways to fuse characteristics from several layers. All feature levels are combined to enhance the model’s sensitivity to detect different scales. Lastly, the multi-scale detection head is used to generate the prediction results, and all three branches can detect objects of various sizes. The YOLOv5 architecture is presented in Fig. 1. [11].
Fig. 1. YOLOv5’s Architecture
3 Proposed Method The improved model takes YOLOv5 as a baseline and enhances the backbone network to boost the model’s ability to detect infrared images. The enhanced model adds the CBAM to the YOLOv5 backbone network, to enable the limited effective features to get a stronger impact on the detection results and let the model focus more attention on targets. The attention mechanism has recently been integrated into neural networks and used for a variety of computer vision applications. CBAM is a simple lightweight attention mechanism that merges channel attention and spatial attention in a sequential pattern. As given in Fig. 2 [12]. It is divided into two separate sub-modules that control channel and spatial attention, respectively: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM). We incorporate the CBAM module into the backbone
384
A. Khalfaoui et al.
network’s Focus structure for the object detection task so that it can increase essential spatial and channel characteristics in the feature map, thus enhancing object localization accuracy and minimizing the object aggregation issue.
Fig. 2. CBAM-YOLOv5
By multiplying the weights of several channels, the CAM submodule improves the learning of important channel domains. In the case of the feature map of a layer F∈ R(C×H×W), H and W stand for the feature map’s length and width, and C stands for the number of channels. The weights of every channel, Mc ∈ R(c×1×1) are first determined by the CAM submodule using the given equation below. Mc (F) = σ (W1 (W2 (Fc ))) + (W1 (W2 (Fc )))
(1)
In the aforementioned equation, the weights are given by W1 and W2, and the feature c c and Favg respectively. maps after average and maximum pooling are given by Fmax Finally, σ stands for the sigmoid activation function. The original feature map is then multiplied by Mc ∈ R(c×1×1) to produce the channel attention feature map, which is then transmitted to the SAM sub-module. The channel attention features are input before being subjected to maximum pooling and average pooling in turn as demonstrated in Formula (2). The spatial attention weight map Ms ∈ R(1×h×w) is then generated by convolution with the 7×7 kernel as illustrated in formula (3). The final attention feature map is created by multiplying the spatial attention weight map by the input channel attention feature map. Fs =
1 Fc (i) + maxi∈c Fc (i) i∈c c Ms = σ (f (7×7) (Fs ))
(2) (3)
4 Experimental Results and Discussion 4.1 Dataset and Evaluation Metrics In this research, two thermal image datasets are combined and used for the evaluation of the proposed method’s performance (Table 1). The autonomous system lab thermal infrared dataset (ASLTID) [13], includes 3837 thermal infrared human images. The second is the Ohio state university (OSU) thermal pedestrian database [14], it includes a
An Improved YOLOv5 Based on Attention Model for Infrared Human Detection
385
total of 284 thermal images, spread across ten image sequences, gathered under various weather conditions and humidity. Both datasets contain aerial thermal images that were captured from a distant top perspective, much like UAV images. The final combined dataset is divided as follows: 70% for training, 20% for the validation set, and 10% for the test set. Example thermal images from both datasets are presented in Fig. 3. For the evaluation of our module, we used the precision, recall, and F1-score metrics as given in the equations below. TP TP + FP TP R= TP + FN 1 N mAp = P(K)R(K) K=i C P=
(4) (5) (6)
Table 1. Prepared dataset. Dataset number
Prepared dataset
Total samples
1
OSU
284
2
ASL-TID
3837
3
Combined Dataset
4121
Fig. 3. Sample images from the OSU pedestrian (top row) and ASL-TID (bottom row) Thermal image datasets.
Where true positives (TP) show the number of times a positive sample is classified as positive correctly. False positives (FP) show the number of times a negative sample is mistakenly categorized as a positive one. False negative (FN) is about the number of times the model misclassifies a Positive sample as a Negative. C represents the number of object categories, K is the IOU threshold, P(k) stands for precision, R(k) is for recall and N is the number of IOU thresholds.
386
A. Khalfaoui et al.
4.2 Training Protocol The YOLOv5 [15] implementation is available on GitHub and was created using the Python programming language. The training was carried out using a Linux operating system, CUDA 11.1, Pytorch 1.12.1, Python 3.7, and NVIDIA Tesla T4 graphic card. With a learning rate of 0.01 and a batch size of 32, we used the SGD optimizer to train the detection techniques over 100 epochs. 4.3 Results and Discussion Four CBAM blocks were added to the backbone during the detection phase, and the backbone reweights the feature map after downsampling to help the model pay more attention and focus on features that are useful for the detection results. An experimental comparison is done to demonstrate the efficacy of the attention module in the infrared target detection model. The detection results of our model with varying numbers of CBAM blocks added, and the YOLOv5 algorithm are given in the table below (Table 2).
Table 2. Comparison of models with various CBAM numbers. Number of CBAM blocks
mAP0.5 (%)
0
97.03
1
97.70
2
97.90
4
98.02
The detection results show that adding CBAM to the backbone network can enhance the model’s detection performance. The detection effect is optimal when there are four CBAM blocks, which supports the CBAM blocks’ efficiency.
Fig. 4. Box loss, and objectness loss over 100 epochs for the training and the validation set
The detection results demonstrate how the model’s detection performance can be enhanced by incorporating CBAM into the backbone network, and the best detection
An Improved YOLOv5 Based on Attention Model for Infrared Human Detection
387
effect is achieved with four CBAM blocks, which improves the baseline’s mAP0.5 showing the efficiency of these blocks. In later studies, 4 blocks are employed when including CBAM in the model. Sample detection results for the test set are shown in Fig.5. The graph in Fig.4 above, shows our model’s improvements, it represents several key metrics for both the training and validation sets. It displays two distinct types of loss: Objectness loss, and box loss. The box loss measures how accurately the algorithm can localize an object’s center and how completely the predicted bounding box encloses an object. Objectness is a measure of the probability that an object occurs in a suggested zone of interest. A high level of objectivity indicates a high probability of an object being present in the image window. The loss curves of model training and validation converged to a stable close-companion state at 100 epochs which indicates that the model training results are robust and converge quickly, and are suitable for practical human detection in infrared images. The improved model was also compared to other models using the same infrared human detection dataset, and the detection results are presented in Table 3. In both the recall rate and mAP0.5, our model has shown better results.
Fig. 5. Sample detection results for the test set
Table 3. The comparison of different algorithms using the same dataset. Module
Backbone
Recall (%)
mAP0.5 (%)
Parameters (M)
Faster RCNN
ResNet-101
77.2
79.8
60.2
Cascade RCNN
ResNet-101
83.6
82.2
87.9
YOLOv5
CSPDarkNet-53
90.27
97.03
47.4
YOLOv5 + CBAM
CSPDarkNet-53 + CBAM
94.05
98.02
47.6
Compared with Faster RCNN and Cascade RCNN, YOLOv5+CBAM not only has higher detection accuracy but also has fewer model parameters. Compared with YOLOv5, the recall rate and mAP0.5 have increased by 3.78% and 0.99% respectively, while the parameter number of the model only increased by 0.2 M.
388
A. Khalfaoui et al.
5 Conclusion and Prospects In this study, we suggest an enhanced YOLOv5 model for detecting targets in infrared human images based on the CBAM attention module, to boost the model’s capacity for feature extraction. This technique creates a model called CBAM-YOLOv5 that just requires a few model parameters to enhance the mAP0.5 and recall rate. The developed CBAM-YOLOv5 model has a significantly lower computational cost than existing comparable architectures, making it appropriate for deployment in drone-assisted person surveillance systems or thermal imagery-based air surveillance implementation. In future work, we will try to optimize our method and test it on other larger thermal datasets.
References 1. Gade, R., Moeslund, T.B.: Thermal cameras, and applications: a survey. Mach. Vis. Appl. 25(1), 245–262 (2014) 2. Lang, S., Zhijie, Z., Bo, L.: Research on dense-yolov5 algorithm for infrared target detection. Optics & Optoelectronic Technol. 19(01), 69–75 (2021) 3. Huijun, D., Zhigang, W., Yan, W.: Two-channel saliency object recognition algorithm based on improved YOLO network. Laser & Infrared 50(11), 1370–1378 (2020) 4. Wang, Y., Xiuxin, C., Hejin, Y.: Multi-target recognition of substation infrared image based on improved faster RCNN. Chinese J. Sensors and Actuators 34(04), 522–530 (2021) 5. Li, M., Zhang, T., Cui, W.: Research of infrared small pedestrian target detection based on YOLOv3. Infrared Technol. 42(02), 176–181 (2020) 6. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 7. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) 8. Redmon, J., Farhadi, A.: Yolov3: An Incremental Improvement (2018). arXiv preprint arXiv: 1804.02767 9. Bochkovskiy, A., Wang, C.-Y., Liao, H.Y.M.: Yolov4: Optimal Speed and Accuracy of Object Detection (2020). arXiv preprint arXiv:2004.10934 10. Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Cspnet, I.-H.Y.: A new backbone that can enhance the learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391 (2020) 11. Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. Computer Vision-ECCV 2018. Lecture Notes in Computer Science, Springer, Cham, 112211, 319 (2018) 12. Wang, Q., et al.: A real-time individual identification method for swimming fish based on improved Yolov5. SSRN Journal (2022). https://doi.org/10.2139/ssrn.4044575 13. Portmann, J., Lynen, S., Chli, M., Siegwart, R.: People detection and tracking from aerial thermal views. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 1794–1800 (2014) 14. Davis, J.W., Keck, M.A.: A two-stage template approach to person detection in thermal imagery. In: 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION’05)-Volume 1, 1, IEEE, pp. 364–369 (2005) in PyTorch > ONNX > CoreML > TFLite,” 15. “GitHub - ultralytics/yolov5: YOLOv5 GitHub. https://github.com/ultralytics/yolov5. Accessed 1 Sep. 2022
Pneumonia Classification Using Hybrid Architectures Based on Ensemble Techniques and Deep Learning Chaymae Taib1(B) , ELkhatir Haimoudi1 , and Otman Abdoun2 1 Computer Science Department, Polydisciplinary Faculty, Abdelmalek Essaadi University,
Larache, Morocco [email protected] 2 Computer Science Department, Faculty of Science, Abdelmalek Essaadi University, Tetouan, Morocco
Abstract. Pneumonia is a respiratory disease that kills a million children each year, primarily in low-income countries where it is difficult to obtain timely treatment. Furthermore, the discovery of this illness is only detected after it has advanced to a severe state. Therefore, the image analysis can provide early detection of this sickness and save a lot of children’s lives. This study aims to developed twenty Hybrid architecture by combining five Deep learning techniques (VGG16, VGG19, EfficiantVB0, MobileNetV2, DenseNet201) with two of ensemble techniques Bagging Classifier with base learners (KNN, logistic Regression, SVM) and the second technique is the Boosting we used Adaboost classifier with Decision tree as a base learner We employed four classification performance criteria to assess the developed architectures (accuracy, precision, recall and F1-score), Scott Knott’s statistical test was used to group the proposed designs and select the best cluster of outperforming architectures. The founding results presents the strength of combining between deep learning and ensemble techniques on the medical image analysis The Hybrid architecture using the MobileNetV2 as feature extractor and BAGSVM as classifier gave the best results by achieving an accuracy of 99.04%. According to the findings of this investigation, we recommend the hybrid architectures BAGSVMMV2 since it gives the best results in pneumonia binary classification. Keywords: Adaboost · Bagging · Deep learning · Machine learning · Hybrid architectures · Classification · Logistic Regression Pneumonia
1 Introduction Pneumonia is a potentially fatal kind of acute respiratory illness caused by bacteria or viruses that cause the lung’s alveoli to fill up with fluid or pus, resulting in a decrease in carbon dioxide (CO2) and oxygen (O2) exchange between the blood and the lungs, making it difficult for infected people to breathe. In addition, it affects people of all ages and is the leading infectious cause of mortality in children worldwide [1]. In 2017, more © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 389–399, 2023. https://doi.org/10.1007/978-3-031-43520-1_33
390
C. Taib et al.
than 808000 children under the age of five died from pneumonia. Globally, 2.5 million people died from pneumonia in 2019. 600,000 were youngsters under the age of five. In the first month of life, pneumonia kills three out of every ten infants. Infant mortality fell by more than half between 2000 and 2012. Despite this, the statistics [2] remain extremely high. The diagnosis of this condition is seen as a major issue due to a lack of access to healthcare systems, particularly in low-income nations where the majority of the population is affected. Pneumonia is frequently identified after the disease has progressed to a severe state, therefore [19] medical image analysis is viewed as a promising technique for detecting various illnesses, facilitating early diagnosis, and assisting in decision-making [18], several researchers trained their models using radiography (X-rays) or computed tomography (CT) and achieved an excellent outcome. In general, deep learning outperforms machine learning algorithms in detecting respiratory illnesses by extracting the best features from medical images [3]. On the other hand, [4] machine learning algorithms produce faster and more accurate results and require less parameter tuning than deep learning techniques [5, 6], therefore the researchers try to combined the machine learning and deep learning (Hybrid architectures) to make a faster and accurate diagnostic [7]. In our study, we propose to use pretrained CNN techniques [8] for feature extraction referring to the strength of deep learning in extracting the best features from images which are MobileNetV2, VGG16, VGG19, EfficientNetVB0 and finally DenseNet201, and for the classification we use Boosting [9] and Bagging [10] techniques, in the boosting we used AdaBoost with base learner [11] Decision tree and for the Bagging we used base learners of support vector machine, KNN and Logistic Regression. We construct 20 architectures trained and tested over COVID-19 RADIOGRAPHY DATABASE. In this paper, we evaluate the performance of our work in terms of accuracy and precision, recall and F1-score, and we use the SK statistical test to compare the models. The main contributions of this work are summarized below. The current study addresses five research questions (RQs): • (RQ1): What is the total performance of the developed architectures? • (RQ2): Is there a deep learning feature extraction approach that obviously exceeds others regardless the classifier? • (RQ3): Is there a hybrid design that obviously outperforms others, despite of the feature extractor and classifier used? The following are the primary contributions of this empirical study: • developing twenty architectures using five deep learning techniques for feature extraction VGG16 VGG19, EfficientNetVB0, DenseNet201, MobileNetV2 in addition to use Boosting AdaBoost base learner Decision tree and bagging base learners KNN, LR, SVM techniques • evaluating the models in terms of accuracy, precision, recall, F1-score. • comparing the performance of the twenty architectures using SK test
Pneumonia Classification Using Hybrid Architectures
391
2 Background This section will provide an overview of the methodologies and algorithms employed in this experiment. We begin by describing the five CNN approaches used to extract the best features (VGG16, VGG19, MobileNet V2, DensNet201, EfficientNetVB0), followed by the machine learning algorithms used in the base learner (LR, SVM, DT, KNN), and lastly, the two ensemble methods employed (Bagging and Boosting). We used the AdaBoost classifier to boost the classifier. 2.1 Deep Learning VGG16. VGG16 [12] is a pretrained convolution neural network that was the winner of the 2014 ILSVRC ImageNet Large Scale Visual Recognition contest. Instead of a huge number of hyperparameters, they focused on having 3 × 3 filter convolution layers with a stride of 1 and constantly utilizing the same padding and pooling layers of size 2 × 2 with a stride of 2. Following this convolution and pooling layer structure are two fully connected layers (FC) with 4096 neurons each. The last layer consists of a 1000-neuron output layer with a soft-max activation function. The number 16 in VGG16 signifies that this design has 16 layers. VGG19. Is a convolutional neural network with 138 million trainable parameters that ranks second in classification and first in localization. [13] VGG19’s architecture is more sophisticated than VGG16’s, with 16 convolutional layers adopting smaller filters (3 × 3) and strides of 1 (the same as VGG16) and the same five pooling layers of size 2 × 2 with strides of 2. To anticipate the 1000 ImageNet dataset classes, there are two Fully-Connected layers with 4096 channels each, followed by another FC layer with 1000 channels and a SoftMax activation function, like in VGG16. MobileNet_V2. MobileNet V2 [14] is a convolutional neural network that is 53 layers deep that filters features using lightweight depth wise convolution layers, which enhances the state-of-the-art performance of mobile models on many workloads and benchmarks, as well as across a range of model sizes. There are two kinds of blocks in MobileNetV2. The first is a residual block with a stride of one. Another option for shrinking is a block with a stride of 2. Both sorts of blocks have three levels. 11 convolutions with ReLU6 are used in the first layer. The depth wise convolution is the second layer. The third layer is another 11 convolutions, is no non-linearity. EfficientNetVB0. Is a convolutional neural network [15] design and scaling approach that uses a compound coefficient to equally scale all depth/width/resolution dimensions. With a constant set of scaling coefficients, the Efficient Net scaling approach equally adjusts network breadth, depth, and resolution. The foundational EfficientNet-B0 network is built on the inverted bottleneck residual blocks of MobileNetV2, as well as squeeze-and-excite blocks. DenseNet201. Is a 201-layer deep convolutional neural network [16] Each layer in DenseNet receives extra input from all preceding levels and passes on its own featuremaps to all following layers. Concatenation is employed. Each layer receives “collective knowledge” from the levels above it.
392
C. Taib et al.
2.2 Machine Learning KNN. The k-nearest Neighbours algorithm, or KNN, is a non-parametric, supervised learning classifier that employs proximity to classify or predict the grouping of a single data point. it may be used for either regression or classification issues. KNN is easy to implement and to adapts in addition it’s only required a few Hyperparameters. SVM. SVMs are a type of supervised learning algorithms used for classification, regression, and outlier detection. The following are the benefits of support vector machines: Effective in high-dimensional environments. And is still effective when the number of dimensions exceeds the number of samples. Logistic Regression. Logistic regression is another effective supervised machine learning approach for binary classification issues. Logistic regression is best thought of as a linear regression but for classification difficulties. The range of logistic regression is between 0 and 1. Furthermore, it does not necessitate a linear connection between input and output variables. This is because the odds ratio was transformed using a nonlinear log transformation. Decision Tree. Decision Trees are a kind of Supervised Machine Learning in which data is constantly separated based on a certain factor. Two entities may be used to explain the tree: decision nodes and leaves. The decisions or possibilities are represented by the leaves. And the data is separated at the decision nodes.
2.3 Ensemble Techniques Ensemble learning is a popular and extensively used machine learning approach in which numerous separate models, often known as base models, are integrated to form an effective optimum prediction mode. Bagging. Bagging, also referred to as Bootstrap aggregation, is an ensemble learning approach that helps machine learning algorithms increase their performance and accuracy. It is used to manage bias-variance trade-offs and reduce the variance of a prediction model. Bagging prevents data overfitting and is utilized in both regression and classification models, particularly decision tree techniques. Boosting. Boosting is a general algorithm, not a specific model. Boosting requires you to specify a weak model, which is subsequently improved. In this study we used AdaBoost that is a boosting technique designed specifically for classification issues. The error rate of the weak estimator identifies the flaw. AdaBoost finds misclassified data points in each iteration, raising their weights so that the next classifier will pay additional attention to get them correctly.
3 Data Preparation In this section we will present the data collection and the preprocessing we applied on “COVID-19 RADIOGRAPHY DATABASE”, (1) Data collection, (2) the Data preprocessing by using Data augmentation.
Pneumonia Classification Using Hybrid Architectures
393
3.1 Data Description This work uses a public dataset named “COVID-19 RADIOGRAPHY DATABASE”, All the images are in Portable Network Graphics (PNG) file format and the resolution are 299 * 299 pixels. This database contained 6012 lung Opacity, 3616 COVID-19, 10192 Normal and 1345 Viral Pneumonia images. We used in our assessment two classes the normal and Viral Pneumonia images. 3.2
Data Pre-processing
In order to get a good prediction model first it should prepare our data by using various pre-processing techniques therefore we applied Data augmentation on the two classes (Normal and Pneumonia) to overcome the problem of data imbalanced and to prevent the overfitting issue. So, we applied some of geometric transforms such us: • Rotation: randomly rotating the image to 30°. • Zoom: zooming out the image with 0.2 as the range value • Shear: shearing the image by shifting one part of the image to a direction and the other part to the opposite direction, the value range set to 0.2. • Width and Hight shift: sifting the image horizontally and vertically with range value set to 0.3 • Horizontal flip: flipping the image in the horizontal direction. • Fill mode: filling or replacing the empty area with the nearest pixel value. • Rescale: normalizing the pixel values from 0–255 range 0–1 range.
4 Empirical Design In this section we will describe the design we used in this work including the four performance Metrix, in addition to the Skott Knott test to conduct an evaluation over our study, the acronyms used to short the combining architectures, and finally the experiment setup. Performance Metrics To evaluate our study, we used the following metrics: accuracy, F1-score, Precision, Recall, as shown in the equations. Accuracy = (TP + TN )/(TN + TP + FP + FN )
(1)
Precision = TP/(TP + FP)
(2)
Recall = TP/(TP + FN )
(3)
F1 = 2 × (Recall × precision)/(Recall + precision)
(4)
TP: the number of cases that were projected positive but turned out to be true positive. TN: the number of cases that were projected negative but turned out to be true negative.
394
C. Taib et al.
FP: the number of cases that were expected to be positive but turned out to be false. FN: the number of cases that were projected as negative but turned out to be correct. Skott Knott. In 1974 Skott Knott [17] was proposed as a viable method for doing several comparison operations without ambiguity. In addition to its simplicity and flexibility. The Scott-Knott (SK) method is a hierarchical clustering technique that is used as a data exploration tool. It was created to assist researchers working with an ANOVA experiment meant to compare treatment means and to identify separate homogenous groups of those means whenever a significant F-test occurs. Abbreviation. In the remainder of this study, we use the naming standards listed below to shorten the names of the hybrid architectures. BAGKNN: the architecture used Bagging classifier and KNN as base learner. BAGSVM: the architecture used Bagging classifier and SVM as base learner. BAGLR: the architecture used Bagging classifier and Logistic Regression as base learner. ADADT: the architecture used AdaBoost classifier and Decision tree as base learner. Architecture. As shown in the Fig. 1 that in our experiment, we used five deep learning techniques (VGG16, VGG19, DensNet201, EfficientNetVB0, and MobileNetV2) in the stage of feature extraction, which refers to the strength of CNN in extracting the best features to provide aid in making a good prediction. In addition, we used two ensemble techniques (bagging and boosting) for the classification. For the bagging classifier, we used as base learners the following machine learning algorithms (KNN, LR, and SVM). For the other technique, the boosting classifier, we used as a base learner the decision tree algorithm. In this proposal we developed 20 architectures, to build and train these architectures we applied the following setup: • All the input images of the “COVID-19 RADIOGRAPHY DATABASE” were set to 299 × 299 pixels. • None of the CNN-techniques applied in this work for feature extraction were trainable. • We used as parameter for KNN algorithm “n_neighbors = 5”, decision tree we used the parameter “max_depth = 10”, and for the rest of the machine learning algorithms (Logistic Regression and SVM) we set the default configuration. • For the bagging classifier, we used the default setting, and we used for the boosting classifier the parameter “learning_rate = 0.1”. The training and testing of the proposed architectures for this experiment are implemented in Python using Keras and Tensorflow deep learning frameworks and run on aGPU processing unit of 8 cores with 25 GB in RAM and Linux-based OS, provided by google in Colab NotebookAfter training the proposed architectures, we evaluated their performance in terms of accuracy, F1-score, precision, and recall. In addition, we used the SK test to compare these architectures and conduct the best one.
Pneumonia Classification Using Hybrid Architectures
395
Fig. 1. Experimental process.
5 Results and Discussion The finding of this empirical study of HA using “COVID-19 RADIOGRAPHY DATABASE” are presented and discussed in this section. We evaluate the performance of the architectures using the four metrics: Accuracy, F1-Score, Recall, Precision (RQ1) following that the impact of the five deep learning feature extraction techniques on the performance of the four classifiers is assessed in order to identify those that have a positive influence on classification performance (RQ2). Then, the hybrid designs were compared to determine which hybrid architecture performed the best (RQ3). (RQ1): What is the total performance of the developed architectures? We noticed when we used BAGKNN as a based HA: • The greatest accuracy value obtained while using DenseNet201 was FE 98.09%, and the lowest accuracy value obtained when using EfficiantNetVB0 was FE 87.14%. We noticed when we used BAGSVM as a based HA: • The greatest accuracy value obtained while using MobileNetV2 was FE 99.04%, and the lowest accuracy value obtained when using EfficiantNetVB0 was FE 94.28%. We noticed when we used BAGLR as a based HA: • The greatest accuracy value obtained while using MobileNetV2 and DenseNet201 was FE 98,57%, and the lowest accuracy value obtained when using EfficiantNetVB0 was FE 90.23%.
396
C. Taib et al.
We noticed when we used ADADT as a based HA: • The greatest accuracy value obtained while using DenseNet201 was FE 95.47%, and the lowest accuracy value obtained when using VGG19 was FE 91.19%. (RQ2): Is there a deep learning feature extraction approach that obviously exceeds others regardless the classifier? This section will assess the impacts of the five deep learning feature extractor methods on the performance HA in order to identify feature extraction approaches that have a major effect on classification performance. To do this, we used the SK test, which is based on the accuracy scores of each classifier’s hybrid architectures, Furthermore, Figures 2 provide the results of The SK test was used to identify the best CNN techniques among the five used independently of the classifiers. DensNet201 was the top CNN approach according to the SK test, independent of the ensemble technique used. The EfficiantNetVB0 was the least performing CNN approach according to the SK test. (RQ3): Is there a hybrid design that obviously outperforms others, despite of the feature extractor and classifier used? Figure 3 presents the result of the SK test based on the accuracy of the 20 HA using five deep learning techniques (VGG16, VGG19, EfficiantNetVB0, MobileNetV2, DenseNet201) with the bagging classifier used as a base learner (KNN, LR, SVM) and the boosting classifier. We chose to use the Adaboost Classifier with base learner DT. We notice that we obtained 10 classes (a to j). In class A, we get the architectures BAGSVMMV2, BAGLRDN2, BAGLRMV2, BAGSVMDN2, BAGSVMV16. BAGLRV16, BAGKNNDN2, BAGLRV19, and BAGSVMV19 are the architectures for class b. For the class C, we get BAGKNNMV2 and for the class d, BAGKNNV16 and for the class e, BAGKNNV19, and for the class f, we get ADADTDN2, ADADTV16, and for the class g, we get ADADTMV2, BAGSVMEF0, ADADTEF0, and for the class h, we get the architecture ADADTV19, and for the I, we get the architecture BAGLREF0 and for the j, we get the architecture BAGKNNEF0.To conclude the best architecture among the twenty architecture was the BAGSVMMV2 and the worst architecture was BAGKNNEF0.
6 Conclusion and Future Works This research presented and reviewed the findings of an empirical comparison analysis of 20 hybrid architectures for binary classification employing five deep learning techniques (VGG16, VGG19, EfficiantVB0, MobileNetV2 and DenseNet201). All empirical assessments included four performance criteria (accuracy, precision, recall, and F1-score), Skott Knott statistical test. (RQ1): What is the total performance of the developed architectures? The combination of DL tech and ensemble classification present a significant accuracy result. We conduct from our study that when we used deep learning (DenseNet201 and MobileNetV2) as feature extractor combining with the bagging classifier will achieve the best accuracy over the pneumonia’s classification.
Pneumonia Classification Using Hybrid Architectures
397
Fig. 2. Sk test for the feature extractor
Fig. 3. Sk test for the twenty hybrid architectures.
(RQ2): Is there a deep learning feature extraction approach that obviously exceeds others regardless the classifier? The hybrid architecture with the DenseNet201 gives the best result when we used as feature extractor regardless the classifier since they always belong to the best SK cluster furthermore the EfficiantVB0 performed poorly in the feature extraction section. as a result, we recommended the use of DenseNet201 to design HA. (RQ3): Is there a hybrid design that obviously outperforms others, despite of the feature extractor and classifier used? The Hybrid architecture using the MobileNetV2 as feature extractor and BAGSVM as classifier gave the best results by achieving an accuracy of 99.04%. The designs that perform the worst are those that use EfficiantVB0 as FE and are always assigned to the
398
C. Taib et al. Table 1. The overall performance of the twenty architectures.
CNN-techniques
Ensemble techniques
Accuracy
F1-Score
Precision
recall
VGG16
BAGKNN
96.19%
96%
96%
96%
BAGSVM
98.57%
99%
99%
99%
BAGLR
98.33%
98%
98%
98%
VGG19
DensNET201
EfficiantNetVB0
MobileNetV2
ADADT
95%
95%
95%
95%
BAGKNN
95.47%
95%
96%
96%
BAGSVM
98.09%
98%
98%
98%
BAGLR
98.09%
98%
98%
98%
ADADT
91.19%
91%
91%
91%
BAGKNN
98.09%
98%
98%
98%
BAGSVM
98.57%
99%
99%
99%
BAGLR
98.57%
99%
99%
99%
ADADT
95.47%
95%
95%
95%
BAGKNN
87.14%
87%
88%
87%
BAGSVM
94.28%
94%
94%
94%
BAGLR
90.23%
90%
90%
90%
ADADT
93.80%
94%
94%
94%
BAGKNN
97.14%
97%
97%
97%
BAGSVM
99.04%
99%
99%
99%
BAGLR
98.57%
99%
99%
99%
ADADT
94.28%
94%
94%
94%
final SK cluster in the three datasets. As a result of our findings, we recommended using BAGSVMMV2 hybrid architecture for binary classification of pneumonia classification. In the future, we intend to apply similar designs on various types of datasets and, in comparison to this study, to employ different CNN-Techniques for feature extraction with different machine learning algorithms.
References 1. Pneumonia causes 2.5 million deaths around the world each year. https://www.clinic barcelona.org/en/news/pneumonia-causes-2-5-million-deaths-around-the-world-each-year. Accessed 23 June 2022 2. Pneumonia. https://www.who.int/health-topics/pneumonia#tab=tab_2. Accessed 23 June 2022 3. El Asnaoui, K., Chawki, Y., Idri, A.: Automated methods for detection and classification pneumonia based on X-ray images using deep learning, pp. 257–284 (2021). https://doi.org/ 10.1007/978-3-030-74575-2_14/COVER/ 4. Aljaddouh, B., Malathi, D.: Trends of using machine learning for detection and classification of respiratory diseases: investigation and analysis. Mater. Today Proc. 62, 4651–4658 (2022). https://doi.org/10.1016/J.MATPR.2022.03.120
Pneumonia Classification Using Hybrid Architectures
399
5. Qu, Y., Meng, Y., Fan, H., Xu, R.X.: Low-cost thermal imaging with machine learning for non-invasive diagnosis and therapeutic monitoring of pneumonia. Infrared Phys. Technol. 123, 104201 (2022). https://doi.org/10.1016/J.INFRARED.2022.104201 6. Wang, D., Willis, D.R., Yih, Y.: The pneumonia severity index: Assessment and comparison to popular machine learning classifiers. Int. J. Med. Inform. 163, 104778 (2022). https://doi. org/10.1016/J.IJMEDINF.2022.104778 7. Yaseliani, M., Hamadani, A.Z., Maghsoodi, A.I., Mosavi, A.: Pneumonia detection proposing a hybrid deep convolutional neural network based on two parallel visual geometry group architectures and machine learning classifiers. IEEE Access 10, 62110–62128 (2022). https:// doi.org/10.1109/ACCESS.2022.3182498 8. Saul C.J., Urey, D.Y., Taktakoglu, C.D.: Early diagnosis of pneumonia with deep learning (2019). https://doi.org/10.48550/arxiv.1904.00937 9. Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., Asadpour, M.: Boosting methods for multiclass imbalanced data classification: an experimental review. J. Big Data 7(1), 1–47 (2020). https://doi.org/10.1186/S40537-020-00349-Y/FIGURES/5 10. Moral-García, S., Mantas, C.J., Castellano, J.G., Benítez, M.D., Abellán, J.: Bagging of credal decision trees for imprecise classification. Expert Syst. Appl. 141, 112944 (2020). https://doi. org/10.1016/J.ESWA.2019.112944 11. Hatwell, J., Gaber, M.M., Atif Azad, R.M.: Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences. BMC Med. Inform. Decis. Mak. 20(1), 1–25 (2020). https://doi.org/10.1186/S12911-020-01201-2/TABLES/24 12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR - Conference Track Proceedings, September 2014 (2015). https://arxiv.org/abs/1409.1556v6 . Accessed 11 Mar 2022 13. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/S11263-015-0816-Y 14. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, January 2018. https://doi.org/10. 48550/arxiv.1801.04381 15. Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning, ICML 2019, vol. 2019, pp. 10691– 10700, June 2019. https://doi.org/10.48550/arxiv.1905.11946 16. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017, pp. 2261–2269, January 2016. https://doi.org/10.48550/arxiv. 1608.06993 17. Worsley, K.J.: A non-parametric extension of a cluster analysis method by Scott and Knott. Biometrics 33(3), 532 (1977). https://doi.org/10.2307/2529369 18. Chaymae, T., Elkhatir, H., Otman, A.: Recent advances in machine learning and deep learning in vehicular ad-hoc networks: a comparative study. In: Bendaoud, M., Wolfgang, B., Chikh, K. (eds.) The Proceedings of the International Conference on Electrical Systems & Automation, ICESA 2021, pp. 1–14. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-003 9-6_1 19. Taib, C., Abdoun, O., Haimoudi, E.: Performance evaluation of diagnostic and classification systems using deep learning on apache spark. In: Azrar, L., et al. (eds.) Advances in Integrated Design and Production II, CIP 2022. LNME, pp. 145–154. Springer, Cham (2023). https:// doi.org/10.1007/978-3-031-23615-0_15
ResNet-Based Emotion Recognition for Learners N. El Bahri1(B) , Z. Itahriouan2 , and A. Abtoy1 1 SIGL, ENSATE, Abdelmalek Essadi University, Tetouan, Morocco
[email protected] 2 Private University of Fez , Fes, Morocco
Abstract. Aspire to take advantage of this change in several industries thanks to the current advancements in artificial intelligence (AI). An extremely significant field of AI application is created by computer-assisted education and learning. As a result, this last will be able to completely change this industry. Our laboratory is interested in investigating how AI developments are affecting teaching and learning methodologies. In order to do this, We trained ResNet CNN architecture (ResNet50) using “MMA FACIAL EXPRESSION,” which includes thousands of frontal camera classifications based on Ekman and Friesen provided six fundamental emotions (happiness, sadness, anger, neutrality, disgust, fear and surprise), with the goal of detecting learners’ emotions that can enable us to improve the educational learning process.Additionally, we assessed the outcomes of our ResNet model using a variety of assessment measures, including accuracy, loss, confusion matrix, and ROC-AUC. Keywords: Emotion · OCC · DMP · James Lange theory · CNN
1 Introduction The ability of the instructor to understand learners’ emotional states and respond to them is crucial to the learning process. The instructor adapts his or her instructional approach by paying attention to the students’ feelings, expressions, and body language. Additionally, numerous studies have shown that people’s intelligence is just as effective at facilitating learning as emotions, interest rates, and people’s behavior [1]. Emotions, according to Bower and Cohen, influence remembering and decision making [1, 2]. Similar to traditional classroom instruction, e-learning emphasizes the interdependence of cognition and emotion, which is mediated by the social learning context including the instructor, learners and learning material [3]. Many researches studies the correlation between emotional states and learning performance [4–6]. Numerous details concerning subjective personal experience, such as mental state, emotions, opinion, interest, are revealed by facial expressions. Artificial intelligence technology has recently advanced, giving the ability to recognize a variety of human emotions through facial expressions more legitimacy [7]. The changes in a person’s facial expressions in reaction to their internal emotional states, intentions, or social communications are referred to as facial expressions. Since © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 400–410, 2023. https://doi.org/10.1007/978-3-031-43520-1_34
ResNet-Based Emotion Recognition for Learners
401
Darwin’s work from 1872 [8–10]. Behavioral scientists have been actively researching facial expression analysis. In 1978, Suwa et al. [11] described an early effort to analyze facial expression automatically by tracking the mobility of 20 selected spots on an image stream. Since then, significant progress has been made in developing computer systems that may be useful in comprehending and utilizing this organic human communication method [12–15]. Therefore, in order to analyze emotions, higher level knowledge is needed. For instance, in addition to expressing emotion, facial expressions can express physical exertion, intention, mental state, ... [16, 17]. This study intends to produce a system that can identify human emotions from camera as the major objective of our research project is to analyze learners’ emotions from camera using computer vision technology based on AI. To able to build a model that can detect face expression and automatically recognize human emotions in a variety of circumstances, particularly when studying learner personality in online learning, we employ convolutional neural networks. This study provides information on how to develop a model with the highest level of accuracy that can recognize emotions in real time from a frontal camera using ResNet Convolutional Neural Networks architecture. In this paper, we analyze early conceptions of emotion and their many literary representations. In the second section, we discuss techniques for analyzing facial expressions in order to identify learner emotions, particularly those that rely on technology, particularly computer vision and convolutional neural networks. The theories behind this research and how humans can recognize emotion from facial expression are covered in the next section. To be able to build a classification ResNet model and conduct an experimental study, CNN is an AI technology employed in this work. The findings of this research’s experimental study are discussed in the last part.
2 Emotion and Its Models Emotions are the actions of individuals in the world around them. According to Aristotle, the emotion is “what causes his state to become so transformed that his judgment is affected and, that is accompanied by pleasure and pain” [18]. Emotion is defined by Kleinginna Jr. and Anne M. Kleinginna as a complex network of interactions between subjective and objective factors, mediated by neural and hormonal systems, that produce emotional states like feelings of arousal, pleasure, or discomfort, or cognitive processes like emotionally relevant perceptions. Behavior that is often communicative, goal-directed, and adaptive is the result of influences, evaluations, or labeling processes [19]. Based on cross-cultural research on happiness, sadness, anger, fear, surprise, and disgust, Ekman and Friesen proposed six basic emotions as referred above [20]. Personality theories have traditionally placed a strong emphasis on emotions or feelings. There is pain and pleasure at the lowest level, which are more related to sensations than feelings. The root of all other emotions might also come from psychological pain and pleasure (distress and joy) [21]. Numerous emotional models have been created. They aid in the implementation of people’s feelings and responses in various circumstances, such as while confronting an event. We shall discuss many emotional models, including the OCC, MDP, and James-Lange, in this part.
402
N. El Bahri et al.
2.1 OCC The OCC model, one of the most well-known, is used in most studies. Ortoney, Clore, and Collins created this calculating model in 1998 [22]. It identifies 22 different emotional categories. Based on how people react to situations, emotions are classified as either positive or negative. Based on several factors, the OCC model determines the intensity of emotions [23]. The two categories of variables are global and local. Local factors only have an impact on a few emotions; global variables have an impact on all emotions. Locally extracted factors include desirability, praise, and attractiveness, whereas global variables include feeling of realism, closeness, and unexpectedness. Other local factors include like, merit, probability, effort, achievement, cognitive unit strength, expectation deviation, familiarity, and desirability to others [24, 25] . 2.2 DMP The Dualistic Model of Passion [26, 27], the term “psychopathy” is used to describe the high motivation for things that genders like, value, and devote a lot of time and attention to, and is partly derived from the auto-determination theory. Beyond self-determination theory [28, 29], the Dualistic Model of Passion adds a more detailed focus on a specific activity, the sense that it represents a long-standing passion for the person, and the notion that the motivation for this activity is internalized in the identity of the person in a controlled or autonomous way depending on particulars and/or internal personal processes [27]. 2.3 James Lange Theory The autonomic, hormonal, and motor systems’ impressions of physiological body changes were identified by James-Lange theory as emotions. An individual experiences dread and subjectively experiences emotional sensations once they become aware of danger-induced physiological physical changes. James believed that physical changes were consequently necessary for the expression of emotional states: “We feel sad because we cry, furious because we hit, afraid because we tremble, not cry, punch, or shake because we are sorry, angry, or scared, as the case may be” [30]. In the most recent neuroscientific theories of emotion, such as those by Damasio and others [29, 31–34], modern empirical variants of this idea are reappearing.
3 Computer Vision and Deep Learning In machine learning field, the 1990s saw the development and rise of a variety of machine learning approaches that would supplant the neural network. Support vector machines (SVMs), in particular, offered excellent generalization performance and relative freedom from mysterious and difficult-to-choose training parameters [35]. This part presents an overview of computer vision concept and how deep learning have set the stage for unlimited progress in this domain.
ResNet-Based Emotion Recognition for Learners
403
3.1 Computer Vision Computational vision (CV) is the science responsible for the study and application of methods that enable computers to understand images content. This interpretation involves extracting certain features of importance for a given purpose, a visual inspection system requires the input of data (images) normally obtained through cameras, sensors or videos, and the subsequent processing of these data to transform them into the desired information [36]. CV uses various machine learning algorithms to analyze images for scenes, objects, faces, and other content in videos, photos, and pictures in general, the next Figure shows a brief synopsis of the main developments in computer vision over the last 30 years (Fig. 1):
Fig. 1. A rough timeline of some of the most active topics of research in computer vision [37].
Among the many computer vision technologies, which is one of the fields of deep learning widely used today, we will mention some: Image Classification, Object Detection, Face Detection, Object Tracking, Image Segmentation, Optical Character Recognition, Image Reconstruction, Image Restoration [37]. 3.2 Deep Learning Deep Learning a subfield of machine learning concerned with algorithms that are inspired by the structure and function of the brain called neural networks. The currently most relevant types of deep learning such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Multilayer Perceptron (MLPs) [38]. A major conceptual problem in Computer vision and pattern recognition is how to recognize individual objects when those objects cannot be easily segmented out of their surroundings. In general, this poses the problem of feature binding: how to identify and bind together features that belong to a single object. An artificial neural network called a convolutional neural network (CNN) is used to recognize and analyze images. CNNs have numerous connections, and its architecture generally consists of many types of layers, including convolution, pooling and fully connected layers, and realize form of regularization [39]. CNNs’ general architecture is divided into two sections: Fig. 2 depicts feature extractors and a classifier. In the feature
404
N. El Bahri et al.
extraction layers, each layer of the network receives the output from its immediate previous layer as its input and passes its output as the input to the next layer [40].
Fig. 2. The CNN general architecture [41]
3.3 ResNet Architecture Kaiming He et al. developped the Deep Residual Network in 2015. The convolutional layers mostly have 3 × 3 filters, the filters number is doubled to preserve the time complexity per layer. It performs down sampling directly by convolutional layers that have a stride of 2. The network ends with a layer of global average pooling which converts the 2D activity maps of the last layer in the feature extraction stage to an nclasses vector that is used to calculate the probability of belonging to each class and a 1000-way fully-connected layer with SoftMax [42]. There are many residual networks which are different in layers numbers for instance: ResNet-18, ResNet-34, ResNet-50 (as shown in Fig. 3) and ResNet-101.
Fig. 3. ResNet architecture [42].
4 Experiment 4.1 Methodology To detect learners’ emotions from their face expressions, we used in this experiment the transfer learning to train one of the common CNN architectures. This last is a research field in machine learning that aims to transfer knowledge and skills from one or more source tasks and apply them to new tasks. We trained the ResNet model by a set of images from “MMA FACIAL EXPRESSION” dataset [43], that includes thousands
ResNet-Based Emotion Recognition for Learners
405
of frontal camera classified by happiness, sadness, anger, neutrality, disgust, fear and surprise and we evaluated this last by multiple evaluation metrics for instance: Accuracy, loss, Roc-Curve and Confusion Matrix. 4.2 Image Acquisition and Preprocessing To build a model that can classify face images from the front camera, the datasets used is “MMA FACIAL EXPRESSION” [14]. That includes thousands of front camera facial expressions images such as: happiness, sadness, anger, neutrality, disgust, fear and surprise. We chose a set of almost thousands of color images and divided this dataset according to seven emotion types (angry, sad, surprise, happy, fear, disgust, neutral). To seven classes, each of these classes includes almost 150 images, this last contains resized images 48 × 48 where the extension is jpg. 4.3 Training By studying CNN specifications, we decided to use ResNet architecture which is widely used. Training process has been run on the cloud and we used the Jupiter notebook on google collab as the execution platform for the training, which accelerate learning process using GPU or TPU. Due to the ResNet model is included in keras, we used keras API offered in python language which is based on Tenserflow platform. Firstly, we trained the ResNet50 model, where the input to cov1 layer have a size of 48 × 48 × 3 RGB images and the batch size is 32. Before training, all images representing various emotions were adjusted to the model’s parameters, with 50 epochs chosen as the number. We evaluated these models using multiple evaluation metrics: (1) Accuracy and loss which were calculated during the training process, (2) AUC - ROC and Confusion Matrix which were calculated after the generation of the models. We monitored the progress of a training by using the accuracy evaluation metric, where accuracy is represented in function of epochs. Figure 4 represents accuracy variation during training process:
Fig. 4. The ResNet learning curve for the accuracy in training process
406
N. El Bahri et al.
The previous Figure shows that accuracy increases with iterations and stabilizes near 0.95 after almost 38 epochs. This means that the learning process went well and that the predictions made using the validation data are excellent. We also used the loss function which is represented in function of learning iterations. The following Figure shows loss variation during training process (Fig. 5):
Fig. 5. The ResNet learning curve for the loss in training process
As shown in previous Figure, loss decrease with iterations, and it approaches 0 after about 38 epochs. This parameter confirm that learning process went well and that the model is ready to make accurate predictions. • The confusion matrix of All AUs VGG model is as follows (Fig. 6):
Fig. 6. The ResNet Emotion’ classification using the confusion matrix
ResNet-Based Emotion Recognition for Learners
407
The Confusion matrix illustrates many details about our Emotion’ classes, but it does not offer a useful metric in on order to make our decisions. Therefore, we used the Receiver Operator Characteristic (ROC) graphs which summarize and evaluate models by class and globally. Firstly, the y axis represents Sensitivity (True Positive Rate), that means all samples that were accurately classified as actual positive emotion’ class. Secondly, x axis shows (1- specificity). Which mean the rate of True Negative in actual Negative set or true negative samples that have not been correctly classified as true Emotion’ class among all true negatives. THE AUC (area under the curve) allows easy comparison between a real Emotion class curve and another. For example (see Fig. 7), the AUC of the blue roc curve representing the angry class is greater than the AUC of the cyan roc curve representing the surprise indicating that the blue curve is better, as clearly demonstrated by ResNet ROC curve results (see Fig. 7), the highest rate achieved about 0.94 for more than class while the lowest rate is about 0.77. Therefore, our model is about to be an accurate model with a very good test.
Fig. 7. The ResNet Emotion’ ROC curve
5 Results and Discussion As mentioned in the introduction, the main objective of this work is to construct a deep learning model based on transfer learning that can apply knowledge and abilities from one or more source tasks to new tasks that can recognize facial expressions from a front camera. Details about the ResNet model developed through experimentation are illustrated by the results already presented. The different evaluation metrics indicators for our ResNet model are included in Table 3. We have only included the data for the first and last learning epochs because accuracy and loss are calculated for each learning epoch. Even though the initial accuracy value was quite low (nearly 0.3) in the early epochs, it quickly improved after a
408
N. El Bahri et al.
few epochs and eventually reached a very high number (almost 0.91). The loss values experienced the same effect; their initial values were somewhat high (about 3.07) but soon reduced to a very low number (around 0.37). When compared to actual classes and predicted classes, this model provided good accuracy; the highest value was 0.93 for the angry class and the lowest one was 0.78 for the surprise class. Similar to the area under the curve, the ROC score ranges from 0.94 to 0.77 with an average ROC curve of roughly 0.86, indicating that our model was successful throughout training process and produced accurate results. The following table displays the values for each ResNet evaluation metric: Table 1. Accuracy, loss, confusion matrix and ROC score comparison of ResNet model Features
confusion matrix
ROC Score
0.86
0.85
Happy
0.88
0.86
Angry
0.93
0.94
Fear
0.86
0.84
Neutral
0.92
0.91
Disgust
Accuracy
Loss
First value
Last value
First value
Last value
0.3
0.91
3.07
0.37
Sad
0.90
0.89
Surprise
0.78
0.77
6 Conclusion In this paper, we have described the training experiment of CNN architectures ResNet50, classified by the Six basic emotions (happiness, sadness, anger, neutrality, disgust, fear and surprise) were offered by Ekman and Friesen with the purpose of identifying learners’ emotions so that we might enhance the educational learning process. Results shows that our training model is well trained based on its high-test values with an accuracy about 0.86 according to multiple evaluation metrics (accuracy, loss, confusion matrix and ROC score) as clearly shown in Table 1 and graphs based on it. We already have a properly trained model that can recognize emotions from a frontal camera, and we think this model is has a wide range of applications. In many e-learning platforms, the major objective is to identify learner emotions. For instance, Coursera, Skillshare, Learning Management Systems (LMS), LinkedIn Learning and Massive Open Online Courses (MOOC), all of these factors can assist us in accurately determining the emotional states and challenges of learners so that we can create effective teaching strategies to improve online education.
ResNet-Based Emotion Recognition for Learners
409
References 1. Kort, B., Reilly, R.: Analytical models of emotions, learning and relationships: towards an affect-sensitive cognitive machine. In: Conference on Virtual Worlds and Simulation (VWSim 2002), pp. 1–15 (2002) 2. Chaffar, S., Cepeda, G., Frasson, C.: Predicting the learner’s emotional reaction towards the tutor’s intervention. In: Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007), pp. 639–641. IEEE (2007) 3. Bahreini, K., Nadolski, R., Westera, W.: Towards multimodal emotion recognition in e-learning environments. Interact. Learn. Environ. 24(3), 590–605 (2016) 4. LeDoux, J.E.: Emotion, memory and the brain. Sci. Am. 270(6), 50–57 (1994) 5. Kort, B., Reilly, R., Picard, R.W.: An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion. In: Proceedings IEEE International Conference on Advanced Learning Technologies, pp. 43–46. IEEE (2001) 6. Chen, C.M., Sun, Y.C.: Assessing the effects of different multimedia materials on emotions and learning performance for visual and verbal style learners. Comput. Educ. 59(4), 1273– 1285 (2012) 7. Song, Y., Luximon, Y.: Trust in AI agent: a systematic review of facial anthropomorphic trustworthiness for social robot design. Sensors 20(18), 5087 (2020) 8. Scherer, K.R.: Handbook of Methods in Nonverbal Behavior Research. Cambridge University Press, Cambridge (1985) 9. Ekman, P.: The argument and evidence about universals in facial expressions. In: Handbook of Social Psychophysiology, vol. 143, p. 164 (1989) 10. Darwin, C.: The Expression of the Emotions in Man and Animals. University of Chicago Press, Chicago (2015) 11. Suwa, M., Sugie, N., Fujimora, K.: A preliminary note on pattern recognition of human emotional expression. In: Proceedings of the Fourth International Joint Conference on Pattern Recognition, Kyoto (1978) 12. Black, M.J., Yacoob, Y.: Recognizing facial expressions in image sequences using local parameterized models of image motion. Int. J. Comput. Vis. 25(1), 23–48 (1997) 13. Black, M.J., Yacoob, Y.: Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In: Proceedings of IEEE International Conference on Computer Vision, pp. 374–381. IEEE (1995) 14. Donato, G., Bartlett, M.S., Hager, J.C., Ekman, P., Sejnowski, T.J.: Classifying facial actions. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 974–989 (1999) 15. Rosenblum, M., Yacoob, Y., Davis, L.S.: Human expression recognition from motion using a radial basis function network architecture. IEEE Trans. Neural Netw. 7(5), 1121–1138 (1996) 16. Russell, J.A.: Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychol. Bull. 115(1), 102–141 (1994) 17. Carroll, J.M., Russell, J.A.: Do facial expressions signal specific emotions? Judging emotion from the face in context. J. Pers. Soc. Psychol. 70(2), 205–218 (1996) 18. Hartmann, P.: The five-factor model: psychometric, biological and practical perspectives. Nordic Psychol. 58(2), 150–170 (2006) 19. Kleinginna, P.R., Kleinginna, A.M.: A categorized list of emotion definitions, with suggestions for a consensual definition. Motiv. Emot. 5(4), 345–379 (1981) 20. Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17(2), 124 (1971) 21. Solomon, R.L.: The opponent-process theory of acquired motivation: the costs of pleasure and the benefits of pain. Am. Psychol. 35(8), 691 (1980)
410
N. El Bahri et al.
22. Ben-Ze’ev, A.: Describing the emotions: a review of the cognitive structure of emotions by Ortony, Clore & Collins. Philos. Psychol. 3(2–3), 305–317 (1990) 23. Ortony, A., Clore, G.L., Collins, A.: The Cognitive Structure of Emotions. Cambridge University Press, Cambridge (2022) 24. Kesteren, A.J.: A supervised machine-learning approach to artificial emotions. Department of Computer Science, University of Twente (2001) 25. Bartneck, C., Lyons, M.J., Saerbeck, M.: The relationship between emotion models and artificial intelligence. arXiv preprint arXiv:170609554 (2017) 26. Vallerand, R.J.: The Psychology of Passion: A Dualistic Model. Series in Positive Psychology (2015) 27. Vallerand, R.J., Blanchard, C., Mageau, G.A., Koestner, R., Ratelle, C., Léonard, M., et al.: Les passions de l’ame: on obsessive and harmonious passion. J. Pers. Soc. Psychol. 85(4), 756 (2003) 28. Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-Determination in Human Behavior. Springer, New York (2013). https://doi.org/10.1007/978-1-4899-2271-7 29. Craig, A.D.: Interoception: the sense of the physiological condition of the body. Curr. Opin. Neurobiol. 13(4), 500–505 (2003) 30. James, W.: What is emotion? 1884 (1948) 31. Bechara, A.: The role of emotion in decision-making: Evidence from neurological patients with orbitofrontal damage. Brain Cognit. 55(1), 30–40 (2004) 32. Craig, A.D.: Human feelings: why are some more aware than others? Trends Cognit. Sci. 8(6), 239–241 (2004) 33. Craig, A.D.: Forebrain emotional asymmetry: a neuroanatomical basis? Trends Cognit. Sci. 9(12), 566–571 (2005) 34. Niedenthal, P.M.: Embodying emotion. Science 316(5827), 1002–1005 (2007) 35. Cox, D.D., Dean, T.: Neural networks and neuroscience-inspired computer vision. Curr. Biol. 24(18), R921–R929 (2014) 36. Gomes, J.F.S., Leta, F.R.: Applications of computer vision techniques in the agriculture and food industry: a review. Eur. Food Res. Technol. 235(6), 989–1000 (2012) 37. Szeliski, R.: Computer Vision [Internet]. Texts in Computer Science. Springer, London (2011). http://link.springer.com/10.1007/978-1-84882-935-0. Accessed 18 Sept 2020 38. Lee, E.J., Kim, Y.H., Kim, N., Kang, D.W.: Deep into the brain: artificial intelligence in stroke imaging. J Stroke 19(3), 277–285 (2017) 39. Ferreira, A., Giraldi, G.: Convolutional neural network approaches to granite tiles classification. Expert Syst. Appl. 84, 1–11 (2017) 40. Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., et al.: A stateof-the-art survey on deep learning theory and architectures. Electronics 8(3), 292 (2019) 41. Gurucharan, M.: Basic CNN architecture: explaining 5 layers of convolutional neural network (2020). https://wwwupgradcom/blog/basic-cnn-architecture 42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 43. MMA FACIAL EXPRESSION [Internet]. https://kaggle.com/mahmoudima/mma-facial-exp ression.Accessed 19 Sept 2020
Mining the CORD-19: Review of Previous Work and Design of Topic Modeling Pipeline Salah Edine Ech-chorfi(B) and Elmoukhtar Zemmouri ENSAM Meknes, Moulay Ismail University, Meknes, Morocco [email protected], [email protected]
Abstract. The Covid-19 Open Research Dataset (CORD-19) is a dataset conceived in 2020 with the purpose of gathering scientific papers addressing COVID19 and related coronavirus research. The dataset aims to allow the machine learning community to exploit the knowledge expressed by biomedical experts through scientific papers in a natural language format. This paper addresses the problem of extracting knowledge from CORD-19 through different text mining tasks. These tasks vary from simple NLP operations, papers clustering, question answering, text summarization, topic modeling and classification, to knowledge graph extraction. For this purpose, we present a review of different works carried out on CORD-19 and we introduce the design of a new pipeline of different NLP tasks for papers clustering and topic detection preformed on this dataset. Keywords: CORD-19 Dataset · Text Mining · NLP · Clustering · Topic Detection
1 Introduction COVID-19 is a highly infectious disease related to the SARS-CoV-2 virus strain that belongs to the coronavirus family, and chronologically considered the third coronavirus to have serious effects on the world after SARS-CoV (2002) and MERS-CoV (2012). COVID-19 appeared at the end of 2019 and quickly spread across the world after being declared as a global pandemic by the first trimester of 2020, experiencing multiple mutations resulting in more than 10 variants. Since its appearance, biomedical experts have rushed to study the disease and share their knowledge with the research community through scientific papers and scholarly articles. With the growing number of these contributions, it has become a challenge to navigate through the important amount of information and extract the right one for a topic or a specific problem. According to Wagner et al. [1], over 106K articles related to COVID-19 had been published by the end of 2020. Processing such an amount of information requires considerable effort and dedicated time in order to draw its value. On the other hand, machine learning and data mining had proven to be useful in various fields for the extraction of valuable information to the end user. For this purpose, The COVID-19 Open Research Dataset (CORD-19) [2] has been introduced as a collection of papers and preprints related to COVID-19 and other coronaviruses. The © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 411–426, 2023. https://doi.org/10.1007/978-3-031-43520-1_35
412
S. E. Ech-chorfi and E. Zemmouri
dataset allows machine learning and artificial intelligence practitioners to perform data mining on a base of processed scientific publications to contribute the efforts and actions deployed to help extract information from the large number of papers and preprints and make it more accessible to biomedical experts and policy makers. CORD-19 was initially released with approximately 28K papers, and now contains over 710k scholarly articles. In this paper, we present an overview of the CORD-19 dataset and its creation process, a survey of four data mining approaches that have resulted in practical tools ready and already being exploited by users (e.g. search engines), and finally, we introduce a pipeline that performs topic detection alongside multiple NLP tasks on a subset of CORD-19.
2 CORD-19: The Covid-19 Open Research Dataset Following the COVID-19 outbreak, multiple organizations consisting of The White House of Science and Technology Policy, the National Library of Medicine, the Chan Zuckerburg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research and Kaggle, led by The Allen Institute for AI, released the COVID-19 Open Research Dataset (CORD-19) as an early initiative to connect artificial intelligence experts to the biomedical domain specialists in order to extract relevant information from the existing work and boost the research to put an end to the global health pandemic. The goal of the dataset is to gather scientific papers and preprints about COVID19 and other historical coronaviruses in a structured processed format to facilitate the application of AI-based tools to the large biomedical literature published about COVID19. The dataset integrates papers under open access licenses from multiple sources. It was initialized with approximately 73K metadata entries distributed on the following archives: PubMed Central (28.6K), medRxiv (1.1K), bioRxiv (0.8K) and an additional 1.1K from different publishers through the WHO list of papers. Each of these sources uses a different structure of metadata containing information about the paper such as authors, publication date, venue and the documents associated to it. One of the most important fields of the metadata of each paper is its unique identifier within the archive that can be a DOI, PubMed Central ID, PubMed ID, the WHO Covidence, or a MAG identifier depending on the source. For papers retrieval, CORD-19 searched for records that match the following keywords: ‘COVID-19’, ‘Coronavirus’, ‘Corona virus’, ‘2019-nCoV’, ‘SARS-CoV’, ‘MERS-CoV’, ‘Severe Acute Respiratory Syndrome’, ‘Middle East Respiratory Syndrome’ in their title, abstract or full text. By relying on several sources, CORD-19 is able to offer a fairly large amount of data but also runs the risk of gathering a considerable number of duplicated papers coming from different sources as their unique identifiers may vary. After the initial step of the papers collection, a processing phase is mandatory to clean the dataset from duplication and inconsistent metadata. Authors of CORD-19 accomplishes this task by: Clustering the Duplicated Papers. This operation seeks to group the duplicates in one cluster with one unique ID inside the dataset CORD_UID. A unique identifier tuple for each cluster, consisting of 3 identifiers (doi, pmc_id, pubmed_id), is introduced to
Mining the CORD-19: Review of Previous Work and Design
413
determine similar papers across different sources. A paper belongs to a cluster only if all the values of the available identifiers of that paper are similar to the values of the identifier tuple of the cluster. This process may not capture all the duplicates in the set of papers, but CORD-19 chooses to keep a limited number of duplicates than discard a useful paper. Selecting Canonical Metadata for Each Cluster. In some cases, a cluster containing multiple papers may have different metadata for each paper. This operation determines the cluster’s metadata by selecting the one associated to a paper containing document files and the one with the most permissive license. In case of empty fields in the canonical metadata, values from the other metadata entries in the same cluster are assigned to the corresponding fields. The metadata of CORD-19 contains the following fields: cord_uid, sha, source_x, title, doi, pmcid, pubmed_id, license, abstract, publish_time, authors, journal, mag_id, who_covidence_id, arxiv_id, pdf_json_files, pmc_json_files, url, s2_id. Cluster Filtering. Because CORD-19 is only gathering scientific papers and preprints, metadata entries corresponding to non-papers are dropped to keep a clean and consistent dataset. After the processing of the papers’ metadata, CORD-19 addresses the documents associated to them. Some papers in the dataset contain one or more documents, while others may not contain any associated documents. Being retrieved from different sources, these documents may have different formats depending on its origin. The majority of papers are linked to PDF files, while PMC papers are associated to JATS XML files. Each of these formats is processed differently to generate the same JSON file output. For PDF files, CORD-19 uses the pipeline introduced in the creation of the S2ORC dataset [3] consisting of these 3 steps: 1. Parse PDF files to TEI XML using the ML library GORBID 2. Parse TEI XML files to JSON files 3. Clean up the links between inline citations and bibliography entries JATS XML files are also parsed into JSON files using a custom parser.
3 Mining CORD-19: State-of-the-Art Following the release of CORD-19, computer science researchers rushed to develop tools and applications to process the papers stored in the dataset to provide useful information and exploit the knowledge built by the rapidly growing amount of scientific publication. In this section, we provide a list of four important and most cited works that tackled the CORD-19 dataset arranged by their primary task. The choice of these works based on the grounds that they carried out evaluations of their approaches and provided concrete results in their corresponding papers, and they have also resulted in practical tools destined for exploitation by the general public. We present a summary of these works in Table 1 bellow accompanied by detail of their approaches. COVIDASK. COVIDASK (Lee et al.) [4] is a web service with the purpose of answering questions on COVID-19 to assist users to better explore the knowledge stored in
414
S. E. Ech-chorfi and E. Zemmouri
Table 1. Summary table of state-of-the-art works for Mining CORD-19 Dataset - Note that the abbreviations will be detailed in the following subsectionTool & Reference
COVIDASK Lee et al. [4] 2020
Task(s)
Question Answering (QA) Information Retrieval (IR)
Used tools
DenSPI BioBERT BEST
Datasets
CORD-19 PubMed
Evaluation Task
Metrics
Results
QA
EMsent@1
0.3585 (Interrogative) 0.2241 (Keyword)
EMsent@50
0.7736 (Interrogative) 0.5862 (Keyword)
AWS CORD-19 Search Parminder et al. [11] 2021
Document Ranking (DR) Passage Ranking (PR) Question Answering (QA)
Amazon Kendra Amazon Comprehend Medical LDA
CORD-19 BioMed
DR
Precision@20
0.4775 (Keyword) 0.5550 (Interrogative)
Recall@20
0.0459 (Keyword) 0.0582 (Interrogative)
Normalized discounted cumulative gain@20 PR
QA
COVIDScholar Trewartha et al. [15] 2020
Information Retrieval (IR) Document Classification (DC) Keyword Extraction (KE)
BM25 SciBERT RaKUn FastText
CORD-19 MedRxiv Elsevier And Others
DC
KE
0.4380 (Keyword) 0.5357 (Interrogative)
Precision@1
0.4074
Precision@2
0.5370
Precision@3
0.4938
Precision@1
0.3333
Precision@2
0.2593
Precision@3
0.2099
F1-score
>0.73
Precision
>0.74
Recall
>0.75
Accuracy
>0.73
Precision
0.17
(continued)
Mining the CORD-19: Review of Previous Work and Design
415
Table 1. (continued) Tool & Reference
Task(s)
Used tools
Datasets
Evaluation Task
Knowledge4COVID-19 Sakor et al. [18] 2022
Knowledge graph extraction Relationship extraction Prediction
FALCON DDI-BLKG PRGE RML SDM-RDFizer Pubby SPARQL
CORD-19 DrugBank PubMed PMC DBpedia Wikidata Uniport
DDIs prediction
Metrics
Results
Recall
0.33
F1-score
0.2
ROC-AUC
~0.84
Precision
~0.73
Recall
~0.805
F1-score
~0.745
CORD-19 and PubMed articles. The proposed system supports both interrogative and keyword-based queries. COVIDASK preforms two main tasks: Question answering and information retrieval. Question answering is carried by DenSPI [5] built on pre-indexed phrases from CORD-19 and trained on the SQuAD dataset [6]. DenSPI performance is further enhanced with SPARC [7] by enriching the sparse representations of the phrases with lexical information. DenSPI returns results in n-grams format with the answers highlighted in yellow and their sentences in bold weight. For Named Entity Recognition and linking, the models BERN (NER) [9], BioSyn (NEL) [9] and BEST [10] are respectively used to extract biomedical named entities from CORD-19 and PubMed abstracts to facilitate the navigation of the named entities and display those relevant to the question. Lee et al. introduced COVID-19 Questions dataset for the validation of the model; it contains 111 interrogative and keyword-based queries, based on known facts and experimental results about COVID-19, from frequent questions prompted into COVIDASK, questions from the FAQ section of the WHO and the CDC websites, and Kaggles’s CODR-19 challenge. Test results on the Apr. 10 2020 version of CORD-19 have shown that the QA model containing DenSPI and SPARC trained only on SQuAD performed better in interrogative questions. AWS CORD-19 Search. Amazon Web Services (AWS) has launched AWS CORD19 Search (ACS) [11], a search engine for researchers to browse CORD-19 and retrieve answers for their natural language questions. ACS relies on different products previously made by AWS and other technologies to accomplish multiple tasks such as document ranking (DR), question answering (QA), passage ranking (PR), and topic modeling. Amazon Kendra is a search engine and question answering tool, made by AWS for commercial use and adapted to perform on CORD-19. A deep reading comprehension model ranks passages from the retained documents based on their relevance, from which the model extracts the answers. To further boost Amazon Kendra’s performance, another Amazon tool, Comprehend Medical (CM) [12], is employed. It runs on CORD-19 to perform medical domain entity recognition, relationship extraction, and normalization. CM serves the service provided by ACS on 2 levels, as it enriches Amazon Kendra with the extracted medical entities and helps construct a knowledge graph for CORD-19 under the name COVID-19 Knowledge Graph (CKG) [13]. The CKG used to hold over
416
S. E. Ech-chorfi and E. Zemmouri
335K entities and 3.3M relations at the time, and gives access to numerous features to the end user such as article recommendation, citation-based navigation, results ranking by authors and other criteria. An additional step of topic modeling is performed on CODR19 to group the corpus by its relevance to a certain topic. For this purpose, Latent Dirichlet Allocation (LDA) [14] is used to extract topics for the entire corpus after experimenting with a number of topic models (5, 10, 20 topics) and finally eliminating and merging some of the topics to get 10 final topics for CORD-19. COVIDScholar. Trewartha et al. conceived COVIDScholar [15], a corpus launched with over 81,000 processed scientific documents related to COVID-19 from which 1135 documents are retrieved from CORD-19. The papers are processed with multiple NLP techniques for different tasks before being presented to the end user through a search engine interface. For this purpose, the corpus construction follows a pipeline that starts with web scraping the sources of the scientific documents to retrieve newly added papers and pre-prints or update existing ones, repair missing metadata if there is any, which are then cleaned from duplicates and parsed into a common format. This is followed by NLP operations to analyze the papers mainly by their abstracts. Relevance scoring is performed by MB25 and a classification model to rank the papers based on their abstracts and their relevance to COVID-19. Unsupervised document embedding is used to determine the cosine distance between phrases to better rank the papers according to their relevance to a topic. For this task, FastText [16] is used to generate the embeddings, which are also ran through a tool based on Embedding Projector to visualize the relationship between biomedical terms in a 2D-3D space for the end user. The papers are also classified by their abstracts into 5 categories (disciplines) using SciBERT [17], a fine-tuned model trained on multidisciplinary scientific corpus. KNOWLEDGE4COVID-19. Knowledge4COVID-19 [18] is a framework destined to centralize heterogeneous data related to COVID-19 in a knowledge graph (KG) format to extract adverse effects of the treatments and the combinations of different drugs prescribed for the disease and provide more insight on drug-drug interactions and their impact on each other. Knowledge4COVID-19 is composed of multiple data ecosystems (DE), a pipeline for creating a knowledge graph (Knowledge4COVID-19 KG) integrating the data into one structure, and multiple services provided for the end users. Each of these components plays a crucial role in the process to create the final KG and provide ways to explore it. The framework can be divided into 3 parts: (1) Knowledge4COVID-19 DE. The Knowledge4COVID-19 DE contains 2 data ecosystems: Scientific Open Data DE and Scientific Publications DE. These data ecosystems represent different sources of data, and each DE has its own data operators, metadata, and services to process data from the corresponding source. • Scientific Open Data DE: This DE targets open data sources to extract medical concepts about the interactions of drugs and the impacts of their combinations on their effects. The DE retrieves drug-drug interactions (1.276.052), drug indications (2.421), and toxicities (1.532) from DrugBank [19], which is a database containing various information about drugs like their identification, indication, associated conditions, toxicity, side effects and impacts of their interaction with
Mining the CORD-19: Review of Previous Work and Design
417
other drugs to name a few Drug-drug interaction expressed in natural language and adhering to a set of defined patterns such as “The therapeutic efficacy of Alfacalcidol can be decreased when used in combination with Beclomethasone dipropionate”1 . SIDER [20] is another data source that provides the DE with possible side effects of drugs (58.945). Additionally, biomedical terms (4.536.579) along with their definitions and other properties were collected from the Unified Medical Language System (UMLS) [21], a repository of biomedical vocabularies related to medical concepts and the relations among them. Some drugs from DrugBank are divided into 2 categories: CRD are the drugs that target at least one protein of the CYP family, and NCRD are the ones that target at least one protein but it’s not from the aforementioned family. The data acquired from DrugBank are processed with FALCON [22], a Natural Language Processing NLP tool that preforms NER and NEL tasks on short text and link the extracted entities to their terms in UMLS. From each input, FALCON returns the words corresponding to 4 concepts: Percipient drug which is the drug generating the impact, Object drug is the drug whose effect is impacted, the Impact is the change resulting from the interaction of the drugs, and the Effect is the area of the change. FALCON is customized to recognize 320 patterns of the short text retrieved from DrugBank and extract the present Drug-Drug interactions (DDIs). For the given example above, FALCON extracts the following words in Fig. 1.
Fig. 1. Example of a drug-drug interaction extracted by Falcon from natural language
FALCON outputs a representation of DDIs for a deductive system to discover new DDIs in a treatment. This deductive system is based on a set of Datalog predicates and Datalog rules. The predicates are harvested from the knowledge extracted by FALCON and are used to explain the relationship among drugs. Figure 2 shows an example of a predicate and its explanation. These Datalog rules define the effect of the combination of the predicates, which is a deduced predicate. The following rule (Fig. 3) denotes that if a drug A increases the toxicity of a drug B in a treatment T and the drug B increases the toxicity of a drug C in the same treatment T, the drug A increases the toxicity of the drug C in the treatment T. This set of rules allows the system to generate new predicates form the existing ones. The exploitation of the drug entities and the predicates results in the creation of a graph representing a treatment containing multiple drugs. The nodes represent the drugs and the edges are the predicates denoting the relationships between them. Furthermore, the edges are directed from the Percipient drug to the Object drug. From the established 1 https://go.drugbank.com/drugs/DB01436.
418
S. E. Ech-chorfi and E. Zemmouri
Fig. 2. Example of a predicate interpretation
Fig. 3. Example of a Datalog rule
knowledge at this level, the deductive system can generate new predicates (edges) linking drug entities (nodes) and in some cases, previously unrelated drugs. The application of this system, in addition of generating new insight about the effects of DDIs, has shown that the middle-vertex of a wedge, which is a drug that is a percipient of a DDI and an object of another DDI, is the drug that has the most impact on the effectiveness or toxicity of the treatment. The Scientific Open Data DE returns DDIs forming a KG to be exploited for the creation of the Knowledge4COVID-19 KG. • Scientific Publications DE: The purpose of this DE is to harvest knowledge from scientific papers and publications. The DE relies on multiple data sources which are CORD-19 (460.772), PubMed (106.150), and PMC (26.105) for paper and MeSH (1.356.578), Gene Ontology (125.629), and Disease Ontology (5,129) for annotations. NER, NEL, and RE, using the NLP tools MetaMap [23] and SemRep [24], are performed on titles, abstracts, and full text of the papers and the biomedical entities are linked to their corresponding terms and information from UMLS. These annotations of the papers from UMLS covered drugs, relations among drugs, side effects, disorders, and phenotypes. The metadata such as authors, publication venue, and date are also used to characterize the scientific papers. This step generates a set of triples forming the Scientific Publications Graph which contained some erroneous data due to the challenges of the tools. As a solution to this problem, Path Ranking Guided Embeddings (PRGE) [25] which uses PaTyBRED [26], a path ranking solution, to generate confidence scores ranging from 0 to 1 for the triples and Translating Embeddings for Modeling Multi-relational Data (TransE) [27] for graph embedding that takes in consideration the confidence scores of the triples through its loss function when creating the graph. By selecting the value 0.5 as a threshold for confidence scores, 60% of the triples are kept while the rest were marked as erroneous. Now that the first knowledge graph extracted from scientific publications containing relations among different entities is in place, the next step is to predict new DDIs from the existing data. The goal of this operation is to classify drug pairs as interacting or non-interacting. Drug-Drug Interaction prediction on a Biomedical Literature Knowledge Graph (DDI-BLKG) [28] is the method used to transform paths of semantic relations linking drug pairs as
Mining the CORD-19: Review of Previous Work and Design
419
feature vectors representing the frequency of the relations present in the paths. In this case there are 35 unique relation types extracted from UMLS and the maximum path length is 3. Every pair of drugs has a feature vector with the length of 3 × 35. Each time a relation type is found in the path, its corresponding position in the feature row is marked as 1. This process generates feature rows for every drug pair, which are then fed to a Random Forest classier to determine whether the drugs are interacting or not. The training set is composed of the interactions captured in the previous process that are also denoted as positive in DrugBank with the condition that at least one of the drugs in the pair falls under the category of COVID-19 experimental treatments in DrugBank. The rest of the interactions are used for testing. The application of the classifier generated 8.925 new DDIs, with different confidence scores ranging from 0 to 1. With a threshold of 0.5, only the pairs with a confidence score higher than 0.5 are considered to be possibly interacting. The Scientific Publications DE generates another set of DDIs and the Scientific Publications KG to be integrated into the final KG. (2) Knowledge4COVID-19 KG. This section focuses on the pipeline used to create the final KG. The early steps of the pipeline, focusing on knowledge extraction from natural language, text are executed at the Scientific Open Data DE and the Scientific Publications DE. The outputs of these steps represented in DDIs and publications metadata are passed to FALCON for NER and NEL. In order to solve entity alignment issues, FALCON extracts 12.223.409 annotations from UMLS, and establishes links to existing KG such as DBpedia (3.739.445), Wikidata (3,476,435), Uniport RDF KG, and DrugBank (3,427). To unify the heterogonous data, the shared data sources are mapped to a unified schema. The Knowledge4COVID-19 unified schema is a collection of concepts that provide abstract representations for different entities from different data sources. These concepts represented in the unified schema by classes stand for biomedical and scientific entities such as treatments, drugs, side effects, scientific publications, and DDIs. The properties of the classes are used to define relations between the different concepts. The mapping of the data to the unified schema is done through 57 RML [29] triple maps, and 223 mapping assertions. The SDM-RDFizer [30] tool, with to purpose of transforming the shared data into an RDF graph, creates the final KG by executing RML mapping rules. Finally, the KG is published using Pubby, a Linked Data server that adds an HTML interface and dereferenceable URLs on top of RDF data. (3) Knowledge4COVID-19 services. The Knowledge4COVID-19 framework provides a number of services for the exploration of the KG. The end user can visualize the interactions among drugs and conditions in a form of a graph where the nodes represent the drugs and the conditions’ treatments and the edges denote the DDIs. This representation allows the users to examine relevant adverse effects and toxicities resulting from the combinations of drugs. The framework also offers the exploring of the KG by APIs and SPARQL queries.
420
S. E. Ech-chorfi and E. Zemmouri
4 Topic Detection on CORD-19 In this section, we introduce a pipeline to perform Topic Detection on a subset of CORD19. For this purpose, we first run a series of data cleaning operations and NLP preprocessing tasks to process natural language text such as Tokenization, Lemmatization, Language Detection, stop words removal, and Clustering to prepare the data for the final step of Topic Detection. The pipeline in Fig. 4 presents the steps used to process the CORD-19 documents. The pipeline takes as input the CORD-19 metadata and the JSON files representing the scientific papers.
Fig. 4. The pipeline summarizing the operations performed on CORD-19
4.1 Cleaning the Data The first step consists of cleaning the data from noise. Since documents originating from PDF files are hashed using the SHA-1 algorithm which generates a unique identifier for each document. To clean the dataset from empty data, duplicates, and inconsistent data, we use this property to eliminate papers that don’t have an associated PDF file, or those who share the same associated PDF file, or papers with multiple associated files. We cycle through the dataset metadata and drop the records with empty SHA fields. We also remove records that share the same SHA value or possess multiple SHA values. Based on the remaining papers associated with a unique file, we check for missing JSON files in the dataset. If a file appeared to be missing, its corresponding paper is also discarded. We perform a character and word count to analyze the papers’ distribution. 93.55% of the papers contain a body length varying between 1000 and 60000 characters. To avoid bias related to outliers, we discard all papers with a character count lower than 1000 or higher than 60000 characters as shown in Fig. 5.
Mining the CORD-19: Review of Previous Work and Design
421
Fig. 5. The corpus distribution before and after removing outliers
4.2 NLP Preprocessing Stop words are the most common used words that don’t have a specific context meaning. They can carry low-level information and be irrelevant to any specific topic. To remove the stop words from the abstract and body texts of the papers, we use the Natural Language ToolKit (NLTK) library standard stop words containing 40 elements. Considering the nature of our data, we extend this set with 45 words commonly used in scientific research fields that are not related to any specific topic such as “i.e.”, “e.g.”, “fig”, “al.” to cite a few. In this step we run basic NLP operations to transform the natural language text into machine readable data. We use spaCy, an open-source library for NLP, to perform Tokenization, Lemmatization, Language detection (to retrieve English papers only), and finally stop words removal. After this step, we obtain the processed abstracts and body texts of the CORD-19 papers. A series of words removal operations would naturally modify the length of most papers. To clean the outliers once again, we remove all papers containing less than 100 words and more than 4000 words. After analyzing the new distribution, 93.35% of the papers are preserved (Fig. 6).
422
S. E. Ech-chorfi and E. Zemmouri
Fig. 6. Papers distribution after NLP preprocessing
4.3 Vectorization For this step, each document (paper) is represented in latent vector space. In this case, the corpus contains multiple documents with different lengths. Multiple documents embedding models are considered to perform this task such as TF-IDF, Sent2Vec [31], n-gram embedding, and Doc2Vec [32]. The Doc2Vec vectorizer showed better results in information retrieval and sentiment analysis compared to several state-of-the-art methods. We employ Doc2Vec for this purpose. The model is trained 10 times with a vector size of 300 and a minimum word count of 2.
Fig. 7. Distortion score Elbow for K-means clustering
4.4 Papers Clustering In this step, the papers are divided into clusters to determine the dominating topics in each cluster. We use the K-means elbow method on the papers’ embeddings to determine
Mining the CORD-19: Review of Previous Work and Design
423
the optimal number of clusters for the corpus. The graphical representation (Fig. 7) of the distortion which is the average of the squared distances from the cluster centers of the respective clusters shows that the optimal K value is 11. We plot the distribution, shown in Fig. 8, of the papers in each cluster with their K-Means labels embedded by T-SNE into a 2-dementional space.
Fig. 8. Visualization of the clusters generated by K-means
4.5 Topic Detection The goal of this step is to discover a description of the clusters obtained from the previous step. For this, we run an LDA model on each cluster to determine the dominating topics within this cluster. The model generates 5 topics described by 10 words and their respective weights. Figure 6 shows the most dominant words of each topic in each cluster. After analyzing the deducted topics, we notice that most clusters have one dominant topic, and in some rare cases a cluster may have 2 topics with considerable difference in their dominance. Hence, we can label (describe) each cluster with the list of top words of corresponding topic. For example (taking only top 5 words), cluster1 can be labeled by: {model, data, cases, epidemic, infected} (see Fig. 9).
424
S. E. Ech-chorfi and E. Zemmouri
Fig. 9. Top 10 keywords for each topic in each cluster generated by LDA
Mining the CORD-19: Review of Previous Work and Design
425
5 Conclusion In this paper, we addressed the problem of processing the data stored in CORD-19 by, first, providing and overview of the dataset and its creation process, and then reviewing state of the art works done of the dataset, and finally, introducing a pipeline to perform a specific machine learning technique, which is in our case, Topic Modeling. The pipeline consists of basic NLP operations, clustering, and topic modeling techniques to sort the papers by topic and extract the dominant topics in the dataset. The source code of the proposed pipeline is available in a GitHub repository (https://github.com/echchorfisal ahedine/CORD19TopicModelingPipeline.git). For future work, we are determined to focus on knowledge graphs extraction techniques to find new ways to construct KGs from natural language text and enhance the performance of existing processes, and also tackle the challenge of evaluation the accuracy of the extracted information. We will also focus on the application of our future research on the biomedical domain, considering its importance for humanity’s existence, and the accessibility to large numbers of data generated by the ongoing efforts of the biomedical community.
References 1. Wagner, C.S., Cai, X., Zhang, Y., Fry, C.V.: One-year in: COVID-19 research at the international level in CORD-19 data. PLoS ONE 17(5), e0261624 (2022). https://doi.org/10.1371/ journal.pone.0261624 2. Lu Wang, L., et al.: CORD-19: the Covid-19 open research dataset. ArXiv [Preprint] 22 April 2020. arXiv:2004.10706v2. PMID: 32510522; PMCID: PMC7251955 3. Lo, K., Wang, L.L., Neumann, M., Kinney, R., Weld, D.S.: S2ORC: the semantic scholar open research corpus. arXiv preprint arXiv:1911.02782 (2019) 4. Lee, J., et al.: Answering questions on COVID-19 in real-time. arXiv preprint arXiv:2006. 15830 (2020) 5. Seo, M., Lee, J., Kwiatkowski, T., Parikh, A., Farhadi, A., Hajishirzi, H.: Real-time opendomain question answering with dense-sparse phrase index. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4430–4441 (2019) 6. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016) 7. Lee, J., Seo, M., Hajishirzi, H., Kang, J.: Contextualized sparse representation for real-time open-domain question answering. In: ACL (2020a) 8. Kim, D., et al.: A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE Access 7, 73729–73740 (2019) 9. Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. In: ACL (2020) 10. Lee, S., et al.: BEST: next-generation biomedical entity search tool for knowledge discovery from biomedical literature. PLoS ONE 11(10) (2016) 11. Bhatia, P., et al.: AWS CORD-19 search: a neural search engine for COVID-19 literature. In: Shaban-Nejad, A., Michalowski, M., Bianco, S. (eds.) W3PHAI 2021. SCI, vol. 1013, pp. 131–145. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93080-6_11 12. Bhatia, P., Celikkaya, B., Khalilia, M., Senthivel, S.: Comprehend medical: a named entity recognition and relationship extraction web service. In: 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1844–1851 (2019)
426
S. E. Ech-chorfi and E. Zemmouri
13. Wise, C., et al.: COVID-19 knowledge graph: accelerating information retrieval and discovery for scientific literature. arXiv preprint arXiv:2007.12731 (2020) 14. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993– 1022 (2003) 15. Trewartha, A., et al.: COVIDScholar: an automated COVID-19 research aggregation and analysis platform. arXiv preprint arXiv:2012.03891 (2020) 16. Bojanowski, P., et al.: Enriching word vectors with subword information. arXiv preprint arXiv: 1607.04606 (2016) 17. Beltagy, I., Lo, K., Cohan, A.: SciBERT: pretrained language model for scientific text. EMNLP. eprint: arXiv:1903.10676 (2019) 18. Sakor, A., et al.: Knowledge4COVID-19: a semantic-based approach for constructing a COVID-19 related knowledge graph from various sources and analyzing treatments’ toxicities. J. Web Semant. 75, 100760 (2023) 19. Wishart, D.S., et al.: DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. (2018). https://doi.org/10.1093/nar/gkx1037 20. Kuhn, M., Letunic, I., Jensen, L.J., Bork, P.: The SIDER database of drugs and side effects. Nucleic Acids Res. (2015). https://doi.org/10.1093/nar/gkx1037 21. Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. (2004). https://doi.org/10.1093/nar/gkh061 22. Sakor, A., et al.: Old is gold: linguistic driven approach for entity and relation linking of short text. In: Proceedings of the 2019 NAACL HLT (Long Papers), pp. 2336–2346 (2019) 23. Aronson, A.R.: MetaMap: mapping text to the UMLS metathesaurus. Bethesda, MD: NLM, NIH, DHHS, vol. 1, p. 26 (2006) 24. Arnold, P., Rahm, E.: SemRep: a repository for semantic mapping. In: BTW, pp. 177–194 (2015) 25. Bougiatiotis, K., Fasoulis, R., Aisopos, F., Nentidis, A., Paliouras, G.: Guiding graph embeddings using path-ranking methods for error detection in noisy knowledge graphs. arXiv preprint arXiv:2002.08762 (2021) 26. Melo, A., Paulheim, H.: Detection of relation assertion errors in knowledge graphs. In: Proceedings of the Knowledge Capture Conference, pp. 1–8 (2017) 27. Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Conference on Neural Information Processing Systems, pp. 2787–2795 (2013b) 28. Bougiatiotis, K., Aisopos, F., Nentidis, A., Krithara, A., Paliouras, G.: Drug-drug interaction prediction on a biomedical literature knowledge graph. In: International Conference on Artificial Intelligence in Medicine (2020) 29. Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF mappings of heterogeneous data. In: Workshop on Linked Data on the Web Co-located with WWW (2014) 30. Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana, D., Vidal, M.-E.: SDM-RDFIZER: an RML interpreter for the efficient creation of RDF knowledge graphs. In: ACM International Conference on Information & Knowledge Management (2020) 31. Moghadasi, M.N., Zhuang, Y.: Sent2Vec: a new sentence embedding representation with sentimental semantic. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 4672–4680. IEEE, December 2020 32. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR, June 2014
Demand Forecast of Pharmaceutical Products During Covid-19 Using Holt-Winters Exponential Smoothing Anas Elkourchi(B) , Moulay Ali El Oualidi, and Mustapha Ahlaqqach LARILE, ENSEM, UH2C, Casablanca, Morocco [email protected], [email protected]
Abstract. In the pharmaceutical industries, the demand forecasting is essential to optimize and manage complex business processes. Unplanned events such as covid-19 pandemic are most usually accompanied by supply chain disruption and high demand volatility. The aim of this paper is to provide some recommendations that can facilitate and support the demand forecast of the medicines for human use during the covid-19 pandemic. In this paper, we will focus on one of the most usable and common medicament antibiotic in the treatment of covid-19 which is Azithromycin and we will analyze the consumption from 2017 to 2020 and apply Holt-Winters exponential smoothing model to predict the demand of that medicine in the future. As a result, we found that the azithromycin consumption in March 2020 was 2.17 times higher compared to the average of azithromycin consumption in March 2017–2019. We have notified that before covid-19 pandemic during 2019 the model was fitting well the real consumption with MAPE of 12.64. The outliers starting from Feb 2020 are mainly due to the beginning of covid-19 and the customer behavior based on the news. This high forecast volatility during this period can be handled and won’t have a big impact on the market thanks to the safety stock each markets have. On the other hand, we see that from July 2020 until end of the year our model is converting to the real consumption. So we can conclude that Holt-Winters exponential smoothing model can also be applicable in such pandemic. Keywords: Pharmaceutical · Covid-19 · Demand · Forecast · Azithromycin · Holt-Winters
1 Introduction The pharmaceutical industry is experiencing high forecast errors due to some external factors, politics, regulations, technological changes and the new product launches with the same formula. Due to frequent change in the forecast over time, the supply chain is disrupted and the medicines are facing both situational shortages (stock-out) and overstocks (excessive inventory) [1, 2]. Covid-19 became a pandemic in a few months after it was detected in Wuhan, China in December 2019 [3, 4]. The social and economic situations have been changed, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 427–437, 2023. https://doi.org/10.1007/978-3-031-43520-1_36
428
A. Elkourchi et al.
supply chains around the world have faced big challenges and difficulties in adapting the demands and needs of the locked down world. Based on a report shared by the International Civil Aviation Organization (ICAO) in 2020, the supply chain activities have heavily been influenced by COVID-19, including the pharmaceutical sector [5]. Covid-19 had been led to a shortage of medicines, an important public health problem. These are especially important in relation to exceptional situations (such as pandemics) that convert to a significant increase in the next demand for some medicines. In order to monitor the burden on each country’s healthcare system, it is important to closely track the demand for medicines used for treatment. Therefore, more and more models are being applied around the world to help predict COVID-19 mortality, cases, and medicines demand [6]. This study has been developed in the context of such an unplanned event – the COVID-19, it helps to set some recommendations for forecasting the demand of the medicines that is important to correct the adjustments in the manufacturing (packaging plan) and distribution of medicinal products in order to avoid or at least minimize the impact of shortages. Since the beginning of the COVID-19 pandemic, there has been growing concern about the potential consequences of the use of antimicrobial [7]. Azithromycin which is one of the most useful antibacterial has reported favorable results in patients treated with Covid-19 [8, 9]. In the first section, we will review the methodology for forecasting the demand of the medicines during the COVID-19. In the second part, we will analyze the azithromycin consumption from 2017–2020 and apply Holt & Winters model to predict the next demand in the future. Before moving to conclusion, we will discuss and analyze the result of our study.
2 Literature Review 2.1 Methodology for Demand Forecasting of the Medicines During Covid-19 The pandemic of Covid-19 had created an unprecedented demand for modern medical resources. Thus, there had been much debate and research about how to best treat the patients touched by COVID-19 [10]. Since there are key uncertainties regarding actual future demand for the drug, it is imperative that public health prepare for a potentially large increase in demand that requires realistic projections of the demand for some medicines using the accurate information available. Knowing that the ultimate purpose of demand forecasting is to inform the site production in advance to adapt their supply plan with the latest change, the recommendation is to forecast the next demand for at least a 6-month period so the site can have enough time to react to any change in demand. During the actual pandemic of Covid-19, it’s recommended to calculate the demand forecast based on the number of ventilator and non-ventilated COVID-19 patients expected to be treated in the ICU, the average consumption of each medicine in relation to the COVID-19 pandemic and the demand expected for non-COVID-19 patients. It also takes into account the impact of inventory levels on the next demand (in the case of overstock or planned replenishment) [11].
Demand Forecast of Pharmaceutical Products During Covid-19
429
The demand forecast of the medicines is very important for an effective drug supply chain, especially during epidemic situations such as Covid-19 in order to avoid any supply shortages. An integrated approach is recommended taking into account: • what is needed for patients’ treatment. • what is needed to restore the planned minimum stock level/target. • what is actually available in the stocks at market level (hospitals, wholesalers). a. The Requirement for Non COVID-19 Patients For non-COVID-19 patients, the forecast should be based on the past historical demand/sales, preferably matching the same month of the past years in order to reflect the seasonal demand, especially the important change (increase/decrease) over time. b. The Requirement for COVID-19 Patients For COVID-19 patients, the forecast should be based on the average drug consumption per patient per day and the estimated number of patients/days. However, in case the forecast of the total number of COVID-19 patients in need of ICU care is not known, then it is permissible to make a demand forecast based on the total capacity of the ICU. c. The Current Stocks Goods already on hand at hospitals, wholesalers and manufacturers. If there is a strategic stock reserve, this should also be considered an availability stock. In some Member States, information from a stakeholder may not be available/accurate or may be too hard to have it, in which case any missing data should be highlighted. If the data is not available for a particular stakeholder, it should be assumed that the stock is not available, in the worst case scenario. d. Safety Stocks Safety stock, also known as buffer stock, consists of additional stock that is held to address both supply and demand uncertainties in order to avoid stock-outs [12]. This is the minimum amount of drugs that needs to be present in the national supply chain to ensure continuous supply to patients at all times. If we have the available inventory, but we do not know the planned minimum inventory level, do not consider inventory effects when calculating aggregate demand so that you do not underestimate expected demand. The impact of inventory levels should be considered in light of the currently available inventory levels and the safety stock level. If the available goods we have on hand exceeds the safety stock at market, it means that there is a surplus supply that can be consumed without the need for a new replenishment. If the safety stock exceeds the current inventory, it means that the market will increase its inventory during the next forecast period. 2.2 Prediction Model a. Method of Smoothing In time series forecasting models, the data typically show a trend pattern which can
430
A. Elkourchi et al.
be either an uptrend or a downtrend pattern. The model of Holt-Winters exponential smoothing is a forecasting method that uses exponential smoothing based on the forecast results of the previous demand period. This model also treats the seasonal data pattern [13]. Smoothing methods are used to predict time series data that include trend patterns, seasonal patterns, or both at the same time. Smoothing takes the mean over several years to determine the value for a particular year. The smoothing model is divided into two parts. The smoothing method and exponential smoothing method. Forecast data affected by seasonal or trend patterns are performed using the exponential smoothing method. In this method, the weights of the historical data are different, and the characteristics of these weights decrease exponentially. b. Method of Holt-Winters multiplicative Exponential Smoothing The Holt-Winters prediction model was developed by Charles Holt & Peter Winters. Exponential smoothing is a method of smoothing time series data that assigns exponentially decreasing weights and values to historical data. There are three types of exponential smoothing. The first one is a single exponential smoothing time series forecast of univariate data. This model is used when time series data does not have a systematic structure and the data do not show trends and seasonality. This method of exponential smoothing uses only one single parameter α in the range 0 to 1 as the smoothing factor. The smaller the α value, the slower the learning, and more past observations are required to estimate it. However, the larger the value, the faster the learning, and more recent observations are needed to make the estimation [14]. The second type is the double exponential smoothing method, which uses another smoothing parameter γ for trend changes in addition to α. There are two kind of trends: additive trends that provide linear trend analysis and multiplicative trends that provide exponential trend analysis. In the long run, multi-level forecasts have observed that this trend is not feasible. Therefore, attenuation can help by reducing the trend size in a straight line (no trend) for future predictions. Then, we have the third and last type of exponential smoothing which is the triple exponential smoothing. This model is used when we have seasonal variations in our time series so it’s taking it into consideration. The triple exponential smoothing method rely on three parameters: α, γ, and δ. The range of these values is between 0 to 1, 0 < α, γ, δ < 1 [15]. Holt-Winters Triple Exponential Smoothing is the latest exponential smoothing method developed by Charles Holt and Peter Winters, which help us to find patterns of changing levels “L”, seasons “S” and trends “T” over time by using additive or multiplicative seasons. • Exponential smoothing of original data (at the time t) t + (1 − α)(Lt−1 + Tt−1 ) Lt = α SYt−s
(1)
• Trend patterns smoothing (at the time t) Tt = γ (Lt − Lt−1 ) + (1 − γ )Tt−1
(2)
Demand Forecast of Pharmaceutical Products During Covid-19
431
• Seasonal patterns smoothing (at the time t) St = δ TLtt + (1 − δ)St−s
(3)
• So, the p-period forecasting forward is
Y t+p = (Lt + pTt )St−s+p
(4)
where (0 < α, γ , δ < 1). 2.3 Model Evaluation In this paper, we will focus on one of the most widely used measure to evaluate a model which is MAPE (Mean Absolute Percent Error) [16]. It’s also known as a measure of the forecast accuracy of a statistical forecasting technique. Accuracy is usually expressed as a ratio defined by the following formula: 100% n At − Ft MAPE = t=1 n At where Ft is representing the predicted value and At is referring to the actual value. Their difference is divided by the At value. The absolute value of this ratio is summed for each predicted time point and divided by the number of n fitted points.
3 Prediction of Azithromycin Demand During Covid-19 Using Holt-Winters Model 3.1 Data Collection In this research, the data was used based on azithromycin consumption supplied to Croatia (total population of 4 million in 2020) and were taken from the electronic database of IQVIA Adriatic d.o.o. collects data on the quantity of individual medicine distributed from the wholesale pharmacies to all hospital and retail pharmacies in Croatia [17]. On the other hand, for the covid-19 cases in Croatia the data was collected from worldometers worldwide website from February 2020 until end of the year [18]. 3.2 Data Analysis The analysis of the azithromycin consumption data was carried out with the help of google colab (Python) and Microsoft Excel. As we can see in the Fig. 1, in the period from 2017–2020, total azithromycin sold per year have increased from 2017 to 2020. There was an increase of 3.83% from 2019 to 2020, reaching 24063 units of azithromycin. On the other hand, there was an 6.02% increase from the average of 2017–2019 to 2020. This 6.02% increase corresponds to a total of 37038 units of azithromycin. The monthly pattern of azithromycin distribution was quite different in 2020 compared to the previous three years (Fig. 1). Azithromycin distribution in 2020 was highest in March and November. The number of COVID-19
432
A. Elkourchi et al.
Fig. 1. Azithromycin consumption
cases in Croatia started to increase exponentially from February 2020 and the epidemic reached its peak in November 2020 with 79126 cases, respectively (Fig. 1). The azithromycin consumption during March 2020 was 2.17 times higher compared to the average of azithromycin consumption in March 2017–2019 (134195 EA vs 61640 EA), 1.89 times higher in November 2020 (91527 EA vs 48639 EA), respectively (Fig. 2). In May 2020 the consumption of azithromycin was not just lower than the other months of 2020, but also lower than the average consumption in May 2017–2019 (7031 EA vs 36788 EA) and it’s the same for July 2020 compared to the average in July 2017– 2019 (16492 EA vs 29796 EA) (Fig. 2). In fact, the consumption of azithromycin in May-July 2020 was significantly lower compared to the average of 2017–2019.
Fig. 2. Azithromycin consumption in 2020 vs 2017–2019
Demand Forecast of Pharmaceutical Products During Covid-19
433
While plotting the azithromycin consumption from 2017 to 2020 (Fig. 1) we can clearly see that the data is influenced by both trend and seasonality patterns. The time series method for forecasting that is often used when we have trend and seasonality is the Holt-Winters exponential smoothing method especially in the pharmaceutical industries [1]. To predict the next demand of azithromycin, we divided the data in two parts (Fig. 3): • Train data from January 2017 to Feb 2019 [0:26] plotted in Blue (Fig. 3) in which our model will be trained based on historical data (reflecting trend, seasonality, demand change over time). • Test data from March 2019 to November 2020 [26:46] plotted in Orange (Fig. 3) where we will test the accuracy of our model that was trained before.
Fig. 3. Data split
As stated above, there is two types of exponential smoothing (additive and multiplicative). To determine the exponential smoothing parameters, we use the function seasonal decompose in Python. Based on the below Fig. 4, we can conclude that both trend and seasonality are multiplicative since the plot is not linear.
434
A. Elkourchi et al.
Fig. 4. Trend & Seasonal
4 Results and Discussion While plotting the Holt-Winter model prediction for the azithromycin consumption in Python (Fig. 5) we have highlighted the forecast model in green. As our data is influenced by both trend and increasing patterns. Holt-Winters exponential smoothing method, as stated above, uses three parameters, i.e. the level parameter (α), the trend parameter (γ), and the seasonal parameter (δ). Therefore, several forecasting models will be obtained with different parameters. The model taken is a model that has α = 0.97, γ = 0.1, δ = 0.99. As we can notify that before covid-19 pandemic during 2019 from [26:36] (Fig. 5) the model was fitting well the real consumption with Mean Absolute Percentage Error (MAPE) of 12.64. The monthly pattern of azithromycin consumption was quite similar in 2017, 2018 and 2019 with clearly high consumption during winter Months. For every influenza season, we have an increase of the use of antibiotics during the winter months [15]. The outliers we have starting from Feb 2020 is mainly due to the beginning of covid19 in Croatia. During 2020 season, the influenza epidemic was very low in the whole European Countries and it couldn’t have had a big impact on antimicrobials medicines in Croatia in 2020 [19]. So, since there was no significant influenza epidemic would indicate that the increase of azithromycin consumption was mainly due to COVID19. Another factor that can explain the deviation between our prediction model and the real demand is the customer behavior based on the news and high risk seen on the other countries. Azithromycin consumption was largest in February/March 2020 when the first publication of French authors on its efficacy in COVID-19 treatment on a preprint server [20]. Also, the social media and its high impact on people’s perception and decision. Azithromycin consumption had increased in March 2020 after US President Donald Trump published its importance for Covid-19 treatment in social media posts and public speeches [21]. Thus it seems that the fear of a big epidemic and uncertainty due to the absence of the treatment with proven efficacy can influence consumption of antimicrobials. This high forecast volatility during this period can be handled and won’t have a big impact on the market thanks to the safety stock each markets have which is between (minimum stock = 3 Months and maximum stock = 6 Months) of coverage based on the shelf life which is between (2 to 5 Years).
Demand Forecast of Pharmaceutical Products During Covid-19
435
On the other hand, we see from July 2020 until end of the year [26:46] (Fig. 5) that our model is converting to the real consumption reflecting both trend and seasonality which can allow us to rely on this model to predict the next demand in 2021. Unfortunately, the data for azithromycin consumption in 2021 is not available to make a comparison with our model.
Fig. 5. Holt & Winter model prediction
5 Conclusion Forecasting demand for unplanned events such as natural disasters and epidemics is a particularly difficult for supply chain managers. Especially in the pharmaceutical industry, even if the supply chain is disrupted, the pandemic is usually accompanied by high increase in the demand. In this paper, we have proposed some recommendations that can support the demand forecast of the medicines for human use during the covid-19 pandemic. We have worked on one of the most usable and common medicament antibiotic in the treatment of covid19 which is Azithromycin by analyzing the consumption from 2017–2020 and applying Holt-Winters exponential smoothing model to predict the demand of that medicine in the future. We have divided the data into three parts, the first part is before covid-19 which our prediction model was fitting well the real data. The second part concern the first quarter once the covid-19 started which we see a volatility and high change in the forecast mainly due to first case detected at country level as well as the customer behavior based on the news published. Then we have the last part in which our model is converting to the real consumption. We can conclude that Holt & Winter model can also be applicable in this
436
A. Elkourchi et al.
kind of pandemic as the risk of the high change in forecast during the beginning of the pandemic will be reduced thanks to the safety stocks each markets have. To mitigate the effects of the demand’s change during the pandemic and protect supply chain operations, it’s recommended to increase the safety stock level by respecting the remaining shelf life accepted by the market to avoid obsolescence risk as well as opting for the option to produce the medicines at market level so it can satisfy and respond to any over sales in advance by reducing the lead time for transportation. These help us to not only avoid the current consequences in relation to the ongoing pandemic, but also to propose measures that will provide firms the resiliency required for another significant shortage in the future. For future research, we can work on a comparison between countries when more data are available or a comparison on these issues with previous pandemics. We can also apply another model on the same data and evaluate which one is fitting well the consumption.
References 1. Benhra, J., Mouatassim, S., Lamrani, S., Ahlaqqach, M.: Closed loop location routing supply chain network design in the end of life pharmaceutical products. Supply Chain Forum Int. J. 21(12), 79–92 (2020) 2. Siddiqui, R., Azmat, M., Ahmed, S., Kummer, S.: A hybrid demand forecasting model for greater forecasting accuracy: the case of the pharmaceutical industry. Supply Chain Forum Int. J. (2021). https://doi.org/10.1080/16258312.2021.1967081 3. Shi, B., et al.: Evolutionary warning system for COVID-19 severity: colony predation algorithm enhanced extreme learning machine. Comput. Biol. Med., 104698 (2021) 4. Quintero, Y., Ardila, D., Camargo, E., Rivas, F., Aguilar, J.: Machine learning models for the prediction of the SEIRD variables for the COVID-19 pandemic based on a deep dependence analysis of variables. Comput. Biol. Med. 134, 104500 (2021) 5. International Civil Aviation Organization. Economic Impacts of COVID-19 on Civil Aviation. https://www.icao.int/sustainability/Pages/Economic-Impacts-of-COVID-19.aspx. Accessed 29 June 2020 6. IHME COVID-19 health service utilization forecasting Team, Murray, C.J.: Forecasting the impact of the first wave of the COVID-19 pandemic on hospital demand and deaths for the USA and European economic area countries. MedRxiv (2020) 7. Rawson, T.M., Ming, D., Ahmad, R., Moore, L.S.P., Holmes, A.H.: Antimicrobial use, drugresistant infections and COVID-19. Nat. Rev. Microbiol. 18, 409–410 (2020) 8. Gautret, P., Lagier, J.C., Parola, P., Hoang, V.T., Meddeb, L., Mailhe, M., et al.: Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial. Int. J. Antimicrob. Agents 56, 105949 (2020) 9. Gautret, P., Lagier, J.C., Parola, P., Hoang, V.T., Meddeb, L., Sevestre, J., et al.: Clinical and microbiological effect of a combination of hydroxychloroquine and azithromycin in 80 COVID19 patients with at least a six-day follow up: a pilot observational study. Travel Med. Infect. Dis. 34, 101663 (2020) 10. Byrne, M., Scott, T.E., Sinclair, J., Chockalingam, N.: COVID-19 and critical care capacity: can we mitigate demand? 27, 107–108 (2022) 11. Reflection paper on forecasting demand for medicinal products in the EU/EEA EMA/162549/2021
Demand Forecast of Pharmaceutical Products During Covid-19
437
12. Barros, J., Cortez, P., Sameiro Carvalho, M.: A systematic literature review about dimensioning safety stock under uncertainties and risks in the procurement process. Oper. Res. Perspect. 8, 100192 (2021). ISSN 2214-7160 13. Djakaria, I., Saleh, S.E.: J. Phys. Conf. Ser. 1882, 012033 (2021) 14. Panda, M.: Application of ARIMA and Holt-Winters forecasting model to predict the spreading of COVID-19 for India and its states medRxiv (2020) 15. Bezerra, A.K.L., Santos, É.M.C.: Prediction the daily number of confirmed cases of COVID19 in Sudan with ARIMA and Holt Winter exponential smoothing. Int. J. Dev. Res. 10(08), 39408–39413 (2020) 16. Hasan, N., Nene, M.J.: MAPE: an interactive learning model for the children with ASD. In: Kumar, S., Hiranwal, S., Purohit, S.D., Prasad, M. (eds.) Proceedings of International Conference on Communication and Computational Technologies (2022) 17. Bogdani´c, N., Moˇcibob, L., Vidovi´c, T., Soldo, A., Begova´c, J.: Azithromycin consumption during the COVID-19 pandemic in Croatia, 2020. PLoS ONE 17(2), e0263437 (2022) 18. Worldometer 2020 Coronavirus cases data. https://www.worldometers.info/coronavirus/ 19. Joint ECDC/WHO/Europe weekly influenza update. Assessed 28 Mar 2021 20. Gautret, P., Lagier, J.-C., Parola, P., Hoang, V.T., Meddeb, L., Mailhe, M., et al.: Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label nonrandomized clinical trial. medRxiv:2020.03.16.20037135 [Preprint] (2020). Assessed 30 July 2021 21. Niburski, K., Niburski, O.: Impact of trump’s promotion of unproven COVID-19 treatments and subsequent internet trends: observational study. J. Med. Internet Res. 22, e20044 (2020). pmid: 33151895
Using ANN-MLP for Supervised Classification of the Hercynian Granitoids from Their Geochemical Chara Teristics at the Aouli Pluton. (High Moulouya, Morrocco) Taj Eddine Manssouri1 , Imad Manssouri2(B) , Abdellah El Hmaidi3 , and Hassane Sahbi4 1 Regional Center for Education and Training Professions Fez-Meknes, Meknes, Morocco 2 Laboratory of Mechanics, Mechatronics, and Command, Team of Electrical Energy,
Maintenance and Innovation, ENSAM-Meknes, Moulay Ismail University, Meknes, Morocco [email protected] 3 Water Sciences and Environmental Engineering Team, Faculty of Sciences, Meknes, Moulay Ismail University, Meknes, Morocco 4 Moulay Ismail University, Meknes, Morocco
Abstract. The study of rare earths, trace elements and major elements makes it possible to distinguish the different entities of a plutonic complex in general. The aim of this work is to apply the supervised classification model based on artificial neural networks Multi-layer Perceptron (ANN-MLP) in a database of 168 granitoid samples of the Aouli pluton (High Moulouya, Morocco). The 168 input variables of the model correspond to the major elements, trace elements and rare earths. They were obtained by analyzing samples of the Aouli granitoids by X-ray spectroscopy for the major elements, X-ray fluorescence for trace elements, specific ion electrode for fluorine and neutron activation for soil analysis (Oukemeni 1993). Initially, 60% of the database, taken randomly, was used for the formation and choice of the architecture of the neural network MLP. Secondly, unknown test samples were identified by using ANN-MLP model determined in the learning phase. The relative performance of this model (ANN-MLP) was evaluated by calculating the coefficient of determination R2 and the NSE coefficient (Nash-Sutcliffe Efficiency). The results are reflected in the success rate of 86% and the coefficient of determination R2 of 66.4%, which is to choose the model ANN-MLP [20-2-4]. Keywords: Supervised classification · Artificial neural networks · Hercynian granitoids · Aouli pluton · High Moulouya-Morocco · NSE coefficient · Coefficient of determination
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 438–446, 2023. https://doi.org/10.1007/978-3-031-43520-1_37
Using ANN-MLP for Supervised Classification of the Hercynian
439
1 Introduction The Aouli Pluton is composed of four units, (a) the granodiorite, (b) the gray granite, (c) the pink granite; these three units form a single massif, and (d) the muscovite granite which is exposed in two isolated points of the rest of the massif. Three different petrogenetic entities have been demonstrated by this chemical study. They are all calcoalcaline in nature with subalcaline tendency. The granodiorite and the muscovite granite form each one a separate entity, but gray granite and pink granite seem to form together a single entity. This multi-emission pluton is late to post-orogenic. It ends the Hercynian orogeny in the eastern zone. The classification methods are widely used in geoscience, particularly for the detection of rock types. In the literature, these methods are separated mainly into two types: supervised methods based on an a priori knowledge of the classes of belonging, and unsupervised methods that do not require this information a priori. We cite as an example (Chang et al. 2002) who used the Self-Organizing Maps (SOM) method, which is classified as unsupervised method, for the identification of litho facies in petroleum reservoirs using data from logging of five wells in an oil field located in Escambia County, Alabama (USA). This approach allowed them to predict the texture identity of rocks in oil tanks with an accuracy of 78.8%. The classification method using supervised neural networks was used by (Mahmoodi et al. 2016) for the classification of rocks from their physical properties. (Konaté et al. 2015) have studied the well log data recovered over a depth interval of 100–5025 m in « Dabie-Sulu UHPM Belt, East China». They conducted a comparative study between an unsupervised classification method based on SOM maps and a supervised method based on artificial neural networks (MLP). The main objective of our work is the classification by a supervised method of the different types (facies) of rocks of the Aouli pluton (High Moulouya, Morocco) based on their geochemical characteristics. The classification method used in this paper is a supervised method based on the artificial neural network - multi layer perceptron (ANN-MLP).
2 Methods Used 2.1 Artificial Neural Network (ANN) An artificial neural network ANN is a model of mathematical modeling whose original inspiration was the functioning of biological neural networks. The main strength of ANNs is their ability to identify complex and non-linear relationships between inputs and outputs of data sets without the need to understand the nature of the phenomena (Adamowski and Sun 2010). Over the past decade, the use of artificial neural networks (ANN) has grown in many disciplines (economics, ecology and environment, biology and medicine, industry, etc.). They are notably applied to solve classification problems and prediction problems (Manssouri et al. 2008; Manssouri et al. 2021a, b; Boudebbouz et al. 2015). The Multi-Layer Perceptron type (MLP) is the simplest neural network and the most commonly used. Its structure consists of an input layer, an output layer and one or more hidden layers (Fig. 1).
440
T. E. Manssouri et al.
Fig. 1. General architecture of an ANN-MLP
The configuration of the best MLP model and its implementation amounts to choosing the transfer functions, determining the relevant entries, the number of neurons in the hidden layer, choosing the learning algorithm and then optimizing and testing the network. 2.2 The Stages for Conception of the ANN The approach followed for the development of the network can be separated into four stages: a. Development of the learning and testing base in several stages: • • • •
Data collection Data analysis Data normalization Data bases separation
b. Choice and development of artificial neural network models c. Learning d. Validation 2.3 Performance Evaluation This step consists in evaluating the models formed by comparing the difference between the estimated values and the actual values. The result of the evaluation is expressed in two ways: by statistical indicators and by the examination of graphs. The indicators used in this study are: the coefficient NSE (Nash-Sutcliffe efficiency), the coefficient of determination (R2 ), the Mean Square Error (MSE), which are defined as follows: • The coefficient NSE (Nash-Sutcliffe efficiency) i−N obs 2 (Y i − Yisim ) NSE = 1 − i=1 i−N obs mean obs )2 i=1 (Y i − Y
(1)
Using ANN-MLP for Supervised Classification of the Hercynian
441
• The coefficient of determination (R2 ) R2 =
Cov 2 (Y sim , Y obs ) V Y sim ∗ V (Y obs )
(2)
• The mean square errors (MSE) i−N MSE =
i=1
sim (Y obs i − Yi ) N
2
(3)
Yobs and Ysim are the values of the target vector and prediction vector of output neuron of the network, Ymean obs is the mean value of the target vector and N is the number of test samples. The best prediction is when R2 on the one hand and MSE on the other hand tend towards 1 and 0 respectively. The coefficient NSE (Nash-Sutcliffe efficiency) is a normalized statistical that determines the relative importance of the residual variance compared to the variance of the measured data (Nash, 1970). The coefficient NSE (Nash-Sutcliffe efficiency) varies from −∞ to 1 and is expressed as follows: • NSE = 1, corresponds to a perfect modeling of the observed data. • NSE = 0, Indicates that model predictions are as accurate as the average of observed data. • −∞ < NSE < 0, Indicates that the observed average is a better predictor than the model.
3 Presentation of the Study Area – Geological Location The pluton study area of Aouli Haute Moulouya is located about twenty kilometers north of the town of Midelt. The altitude of the zone varies between 1200 and 1500 m in the depression and culminates at 3757 m in jbel Ayachi in the High Atlas. The climate is continental; dry and hot in summer; cold and slightly damp in winter. The access to the land is made essentially via the main road N° 21. The spontaneous vegetation consists mainly of “Alfat”, which develops outcrop conditions very favorable to observation. High Moulouya is a depression that ends in the corner of the eastern junction of two mountain ranges the High Atlas and the Middle Atlas. Two Paleozoic buttonholes (Fig. 2) are known in the upper Moulouya: Aouli and Bou Mia. These two buttonholes reveal the massif of Aouli to the east and that of Bou Mia to the west and constitute the massif of the upper Moulouya.
4 Results and Discussion 4.1 Design of the Neural Classification Model The first step in designing the model was to choose the input variables to be able to form a database.
442
T. E. Manssouri et al.
Fig. 2. Geological Map of Aouli’s Buttonhole
For this purpose, one hundred and sixty-eight samples of total rock were analyzed to determine their composition in major elements and in traces, including 11 samples of granodiorite, 81 gray granite, 71 pink granite and 5 muscovite granite. Twenty-eight of these samples were also analyzed for rare earths, including 8 of granodiorite, 7 of granite, 8 of pink granite and 5 of muscovite granite. The methods used for these analyzes are: X-ray spectrography for the major elements; X-ray fluorescence for trace elements; the electrode, with a specific ion for fluorine; And neutron activation for the analysis of rare earth elements. (Oukemeni 1993). The second step is the determination of the network architecture. For this purpose, in order to choose the “best” neural network architecture, several statistical indicators are generally used in this case we used three statistical indicators, namely the coefficient NSE (Nash-Sutcliffe efficiency), the coefficient of determination (R2) and the mean square error (MSE) Figs. 3, 4 and 5. The Multi Layer Perceptron MLP artificial neural network consists of an input layer containing twenty neurons, a hidden layer containing two neurons, and an output layer containing four neurons. The database consists of twenty vectors x1 , x2 ,……, x20 , Independent and standardized between 0.1 and 0.9 which are: major elements, trace elements and rare earths (Oukemeni 1993) (Fig. 6). This database is divided into three sets: the learning set (60%) is used to create the model, the validation set (10%) is used to check the model, and the test set (30%) is used to test the effectiveness of the model. The network weights and biases were readjusted using the Levenberg-Marquardt algorithm.
Using ANN-MLP for Supervised Classification of the Hercynian
Fig. 3. Statistical indicator: Nash-Sutctiffe Efficiency NSE
Fig. 4. Statistical indicator: Coefficient of determination R2
Fig. 5. Statistical indicator: The mean square error (MSE)
443
444
T. E. Manssouri et al.
Fig. 6. Inputs/Output of the classification model
4.2 Validation of the Classification Neural Model Once the architecture, weights and bias of the neural network have been fixed, it is necessary to know if this neural model is likely to be generalized. The validation of neural architecture [20-2-4] (Fig. 7) consists in judging its classification capacity by using the weights and biases calculated during the learning, to apply them to another test database composed of 50 samples.
Fig. 7. The neural architecture [20-2-4]
The Fig. 8 it reflects the results of this classification test, it is remarkable from this figure that testing data that consist of 50 rock samples are perfectly sorted with the
Using ANN-MLP for Supervised Classification of the Hercynian
445
exception of 7 samples. The classification results obtained by the neuronal architecture [20-2-4] are estimated by the success rate which equals 86%.
Fig. 8. Results of classification by ANN [20-2-4]
5 Conclusions This study made it possible to demonstrate the classification capacity of 50 samples of granuloid rocks of the Aouli pluton (High Moulouya, Morocco) by using techniques based on Artificial neural networks (ANN) of Multi Layer Perceptron types (MLP) (Supervised Classification). We evaluate the quality of the classification model ANN-MLP [20-2-4] by two indicators, namely the coefficient of determination R2 and the success rate, the success rate is estimated by the number of individuals correctly classified on the total number of individuals. In our case the success rate is 86% and the coefficient of determination R2 is 66.4%. It has been concluded that the ANN-MLP [20-2-4] classification model is very simple to implement and leads to quite satisfactory results.
References Adamowski, J., Sun, K.: Development of a coupled wavelet transform and neural network method for flow forecasting of non-perennial rivers in semi-arid watersheds. J. Hydrol. 390(1–2), 85–91 (2010) Boudebbouz, B., Manssouri, I., Ghanou, Y., Manssouri, T.: Defect classification in nonlinear systems using self-organizing maps. J. Nat. Sci. Sustain. Technol. 9(1), 45 (2015) Chang, H.C., Kopaska-Merkel, D.C., Chen, H.C.: Identification of lithofacies using Kohonen self-organizing maps. Comput. Geosci. 28(2), 223–229 (2002) Mahmoodi, O., Smith, R.S., Tinkham, D.K.: Supervised classification of down-hole physical properties measurements using neural network to predict the lithology. J. Appl. Geophys. 124, 17–26 (2016)
446
T. E. Manssouri et al.
Manssouri, I., Chetouani, Y., El Kihel, B.: Using neural networks for fault detection in a distillation column. Int. J. Comput. Appl. Technol. 32(3), 181–186 (2008). https://doi.org/10.1504/IJCAT. 2008.020953 Manssouri, I., Talhaoui, A., El Hmaidi, A., Boudad, B., Boudebbouz, B., Sahbi, H. : Artificial intelligence for supervised classification purposes: case of the surface water quality in the Moulouya River, Morocco. J. Water Land Develop. 240–247 (2021) Manssouri, I., Boudebbouz, B., Boudad, B.: Using artificial neural networks of the type extreme learning machine for the modelling and prediction of the temperature in the head the column. Case of a C6H11-CH3 distillation column. Materials Today Proc. 45, 7444–7449 (2021) Nash, J.E., Sutcliffe, J.V.: River flow forecasting through conceptual models part I—a discussion of principles. J. Hydrol. 10(3), 282–290 (1970) Oukemeni, D.: Géochimie, géochronologie (U-Pb) du pluton d’Aouli et comparaisons géochimiques avec d’autres granitoïdes hercyniens du Maroc par analyse discriminante. Université du Québec à Chicoutimi, 119 pp. (1993) Konaté, A.A., et al.: Capability of self-organizing map neural network in geophysical log data classification: Case study from the CCSD-MH. J. Appl. Geophys. 118, 37–46 (2015)
Overview of Artificial Intelligence in Agriculture An Impact of Artificial Intelligence Techniques on the Agricultural Productivity Sara Belattar1(B) , Otman Abdoun2 , and El Khatir Haimoudi1 1 Department of Computer Science, Polydisciplinary Faculty, Abdelmalek Essaâdi University,
Larache, Morocco [email protected] 2 Department of Computer Science, Faculty of Science, Abdelmalek Essaâdi University, Tetouan, Morocco
Abstract. Artificial intelligence, or augmented intelligence (AI), is the subfield of the computer science that enables machines to exhibit increasingly intelligent behavior. Due to its intelligent features, AI has changed the world and significantly impacted all fields such as medicine, agriculture, the economy, and so on. These features enable machines to learn, think, and solve problems, which paves the way for the creation of powerful intelligent systems that are capable of resolving very difficult issues. Agriculture is an important sector in all countries since it is the primary food source for billions of people worldwide and boosts GDP. However, there are various factors that limit the crop’s growth and decrease its productivity, like climate change, biotic and abiotic factors, rapid population growth, and more. These factors might lead to food shortages in the near future. Thus, the use of AI will change agriculture in a big way by giving farmers smart tools that can increase crop productivity and reduce costs. For this reason, this study presents an overview of the use of AI in the agriculture sector by explicating its fundamental concepts. In addition, the purpose of this review paper is to familiarize readers with the contemporary and cutting-edge AI approaches that are currently being applied in agriculture, as well as to suggest appropriate AI techniques that have been successfully implemented on a broad scale. On the basis of the literary works studied in this paper, the results indicate that ANN and DL techniques have been applied more extensively (65%) than classical ML techniques (35%) in the agriculture sector. Keywords: Soil management · crop management · water management · artificial intelligence · improving agricultural productivity
1 Introduction Alan Turing, a mathematician, presented the concept of developing computers to replicate intelligent human behavior and critical thought in 1950. During this period, Alan Turing created well-known tests to determine whether or not computers could mimic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 447–461, 2023. https://doi.org/10.1007/978-3-031-43520-1_38
448
S. Belattar et al.
human intelligence, such as learning, adapting, and interpreting [1, 2]. Six years later, the term “artificial intelligence” was introduced and coined by a group of researchers (“John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon”) when they submitted “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.“ [3, 4]. Between the years 1956 and 1974, the field of artificial intelligence emerged as an essential instrument for the development of the world in a variety of life aspects and scientific studies. This period is known as the “golden age” of AI [5]. The strength of artificial intelligence lies in its capacity to tackle very complex issues and repetitive tasks in various sectors, as opposed to the conventional methods applied by people, which are time-consuming and provide erroneous and unsatisfactory results. As a result, artificial intelligence has become the most debated and deployed technology in recent years. Agriculture is an important sector in all nations, accounting for around 6.4% of the global Gross Domestic Product (GDP) [6]. Consequently, agriculture is the fundamental source of life and plays a crucial role in long-term economic development. Recently, several natural and external factors have had the potential to negatively impact agriculture, particularly its productivity. These factors are climate change, insects, pests, weeds, rapid population growth, COVID-19, the Ukraine crisis, and more. These factors might lead to food shortages in the near future. Furthermore, the number suggests that approximately 820 people are hungry today [7]. In order to meet the requirements of a growing population, “the Food and Agriculture Organization of the United Nations (FAO)” forecasts that the demand for food worldwide will increase by a factor of 70% and will reach 9.7 billion people by the year 2050 [8]. Because of this, artificial intelligence represents a good solution for enhancing the performance of the agricultural sector by introducing extremely intelligent new technologies. Therefore, thanks to artificial intelligence, the agriculture industry will be able to increase crop output while also improving water management and detecting diseases earlier. Also, using AI in agriculture could reduce the number of workers needed to do jobs, cut down on the amount of time spent on those jobs, and solve the most difficult problems that traditional farming methods and mathematical models can’t. This paper aims to present an overview of the application of AI in the agriculture sector by highlighting three areas under study, such as soil management, crop management, and water management. It also provides a background on the basics of artificial intelligence, such as definition, overview, techniques, and performance evaluation. Thus, our purpose is to show the readers how AI can be used and how it can change the agriculture field, as well as offer the most extensively utilized AI techniques in agriculture.
2 Basics of Artificial Intelligence 2.1 Definition of AI Before going into the depths of artificial intelligence, we will first define artificial intelligence in a nutshell (AI). According to Veale (2001), the extreme challenge in defining “Artificial intelligence” is the root of the difficulty that arises when attempting to define “intelligence” [9]. In order to accomplish this, many researchers have tried to provide
Overview of Artificial Intelligence in Agriculture
449
definitions of intelligence. The following are a few of these definitions that may be found in the relevant literature: According to “Encyclopaedia Britannica”, the intelligence can be defined as follow: “Ability to adapt effectively to the environment, either by making a change in oneself or by changing the environment or finding a new one” [10]. Therefore, it is almost impossible to precisely define the intelligence concept, supported by different researchers [12, 13]. Hence the definition of artificial intelligence is extremely broad because there are so many AI definitions on the internet and in literature. According to Schatsky et al., Artificial intelligence is defined as: “A useful definition of artificial intelligence is the theory and development of computer systems able to perform tasks that normally require human intelligence.” [14]. 2.2 Overview of AI Artificial intelligence (AI) has been used in various scientific fields over the past decade, especially for complex problems that cannot usually be solved using traditional algorithm techniques. Thus, the artificial intelligence domain allows us to create highly intelligent machines capable of learning, adapting, and making complex cognitive decisions, which are difficult to achieve using classical methods. Nowadays, AI is widely used in different domains, such as health, economy, agriculture, transport, and so on, to solve different issues like diagnosis, prediction, recommendation, disease detection, and business analytics. Techniques Machine learning (ML) and deep learning (DL) are the two main parts of AI (see Fig. 2). Machine learning is a crucial subfield of AI. In contrast to traditional programming, the purpose of ML is to create highly intelligent systems that can learn from specific problems (training data) and analyze them in order to predict new data (or future events) and solve different tasks without the need for human intervention. DL is another subfield of AI. In particular, it is a branch of ML that is based on artificial neural networks (ANN) [15]. DL makes it possible for computer systems to automatically and successfully extract features without requiring the assistance of a domain expert to do so. Thus, ML is able to generate predictive models with the use of basic data, while DL is more suited for computer vision, particularly in image analysis, when compared to traditional ML algorithms. In addition, in the field of data science, AI plays a crucial role since it is able to handle large amounts of data in order to generate predictions, diagnoses, and more, as shown in Fig. 1. In this section, we’ll give a quick overview of the most common ML and DL algorithms used in agriculture. For traditional machine learning, we can mention the following: • • • • •
Support vector Machine (SVM) [16–18]; Decision threes (DT) [19]; K-nearest neighbors (KNN) [20]; Logistic Regression [21]: Linear Regression [22];
450
S. Belattar et al.
Fig. 1. The relationship between AI, ML, DL, and Data science.
• • • •
Random Forest [23]; K-means [24]; Principal Component analysis (PCA) [25]; Etc. For advanced machine learning, we can mention the following:
• Artificial neural networks (ANN) [26]. Among the techniques found in ANN are Kohonen self-organizing maps (K-SOM), counter propagation artificial neural networks (CP-ANN), multi-layer perceptrons (MLP), and others [27, 28]. • Deep Learning (DL). Deep learning makes use of a variety of different techniques such as the convolutional neural network (CNN) [29], the recurrent neural network (RNN) [30], the deep belief network (DBN) [31], the generative adversarial networks (GANs) [32], and a lot of other techniques. Performance Evaluation There are different metrics that should be used to show the overall effect of ML and DL algorithms when judging how well they work. These metrics are the confusion matrix, classification accuracy, precision, recall, and F1 score [29]. The confusion matrix (CM) is a crucial metric that allows us to evaluate the classification performance of the proposed model across all classes and compare it to the actual classification. Typically, the results of this metric are shown as a matrix, with diagonal values representing the correct classification and other values (i.e., non-diagonal) representing the incorrect classification (see Table 1).
Overview of Artificial Intelligence in Agriculture
451
Table 1. The confusion matrix metric Predicted Positive
Predicted Negative
Actual Positive
True Positive (TP)
False Negative (FN)
Actual Negative
False Positive (FP)
True Negative (TN)
Where: TP represents the number of instances accurately identified as positive and that are true positives, TN is the number of instances accurately identified as negative and that are true negatives, FN is the number of instances that are incorrectly identified (i.e., classified as negative but are false), and FP is the number of instance that are incorrectly classified (i.e., classified as positive but are false). Other metrics that can also examine the overall efficacy of the proposed models are classification accuracy (CA), precision (P), recall (R), and F1 score (F1), where CA is the rate of the correct classification made by the model, as shown in Eq. (1). P, R, and F1 also represent the percentages of the predictive model’s quality (see Eq. (2)–(4)). CA = TP + TN / TP + TN + FP + FN
(1)
P = TP / TP + FP
(2)
R = TP / TP + FN
(3)
F1 = 2 × ((P × R)/(P + R))
(4)
Hence, this section provided an overview of AI, including its concept as well as its techniques, performance evaluation, and so on. In the next section of this paper, we will discuss several works that show the effectiveness of AI in the field of agriculture.
3 Artificial Intelligence in Agriculture Traditional farming methods based on field experiments require a lot of work and time, as well as monitoring on a small scale. Meanwhile, traditional algorithmic methods can’t analyze large amounts of data and make decisions. Therefore, innovation and intelligent strategies are imperative to improving and sustaining farming practices. With its capability to learn, self-adapt, and generalize, AI has become an impressive technique for agricultural diagnosis and other tasks, allowing accurate and adequate decisions to be made in a short period of time without the need for human intervention and capable of handling large volumes of agricultural datasets with greater performance than traditional algorithmic methods. Furthermore, the artificial intelligence tools will usher in various technologies, including the internet of things (IoT), drone technology, data science, machine learning, and mobile applications that can treat a wide range of problems (see Fig. 2). So, different problems in agriculture will be solved quickly with the help of AI
452
S. Belattar et al.
systems that work on their own. This will help farmers cut their losses and get a high yield. The main important areas that are under study in the agriculture field are soil management, crop management, and water management. As a result, in this section, we will present some interesting works that propose various AI techniques for improving the agriculture sector with the greatest accuracy. These works include diagnosis systems, prediction systems, and recommendation systems, as these are major occupation systems in the agriculture sector.
Fig. 2. AI in agriculture
3.1 Soil Management Soil is an essential factor of agricultural success because it provides nutrients for crop growth and development, such as potassium, phosphorus, calcium, nitrogen, magnesium, proteins, and more (chemical properties) [33]. In addition, the soil stores water to enable it to be available for crop development. For that, the robust and accurate monitoring and diagnosis of soil are challenges. Therefore, the excellent quality of soil has a good impact on the crop, production, and profitability. In this regard, various researchers used advanced techniques like AI to create intelligent systems for monitoring and diagnosing soil health in order to create healthier soils. As per [24], the authors proposed an unsupervised machine learning algorithm such as K-means for evaluating the diagnosis of soil health based on geographically coherent distinctions in soil properties. Their proposed strategy could support the plant and enhance crop productivity. Nguyen et al. [16] suggested a new strategy for monitoring the soil moisture of Western Australia using multi-sensor data and ML algorithms, such as random forest regression (RFR), SVM, “Extreme gradient boosting regression (XGBR)”, and “CatBoost gradient boosting regression (CBR)”. Additionally, the authors used the genetic algorithm (GA) for feature selection. The experimental results of this study showed that the combination of XGBR with GA gave high performance compared to RFR, SVM, and CBR. Thus, this proposed study can help improve soil monitoring and help farmers keep track of droughts by giving them good tools and taking the necessary steps. According to Rahim et al. [34], an automated monitoring system employing
Overview of Artificial Intelligence in Agriculture
453
a smartphone and deep learning (CNN) was built to identify or predict the soil texture (e.g., sandy clay, clay loam, clay slit, etc.) by looking at images of the ground. In addition, the authors of this research evaluated CNN’s performance with that of ANN, SVM, RF, and KNN. Consequently, their findings demonstrated that CNN delivered superior results relative to other ML systems and successfully predicted the type of soil texture. So, their approach is very important for better management and use of arable lands, which will lead to higher crop yields in the long run. The last study [35] provided a new model based on the ANN for estimating soil moisture, and the findings were compared to those of multiple linear regression models (MLR). Table 2 highlights and summarizes the aforementioned articles for the topic of AI-based soil management. Table 2. AI for soil management Author
Functionality
The proposed AI techniques
Results
[24]
Enhance Soil health diagnosis and management
K-means
–Offer a good intelligent classification framework
[16]
Monitoring the soil moisture New AI approach: XGBR+GA
–XGBR+GA outperforms SVM, RFR, and CBR with 89% accuracy
[34]
Predicting the soil texture
Improved CNN
–Their CNN model had 99.89% accuracy, higher than other DL methods
[35]
Estimation of soil moisture
New model based ANN
–Their model outperformed MLR
3.2 Crop Management The quality of crop yield is dependent on the availability of nutrients. Typically, the plant thrives by absorbing nutrients from the soil. Consequently, the nutrient is a chemical element required for the plant to complete its life cycle. Potassium, nitrogen, and phosphorus are the three primary crop nutrients. Diverse conventional expert systems are used to diagnose crop nutrition, but these systems are insufficient and have numerous flaws. Therefore, the use of AI tools is necessary. Crop Nutrition H. Song and Y. He [36] proposed a crop nutrition diagnostic system based on an ANN. This system is also known as the “intelligent and portable diagnosis expert system (ES)” because it assists inexperienced farmers in detecting crop nutrition disorders in real-time. In the experimental tests, their automated system accurately identifies crop nutrition disorders with an accuracy of more than 90%. In addition, their idea of a system based on ANN can make crops more productive by reducing yield losses and making the best use of fertilizer. Espejo-Garcia et al. [37] proposed an intelligent system
454
S. Belattar et al.
for early diagnosis (or identification) of nutrient deficiency using deep neural networks (DNN) and transfer learning. In their experimental tests, the authors utilized two distinct datasets; the first contained 5648 images of sugar beets with deficiencies in nitrogen, potassium, phosphorus, etc., and the second contained images of other orange trees with deficiencies in magnesium, potassium, iron, and manganese. Their results showed that the proposed EfficientNetB4 deep neural network gave the highest accuracy in both datasets, with 98.65% and 98.52%. Consequently, their method can detect nutritional deficiencies early on, enabling an increase in yield and preventing agricultural losses. Another study [38] also proposed a deep convolutional neural network (DCNN) to diagnose the lettuce’s nutrient status based on the classification of nutrition deficiencies. In this study, the empirical results showed that the DCNN reached an accuracy of 96.5%. Their lettuce dataset includes over 3000 images with deficiencies in nitrogen, full nutrition, potassium, and phosphorus. Thus, their approach will be promising for improving agricultural productivity. Crop Yield Prediction Crop yield prediction is a very challenging problem in agricultural precision. Various specific parameters are taken into consideration to predict a suitable crop, among which are parameters related to soil and atmosphere (e.g., pH, nitrogen, temperature, organic carbon, zinc, magnesium, calcium, depth, rainfall, fertilizer, and more). Hence, the use of AI is needed to provide a better crop yield prediction. According to [39], the authors proposed using an artificial neural network to develop an intelligent system for predicting the appropriate crop based on parameters like temperature, rainfall, phosphate, nitrogen, potassium, and depth, as well as proposing fertilizer. The purpose of suggesting the ANN is that it is one of the powerful techniques in the prediction and modeling context. In the experimental tests, their results were effective and gave an excellent performance. In [17], the authors used regression techniques like “support vector machine regression (SVM), random forest (RF), and multivariate polynomial regression (MPR)” for crop yield prediction. Their purpose is to be applied intelligently in order to increase crop yield. In this study, the authors focused on specific parameters such as humidity, temperature, rainfall, and yield. Their strategy will assist farmers in determining the best moisture and temperature conditions for crop yield. Belattar et al. [27, 28] proposed a modified Kohonen self-organizing maps (MK-SOM) and modified counter propagation artificial neural networks (MCP-ANN) based on the Gram-Schmidt algorithm (GSHM) for better crop yield prediction using a variety of parameters including temperature, moisture, humidity, depth, rainfall, and others. The new K-SOM and CP-ANN structures achieved superior performance compared to their predecessors. In [23], the authors developed a recommendation system by using data mining techniques like Random Forest and K-means clustering. Their system is based on specific parameters to recommend the most suitable fertilizer, such as nitrogen, phosphorus, and potassium. In addition, the data used in this study is stored in ontology for early sharing in the knowledge base of RS. Plant Disease Detection Several issues can limit agricultural productivity, and plant diseases are one of the most severe issues. Based on “Food and Agriculture Organization (FAO),” pests and plant
Overview of Artificial Intelligence in Agriculture
455
diseases account for 20% to 40% of production losses [40]. In addition, according to [40], plant disease alone contributes to approximately 13% of global crop productivity losses. Thus, plant diseases are very challenging in the agricultural domain, and their diagnosis and control are essential to limit crop losses. As per [18], the authors proposed an automatic K-mean machine learning algorithm for creating an intelligent system for diagnosing grape leaf diseases through automatic inspection. The authors used various other techniques like support vector machine and principal component analysis (PCA). Firstly, the K-means is used in image segmentation and the detection of disease areas. Secondly, the features have been extracted using three colors. Finally, the support vector machine is employed to make the classification, where PCA was used to reduce the feature dimension. In their experimental tests, the authors compared their proposed approach with Google Net and CNN, and their results showed an exciting performance. According to [41], the authors proposed a practical and robust intelligent diagnosis system for plant diseases called “Plant Disease Diagnosis and Severity Estimation Network (PD2SENet)”. The proposed PD2SE-Net consists of ResNet 50 as the basic architecture. To validate the performance of their approach, they used a dataset that contained various crop types like apple, grape, cherry, peach, and pepper with different disease stages (e.g., healthy, general, and serious). Belattar et al. [29] proposed a new CNN architecture called “OP-CNN” for creating an intelligent system that can early detect the disease of the strawberry plant, since the latter is the major crop in the Larache province of northern Morocco. In this study, the authors compared their proposed OP-CNN with other state-of-the-art CNN architectures like DenseNet 121, VGG 19, and ResNet 50. Their experimental tests showed that the OPCNN gave the highest values versus other CNN architectures, with an accuracy of 100% in classification, precision, recall, and F1 score. Table 3 highlights and summarizes the aforementioned articles for the topic of AI-based crop management. Table 3. AI for crop management Author
Functionality
The proposed AI techniques
Results
[36]
Crop nutrition diagnostic system
ANN
–Their proposed system reached more than 90% of accuracy (continued)
456
S. Belattar et al. Table 3. (continued)
Author
Functionality
The proposed AI techniques
Results
[37] [38] [39] [17] [27, 28] [23] [18] [41]
Crop nutrition diagnosis system Crop nutrition diagnosis system Crop prediction system Crop prediction system Crop prediction system Recommendation system for suitability crop and fertilizer Plant disease detection Plant disease detection
DNN DCNN ANN SVM, MPR, and RF New K-SOM and CP-ANN RF and K-means New AI technique based on the SVM, PCA, and K-means New AI technique: PD2SENet
–Their DNN had 98.65% and 98.52% accuracy –Their DCNN provided 96.5% accuracy –Their ANN gave excellent performance –SVM demonstrated superior performance to MPR and RF –The improved K-SOM and CP-ANN delivered 100% accuracy with less execution time –The performance of their proposed system was reasonably high –Their proposed technique outperforms CNN and GoogleNet –PD2SENet classified plant diseases with 98% accuracy
[29]
Plant disease detection
New AI technique: OP-CNN
–OP-CNN outperformed other CNN techniques with 100% accuracy
3.3 Water Management Water is an important natural resource that must be preserved. Water consumption represents the biggest issue that can be found in various fields. Agricultural water is an essential resource for food security and its use accounts for roughly 70% of total consumption globally [42]. As the population grows and food demand rises, this percentage will rise rapidly. Thus, intelligent water management and efficient technologies are required in order to achieve sustainable development of water in irrigation. AI plays a significant role in optimizing water use in irrigation systems such as drip irrigation, surface irrigation, manual irrigation, and more. In this context, using sensors, programmable electronic cards, and AI techniques is necessary to create automated irrigation systems that can predict the water demand. In [43], the authors create a smart irrigation system using a camera, soil color analysis, and ANN. Thus, their objective is to transmit the image of soil color to the ANN, and then the ANN decides whether or not to irrigate the soil. As per [44], the authors developed a smart irrigation system using different technologies, such as neural networks (NN), the internet of things (IoT), and the MQTT protocol. The IoT is used to communicate
Overview of Artificial Intelligence in Agriculture
457
and connect devices; the NN is employed in decision-making for efficient irrigation; and MQTT is used to keep farmers up to date on crop conditions. Their proposed system provided various benefits, like intelligence, portability, and low cost. Belattar et al. [45], created an intelligent system based on ANN (CP-ANN) and combined it with IoT, sensors, and an Arduino card in order to diagnose and control different factors in greenhouses and in a variety of functions, including smart irrigation, temperature control, and air humidity control, etc. In a different study [46], the authors presented a smart irrigation system that uses IoT, K-means, and a support vector regression (SVR) model to predict the soil moisture of the next days. In a final study [47], the authors suggested a hybrid model combining XGBoost and K-means for the development of a smart irrigation system, and compared their results to those of other hybrid models. Table 4 highlights and summarizes the aforementioned articles for the topic of AI-based water management. After doing the literature analysis that was stated earlier, we are now able to demonstrate the significance of using AI in the agricultural industry in a variety of contexts, including crop management, soil management, and water management. Therefore, artificial intelligence has a good impact on agricultural output with high results. On the basis of a survey of the relevant literature, the artificial neural network and deep learning are the most often used AI techniques in the agriculture area due to their superior properties over traditional machine learning (see Fig. 3). For this reason, we strongly recommend them. Hence, it is imperative that the advantages of AI in agriculture be taken into account by all nations, but particularly by emerging nations such as Morocco. This will help to minimize the use of traditional techniques that prevent improvements in agricultural productivity. Table 4. AI for water management Author
Functionality
The proposed AI techniques
Results
[42]
Smart irrigation system
ANN
–Their system was very accurate
[43]
Smart irrigation system
ANN
–Their proposed system was beneficial
[44] [46]
Smart irrigation system Smart irrigation system
CP-ANN K-means/SVR
–High performance with 100% accuracy –Their outcomes were quite promising
[47]
Smart irrigation system
K-means + XGBoost
–Their proposed system outperforms other hybrid models by an accuracy of 97%
458
S. Belattar et al.
Fig. 3. Based on the literature research conducted for this paper, a pie chart depicting the AI techniques that have been widely implemented.
4 The Future Scope In the future, thousands of millions of crop-related data will be produced by the agriculture sector. Thus, the standard methods for processing it and making decisions are inefficient. With the increasing human population, the exploitation of AI in agriculture is a good solution to maintain sustainable agriculture development for future generations. Therefore, AI solutions must reach the farming community to be utilized for a variety of agricultural jobs. Utilizing open-source platforms, promoting the concept by the government, and holding meetings with farmers and their communities can significantly contribute to how the system based on AI technologies can be used to improve agricultural productivity at a low cost. The researchers’ goals are to produce extremely intelligent systems that are easy for farmers to use, such as intelligent systems based on voice and text, as well as those with multiple languages. It’s tough to forecast what AI’s future profitability percentage in agriculture will be, but we believe that AI will alter this sector by raising agricultural yields and by eliminating the factors related to cost and traditional mechanisms. Consequently, AI in agriculture has the potential to be a boom in farming. In the future, many people will be concerned that AI will render humans obsolete in a variety of occupations, particularly agricultural laborers, because AI will drastically cut the number of laborers until there are none left. This is especially true for tasks that are boring or keep doing the same thing over and over. Otherwise, there will be a significant increase in the number of jobs required to oversee and maintain AI systems.
5 Conclusion This review paper offered a general introduction to artificial intelligence and discussed some of the ways in which it could be used in the agricultural sector. The goal of this overview introduction was to familiarize the reader with all of the fundamental concepts that enable them to see and comprehend the role that AI plays in agriculture as well as its advantages in terms of crop productivity, cost, and profitability. Therefore, AI
Overview of Artificial Intelligence in Agriculture
459
approaches have been shown to be a useful tool in the agricultural industry, and the studies given indicate that these techniques may be used to construct intelligent systems capable of early diagnosis, prediction, and recommendation that can increase the level of productivity compared to classical techniques used. So, AI may completely revolutionize the agricultural industry as a result of the many benefits it offers.
References 1. Turing am.: i.—computing machinery and intelligence. mind. lix, 433–60 (1950) 2. Amisha, P.M., Pathania, M., Rathaur, V.: Overview of artificial intelligence in medicine. J. Family Med. Primary Care 8(7), 2328 (2019). https://doi.org/10.4103/jfmpc.jfmpc_440_19 3. Mitchell, M.: Artificial Intelligence: A Guide for Thinking Humans. Penguin UK (2019) 4. McCarthy, J., Minsky, M.L., Rochester, N., Shannon, C.E.: A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence 27(4), 12 (1955) 5. Issarti, I., Rozema, J.J.: Basics of artificial intelligence for ophthalmologists. In: Grzybowski, A. (eds.) Artificial Intelligence in Ophthalmology. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-78601-4_2 6. Ayoub Shaikh, T., Rasool, T., Rasheed Lone, F.: Towards leveraging the role of machine learning and artificial intelligence in precision agriculture and smart farming. Comput. Electron. Agric. 198(May), 107119 (2022). https://doi.org/10.1016/j.compag.2022.107119 7. Hunger and Food Insecurity. Food and Agriculture Organization of the United Nations, Food and Agriculture Organization of the United Nations (2020). www.fao.org/hunger/en/ 8. Nation United. Sustainable development goals (2017). https://sdgs.un.org/goals 9. Veale, T.: Key Ideas in Artificial Intelligence Dublin City University [WWW] (2001). http:// www.compapp.dcu.ie/~tonyv/Textbook/history.html 10. Britannica: Intelligence Encyclopaedia Britannica [WWW] (2001). http://www.britannica. com/eb/article?eu=109299 11. Nakashima, H.: AI as complex information processing. Minds Mach. 9, 57–80 (1999) 12. Yam, P.: Intelligence Considered ScientificAmerican.Com [WWW] (2001). http://www. sciam.com/1998/1198intelligence/1198yam.html 13. Rifkin, S.: Harvard Undergraduate Society for Neuroscience The Harvard Computer Society [WWW]. http://hcs.harvard.edu/~husn/BRAIN/vol2/Primate.html (1995) 14. Schatsky, D., Muraskin, C., Gurumurthy, R.: Cognitive technologies - The real opportunities for business. Deloitte Review, 16 [online document] (2018). https://www2.deloitte.com/ insights/us/en/deloitte-review/issue-16/cognitivetechnologiesbusiness-applications.html. Accessed 16 June 2018 15. Belattar, S., Abdoun, O., Haimoudi, E.K.: Comparing machine learning and deep learning classifiers for enhancing agricultural productivity. Case study: Larache Province, Northern Morocco. Int. J. Electric. Comput. Eng. 13(2), 1689–1697 (2023). https://doi.org/10.11591/ ijece.v13i2.pp1689-1697 16. Nguyen, T.T., et al.: A low-cost approach for soil moisture prediction using multi-sensor data and machine learning algorithm. Sci. Total Environ. 833(March), 155066 (2022). https://doi. org/10.1016/j.scitotenv.2022.155066 17. Shah, A., Dubey, A., Hemnani, V., Gala, D., Kalbande, D.R.: Smart farming system: crop yield prediction using regression techniques. In: Vasudevan, H., Deshmukh, A.A., Ray, K.P. (eds.) Proceedings of International Conference on Wireless Communication. LNDECT, vol. 19, pp. 49–56. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-8339-6_6 18. Javidan, S.M., Banakar, A., Vakilian, K.A., Ampatzidis, Y.: Diagnosis of grape leaf diseases using automatic K-means clustering and machine learning. Smart Agric. Technol. 3, 100081 (2022). https://doi.org/10.1016/j.atech.2022.100081(2023)
460
S. Belattar et al.
19. Singh Kushwah, J., Kumar, A., Patel, S., Soni, R., Gawande, A., Gupta, S.: Comparative study of regressor and classifier with decision tree using modern tools. Mater. Today Proc. 56, 3571–3576 (2022). https://doi.org/10.1016/j.matpr.2021.11.635 20. Zhang, S., Li, J.: KNN classification with one-step computation. IEEE Trans. Knowl. Data Eng. 1 (2021). https://doi.org/10.1109/TKDE.2021.3119140 21. Verma, R., Bhardwaj, N., Singh, P.D., Bhavsar, A., Sharma, V.: Estimation of sex through morphometric landmark indices in facial images with strength of evidence in logistic regression analysis. Forensic Sci. Int. Reports 4, 100226 (2021). https://doi.org/10.1016/j.fsir.2021. 100226 22. Huang, L., Song, T., Jiang, T.: Linear regression combined KNN algorithm to identify latent defects for imbalance data of ICs. Microelectron. J. 131, 105641 (2023). https://doi.org/10. 1016/j.mejo.2022.105641 23. Chougule, A., Jha, V.K., Mukhopadhyay, D.: Crop suitability and fertilizers recommendation using data mining techniques. In: Panigrahi, C., Pujari, A., Misra, S., Pati, B., Li, K.C. (eds.) Progress in Advanced Computing and Intelligent Engineering. AISC, vol 714. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0224-4_19 24. Devine, S.M., Steenwerth, K.L., O’Geen, A.T.: A regional soil classification framework to improve soil health diagnosis and management. Soil Sci. Soc. Am. J. 85(2), 361–378 (2021). https://doi.org/10.1002/saj2.20200 25. Juan, P.M., et al., Comparison of soil quality indexes calculated by network and principal component analysis for carbonated soils under different uses. Ecol. Indicat. 143, 109374, ISSN 1470-160X (2022). https://doi.org/10.1016/j.ecolind.2022.109374 26. Belattar, S., Abdoun, O., Haimoudi, E.K.: Study and analysis of data analysis systems (Reconstruction of a Learning Data from the Initial Data). In: ACM International Conference Proceeding Series (2020). https://doi.org/10.1145/3386723.3387837 27. Belattar, S., Abdoun, O., Haimoudi, E.K.: New learning approach for unsupervised neural networks model with application to agriculture field. Int. J. Adv. Comput. Sci. Appl. 11(5), 360–369 (2020). https://doi.org/10.14569/IJACSA.2020.0110548 28. Belattar, S., Abdoun, O., Haimoudi, E.K.: A novel strategy for improving the counter propagation artificial neural networks in classification tasks. J. Commun. Software Syst. 18(1), 17–27 (2022). https://doi.org/10.24138/jcomss-2021-0121 29. Belattar, S., Abdoun, O., Haimoudi, E.K.: Performance analysis of the application of convolutional neural networks architectures in the agricultural diagnosis. Indon. J. Electric. Eng. Comput. Sci. 27(1), 156–162 (2022). https://doi.org/10.11591/ijeecs.v27.i1 30. Xuedan, D., et al.: Overview of deep learning. In: 31st Youth Academic Annual Conference of Chinese Association of Automation, Wuhan, China, Novembre, 11–13 (2016) 31. Tang, J., Wu, J., Hu, B., Liu, J.: Towards a fault diagnosis method for rolling bearing with Bidirectional deep belief network. Appl. Acoust. 192, 108727 (2022). https://doi.org/10.1016/ j.apacoust.2022.108727 32. Tao, Z., Qi, L., Lu, H., Qianru, C., Xiangxiang, Z.: GAN review: Models and medical image fusion applications, Inform. Fus. 91, 134–148, ISSN 1566-2535 (2023). https://doi.org/10. 1016/j.inffus.2022.10.017 33. Eli-Chukwu, N., Ogwugwam, E.C.: Applications of artificial intelligence in agriculture: a review. Eng. Technol. Appl. Res. 9(4), 4377–4383 (2019) 34. Azadnia, R., Jahanbakhshi, A., Rashidi, S., khajehzadeh, M.: Developing an automated monitoring system for fast and accurate prediction of soil texture using an image-based deep learning network and machine vision system. Measure. J. Int. Measure. Confeder. 190(October 2021), 110669 (2022). https://doi.org/10.1016/j.measurement.2021.110669 35. Johann, A.L., de Araújo, A.G., Delalibera, H.C., Hirakawa, A.R.: Soil moisture modeling based on stochastic behavior of forces on a no-till chisel opener. Comput. Electron. Agric. 121, 420–428 (2016)
Overview of Artificial Intelligence in Agriculture
461
36. Song, H., He, Y.: Crop nutrition diagnosis expert system based on artificial neural networks. In: 3rd International Conference on Information Technology and Applications, Sydney, Australia, July 4–7 (2005) 37. Espejo-Garcia, B., Malounas, I., Mylonas, N., Kasimati, A., Fountas, S.: Using EfficientNet and transfer learning for image-based diagnosis of nutrient deficiencies. Comput. Electron. Agric. 196(January), 106868 (2022). https://doi.org/10.1016/j.compag.2022.106868 38. Taha, M.F., et al.: Using deep convolutional neural network for image-based diagnosis of nutrient deficiencies in plants grown in aquaponics. Chemosensors 10(2) (2022). https://doi. org/10.3390/chemosensors10020045 39. Dahikar, S.S., Rode, S.V., Deshmukh, P.: An artificial neural network approach for agricultural crop yield prediction based on various parameters. Int. J. Adv. Res. Electron. Commun. Eng. 4(1) (2015) 40. Ahmad, A., Saraswat, D., El Gamal, A.: A survey on using deep learning techniques for plant disease diagnosis and recommendations for development of appropriate tools. Smart Agric. Technol. 3, 100083 (2023). https://doi.org/10.1016/j.atech.2022.10008 41. Liang, Q., Xiang, S., Hu, Y., Coppola, G., Zhang, D., Sun, W.: PD2SE-Net: computer-assisted plant disease diagnosis and severity estimation network. Comput. Electron. Agric. 157, 518– 529 (2019). https://doi.org/10.1016/j.compag.2019-01.034 42. Kurtulmu¸s, E., Arslan, B., Kurtulmu¸s, F.: Deep learning for proximal soil sensor development towards smart irrigation. Expert Syst. Appl. 198(October 2021) (2022). https://doi.org/10. 1016/j.eswa.2022.116812 43. Al-Naji, A., Fakhri, A.B., Gharghan, S.K., Chahl, J.: Soil color analysis based on a RGB camera and an artificial neural network towards smart irrigation: a pilot study. Heliyon 7(1), e06078 (2021). https://doi.org/10.1016/j.heliyon.2021.e06078 44. Nawandar, N.K., Satpute, V.R.: IoT based low cost and intelligent module for smart irrigation system. Comput. Electron. Agric. 162(May), 979–990 (2019). https://doi.org/10.1016/j.com pag.2019.05.027 45. Belattar, S., Abdoun, O., Haimoudi, E.K.: Intelligent management of using natural resources in agriculture (2020). https://doi.org/10.1007/978-3-030-36664-3_26 46. Goap, A., Sharma, D., Shukla, A.K., Krishna, C.R.: An IoT based smart irrigation management system using Machine learning and open source technologies. Comput. Electron. Agric 155, 41–49 (2018). ISSN 0168-1699. https://doi.org/10.1016/j.compag.2018.09.040 47. Premkumar, S., Sigappi, A.N.: IoT-enabled edge computing model for smart irrigation system. J. Intell. Syst. 31(1), 632–650 (2022) https://doi.org/10.1515/jisys-2022-0046
RNN and LSTM Models for Arabic Speech Commands Recognition Using PyTorch and GPU Omayma Mahmoudi and Mouncef Filali Bouami(B) Laboratory of Applied Mathematics and Information Systems, Multidisciplinary Faculty of Nador, Mohammed Premier University, Oujda, Morocco {mahmoudi.omayma,m.filalibouami}@ump.ac.ma
Abstract. Because of the acute lack of Arabic-language datasets, speech command recognition, particularly in Arabic, is a real challenge for research. Contrary to the English language, the Google Speech Commands dataset is the most wellknown dataset for spoken instructions. This has accelerated study and sparked many new deep techniques that address keyword research challenges. Two classes’ classification are introduced in this paper (""ﻧﻌﻢ، " ) )"ﻻwith LSTM & RNN. Moreover, general-purpose computing on the graphics processing unit will be applied to train the model, reducing the training time. To ensure that the parameters used in our model are the best, several training experiments were performed on different parameters to check the system’s efficiency. The results demonstrate that the proposed technique has satisfactory accuracy. The training time was extremely short due to the technology used (CUDA Toolkit), and, last but not least, the model recognized the commands. Keywords: LSTM · RNN · Arabic speech command recognition · PyTorch · GPU · MFCC
1 Introduction Due to the quick advancement of artificial intelligence [1] across a variety of industries, including smartphones, vehicles, smart homes, smart home appliances, etc., speech commands recognition (SCR) facilitates services and simplifies our lives, and it is one of the fundamentals of life. But This only applies to basic commands for the time being. However, scientists will be able to create more sophisticated conversational voice analysis algorithms as technology develops. The idea is to have smooth communication between a human and a machine, with the latter responding intelligently. SCR software may help manage smart homes, send commands to smartphones and tablets, make reminders, and communicate properly with personal hands-free technologies. It is extremely beneficial to anyone who has a physical impairment and finds writing difficult, painful, or impossible. Additionally, more reliable and effective human-machine interaction interfaces are required to give users the greatest experience possible when using more sophisticated smart gadgets. Only when we have unified speech recognition models will this © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 462–470, 2023. https://doi.org/10.1007/978-3-031-43520-1_39
RNN and LSTM Models for Arabic Speech Commands Recognition
463
be achievable, and such systems will eventually enable people of diverse backgrounds, educational levels, and lifestyles to connect with technology naturally. The human race already benefits from numerous voice recognition-based applications for a variety of jobs. Users can search for anything using voice commands instead of a keyboard by using voice search in a variety of applications. Search engine queries, travel directions, hotel and restaurant searches, and product searches on e-commerce websites are just a few examples of these applications. Our research seeks to develop a system for finding keywords that can recognize preset words and help a device engage in various ways based on what is required. Specifically, our dataset includes two of the most popular classes. (""ﻧﻌﻢ،" ) )"ﻻin English (“NO”, “YES”), This gathers 600 waveform audio recordings from 30 various contributors. A single word is heard in each audio clip, which lasts one second. We have accelerated the operation of the deep learning (DL) [2] programs LSTM [3] and RNN [3] using graphics processing units (GPUs) [4]. GPUs are capable of providing powerful and fast computation owing to a huge number of hardware computing units and hundreds of thousands of running threads. Additionally, the prevalent use of PyTorch as a framework in DL [5] facilitates quick GPU deployment for high-level AI inference and training methods. One of the most significant advantages of using GPUs in DL applications is their high programmability and API support for AI. An Nvidia GeForce RTX 3050 GPU on a laptop computer [6] was used to train our dataset. In this study, we used two DL models—LSTM and RNN—to distinguish between two classes of Arabic speech commands (SC). Before that, MFCC [7] was used to extract features. This paper’s rest is organized as follows: The second section covers the related works, and the third section presents the methodology. The fourth section’s main subject is experiments and their results. We present our conclusion in the final section.
2 Related Works Dong et al. proposed C-1-G-2-Blstm [8]. This model combines CNN’s capacity to learn local characteristics with RNN’s ability to understand the long-term dependency of sequence data. Additionally, using a gated CNN [9] rather than a traditional CNN significantly increased the model’s capacity. The studies show that with the Google SC dataset, C-1-G-2-Blstm can obtain a high accuracy of 90.6%, which is 6.4% higher than previous methods. Sumon et al. [10] created three distinct CNN architectures to detect Bangla short SC using the dataset of reported Bangla short speech commands. In one technique, they retrieved MFCC characteristics from audio recordings, but in another, the raw audio files were employed for another CNN architecture. Finally, a pre-trained model trained on a large English short SC dataset was fine-tuned by retraining on the Bangla dataset. According to their findings, the MFCC model performs better in detecting Bangla short-spoken instructions, whereas the other models perform well in recognizing singlesyllable commands but have difficulty recognizing multisyllabic commands. Raphael et al. [11] found resource-efficient methods for time-domain short-circuiting such systems when a model is certain of its outcome. They recommended using labels
464
O. Mahmoudi and M. F. Bouami
at the frame level to enhance balancing accuracy and efficiency even more. On two datasets, the optimal technique yields an average 45% reduction in the utterance’s time without reducing the absolute accuracy [12] by more than 0.6 points. Hung et al. [13] synthesized and processed speech in Vietnamese. Their research develops and tests RNN to apply it to the data that they have collected. A list of 15 keywords for managing smart home devices is available, and their average recognition accuracy is 98.19%.
3 Methodology 3.1 Dataset The Arabic SC Dataset (v1.0) provided us with a dataset consisting of audios of commands of 2 classes (""ﻧﻌﻢ، "[ )"ﻻ14]. x and y are pairings, where x stands for the input audio signal whereas y for the corresponding phrase. The audio file has a sample rate of 16 kHz for one second each. Ten statements for each of the 30 orders were recorded by each participant. Consequently, each command has a total of 300 audio files. (30 * 10 * 2 = 600). We separated the data as 60% for training, 20% for validation, and 20% for testing. 3.2 Arabic Speech Commands Recognition The usual architecture of speech command algorithms is depicted in Fig. 1. The probability of a single speech command (""ﻧﻌﻢ، " )"ﻻis calculated by first converting the input speech signal into a feature representation, such as MFCCs, and then feeding that representation to a recurrent neural network.
Fig. 1. Pipelines depicting speech commands recognition.
• Input for Arabic Speech: Individually enters and classifies the processed records. • Extraction of features: To extract meaningful features from the audio data, we used the Librosa package, which provides multiple methods for extracting a variety of characteristics from the audio, including MFCC. • Classifier using recurrent neural networks: LSTM and RNN are used to classify SC, and their classifications are compared. • Commands recognition: The output is then subjected to a threshold, allowing the command with the highest probability to be chosen.
RNN and LSTM Models for Arabic Speech Commands Recognition
465
3.3 Feature Extraction Our development follows a series of sequential stages, as shown in Fig. 2.
Fig. 2. Feature extraction utilizing the MFCC.
MFCCs are a popular feature in ASR systems for displaying voice signals. Despite not being the only feature, it is recognized to aid in achieving exceptional results when compared to other features, which encouraged us to incorporate it into our study. “Figure 3” illustrates the main phases in the MFCCs feature extraction method. In the framing step, we used a 25 ms window with a 10 ms length. The discrete cosine transform (DCT) [15] was then used to these data to derive 13 MFCC characteristics. The first characteristic was then removed as it didn’t include any important information, leaving the remaining 12 characteristics. We utilized the PyTorch implementation for this task.
Fig. 3. Presentation of waveforms and features using a (backward) instance from our dataset.
466
O. Mahmoudi and M. F. Bouami
3.4 PyTorch Simulation with LSTM and RNN Through their gate systems, LSTMs and RNNs were developed as a way to handle shortterm and long-term memory. When it comes to Natural Language Processing (NLP), the best deep learning models still mainly rely on the original versions, even though there have been several modifications. This is relevant to voice recognition, synthesis, text production, and market research. Table 1 presents how the LSTM and RNN models differ from one another. Table 1. LSTM versus RNN model differences.
models
Figures
LSTM is a three “ports” gathering in a cell form, that serve as compute centers and regulate the informatio n flow (by carrying out specified tasks). Additionally, there are two distinc t outputs. The output of the previous hidden state and the observation at time = t are the two inputs that the RNN cell accepts. However, it is not available on any memory cell.
4 Experiments and Results 4.1 The Training Stage Training Environment: Due to NN’s high computational requirements, we decided to train our network on a GPU. Our training environment is depicted in Table 2. Initialization: To generate random weights for symmetry breaking, a shortened normal distribution with a zero mean and a predefined standard deviation is used. Although the final weight values in the trained network are unknown, it makes sense to assume that they will be roughly evenly distributed between positive and negative values given that the data has been sufficiently normalized. Because if every neuron in the network produces
RNN and LSTM Models for Arabic Speech Commands Recognition
467
Table 2. Training Environment. Language
Python3.6
Framework
Pytorch 1.13.0
GPU
GeForce GTX 3050
GPU Memory
18358 Mo
the same output, they will all compute the same gradients and alter the same parameters during backpropagation; As a result, we would like the weights to be close to but not exactly zero. Batch Size: For gradient descent, the batch size (BS) [16] is 64. To reduce correlation and improve the network’s ability to learn, we randomly select 64 training samples for each step and pick a BS higher than one to prevent overfitting. Because training a NN might be quite computationally demanding, the BS shouldn’t be too huge. As a result, this number is a compromise between hardware constraints and performance. Learning Rate: We chose 0,001 for the learning rate (LR) [17]. The latter has a substantially lower LR because the model is being modified for the subsequent actions. We experimented with several combinations before settling on this one, which we found to be capable of achieving both high efficiency and good convergence.
4.2 Results and Discussion The results of the experiment revealed that among models that have been trained on the two classes (""ﻧﻌﻢ، " )"ﻻin English (“NO”, “YES”), RNN has a validation accuracy of 93.33%, a test accuracy of 84.00, and a test loss of 0.49; LSTM has a validation accuracy of 100%, test accuracy of 94.17%, and test loss of 0.183. On the test set, the LSTM model was able to increase the prediction accuracy by 10.17%. However, unlike the RNN model, it does not save time when training the model (Table 3). Table 3. Evaluation of accuracy loss of LSTM & RNN models Model
Train
Test
Accuracy
loss
Accuracy
loss
TIME TAKEN
LSTM
1.0000
0.0013
0.9417
0.3539
1 min
RNN
0.9333
0.1608
0.8400
0.4900
0.5 min
468
O. Mahmoudi and M. F. Bouami
The LSTM and RNN models were trained using the Adam optimization technique, with a 0.4 value for the dropout rate, 50 epochs of the loss function of the categorical cross-entropy with a BS of 64 and a beginning LR of 0.001 were used. The accuracy and loss curves are displayed in Figs. 4 and 5 for the two models.
Fig. 4. Loss and accuracy history graphs utilizing LSTM models during training and testing.
Fig. 5. Loss and accuracy history graphs utilizing RNN models during training and testing.
4.3 Confusion Matrix (CM) Lastly, we display a CM for the test set for the LSTM model in Fig. 6 and the RNN model in Fig. 7. The columns of the CM reflect a collection of samples for which the Arabic words “NO” and “YES” were predicted. It shows that when compared to the LSTM model, the LSTM model makes many fewer errors.
RNN and LSTM Models for Arabic Speech Commands Recognition
Fig. 6. Test CM of LSTM model
469
Fig. 7. Test CM of RNN model
5 Conclusion In this project, we used the recurrent neural networks LSTM and RNN as our models. The study’s findings demonstrate that LSTM outperforms the RNN model by improving prediction accuracy by 10%. The LSTM model obtained a 94.17% accuracy, while the RNN model has an 84.00%. Based on this experiment, we conclude that the LSTM model outperforms the RNN model in terms of SCR. Furthermore, using GPUs allowed us to significantly accelerate the training process for our models. To develop more speech samples, we will use a variety of data augmentation techniques in our next work, including noise factors, pitch alteration, velocity transformation, and others. To enhance speech recognition performance.
References 1. Fetzer, J.H., Fetzer, J.H.: What is Artificial Intelligence?, pp. 3–27. Springer, Netherlands (1990) 2. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016) 3. Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D 404, 132306 (2020) 4. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008) 5. Stevens, E., Antiga, L., Viehmann, T.: Deep Learning with PyTorch. Manning Publications (2020) 6. Jakiela, P., Zhu, H., Zhang, X.: Exploration and Setup of Power Delivery System Attacks (2022) 7. Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16, 582–589 (2001) 8. Wang, D., Lv, S., Wang, X., Lin, X. (Gated Convolutional LSTM for Speech Commands Recognition. springer.com (2018) 9. Wu, J.: Introduction to convolutional neural networks. National Key Lab for Novel Software Technology, vol. 5, Isuue 23, p. 495. Nanjing University. China (2017) 10. Sumon, S.A., Chowdhury, J., Debnath, S., Mohamme, N., Momen, S.: Bangla Short Speech Commands Recognition Using Convolutional Neural Networks. ieeexplore.ieee.org. 02 December 2018 11. Tang, R., et al.: Temporal Early Exiting for Streaming Speech Commands Recognition. ieeexplore.ieee.org. 27 Avril 2022
470
O. Mahmoudi and M. F. Bouami
12. Mahmoudi, O., Bouami, M.F.: Arabic speech emotion recognition using deep neural network. In: Motahhir, S., Bossoufi, B. (eds.) Digital Technologies and Applications: Proceedings of ICDTA’23, Fez, Morocco, Volume 2, pp. 124–133. Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-29860-8_13 13. Hung, P.D., Giang, T.M., Nam, L.H., Duong, P.M.: Vietnamese speech command recognition using. Int. J. Adv. Comput. Sci. Appl. https://doi.org/10.14569/IJACSA.2019.0100728 14. Ghandoura, A., Hjabo, F., Al Dakkak, O.: Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting. Eng. Appl. Artif. Intell. 102, 104267 (2021) 15. Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974) 16. Shelke, R., Vanjale, S.: Deep named entity recognition in hindi using neural networks. Revue d’Intelligence Artificielle 36(4) (2022). 17. Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with a dynamic bound of learning rate. arXiv preprint arXiv:1902.09843 (2019)
Author Index
A Abdellaoui, Yasmine 244 Abdelmajid, Badri 381 Abdoun, Otman 389, 447 Aboutaleb, Soumia 244 Abtoy, A. 400 Afroz, Umme Sanzida 21 Ahlaqqach, Mustapha 427 Ahmad, Fadzil 12 Ahmadi, Abdeslam 161 Alaoui, Safae Belamfedel 161 Amhraoui, Elmehdi 150 Amour, Amar 299 Amri, Samir 172 Arbaoui, Abdelaziz 299 Aschwege, Frerk Müller-von 1 Aya, Kechchour 334 B Baali, Imran 287 Bani, Rkia 172 Barbara, Idriss 272 Belattar, Sara 447 Bellarbi, Larbi 356 Belmarouf, Chaimae 211 Benabdella, Abla Chaouni 134 Benabdellah, Naoual Chaouni 108 Benghabrit, Asmaa 134 Benmoussa, Youssef 244 Berrada, Mohamed 161 Boll-Westermann, Susanne 1 Bouami, Mouncef Filali 462 Boudnaya, Jaouad 334 Bouhaddou, Imane 134 C Chafik, Hassan 161 Cheracher, Aymane 244 Chtioui, Ahmed 134 D Darif, Anouar 123 Daud, Kamarulazhar
12
Douifir, Kenza 108 Douzi, Youssef 257 E Ech-chorfi, Salah Edine 411 El Alami, Aymane 244 El Alami, Yasser El Madani 287 El Bahri, N. 400 El Fahfouhi, Hanae 347 El Hassani, Ibtissam 312 El Makroum, Reda 244 El Oualidi, Moulay Ali 427 El Fallah, Saad 34 Elghazi, Khalid 86 El-Hassani, Fatima Zahrae 366 Elhassani, Ibtissam 46 Elkourchi, Anas 427 Elouazzani, Hind 46 Elouerghi, Achraf 356 En-Naimani, Zakariae 201, 347 F Fallah, Saad El 59 Fihri, Abdelkader Fassi 98, 211, 312 G Ghanou, Youssef 366 Gomgnimbou, Ouèdan Jhonn 334 Guennoun, Zouhair 172 H Habbat, Nassera 323 Hadda, Mohammed 272 Haddouch, Khalid 201, 347, 366 Haimoudi, El Khatir 447 Haimoudi, ELkhatir 389 Hajji, Tarik 98, 211, 257, 312 Hajjoubi, Chaima El 98 Hamid, Md. Abdul 21 Hassan, Ramchoun 231 Hassani, Ibtissam El 98, 211 Hmaidi, Abdellah El 438
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 T. Masrour et al. (Eds.): A2IA 2023, LNNS 772, pp. 471–472, 2023. https://doi.org/10.1007/978-3-031-43520-1
472
Author Index
Hmaidi, Safae 287 Huynh, Nina Aslhey 334
N Nasreddin, Dalal 244 Nouri, Hicham 323
I Ibrahim, Anas 73 Ibrahim, Anwar Hassan 12 Idris, Mohaiyedin 12, 73 Ilham, El Mourabit 381 Ismail, Ahmad Puad 12, 73 Itahriouan, Z. 400 J Jakni, Taha 312 Jopony, Sophie Thelma Marcus Joudar, Nour-Eddine 201
O Osman, Muhammad Khusairi 12, 73 Ouakki, Yassine 299 Ouazzani Jamil, Mohammed 34, 59 Ouchitachen, Hicham 123 P Pinski, Jan 12
K Kassimi, Salma 98 Khaldoun, Asmae 244 Khalfaoui, Aicha 381 Khan, Md. Rafidul Hasan 21 Kharbach, Jaouad 34, 59 Khomsi, Zakaryae 356 L Lachgar, Maryem 123 Lahmamsi, Marouane 312 Larhlimi, Ibtissam 123 Lazaar, Mohamed 287 Lehmam, Oumayma 59 Limami, Houssame 244 M Mahmoudi, Omayma 462 Mahmud, Mat Nizam 73 Manssouri, Imad 244, 438 Manssouri, Taj Eddine 438 Marjana, Nusrat Jahan 21 Marrakchi, Saad Amrani 244 Masrour, Tawfik 46, 86, 150, 183, 257, 272 Maszuhn, Matthias 1 Mkhida, Abdelhak 334 Mohamad, Fadzil Ahmad 73 Mohammed, Hadda 231 Mostafa, Bakhouya 231 Mouncif, Hicham 123
1
Q Quasdane, Mohamed
183
R Rabiain, Azmir Hasnur 73 Rahman, Md. Sadekur 21 Ramchoun, Hassan 86, 183 Rbihou, Safae 201 Rezzouk, Abdellah 34, 59 S Saad, Zuraidi 73 Sabbahi, Inass 244 Sabri, Karim 323 Sabri, Nur Nadhirah Naqilah Ahmad Sahbi, Hassane 438 Soh, Zainal Hisham Che 12 Sulaiman, Siti Noraini 73 T Taib, Chaymae 389 Taki, Youssef 221 Talhaoui, Yassine 211 Tawfik, Masrour 231 Tumpa, Eteka Sultana 21 Y Yahaya, Saiful Zaimy 12 Yassir, Ait Omar 334 Z Zemmouri, Elmoukhtar 221, 411 Zenkouar, Lahbib 172
73