557 29 71MB
English Pages 994 [995] Year 2023
Lecture Notes in Networks and Systems 637
Janusz Kacprzyk · Mostafa Ezziyyani · Valentina Emilia Balas Editors
International Conference on Advanced Intelligent Systems for Sustainable Development Volume 1 - Advanced Intelligent Systems on Artificial Intelligence, Software, and Data Science
Lecture Notes in Networks and Systems
637
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Janusz Kacprzyk · Mostafa Ezziyyani · Valentina Emilia Balas Editors
International Conference on Advanced Intelligent Systems for Sustainable Development Volume 1 - Advanced Intelligent Systems on Artificial Intelligence, Software, and Data Science
Editors Janusz Kacprzyk Polish Academy of Sciences Systems Research Institute Warsaw, Poland
Mostafa Ezziyyani Abdelmalek Essaâdi University Tangier, Morocco
Valentina Emilia Balas Department of Automatics and Applied Software Aurel Vlaicu University of Arad Arad, Romania
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-26383-5 ISBN 978-3-031-26384-2 (eBook) https://doi.org/10.1007/978-3-031-26384-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
Within the framework of the International Initiative for Sustainable Development of innovations and scientific research in order to keep pace with the digital transformation in light of the fourth industrial revolution and to encourage development projects known to the world, ENSAM-Rabat of Mohammed V University in cooperation with ICESCO organized the fourth edition of the International Conference on Advanced Smart Systems for Sustainable Development and their applications in various fields through five specialized seminars during the period from May 22 to 28, 2022. The fourth edition of the International Conference on Advanced Smart Systems for Sustainable Development was a great success, under the high patronage of His Majesty King of Morocco, Mohammed VI, and the participation of scientists and experts from more than 36 countries around the world. The conference, in its fourth edition, also resulted in a set of agreements and partnerships that were signed between the various participating parties, thus contributing to achieving the goals set by the conference regarding the investment of smart systems for sustainable development in the sectors of education, health, environment, agriculture, industry, energy, economy and security. In view of the importance of the conference as a high-level annual forum, and in consideration of the scientific status that the conference enjoys nationally, continually and internationally. Based on the experience gained and accumulated through the previous editions, we look forward to the success of next edition at all organizational and scientific levels, like its predecessors, and hosting a distinguished presence and weighty personalities from all participating countries in order to move forward for cooperation in priority areas and common interest such as health, agriculture, energy and industry.
Preface
Science, technology and innovation have for a long time been recognized as one of the main drivers behind productivity increases and a key long-term lever for economic growth and prosperity. In the context of the International Conference on Advanced Intelligent Systems for Sustainable Development plays an even more central role. Actually, AI2SD features strongly in Sustainable Development Goal in different fields, as well as being a cross-cutting one to achieve several sectoral goals and targets: Agriculture, Energy, Health, Environment, Industry, Education, Economy and Security. An ambition of the AI2SD to become the global forerunner of sustainable development should, in particular, include integrating new technologies and artificial intelligence and smart systems in its overarching and sectoral strategies of research and development. In which it emphasizes that solutions discussed by experts are important drivers for researches and development. AI2SD is an interdisciplinary international conference that invites academics, independent scholars and researchers from around the world to meet and exchange the latest ideas and discuss technological issues concerning all fields Social Sciences and Humanities for Sustainable Development. Due to the nature of the conference with its focus on innovative ideas and developments, AI2SD provides the ideal opportunity to bring together professors, researchers and high education students of different disciplines, to discuss new issues, and discover the most recent developments, scientific researches proposing the panel discussion on Advanced Technologies and Intelligent Systems for Sustainable Development Applied to Education, Agriculture, Energy, Health, Environment, Industry, Economy and Security.
Organization
Chairs General Chairs Mostafa Ezziyyani Janusz Kacprzyk Valentina Emilia Balas
Abdelmalek Essaadi University, FST – Tangier, Morocco Polish Academy of Sciences, Poland Aurel Vlaicu University of Arad, Romania
Co-chairs Khalid El Bikri Wajih Rhalem Loubna Cherrat Omar Halli
ENSAM Rabat, Morocco ENSAM Rabat, Morocco ENCG of Tangier, Morocco Advisor to the Director General of ICESCO
Honorary Presidents Salim M. Almalik
Abdellatif Miraoui Younes Sekkouri Ghita Mezzour
Director General (DG) of the Islamic World Educational, Scientific and Cultural Organization (ICESCO) Minister of Higher Education, Scientific Research and Professional Training of Morocco Minister of Economic Inclusion, Small Business, Employment and Skills Minister Delegate to the Head of Government in Charge of Digital Transition and Administration Reform
Honorary Guests Thomas Druyen
Jochen Werner
Director and Founder of the Institute for Future Psychology and Future Management, Sigmund Freud University Medical Director and CEO, Medicine University of Essen, Germany
x
Organization
Ibrahim Adam Ahmed El-Dukheri Director General of the Arab Organization for Agricultural Development Stéphane Monney Mouandjo Director General of CAFRAD Jamila El Alami Director of the CNRST Rabat, Morocco Mostapha Bousmina President of the EuroMed University of Fez, Fez, Morocco Chakib Nejjari President of the Mohammed VI University of Health Sciences Casablanca, Morocco Noureddine Mouaddib President of International University of Rabat, Rabat, Morocco Azzedine Elmidaoui President of Ibn Tofail University, Kenitra, Morocco Lahcen Belyamani President of the Moroccan Society of Emergency Medicine SAMU Rabat, Morocco Karim Amor President of Moroccan Entrepreneurs and High Potentials of the World-CGEM Hicham El Abbadi Business Sales Manager, Afrique Francophone EPSON Ilham Berrada Director of ENSIAS Rabat, Morocco Mostafa Stito Director of the ENSA of Abdelmalek Essaadi University, Tetouan, Morocco Mohamed Addou Dean of FST Tangier, Morocco Ahmed Maghni Director of ENCG Tangier, Morocco
Keynote Speakers Chakib Nejjari Anas Doukkali Thomas Druyen
Jochen Werner Abdelhamid Errachid El Salhi Oussama Barakat Fatima Zahra Alaoui Issame Outaleb Rachid Yazami
President of the Mohammed VI University of Health Sciences Casablanca, Morocco Former Minister of Health, Morocco Director and Founder of the Institute for Future Psychology and Future Management Sigmund Freud University Medical Director and CEO, Medicine University of Essen, Germany Full Professor Class Exceptional Class, University Claude Bernard, Lyon, France University of Franche-Comté, Besançon, France Dean of the Faculty of Medicine of Laâyoune, Morocco CEO and Founder PharmaTrace, Munich, Germany Scientist, Engineer and Inventor, Morocco
Organization
Tarkan Gürbüz Plamen Kiradjiev Abdel Labbi
Mostafa Ezziyyani Ghizlane Bouskri Levent Trabzon Marius M. Balas Afef Bohli
Ahmed Allam (President)
Valentina Emilia Balas Faissal Sehbaoui Jaime Lloret Hanan Melkaoui Issa Mouhamed Hossana Twinomurinzi Abdelhafid Debbarh Hatim Rhalem Faeiz Gargouri (Vice President) Adil Boushib Nasser Kettani
Kaoutar El Menzhi Khairiah Mohd-Yusof (President) Nadja Bauer Badr Ikken Amin Bennouna Mohamed Essaaidi Hamid Ouadia
xi
Middle East Technical University (METU), Ankara, Turkey German Edge Cloud (GEC), Friedhelm Loh Group, Germany Head of Data & AI Platforms Research, IBM Distinguished Engineer, IBM Research – Europe FST – Tangier, Morocco Senior Data Scientist at Volkswagen Group, Germany Mechanical Engineering, Istanbul Technical University, Turkey Aurel Vlaicu University of Arad Assistant Professor at the Higher Institute of Computer Science and the Cofounder of Digi Smart Solutions World Association for Sustainable Development, Senior Policy Fellow, Queen Mary University of London, UK Aurel Vlaicu University of Arad, Romania CEO of AgriEDGE, Attached to the Mohammed VI Polytechnic University Department of Communications Polytechnic University of Valencia, Spain Yarmouk University, Irbid, Jordan Head|Centre for Applied Data Science at University of Johannesburg, South Africa Chief of Staff/Advisor to the President-UIR EPSON Sales Manager, Morocco University of Sfax, Tunisia Regional Manager Microsoft, Germany Entrepreneur, ExO Coach, Digital Transformation Expert, Exponential Thinker, Certified DPO, Accessibility Expert Head of Digital Learning Center UM5R, Morocco Johor Bahru, Johor, Malaysia Dortmund, Germany General Director of IRESEN, Rabat, Morocco Cadi Ayyad University, Marrakech, Morocco ENSIAS, Mohammed V University, Rabat, Morocco ENSAM, Mohammed V University, Rabat, Morocco
xii
Organization
Khalid Zinedine Brahim Benaji Youssef Taher Tarik Chafik Abdoulkader Ibrahim Idriss Loubna Cherrat Laila Ben Allal Najib Al Idrissi
Hassan Ghazal Muhammad Sharif Mounir Lougmani El Hassan Abdelwahid Mohamed Zeriab Es-Sadek Mustapha Mahdaoui M’Hamed Ait Kbir Mohammed Ahachad
Faculty of Sciences, Mohammed V University, Rabat, Morocco ENSAM, Mohammed V University, Rabat, Morocco Center of Guidance and Planning of Education, Morocco FST, Abdelmalek Essaadi University, Tangier, Morocco Dean of Faculty of Engineering – University of Djibouti, Djibouti Abdelmalek Essaadi University, Morocco FST Abdelmalek Essaadi University, Morocco Mohammed VI University of Health Sciences, General Secretary of the Moroccan Society of Digital Health, Morocco President of the Moroccan Association of Telemedicine and E-Health, Morocco Director and Founder of Advisor/Science and Technology at ICESCO General Secretary of the Association of German Moroccan Friends-DMF Cadi Ayyad University, Marrakech ENSAM, Mohammed V University in Rabat FST, Abdelmalek Essaadi University, Morocco Abdelmalek Essaadi University, Morocco Abdelmalek Essaadi University, Morocco
Course Leaders Adil Boushib Ghizlane Bouskri Nadja Bauer Hassan Moussif Abdelmounaim Fares Imad Hamoumi Ghizlane Sbai
Regional Manager Microsoft, Germany Senior Data Scientist at Volkswagen Group, Germany Dortmund, Germany Deutsche Telekom expert, Germany. General Director and Founder of M-tech Co-Founder and Chief Executive Officer Guard Technology, Germany Senior Data Scientist Engineer, Germany Product Owner, Technical Solution Owner at Pro7Sat1
Organization
Scientific Committee Christian Axiak, Malta Bougdira Abdeslam, Morocco Samar Kassim, Egypt Vasso Koufi, Greece Alberto Lazzero, France Charafeddine Ait Zaouiat, Morocco Mohammed Merzouki, Morocco Pedro Mauri, Spain Sandra Sendra, Spain Lorena Parra, Spain Oscar Romero, Spain Kayhan Ghafoor, China Jaime Lloret Mauri, Spain Yue Gao, UK Faiez Gargouri, Tunis Mohamed Turki, Tunis Abdelkader Adla, Algeria Souad Taleb Zouggar, Algeria El-Hami Khalil, Morocco Bakhta Nachet, Algeria Danda B. Rawat, USA Tayeb Lemlouma, France Mohcine Bennani Mechita, Morocco Tayeb Sadiki, Morocco Mhamed El Merzguioui, Morocco Abdelwahed Al Hassan, Morocco Mohamed Azzouazi, Morocco Mohammed Boulmalf, Morocco Abdellah Azmani, Morocco Kamal Labbassi, Morocco Jamal El Kafi, Morocco Dahmouni Abdellatif, Morocco Meriyem Chergui, Morocco El Hassan Abdelwahed, Morocco Mohamed Chabbi, Morocco Mohamed_Riduan Abid, Morocco Jbilou Mohammed, Morocco Salima Bourougaa-Tria, Algeria Zakaria Bendaoud, Algeria Noureddine En-Nahnahi, Morocco Mohammed Bahaj, Morocco Feddoul Khoukhi, Morocco Ahlem Hamdache, Morocco
xiii
xiv
Organization
Mohammed Reda Britel, Morocco Houda El Ayadi, Morocco Youness Tabii, Morocco Mohamed El Brak, Morocco Abbou Ahmed, Morocco Elbacha Abdelhadi, Morocco Regragui Anissa, Morocco Samir Ahid, Morocco Anissa Regragui, Morocco Frederic Lievens, Belgium Emile Chimusa, South Africa Abdelbadeeh Salem, Egypt Mamadou Wele, Mali Cheikh Loukobar, Senegal Najeeb Al Shorbaji, Jordan Sergio Bella, Italy Siri Benayad, Morocco Mourad Tahajanan, Morocco Es-Sadek M. Zeriab, Morocco Wajih Rhalem, Morocco Nassim Kharmoum, Morocco Azrar Lahcen, Morocco Loubna Cherrat, Morocco Soumia El Hani, Morocco Essadki Ahmed, Morocco Hachem El Yousfi Alaoui, Morocco Jbari Atman, Morocco Ouadi Hamid, Morocco Tmiri Amal, Morocco Malika Zazi, Morocco Mohammed El Mahi, Morocco Jamal El Mhamdi, Morocco El Qadi Abderrahim, Morocco Bah Abdellah, Morocco Jalid Abdelilah, Morocco Feddi Mustapha, Morocco Lotfi Mostafa, Morocco Larbi Bellarbi, Morocco Mohamed Bennani, Morocco Ahlem Hamdache, Morocco Mohammed Haqiq, Morocco Abdeljabbar Cherkaoui, Morocco Rafik Bouaziz, Tunis Hanae El Kalkha, Morocco Hamid Harroud, Morocco
Organization
Joel Rodrigues, Portugal Ridda Laaouar, Algeria Mustapha El Jarroudi, Morocco Abdelouahid Lyhyaoui, Morocco Nasser Tamou, Morocco Bauer Nadja, Germany Peter Tonellato, USA Keith Crandall, USA Stacy Pirro, USA Tatiana Tatusova, USA Yooseph Shibu, USA Yunkap Kwankam, Switzerland Frank Lievens, Belgium Kazar Okba, Algeria Omar Akourri, Morocco Pascal Lorenz, France Puerto Molina, Spain Herminia Maria, Spain Driss Sarsri, Morocco Muhannad Quwaider, India Mohamed El Harzli, Morocco Wafae Baida, Morocco Mohammed Ezziyyani, Morocco Xindong Wu, China Sanae Khali Issa, Morocco Monir Azmani, Morocco El Metoui Mustapha, Morocco Mustapha Zbakh, Morocco Hajar Mousannif, Morocco Mohammad Essaaidi, Morocco Amal Maurady, Morocco Ben Allal Laila, Morocco Ouardouz Mustapha, Morocco Mustapha El Metoui Morocco Said Ouatik El Alaoui, Morocco Lamiche Chaabane, Algeria Hakim El Boustani, Morocco Azeddine Wahbi, Morocco Nfaoui El Habib, Morocco Aouni Abdessamad, Morocco Ammari Mohammed, Morocco El Afia Abdelatif, Morocco Noureddine En-Nahnahi, Morocco Zakaria Bendaoud, Algeria Boukour Mustapha, Morocco
xv
xvi
Organization
El Maimouni Anas, Morocco Ziani Ahmed, Morocco Karim El Aarim, Morocco Imane Allali, Morocco Mounia Abik, Morocco Barrijal Said, Morocco Mohammed V., Rabat, Morocco Franccesco Sicurello, Italy Bouchra Chaouni, Morocco Charoute Hicham, Morocco Zakaria Bendaoud, Algeria Ahachad Mohammed, Morocco Abdessadek Aaroud, Morocco Mohammed Said Riffi, Morocco Abderrahim Abenihssane, Morocco Abdelmajid El Moutaouakkil, Morocco Silkan, Morocco Khalid El Asnaoui, France Salwa Belaqziz, Morocco Khalid Zine-Dine, Morocco Ahlame Begdouri, Morocco Mohamed Ouzzif, Morocco Essaid Elbachari, Morocco Mahmoud Nassar, Morocco Khalid Amechnoue, Morocco Hassan Samadi, Morocco Mohammed Yahyaoui, Morocco Hassan Badir, Morocco Ezzine Abdelhak, Morocco Mohammed Ghailan, Morocco Kaoutar Elhari, Morocco Mohammed El M’rabet, Morocco El Khatir Haimoudi, Morocco Mounia Ajdour, Morocco Lazaar Saiida, Morocco Mehdaoui Mustapha, Morocco Zoubir El Felsoufi, Morocco Khalil El Hami, Morocco Yousef Farhaoui, Morocco Mohammed Ahmed Moammed Ail, Sudan Abdelaaziz El Hibaoui, Morocco Othma Chakkor, Morocco Abdelali Astito, Morocco Mohamed Amine Boudia, Algeria Mebarka Yahlali, Algeria
Organization
Hasna Bouazza, Algeria Zakaria Bendaoud, Algeria Naila Fares, Spain Brahim Aksasse, Morocco Mustapha Maatouk, Morocco Abdel Ghani Laamyem, Morocco Abdessamad Bernoussi, Morocco
xvii
Acknowledgement
This book is the result of many efforts combined with subtle and strong contributions more particularly from the General Chair of AI2SD’2022 Professor Mostafa EZZIYYANI from Adelmalek Essaadi University, the distinguished honorary Chair Academician Janusz KACPRZYK from the Polish Academy of Sciences, and Co-Chair Professor Valentina EMILIA BALAS, Aurel Vlaicu University of Arad, ROMANIA. The scientific contribution published throughout this book could never be so revolutionary without the perpetual help and the limitless collaboration of several actors who supreme is precisely the high patronage of his majesty King Mohammed VI, who in addition to his undeniable support in all the production and scientific inspiration processes, he provided us with all the logistical and technical means in the smallest needs presented during the organization of the event and the publication of this book. The deep acknowledgment addressed to ENSAM school embodied by its director Pr. Khalid BIKRI for his prestigious inputs and the valuable contributions provided by Pr. Wajih RHALEM and by all the faculty members and his engineering students have prepared a fertile ground for presentation and exchange resulting in rigorous articles which are published in this volume. Great thanks to the Director General of the Organization of the Islamic World for Education, Science, and Culture (ICESCO) presented by its Director General Dr. Salim M. Al MALIK for their collaboration, their support, and for the distinguished welcome of the researchers and guests from the AI2SD’2022 conference. The appreciation is addressed to Dr. Omar HALLI advisor of the Director General of ICESCO for His excellent role in coordinating the organization of the AI2SD’2022 edition at ICESCO. The dedication inevitably concerns the organizing committee managed by General Chair Professor Mostafa EZZIYYANI, the VIP coordinator Professor Mohammed Rida ECH-CHARRAT, the scientific committee coordinator Professor Loubna CHERRAT, the Ph.D. student organization committee coordinator Mr. Abderrahim EL YOUSSEFI, and all professors and doctoral students for their constant efforts for the organization, maintenance of the relationship with researchers and collaborators, and also in the publication process.
Contents
Using Blockchain in University Management Systems - State of Art . . . . . . . . . . Marhane Khaoula, Taif Fatima, Namir abdelwahed, and Azzouazi Mohamed Graph Neural Networks to Improve Knowledge Graph Embedding: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ismail Chetoui, Essaid El Bachari, Mohammed El Adnani, and Abdelwahed El Hassan Tifinagh Handwritten Character Recognition Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajaa Sliman and Ahmed Azouaoui Maintenance Prediction Based on Long Short-Term Memory Algorithm . . . . . . . Mouna Tarik and Khalid Jebari Towards an Approach for Studying the Evolution of Learners’ Learning in E-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilham Dhaiouir, Loubna Cherrat, Mostafa Ezziyyani, and Mohamed Khaldi Chatbots Technology and its Challenges: An Overview . . . . . . . . . . . . . . . . . . . . . Hajar Zerouani, Abdelhay Haqiq, and Bouchaib Bounabat Machine Learning, Deep Neural Network and Natural Language Processing Based Recommendation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manal Loukili and Fayçal Messaoudi Artificial Intelligence for Fake News . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Imane Ennejjai, Anass Ariss, Nassim Kharmoum, Wajih Rhalem, Soumia Ziti, and Mostafa Ezziyyani Traffic Congestion and Road Anomalies Detection Using CCTVs Images Processing, Challenges and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ayoub Es-Swidi, Soufiane Ardchir, Yassine Elghoumari, Abderrahmane Daif, and Mohamed Azouazi
1
15
26
35
46
56
65
77
92
Text-Based Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Adil Baqach and Amal Battou
xxii
Contents
Smart Tourism Destinations as Complex Adaptive Systems: A Theoretical Framework of Resilience and Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Naila Belhaj Soulami and Hassan Azdimousa Machine Learning Algorithms for Automotive Software Defect Prediction . . . . . 136 Ramz Tsouli Fathi, Maroi Tsouli Fathi, Mohammed Ammari, and Laïla Ben Allal Agile User Stories’ Driven Method: A Novel Users Stories Meta-model in the MDA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Nassim Kharmoum, Sara Retal, Karim El Bouchti, Wajih Rhalem, Mohamed Zeriab Es-Sadek, Soumia Ziti, and Mostafa Ezziyyani AI-Based Adaptive Learning - State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Aymane Ezzaim, Aziz Dahbi, Noureddine Assad, and Abdelfatteh Haidine A New Predictive Analytics Model to Assess the Employability of Academic Careers, Based on Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 168 Abderrahim El Yessefi, Soumaya Elmamoune, Loubna Cherrat, Sarah Khrouch, Mohammed Rida Ech-Charrat, and Mostafa Ezziyyani New Approach for Anomaly Detection and Prevention . . . . . . . . . . . . . . . . . . . . . . 181 Chliah Hanane and Battou Amal FunLexia: An Intelligent Game for Children with Dyslexia to Learn Arabic . . . . 189 Fatimaezzahra Benmarrakchi Artificial Neural Networks Cryptanalysis of Merkle-Hellman Knapsack Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Hicham Tahiri Alaoui, Ahmed Azouaoui, and Jamal El Kafi Using Machine Learning Algorithms to Increase the Supplier Selection Process Efficiency in Supply Chain 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Houria Abouloifa and Mohamed Bahaj A New Approach to Intelligent-Oriented Analysis and Design of Urban Traffic Control: Case of a Traffic Light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Abdelouafi Ikidid, Mohamed El Ghazouani, Yassine El Khanboubi, Charafeddine Ait Zaouiat, Aziz El Fazziki, and Mohamed Sadgal Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 EL Gougi Badreddine, Hassouni Larbi, Anoun Houda, and Ridouani Mohammed
Contents
xxiii
Data Migration from Relational to NoSQL Database: Review and Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Chaimae Saadouni, Karim El Bouchti, Oussama Mohamed Reda, and Soumia Ziti Recommendation System: Technical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 Hanae Mgarbi, Mohamed Yassin Chkouri, and Abderrahim Tahiri The Appropriation of the Agile Approach in Public Sector: Modeling the Achievement of Good Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Mouna Hajjaj, Houda Lechheb, and Hicham Ouakil The Contribution of Deep Learning Models: Application of LSTM to Predict the Moroccan GDP Growth Using Drought Indexes . . . . . . . . . . . . . . . 284 Ismail Ouaadi and Aomar Ibourk Natural Language Processing and Motivation for Language Learning . . . . . . . . . 295 Moulay Abdellah Kassimi and Abdessalam Essayad Generating Artworks Using One Class SVM with RBF Kernel . . . . . . . . . . . . . . . 308 Mohamed El Boujnouni Multiobjective Evolutionary Algorithms for Engineering Design Problems . . . . . 318 Youssef Amamou and Khalid Jebari Designing Hybrid Storage Architectures with RDBMS and NoSQL Systems: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Lamya Oukhouya, Anass El haddadi, Brahim Er-raha, Hiba Asri, and Asma Sbai Analysis of the Pedagogical Effectiveness of Teacher Qualification Cycle in Morocco: A Machine Learning Model Approach . . . . . . . . . . . . . . . . . . . . . . . . 344 Aomar Ibourk, Khadija Hnini, and Ismail Ouaadi Smart Education – A Case Study on a Simulation for Climate Change Awareness and Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Mohamed Amine Marhraoui Towards an E-commerce Personalized Recommendation System with KNN Classification Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Doae Mensouri, Abdellah Azmani, and Monir Azmani Convolutional Long Short-Term Memory Network Model for Dynamic Texture Classification: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Manal Benzyane, Imad Zeroual, Mourade Azrour, and Said Agoujil
xxiv
Contents
Towards an Accident Severity Prediction System with Logistic Regression . . . . . 396 Houssam Mensouri, Abdellah Azmani, and Monir Azmani FUZZY C-MEANS Based Extended Isolation Forest for Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 Mniai Ayoub, Jebari Khalid, and Pawel Karczmarek Fashion Image Classification Using Convolutional Neural Network-VGG16 and eXtreme Gradient Boosting Classifier . . . . . . . . . . . . . . . . . 419 Toufik Datsi, Khalid Aznag, and Ahmed El Oirrak MentorBot: A Traceability-Based Recommendation Chatbot for Moodle . . . . . . 432 Kamal Souali, Othmane Rahmaoui, and Mohammed Ouzzif Regularization in CNN: A Mathematical Study for L1 , L2 and Dropout Regularizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Chrifi Alaoui Mehdi, Joudar Nour-Eddine, and Ettaouil Mohamed Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Ouafae El Bouhadi, Abdellah Azmani, and Monir Azmani A New Approach to Protect Data In-Use at Document Oriented Databases Springer Guidelines for Authors of Proceedings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Abdelilah Belhaj, Karim El Bouchti, Soumia Ziti, and Chaimae A Dual Carriageway Smart Street Lighting Controller Based on Multi-variate Traffic Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Fouad Agramelal, Mohamed Sadik, and Essaid Sabir Blockchain-Based Self Sovereign Identity Systems: High-Level Processing and a Challenges-Based Comparative Analysis . . . . . . . . . . . . . . . . . . . 489 Bahya Nassr Eddine, Aafaf Ouaddah, and Abdellatif Mezrioui Impact of Machine Learning on the Improvement of Accounting Information Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Meryem Ayad, Said El Mezouari, and Nassim Kharmoum NLP Methods’ Information Extraction for Textual Data: An Analytical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Bouchaib Benkassioui, Nassim Kharmoum, Moulay Youssef Hadi, and Mostafa Ezziyyani
Contents
xxv
Handwriting Recognition in Historical Manuscripts Using a Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 Hassan El Bahi Artificial Intelligence for a Sustainable Finance: A Bibliometric Analysis . . . . . . 536 Rania Elouidani and Ahmed Outouzzalt Geoparsing Recognition and Extraction from Amazigh Corpus Using the NooJ Complex Annotation Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 Bouchra Ladouzi, Azeddine Rhazi, and Ali Boulaalam Agent-Based Merchandise Management and Real-Time Decision Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Mostafa Tagmouti and Aziz Mabrouk Selecting the Best Moroccan Tourist Destination Using the Fuzzy Analytic Hierarchy Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Smahane Jebraoui, Bezza Hafidi, and Mohamed Nemiche Improving Model Performance of the Prediction of Online Shopping Using Oversampling and Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578 Sara Ahsain and M’hamed Ait Kbir Combining Descriptors for Efficient Retrieval in Databases Images . . . . . . . . . . . 587 Essakhi Aziz, Hassan Silkan, and Abdelkader Boulezhar Towards an Educational Planning Information System in Big Data Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Mustapha Skittou, Mohamed Merrouchi, and Taoufiq Gadi CNN-Based Face Emotion Detection and Mouse Movement Analysis to Detect Student’s Engagement Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Allinjawi Arwa, Altuwairqi Khawlah, Kammoun Jarraya Salma, Abuzinadah Nihal, and Alkhuraiji Samar Data Cleaning in Machine Learning: Improving Real Life Decisions and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Sanae Borrohou, Rachida Fissoune, Hassan Badir, and Mohamed Tabaa Blockchain-Based Cloud Computing: Model-Driven Engineering Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639 Youness Bentayeb, Hassan Badir, and Noureddine En-Nahnahi
xxvi
Contents
Student Attention Estimation Based on Body Gesture . . . . . . . . . . . . . . . . . . . . . . . 651 Tarik Hachad, Abdelalim Sadiq, Fadoua Ghanimi, Lamiae Hachad, and Ahmed Laguidi Cat Swarm Optimization Algorithm for DNA Fragment Assembly Problem . . . . 662 Asmae Yassine, Morad Bouzidi, and Mohammed Essaid Riffi DSGE and ABM, Towards a “True” Representation of the Real World? . . . . . . . 668 Khawla Dahani and Rajae Aboulaich Predictive Hiring System: Information Technology Consultants Soft Skills . . . . . 680 Asmaa Lamjid, Karim El Bouchti, Soumia Ziti, Reda Oussama Mohamed, Hicham Labrim, Anouar Riadsolh, and Mourad Belkacemi Automated Quality Inspection Using Computer Vision: A Review . . . . . . . . . . . . 686 Ghizlane Belkhedar and Abdelouahid Lyhyaoui A Comparative Study of Adaptative Learning Algorithms for Students’ Performance Prediction: Application in a Moroccan University Computer Science Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 Mariam Arkiza, Soukaina Hakkal, Ilham Oumaira, and Ayoub Ait Lahcen Pedestrian Orientation Estimation Using Deep Learning . . . . . . . . . . . . . . . . . . . . 710 O. Boutaibi, M. Belhiah, C. Talbi, M. Rahmouni, and S. Ziti Artificial Intelligence Application in Drought Assessment, Monitoring and Forecasting Using Available Remote Sensed Data . . . . . . . . . . . . . . . . . . . . . . 718 Mohammed Chriatt, Rabie Fath Allah, Asmaa Fakih Lanjri, Mohammed Ammari, and Laïla Ben Allal CSR Communication Through Social Networks: The Case of Committed Brand-Banks in Morocco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Ait Mhamed Hind and Zaghli Mariam Content-Based Image Retrieval Using Octree Quantization Algorithm . . . . . . . . 741 Hassan Zekkouri, Brahim Aksasse, and Mohammed Ouanan Release Planning Process Model in Agile Global Software Development . . . . . . 756 Hajar Lamsellak, Amal Khalil, Mohammed Ghaouth Belkasmi, Oussama Lamsellak, and Mohammed Saber Developing a New Indicator Model to Trade Gold Market . . . . . . . . . . . . . . . . . . . 763 Oumaina Nadi, Karim Elbouchti, Oussama Mohamed Reda, Chaimae Ahout, and Soumia Ziti
Contents
xxvii
A New Model Indicator to Trade Foreign Exchange Market . . . . . . . . . . . . . . . . . 770 Chaimae Ahouat, Karim El Bouchti, Oussama Mohamed Reda, Oumaima Nadi, and Soumia Ziti Improving Arabic to English Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . 778 Nouhaila Bensalah, Habib Ayad, Abdellah Adib, and Abdelhamid Ibn El Farouk Artificial Neural Network with Learning Analytics for Student Performance Prediction in Online Learning Environment . . . . . . . . . . . . . . . . . . . . 788 Aimad Qazdar, Sara Qassimi, Meriem Hafidi, Oussama Hasidi, and El Hassan Abdelwahed Attentive Neural Seq2Seq for Arabic Question Generation . . . . . . . . . . . . . . . . . . 802 Said Lafkiar, Alami Hamza, Mohamed Zouitni, Nabil Burmani, Hassan Badir, and Noureddine En Nahnahi Microservice-Specific Language, a Step to the Low-Code Platforms . . . . . . . . . . 817 Mehdi Ait Said, Abdellah Ezzati, and Sara Arezki Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829 Najem Kamal, Ziti Soumia, and Zaoui Seghroucheni Yassine A Comparative Review of Tweets Automatic Sarcasm Detection in Arabic and English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841 Soukaina Mihi, Brahim Ait Ben Ali, and Nabil Laachfoubi Mobile Payment as a Lever for Financial Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . 850 Hanane Azirar, Bouchra Benyacoub, and Samir Aguenaou New Approach to Interconnect Hybride Blockchains . . . . . . . . . . . . . . . . . . . . . . . 862 Hajji Mohammed Amine, Ziti Soumia, Nassim Kharmoum, Labrim Hicham, and Ezziyani Mostafa A Variable Neighborhood Search (VNS) Heuristic Algorithm Based Classifier for Credit Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868 Mohamed Barhdadi, Badreddine Benyacoub, and Mohamed Ouzineb Application of Machine Learning Techniques to Enhance Decision Making Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878 Salma Trabelsi, Sahbi Zahaf, and Mohamed Ben Aouicha
xxviii
Contents
A Smart Interactive Decision Support System for Real-Time Adaptation in the Mobility Strategy for Optimization of the Employee’s Transportation . . . . 888 Soumaya El Mamoune, Abderrahim El Yessefi, Loubna Charrat, and Mostafa Ezziyyani A Hesitant Fuzzy Holdout Method for Models’ Selection in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903 Youssef Lamrani Alaoui, Mohamed Tkiouat, Adil El Fakir, and Yahya Hanine Interpretable Credit Scoring Model via Rule Ensemble . . . . . . . . . . . . . . . . . . . . . . 903 Siham Akil, Sara Sekkate, and Abdellah Adib A New Distributed Architecture Based on Reinforcement Learning for Parameter Estimation in Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912 Issam Qaffou Smart Sourcing Framework for Public Procurement Announcements Using Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921 Amina Oussaleh Taoufik and Abdellah Azmani An MCDM-Based Methodology for Influential Nodes Detection in a Social Network. Facebook as a Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933 Issam Qaffou and Khaoula Ait Rai A Predictive Approach Based on Feature Selection to Improve Email Marketing Campaign Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 940 Kenza Bayoude, Soufiane Ardchir, and Mohamed Azzouazi Diagnosis and Adjustment for Sustainable Tourism . . . . . . . . . . . . . . . . . . . . . . . . . 949 Samir Haloui, Mustapha Ait Rami, and Jamal Chao Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
Using Blockchain in University Management Systems - State of Art Marhane Khaoula(B) , Taif Fatima, Namir abdelwahed, and Azzouazi Mohamed Faculty of Science Ben M’sik University, Casablanca, Morocco [email protected]
Abstract. Universities is a public establishment endowed with legal personality and administrative and financial autonomy, they have undergone a lot of change over the years and a lot of obstacles in successfully diversifying the initial and continuing training offers, increasing the capacity of students and improving scientific research and also, they are under pressure aiming the improvement of their performances. In the face of change and technological evolution, university management will be difficult and will be exposed to many risks. The blockchain is an emerging technology that serves as an immutable ledger, which allows transactions to take place in a decentralized manner. It has become a publicly available infrastructure for building decentralized applications and achieving interoperability. Blockchainbased applications ensure transparency and trust between all parties involved in the interaction. The blockchain also makes blockchain technology-based services interest to the education sector. This state of art attempts to provide references and links that help to know better and reflect about this new technology, as well as to contribute starting the indispensable debate about how university would implement this, an in what terms. Keywords: University Management System · Risk · Blockchain · Educational · Tenders
1 Introduction Information and communication technologies for development in an essential component of the globalization which impact our era. All human activities are affected. Moroccan universities have undergone a lot of change over the years. Each university is characterized by its own structure, governance, and number of its stakeholders (students/research professor, administrative and technical staff). These changes have generated a very large mass of data which can lead to the risk of getting lost, and many difficulties to improve the quality of their services and benefits. Also, universities are under pressure aiming the improvement of their performances [1], so the universities classification depends closely on transmission, production and valuation of knowledges, whence the concept of competence center, excellency, competitiveness, attractiveness, innovation and Key Performance Indicator (KPI). The deciders became aware that data are the real richness and exploit them provides an essential competitive gain in the race for competitiveness and decision-making. On the other hand, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 1–14, 2023. https://doi.org/10.1007/978-3-031-26384-2_1
2
M. Khaoula et al.
the knowledge transmission, even of competence, should not depend on one person to another but rather on a lasting network of expertise and know-how. In our research we are interested in two key factors that enhance and help universities better achieve their objectives if they are well managed and piloted, the management of tenders and budget management. So the aim of this paper is to determinate how blockchain technology can help universities to increase their skills, through the quality of its projects especially in terms of project implementation (tenders). This paper is organized as follow: Sect. 4 presents an overview of general risk caused by the change. Review on current studies applications of blockchain technology in university institutions in Sect. 5. Further, Sect. 6 general tendering processes and risk. Finally, Sect. 7 concludes with a summary of this paper.
2 General Risks The main objective of the change over the years was: • Support for the autonomy of universities within the framework of a renewed contractualization between them and the State; • Improving efficiency in decision-making and management performance (Table 1). Unfortunately, these changes have not been accompanied by a robust management system using good practices and good strategies to achieve better results. This put us in front of risks that we have to face. 2.1 Risks Relating to the Areas of Continuing Education and Research • • • •
Absence of a training evaluation system; Weak internal audit of continuing education; Modest budget devoted to scientific research; etc.
2.2 Risks Relating to the Area of Governance • Inconsistency of the university development program with that of university establishments; • Strategies and action plans not yet evaluated; • Lack of an overall assessment of the action plans: assessment with the aim of correcting the dysfunctions identified and providing the necessary solutions; • The universities’ strategic orientations are not declared into operational objectives and lack of measurable indicators of results and objectives. This does not provide a long-term vision and even guarantee anticipation of the means necessary for the accomplishment of its missions. This makes any objective evaluation of projects difficult; • The deadlines set for carrying out the programmed actions not respected; • Absence of internal control: no internal control system, absence of risk mapping; • Absence of a procedures manual and job descriptions to clarify tasks and responsibilities.
Using Blockchain in University Management Systems - State of Art
3
Table 1. Chronological change of the various public actions carried out in favor of higher education Year
Action
Disadvantage
Before 2003
Each establishment: • Has its own grant; • Free in budget management; • Off-budget account
• Data not centralized without a global view; • Centralization of decision-making power
2003
The educational reform: • Delegation of account signing; • Elimination of off-budget accounts and creation of paragraph 50; • University councils deliberate on all questions concerning the management of the university
• The President is the only person empowered to order; • Unprepared situation which led to a delay in budget preparation and automatically delayed commitments
From 2009 to 2012 The contractualization between the State and the Universities within the framework of the Emergency Program (PU)
Emergence of Management Control practices within the University
From 2013 to 2016 Strengthening the leadership of Adoption of tools that did not meet universities in order to improve the the real need effectiveness and efficiency of management both academically and scientifically as well as in terms of administrative and financial management
2.3 Material Resources Risk • Very weak means of communication (partial information, absent communication tool, etc.). 2.4 Human Resources Risk • • • •
No departure management procedure (abandonment of post, resignation, retirement); No staff evaluation technique; Weak content staff training; No information system managing the HR component.
4
M. Khaoula et al.
2.5 IT Risk • Lack of a reliable information system meeting the needs of universities; • IT master plan to update. 2.6 Risk Relating to Financial Resources • Absence of detailed situations for the execution of the employment programs relating to research and training; • No financial management information system; • Deficiencies financial control. Some of the above problems can be reduced by: • The stakeholders in charge of steering, monitoring and evaluating the internal control system (risk committee, internal control and audit committee) must be set up; • Appropriate an organization chart validated by the ministry; • Establish a risk map; • Establish an e-HRM project: with the goal of improving employees through e-learning, the approach of the collaborative intranet around the organization’s professions and employed the groupware and workflow model for sharing knowledge.
3 The Blockchain The blockchain is an emerging technology [2], providing significant opportunities to disrupt traditional products and services due to the distributed and decentralized in nature. The features such as the permanence of the blockchain record and the ability to run smart contractsmake blockchain technology-based products or services significantly different from previous internet-based commercial developments and of particular interest to the education sector. The applications of blockchain in university system management could disrupt the sector. In 2017 the University of Melbourne began utilizing blockchain to grant digital certificates, allowing students to exchange authenticated versions of their qualifications in a tamper-resistant network with employers and other third parties. Different ways of implementing blockchain technology are developing all the time in the higher education market. Many high-profile ventures have gained considerable media coverage in the last few years, fueling further interest in blockchain based education application development.
Using Blockchain in University Management Systems - State of Art
5
4 Overview of Blockchain Technology Blockchain is the name of a technology that allows you to keep decentralized and distributed records of digital transactions. The first implementation took place in 2009 in the context of Bitcoin as a digital currency, and although the Blockchain technology is no longer only in Bitcoin, it is the example that we will use as a paradigm of Blockchain. In Bitcoin, transactions occur between anonymous users (their identity does not appear in any place) using public key cryptography, that is to say, each user has a private key that only he or she knows, and a public key, which is shared with other users. All transactions are communicated to all nodes on the network. The Nodes check transactions and group them in blocks. Each block is identified by a hash: a cryptographically unique value calculated on the contents of the block and includes a reference to the hash from the previous block, so that, blocks are linked. This chain of blocks is thus a record of transactions or a public accounting book (ledger), shared by all the nodes in the network [3]. In this way, all nodes can verify that the keys used are correct and that the bitcoins transferred are from a previous transaction and they had not been spent already. However, a transaction is only considered confirmed when it is part of a block added to the chain. To add a block, it is necessary to undermine it, in other words, calculate its hash, which requires solving a unique mathematical problem of great difficulty that consumes some very considerable computing resources, especially when we know that the difficulty of the resolution of the hash will be adjusted periodically in order to adapt itself to the processing capabilities of the network. As the power of the connected computers is increased, the difficulty of the problem grows. As a result, modifying the content of a block would modify its hash, so the link to the next block would fail and would break the chain, which, combined with the difficulty of repairing it, and with the fact that the rest of nodes have a copy of the original string, makes the information contained in the blocks unalterable. There are many uses of Blockchain in management university system. – Potential applications of blockchain in management university system include: – Build a transparent and immutable system for hassle free cash flow for students using Tokens. – Creating link identity and e-wallet for the platform users to use tokens for transactions and collaborations. – Smart contract engine to implement security and specific rules as per education institutions regulations. – Verification/validation authenticity of academic documentations. – Storage of permanent records. – Learner identity verification and information security. – Student ownership of lifelong learning credentials. – Intellectual property protection for educational content. – Education Industry Cooperative System between the educational institutions and utilizing companies (Fig. 1).
6
M. Khaoula et al.
Fig. 1. Application the blockchain in university [4]
5 Other Reviews on the Applications of Blockchain Technology in University Institutions 5.1 The Proof of Educational Transcript System PETS Model [5] presented a model of the proof of educational transcript system PETS model will connect educational institutions into a single educational blockchain network of credits earned by learners/students. They will be able to choose courses and instructors from a pool of institutions, collecting the credits over a lifetime, and ‘cashing them in’ when they are degree-ready and the issuing institution agrees. PETS is an educational blockchain actual tool to ensure it. Figure 2 shows the various actors in the proposed proof of educational transcript system model. 5.2 Strengthen Online Education With the blockchain innovation an ideal solution will be provided to the issues of online instruction, which are validity and security. The blockchain can generate non-modifiable learning records for online instruction, without the required third-party involvement for monitoring and ensure the reasonable recognition of course credits. The innovation can be within Students’ record and Authenticated certification. In this context, [6] he proposed online learning system uses the blockchain to record and evaluate students’ learning performance automatically to achieve absolute openness, transparency, and credibility.
Using Blockchain in University Management Systems - State of Art
7
Fig. 2. PETS project
5.3 Student Data Privacy and Consent The student’s digital agreement can be executed without relying on a third-party legal document requirement. The proposed framework is blocks of repeated authorization which will allow the educational institution to grant access of data for any legal purpose after securing the consent for data access privilege using smart contract. in this work [7] used Hyperledger Fabric and Hyperledger Composer, and he proposed a framework of nested authorization, allowing a public school data administrator the ability to grant authorization rights to a third party after school intervention program and their volunteers, who meet with parents and obtain consent for data access rights relying on smart contracts. These data consent signatures allow researchers the ability to design targeted inventions, meeting the needs of the community. Similar work in this field has been developed by Daniel Amo [8], he used Blockchain technologies as a possible basis for a solution to to prevent leaks or the misuse of data about the students or their activity. The work of [9] aims to enact and produce a secure smart contract-based system for an examination system of a large University with a high number of affiliated colleges under it. The system should be such that, it is free from different kinds of vulnerabilities, and with no single point of failure. In addition to cryptographic transaction processing of the smart contract between the receiver and the sender, the application should be well protected against hacking attacks and fraudulent activities. Ashis Kumar shows the
8
M. Khaoula et al.
smart contracts are good to enhance security as well as operational efficiency in diverse application areas. 5.4 Learning Outcomes and Meta-diploma Blockchain based technology for learning outcomes, which is based on graduation requirement index of the education ministry, with professional certification can use automated evaluation software as a tool. The course-learning outcomes, the course name, course requirements and the weight of the course can be included in the block. Some researchers have mentioned the effectiveness and importance of using blockchain in the evaluation of learning outcomes of students. [10] discussed the possible contributions and impact of combining data analytics, artificial intelligence, and blockchain on university education. What he envisioned is an idealized system model, outsourcing part of the courses and assessments. The blockchain is powered by learning analysis which evaluates students’ behavior with data analytics technology and artificial intelligence, and the entire system is powered by the smart contracts of the blockchain. Chen et al. [11] discussed the possible applications of blockchain in the field of education, especially in certificate management, students’ performance evaluation and incentive mechnism. They analyzed the problems of users and the benefits that blockchain-based systems can bring from the perspective of different roles. In terms of specific systems for blockchain-based continuous assessment monitor system, [12] proposed a simple blockchain system to record university grades. They implemented the system based on Ethereum and discussed its effectiveness and significance. [13] proposed a blockchain-based word learning model which shares learning resources by nodes in blockchain network, rewards students with digital currency and records study outcomes. 5.5 Operational Skill Competition By letting understudies mimic operations and diversions on an advanced instruction operation framework, education institutions can be able to examine learning accomplishment and quality of education. In digital education zone, it utilizes blockchain innovation to make strides competition mode. It’s progress proficiency and maintaining a strategic distance from the issue of misty and distortion messages. It can give unchangeable computerized certification of scholastic accomplishment. [14] studied educational competition mode based on blockchain technology, designed blockchain’s application mode and frame, analyzed evaluation criteria and algorithm, designed an operational skill evaluation model, developed an operational skill competition evaluation system based on e-business sandbox and experimented it.
Using Blockchain in University Management Systems - State of Art
9
[15] pointed out that the current competition has too much assessment contents. It requires a long preparation process for students, which also affects the study of normal courses, making the participation enthusiasm low. [16] pointed out that there is a lack of supporting incentives and excessive use of competition skills in the current competition mode.[17] pointed out that there is a dishonest phenomenon in the current on-line scoring system. Unfair scoring in processing systems becomes an important and challenging issue. 5.6 Management of Student Credits Another persistent problem for institutions has been, frequently putting students at a disadvantage when they realize, for example, that they have to retake courses to meet the requirements of a new institution. Students also face difficulties transitioning to another institution of higher education while also maintaining and confirming courses completed at a previous institution. In situations where a student decides to move to an institution in another country, where language and different procedures are likely to pose additional obstacles, this issue is much more severe. Also, record management requirements differ which can make it difficult to share interinstitutional documents. Credit transfers generally rely on institutions agreeing on agreements to accept each other’s credits subject to certain conditions, but they are often not noticed by students reporting such agreements. Such agreements may be written as blockchain-based smart contracts with a blockchain approach, whereby the credits will be transferred automatically upon fulfillment of the contract conditions. [18] proposed an approach of Smart Contract for Central Sector Scheme of Scholarship for College and University Students availing the scholarship. These model inclide four individual entities. These entities are Education Boards, Students, Colleges, and Banks (Fig. 3). 5.7 Educational Certificate Most of the available instructive certificate administration cannot ensure information security and reliability of the student information. Utilizing blockchain technology may overcome the trust issues with its transparency and connectivity with the stakeholders to ensure authentication of the certificates. The National University of La Plata (UNLP) has started developing a framework for a blockchain-based verification of academic achievement, [11, 19]. [The same approach was also adopted by the Argentinian College CESYT [20]. Both solutions use blockchain technology and cryptography (i.e. digital signature, time stamps, etc.) to issue diplomas for students. [21] presented a model for the procedure of issuing the digital certificate in this system is as follows. First, generate the electronic file of a paper certificate accompanying other related data into the database, meanwhile calculate the electronic file for its hash value. Finally, store the hash value into the block in the chain system. The system will create a related QR-code and inquiry string code to affix to the paper certificate. It will provide
10
M. Khaoula et al.
Fig. 3. Interaction between education board, students, colleges, and banks
the demand unit to verify the authenticity of the paper certificate through mobile phone scanning or website inquiries. Through the unmodifiable properties of the blockchain, the system not only enhances the credibility of various paper-based certificates, but also electronically reduces the loss risks of various types of certificates (Fig. 4). 5.8 Student Capability Evaluation System Using blockchain technology, students’ academic performance and academic achievements in school, training, competitions, practice and other activities outside school can be analyzed to evaluate the students’ capability, which helps the students and companies who will open employment opportunity for them. The design scheme of the student professional ability evaluation system based on blockchain technology, which can analyze student ability analysis method based on clustering algorithm. The presented system may also provide possibilities for creating an ecosystem of student ability evaluation. [22] mainly introduces the application program of blockchain technology in the evaluation of students’ professional ability. Through K-means clustering algorithm, students’ academic performance and academic achievements in school, training, competitions, practice and other activities outside school are analyzed, in order to objectively and effectively evaluate the students’ professional ability, which provides reasonable advice for student employment.
Using Blockchain in University Management Systems - State of Art
11
Fig. 4. Working process of the system
6 Tenders Tenders or Request For Proposal (RFP) are public orders which requires a prior university’s needs expression, compliance with publication, competitive bidding obligations and the choice of the most economically tender’s answer. These guidance and obligations are implemented in accordance with the rules defined by the public procurement regulations of universities. The services requested by the tender are: works, supply and service. Each tender includes three steps in its process: • Step1: The Preparation; • Step 2: Tender launch; • Step 3: Sign the contract (Fig. 5). We can defind {project type, Cost, Time} as a variable of tenders. The process tenders are complex and it should be executing in time.
12
M. Khaoula et al.
Fig. 5. General tendering processes and risk
7 Conclusion Blockchain is adisruptive technology that, after a few years of implementation as the basis of digital currency, is showing itself to be an open resource with possibilities in different fields. The key to the interest in this technology lies in its ability to move from a system of centralized data logging to a distributed system that ensures no alteration of the information and the maintenance of privacy. Based on the analysis of the connotation and technical characteristics of blockchain, it explains the various application scenarios of the new mode of “blockchain + higher education”, we studied the problems existing in the application, and we have put forward the corresponding improvement ideas, so as to provide reference for the follow-up research of higher education theory and practice based on blockchain technology. Moroccan universities includes: • Preventive system that decrease the likelihood that a tender fail? • A strategy depending on the risk itself and minimize the risk of tender’s process executions? • A technology of high transparency, non-tampering, non-repudiation, and traceability? How can Blockchain technology ensure the bidding process security?
Using Blockchain in University Management Systems - State of Art
13
References 1. Lebaron, F.: Comment mesurer les “performances” des universités? Quelques réflexions sur la mise en place d’indicateurs à l’Université de Picardie, p. 14 (2008) 2. Ganne, E.: Can blockchain revolutionize international trade? World Trade Organization, Geneva (2018) 3. Dwyer, G.P.: The economics of Bitcoin and similar private digital currencies. J. Financ. Stab. 17, 81–91 (2015). https://doi.org/10.1016/j.jfs.2014.11.006 4. 20 Ways Blockchain Will Transform (OK, May Improve) Education.html 5. Int. J. Latest Technol. Eng. Manag. Appl. Sci. 6. Iyer, S.S.: Sustainable education management blockchain: a systematic literature review (2021). https://doi.org/10.13140/RG.2.2.10050.71363 7. Gilda, S.: Blockchain for student data privacy and consent. In: 2018 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, January 2018 8. Amo, D., Fonseca, D., Alier, M., García-Peñalvo, F.J., Casañ, M.J.: Personal data broker instead of blockchain for students’ data privacy assurance. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) WorldCIST’19 2019. AISC, vol. 932, pp. 371–380. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16187-3_36 9. Samanta, A.K., Sarkar, B.B., Chaki, N.: A Blockchain-based smart contract towards developing secured university examination system. J. Data Inf. Manag. 3(4), 237–249 (2021). https:// doi.org/10.1007/s42488-021-00056-0 10. Williams, P.: Does competency-based education with blockchain signal a new mission for universities? J. High. Educ. Policy Manag. 41(1), 104–117 (2019). https://doi.org/10.1080/ 1360080X.2018.1520491 11. blockchain_academic_verification_use_case.pdf 12. Rooksby, J., Dimitrov, K.: Trustless education? A blockchain system for university grades. Ubiquit. J. Pervasive Media 6(1), 83–88 (2019). https://doi.org/10.1386/ubiq_00010_1 13. Zhong, J., Xie, H., Zou, D., Chui, D.K.W.: A blockchain model for word-learning systems. In: 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC), Kaohsiung, Taiwan, pp. 130–131, November 2018. https://doi.org/10.1109/BESC. 2018.8697299 14. Wu, W.: Analysis of professional construction based on vocational skills competition. Mod. Vocat. Educ. 8, 80–81 (2016) 15. Garg, A., A, S., Kumar, P., Madhukar, M., Loyola-González, O., Kumar, M.: Blockchainbased online education content ranking. Educ. Inf. Technol. 27, 4793–4815 (2021). https:// doi.org/10.1007/s10639-021-10797-5 16. Liu, Z., Cao, J., Wang, Y., Wang, B.: Teaching innovation under the concept of learning competition. Présenté à 2018 International Conference on Management, Economics, Education and Social Sciences (MEESS 2018), Shanghai, China (2018). https://doi.org/10.2991/meess18.2018.31 17. Yang, Y.-F., Feng, Q.-Y., (Lindsay) Sun, Y., Dai, Y.-F.: Dishonest behaviors in online rating systems: cyber competition, attack models, and attack generator. J. Comput. Sci. Technol. 24(5), 855–867 (2009). https://doi.org/10.1007/s11390-009-9277-5 18. Bedi, P., Gole, P., Dhiman, S., Gupta, N.: Smart contract based central sector scheme of scholarship for college and university students. Procedia Comput. Sci. 171, 790–799 (2020). https://doi.org/10.1016/j.procs.2020.04.086 19. Mikroyannidis, A., Third, A., Domingue, J., Bachler, M., Quick, K.A.: Blockchain applications in lifelong learning and the role of the semantic blockchain. In: Sharma, R.C., Yildirim, H., Kurubacak, G. (eds.) Advances in Educational Technologies and Instructional Design, pp. 16–41. IGI Global (2020). https://doi.org/10.4018/978-1-5225-9478-9.ch002
14
M. Khaoula et al.
20. Amati, F.: First official career diplomas on Bitcoin’s blockchain (2015). https://blog.signat ura.co/first-official-career-diplomas-on-bitcoin-s-blockchain-69311acb544d 21. Cheng, J.-C., Lee, N.-Y., Chi, C., Chen, Y.-H.: Blockchain and smart contract for digital certificate. In: Meen, Prior, Lam (eds.) Proceedings of the 2018 IEEE International Conference on Applied System Innovation, ICASI 2018 (2018) 22. Zhao, W., Liu, K., Ma, K.: School of Information Science and Engineering, University of Jinan, Nanxinzhuang Road No. 336, Jinan, 250022, PR China (2018)
Graph Neural Networks to Improve Knowledge Graph Embedding: A Survey Ismail Chetoui(B) , Essaid El Bachari, Mohammed El Adnani, and Abdelwahed El Hassan Cadi Ayyad University, Marrakech, Morocco [email protected], {elbachari,md-eladnani, Abdelwahed}@uca.ac.ma
Abstract. Graphs have become a new form of knowledge representation used by many IT leaders due to their importance and their new way of representing knowledge. The recent trend of artificial intelligence imposed itself on several domains, including graphs. From this point, researchers start looking for powerful ways to apply AI algorithms in graphs. Graph embedding methods consist of converting entities and relations to numerical values in an embedding space to feed these values to AI algorithms. In this survey, we try to provide a state-ofart covering the overall research topic of knowledge graphs embedding (KGE) including encoding models, auxiliary information, and limitations of traditional KGE models. We also present Graph Neural Networks (GNN) as a new type of encoders used to improve the quality of KGE, and as a solution for issues facing traditional KGE which are not originally built for encoding graph structures. We treat also the message-passing framework as a mechanism used in GNN to encode graphs. In the last section, we will present Question-Answering as an example of KGE application using embedding models as a mechanism of reasoning over graphs that help to answer questions. Keywords: Knowledge Graph Embedding · Graph Neural Network · KG Completion · KG Reasoning · Question-Answering
1 Introduction Knowledge graphs (KG) have made a big step in knowledge representation which attracts the attention of researchers from different domains. This is leveraged by the structure of graphs that facilitate modeling interactions between real-world entities. A KG is a data structure consisting of nodes, edges, and auxiliary information to offer a precise semantic of triples, used by many IT leaders such as GOOGLE, AMAZON, FACEBOOK. KG aim to represent knowledge in such a way to be readable by both humans and machines which requires using of AI algorithms in many tasks related to graphs. In order to present in this survey graphs and AI relationship, we firstly give an overview about KG embedding (A.K.A knowledge representation learning) and why it’s useful for solving many issues such as data governance, fraud detection, knowledge © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 15–25, 2023. https://doi.org/10.1007/978-3-031-26384-2_2
16
I. Chetoui et al.
management, search, chatbot, recommendation, as well as intelligent systems across different organisational units. Then in the second section, we discussed Graph Neural Networks (GNN) as a new type of encoders used to improve embedding performance, we start with objectives behind using GNN, and challenges facing GNN. Then we present some solutions proposed to these challenges, and the structure of GNN explaining how it works, especially the message-passing framework as a useful tool to propagate information over graphs. Lastly, we will present briefly knowledge graph reasoning, and how it can be a promising tool to answer simple and complex questions based on embedding models discussed in the previous section.
2 A Brief Review of Knowledge Graph Embedding A graph is a data structure used to represent things and the relations between them. It is made of two sets, the set of nodes also called vertices and the set of edges also called arcs. Each edge itself connects a pair of nodes indicating some kind of relation between them. This relation can either be undirected, representing symmetric relations between nodes, for example, if a graph is modeled to represent classmate relations between students then the edges will be undirected. Relations can also be directed when they represent asymmetric relations, for example, if a graph is used to model call phone relations between people, then the edges will be directed as they are used to indicate the administrative hierarchy. Depending on the directionality of the edges, a graph can be directed or undirected. Graphs can be either homogeneous or heterogeneous. In a homogeneous graph, all the nodes represent instances of the same type and all the edges represent relations of the same type also. For example, a graph where all nodes represent the type Person and their relations represent connection such a social network. In contrast, in a heterogeneous graph, the nodes and relations could be from different types. For example, a graph modeling different knowledge types in a smart city database will have citizens, police, road, and healthcare nodes that are connected via several edge types such as born_In, is_A, kind_Of. Due to this flexibility, graphs become a universal structure to model data interactions, these structures emphasize the links between data points, enabling the extension and the inference of new knowledge from existing knowledge graphs. 2.1 Knowledge Graphs Knowledge graphs (KG) represent a collection of real-world entities and the relational facts between pairs of entities and it allows us to encode the data or knowledge into a structure that can be interpretable by human and has the ability to automated analysis and inference. KG have several applications, such as question answering, information retrieval, and also recommender systems (Fig. 1). Researchers start to represent data in graph format due to its structure that helps to put data in a meaningful way for human and computers. Anything can act as a node, for example, people, city, cat, etc. An edge connects two nodes and captures the relationship between them. KGs are basically built on the top of the existing databases to link all data together at a readable scale combining both structured, semi-structured and unstructured
Graph Neural Networks to Improve Knowledge Graph Embedding
17
Fig. 1. An example of knowledge graph using in education context.
data. The black side of using KG is that it requires a large size of memory since it used a big number of entities. As example, Google KG contains more than 18 billion facts, and 570M entities. Due to this large size, it was necessary to look for an intelligent way to handle this amount of data, in another word we look for methods to feed graphs to AI algorithms to execute some downstream tasks. In this context, recent researches about knowledge graph focus on Knowledge representation learning (A.K.A knowledge graph embedding), which are the result of specific representation learning models applied to KGs. The goal of KGE models is to embed the KG components entities and relations into a continuous and low-dimensional vector space and model the plausibility of each fact in that space. KG embedding learning can be viewed as a dimensionality reduction process useful for scalability purposes. The dimension of the learned vectors is strictly lower than the number of nodes in the original graph. Knowledge graph embeddings are different from Graph Neural Networks (GNNs) because KG embedding models are in general shallow and linear models and should be distinguished from GNNs, which are neural networks that take relational structures as inputs. In general, the embedding can be done in three steps: 1) Representing entities and relations (entities are represented as vectors in a continuous vector space, and relations as operators in that space, which can be characterized by vectors, matrices, or tensors depending on the model that we use. 2) Defining a scoring function, for each fact, an energy function is defined to measure its plausibility, with the corresponding entity and relation representations as variables. Plausible triples are assumed to have low or high energies depending on the model used. 3) Learning the latent representations to obtain the entity and relation representations, by minimizing a function ranking loss (Fig. 2).
18
I. Chetoui et al.
Fig. 2. Knowledge Graph Embedding Process
2.2 Encoders The goal behind using encoders is to translate the features of a data point into a lowdimensional representation. On contrary, the decoder takes as input this representation and tries to reconstruct the original graph. Encoding approaches can be distinguished into three famous types. The first approach translation-based aims to encode interactions of entities and relations by translating head entities in the embedding space to be close to tail entities, many methods based on this approach such: TransE [1], TransH [2], TransR [3], TransD [4], TransA [5], KG2E [6], TransG [7]. The second approach is based on factorization which aims to obtain low-rank representations, very used model of matrix factorization is RESCAL [8], which reduces the modeling of the structure of KGs into a tensor factorization operation, other models based on the same approach such as DistMult [9], ComplEx [10], RotatE [11], SimplE [12], QuatE [13]. The third approach based on Neural Networks, generally feeds entities and relations into deep neural networks and computes a semantic matching score in order to output the probability of the fact. SME [14], SLM [15], NAM [16], R-GCN [17], and other models based on this approach. There are dozens of other models that cannot be classified in any of the previous approaches. 2.3 Auxiliary Information The previous methods for embedding perform this task based only on triple facts, while supplementary textual descriptions, category types, relational paths, and visual information of entities and relations have not been fully employed. The key idea of adding auxiliary information is to take full advantage of additional semantic information to get a more precise semantic of facts. On the one hand, models based on textual descriptions aim to ensure that each entity has a description to give it more sense. For example, DKRL [18] aims to learn representation directly from entity descriptions using a convolutional encoder. SSP [19] compute the plausibility of a triple by projecting the loss function onto a hyperplane that represents the semantic relevance between entities. On the other hand, models based on category type leveraged to represent entities in KG with types, SSE [20] use auxiliary
Graph Neural Networks to Improve Knowledge Graph Embedding
19
information categories of entities and enforce the embedding space to be semantically smooth, if two entities belong to the same semantic category then they should lie close to each other in the embedding space. More models add other types of auxiliary information like visual information like IKRL [21] that proposes a model where knowledge representations are learned with both triple facts and images. 2.4 Limitations of “SHALLOW ENCODERS” These techniques mentioned above give us a powerful transformation of graphs component in a vector space which let them achieved great success, but some shortcomings are facing these models. On one hand, it’s extremely expensive in terms of parameters because the number of parameters is related to the number of nodes which means that every node has its unique embedding, on the other hand, it cannot generate embeddings for nodes that are not seen during training. On another way, the encoders of these models that map nodes to embeddings are simply an embedding lookup that trains a unique embedding for each node in the graph. Generally, issues facing these models can be summarized in three points: a) There is no parameter sharing between nodes in the encoder because this type of encoder directly optimizes a unique embedding vector for each node. This absence of parameter sharing is inefficient from two sides. From one side, sharing parameters between nodes can improve the performance of learning and also work as a form of regularization. From the other side, this absence of parameter sharing means that the number of parameters in shallow embedding methods is necessarily huge which means impossible to handle it. b) A graph contains rich feature information, which could potentially be informative in the encoding process. But they do not take advantage of node features in the encoder. c) These methods can only generate embeddings for nodes that were present during the training phase. Generating embeddings for unseen nodes is not possible. As a result, shallow embedding methods cannot be used on inductive applications that involve generalizing to unseen nodes during the training. As a consequence, this imposes an issue for massive graphs that cannot be fully stored in memory. These issues of the shallow encoders mentioned previously can be replaced with more complex encoders that depend more generally on the structure and attributes of the graph. We will present the most popular paradigm to define such encoders. Graph neural networks can solve these limitations.
3 Graph Neural Networks KGE models encode the interactions between entities and relations using models that are not built at the first time for encoding structures of graphs. For that reason, a novel family of neural architectures has been proposed to address this limitation. However, Graph Neural Networks (GNNs) becomes now the key architecture for learning the latent representation of graph-structured data. This GNN enables modeling dependencies
20
I. Chetoui et al.
between nodes in a graph more than previous models. GNN intends to capture the local graph structure, central nodes with similar neighbors in a graph should have similar embeddings without taking into consideration the distance between nodes in the original graph. The goal of these new methods based on neural networks architecture is to learn what we called structural embedding. The representation of each node in the graph is updated by taking the local structure of the graph as well as any feature information we might have in the graph. GNN process can be described in two layers. The input layer where we initialize the representation of nodes in the graph, which becomes the input to the GNN layer. In this layer, we assign to each node and edge a feature representation, basic method to represent nodes in the graph is by using one-hot vectors. But these types of features representation do not actually give us any type of information related to the node’s information. Then, these initial features assigned to the nodes Fig. 3. Aggregate information will be updated using aggregators features of neighbor from red node’s neighbor nodes and by training using Multi-Layer Perceptron; The GNN layer where we encode nodes and relations of the graph structure to use this encoded data to update the initial representation of nodes and edges. This goal is achieved by applying what we call message-passing framework. The algorithm of this framework let each node feature updated with the features of its neighbors. The neighbor features are transferred to the target node as messages through the edges (Fig. 3). As a result, the new representation of the node encodes and represents the local structure of the graph. The most popular class of GNNs is known as Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs), and Graph Recurrent Networks (GRNs). Current deep learning algorithms are created for simple sequences such as texts or simple grid such images, but graphs have arbitrary size, a complex and irregular topological structure which means the absence of a spatial locality like grids, No fixed node ordering or reference point, For example, convolutional neural networks (CNNs) are designed only over regular structured inputs like images. Due to this structure, it’s very difficult to perform CNN on graphs because of the arbitrary size of the graph. As a consequence, we cannot define a notion of a window and sliding in the context of a graph, because the same window in a graph in one case can take three nodes and in another case will take more or less depending on the graph. There’s also another issue which is the unfixed node ordering. If we first take for example the nodes {1, 2, 3, 4, 5}, and at the second time we take {3, 4, 2, 5, 1}, then the inputs of the matrix in the network will change, as a consequence, the results also will change. Graphs don’t consider node ordering, so we want to get the same result without taking into account how we order the nodes. The key idea of GCN is every node in the graph takes and aggregates information from its neighbors and combines this information to create a new message, but what makes this different from a classical neural network is every node defines its own neural network architecture based on the local structure around it (Fig. 4).
Graph Neural Networks to Improve Knowledge Graph Embedding
21
Fig. 4. Computational Graph of red node based on its neighbors
The objective is to define a computational graph for each node. The computational graph should save the structure of the graph and integrate the features of neighboring nodes at the same time. For example, the embedding vector of the target node {red} should consist of its neighbor {blue, yellow, green}, without taking into account the ordering of {blue, yellow, green}. We can do this by simply take the average of the features of {blue, yellow, green}. In general, the aggregation function referring to the NNs in the above figure on the right needs to be order invariant so we can use max, average, sum functions. 3.1 Message Passing Framework The Key idea behind the GNN message passing framework is that at each iteration, every node takes information from its local neighbor, and as these iterations progress each node embedding start contains more information from far nodes of the graph. The transformation of information between nodes is called neural message passing. It works to exchange vector information in order to update features nodes using neural networks. The features of each neighbor are transformed to the target node as messages through the edges. As a result, the new representation of the node contains and represents the local structure of the graph. The formula of this message passing is represented as follows: (k) (i) h , Aggregate , ∀v ∈ N hu(i+1) = Update(i) h(i) (u) u v (i) (i) (1) = Update(i) hu , messageN (u) m: message aggregated from neighbors of node u; h: embedding of each node; i: iteration number; At each iteration i in GNN process, an Aggregate function takes as input the set of embeddings of the nodes in node u neighbor and generates a message m based on the aggregated information from neighbor of the target node. Then, the Update function takes the message m and combines it with the previous embedding of node u to generate the updated embedding h. The initial embeddings at i = 0 are set to the input features for all the nodes. Generally, Node embedding encodes two kinds of information, the first
22
I. Chetoui et al.
one is structural information about the graph by encoding the degree of each node in the graph. The second type encoded is features about all nodes in the graph. But after a certain number of iteration, each node contains information about all the nodes in the graph and that can make all node embedding converge to the same value which is called the over-smoothing problem. Which means that after some number of iterations, the representations for all nodes in the graph will converge to the same value, meaning that all nodes become very similar one to another. An approach proposed to resolve this issue of over-smoothing is based on using vector skip connections, which try to directly preserve information from previous rounds of message passing during the update step. A simple Skip-Connection employs a concatenation to preserve more node-level information during message passing where we simply concatenate the output of the base update function with the node’s previous-layer representation [22]. Neighborhood attention is a useful mechanism for increasing the representational semantic of a GNN model, especially in cases where we have prior information to indicate that some neighbors might be more important than others. Graph Attention Network GAT [23] can learn to assign different importance weights to neighbors for every node using an attention mechanism, instead of dealing with all neighbors equally and simply aggregating the neighbor messages by taking their average. For the Heterogeneous graph, H-GAT [24] proposes a model based on hierarchical attention, where node-level attention aims to learn the importance between a node and its meta-path-based neighbors, and semantic-level attentions where it to learn the importance of different meta-paths. 3.2 GNN over KG In the previews section, we treat a homogenous graph where the edges are undirected, no-typed, and unique edges for connecting nodes. But now we will introduce how to extend the previous framework of Graph Neural Networks (GNNs) to encode a more complex structure of knowledge graph that includes directed, typed, and multiple edges for connecting nodes. The most challenge facing the applying of GCN in knowledge graph so-called Relational Graph convolution Network (RGCN) [17] is the rapid growth in the number of parameters with the number of relations in the graph because we have one trainable matrix per relation type. As a consequence, this can easily conduct overfitting and slow learning on rare relations on a very large graph. R-GCN [17] proposes a scheme to face this problem by parameter sharing with basis matrices. In this basis matrix model, all the relation matrices are defined as linear combinations. It also learns an embedding for each relation, as well as a tensor that is shared across all relations. It creates different weight matrices for the propagation on different types of relations. But, the number of parameters explodes because it’s proportional to the number of relations. Therefore, it introduces two kinds of regularizations to control the number of parameters for modeling such amount of relations: basis- and block-diagonal-decomposition. G2S [25] propose a method to converts the original graph to a bipartite graph, the original edges are converted also to nodes, and one original edge is split into two new edges. Then, it converts edge information of graphs into sentences using Gated Graph Neural Networks. The aggregation function of GNN takes both the hidden representations of
Graph Neural Networks to Improve Knowledge Graph Embedding
23
nodes and the relations as the input. MAGNN [26] applies linear transformations to project heterogeneous node attributes, with dimensions for different types of node to the same vector space. Then, MAGNN applies an aggregation with the attention mechanism mentioned above for every meta-path. Next, each target node extracts information and combines it with the meta-path instances connecting the node with its meta-path-based neighbors. More complex graphs contain some nodes connected by multiple edges of different types. We can consider this type of graph as multiple layers by viewing under different types of edges, where each layer represents one type of relation. For example, in social media networks, one person can like, comment, and share the same image at the same time. mGCN [27] introduces a principled approach to capture the interactions within and across dimensions and model rich information in multi-dimensional graphs coherently for representation learning.
4 Question Answering Over KG Reasoning facts in knowledge graphs has become recently an important research topic, due to its ability to infer new knowledge and conclusions from existing data. However, one of the most popular downstream tasks of reasoning over Knowledge Graphs is to have the ability to answer multi-hop queries or in another way make complex predictions over knowledge graphs. To answer this issue, questions and triples are represented firstly at the same vector space, then generate candidate facts to finds out the answer using multi-hop reasoning. When using KG to promote a question-answering system, we sometimes use only one of the triplets in the knowledge graph to answer the question. But, when the question is complicated and the knowledge base is incomplete, it is necessary for the questionanswering system to be able to infer unknown answers with existing triples, As a consequence, we need Knowledge Graph Completion (KGC) process to complete KG. This process of KGC includes many tasks like link prediction, entity prediction, and relation prediction. We can formulate knowledge graph completion problems as answering onehop queries by answering query if the tail entity is an answer of (h, r, ?). For example: Did the student has a Ph.D.? We can generalize one-hop queries to path queries by adding more relations on the path which means be able to chain multiple relations by adding one to another to traverse over the graph. But many relations between entities are missing or incomplete, it’s hard to be able to identify all the answer entities. Due to the ability of TransE [1] to handle composition relations, it can handle path queries by translating the latent space for multiple hops using the addition of relation embeddings. But other models such as ComplEx [10], DistMult [9], and TransR [3] cannot deal with composition relations so they cannot be easily implemented to handle path queries. Query2box [28] proposes a multi-hop knowledge graph reasoning framework to embed queries as boxes or hyper-rectangles in the embedding space. It handles existential quantification and conjunction by taking projection and intersection respectively. Another model proposes reasoning over the Heterogeneous Documents Entity (HDE) [29] graph by initializing nodes representation with co-attention and self-attention-based context encoders. It also employs Graph Neural Networks (GNN) based message-passing framework to aggregate information.
24
I. Chetoui et al.
5 Conclusion We present Knowledge graphs as a new form of data structure, especially with the recent emergence of knowledge representation learning, knowledge acquisition methods, and a wide variety of knowledge-aware applications. Many efforts have been conducted to implement recent AI algorithms and models on graphs to augment the performance of treating graphs. However, these methods and techniques achieved positive results in many benchmark datasets which encourage researchers to make more efforts. This short review provides a brief introduction to understanding graph representation learning and graph neural networks as an intersection point between graphs and Artificial intelligence. Our review focuses on graph neural networks as new encoding models to deal with traditional models’ limitations by adding CNN architecture form nodes and also edges. Finally we discuss how we can retrieve information by using question answering, knowledge graph completion and also graph reasoning. In the future we plan to build a GNN framework in education context.
References 1. Bordes, A., Usunier, N., et al.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013) 2. Wang, Z., Zhang, J., et al.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014) 3. Lin, Y., Liu, Z., et al.: Learning entity and relation embeddings for knowledge graph completion. In: AAAI, pp. 2181–2187 (2015) 4. Ji, G., He, S., et al.: Knowledge graph embedding via dynamic mapping matrix. In: ACLIJCNLP. ACL-IJCNLP, pp. 687–696 (2015) 5. Xiao, H., Huang, M., et al.: TransA: an adaptive approach for knowledge graph embedding. In: AAAI, pp. 1–7 (2015) 6. He, S., Liu, K., et al.: Learning to represent knowledge graphs with Gaussian embedding. In: CIKM, pp. 623–632 (2015) 7. Xiao, H., Huang, M., et al.: TransG: a generative model for knowledge graph embedding. In: ACL, pp. 2316–2325 (2016) 8. Nickel, M., Tresp, V., Kriegel, H.P.: A three-way model for collective learning on multirelational data. In: ICML, pp. 809–816 (2011) 9. Yang, B., Yih, W.T., et al.: Embedding entities and relations for learning and inference in knowledge bases. ICLR, pp. 1–13 (2015) 10. Trouillon, T., Welbl, J., et al.: Complex embeddings for simple link prediction. In: ICML, pp. 2071–2080 (2016) 11. Sun, Z., Deng, Z.-H., et al.: RotatE: knowledge graph embedding by relational rotation in complex space. In: ICLR pp. 1–18 (2019) 12. Kazemi, S.M., Poole, D., et al.: SimplE embedding for link prediction in knowledge graphs. In: NeurIPS, pp. 4284–4295 (2018) 13. Zhang, S., Tay, Y., et al.: Quaternion knowledge graph embeddings. In: NeurIPS, pp. 2731– 2741 (2019) 14. Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data. Mach. Learn. 94(2), 233–259 (2013). https://doi.org/10. 1007/s10994-013-5363-6
Graph Neural Networks to Improve Knowledge Graph Embedding
25
15. Socher, R., Chen, D., et al.: Reasoning with neural tensor networks for knowledge base completion. In: NeurIPS, pp. 926–934 (2013) 16. Liu, Q., Jiang, H., et al.: Probabilistic reasoning via deep learning: neural association models. arXiv:1603.07704 (2016) 17. Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., Welling, M.: Modeling relational data with graph convolutional networks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 593–607. Springer, Cham (2018). https://doi.org/10.1007/978-3-31993417-4_38 18. Xie, R., Liu, Z., et al.: Representation learning of knowledge graphs with entity descriptions. In: AAAI, pp. 2659–2665 (2016) 19. Xiao, H., Huang, M., et al.: SSP: semantic space projection for knowledge graph embedding with text descriptions. In: AAAI, pp. 3104–3110 (2017) 20. Guo, S., Wang, Q., et al.: Semantically smooth knowledge graph embedding. In: ACLIJCNLP, pp. 84–94 (2015) 21. Xie, R., Liu, Z., et al.: Image-embodied knowledge representation learning. In: IJCAI, pp. 3140–3146 (2017) 22. Xu, K., Li, C., et al.: Representation learning on graphs with jumping knowledge networks. In: ICML (2018) 23. Velickovic, P., Cucurull, G., et al.: Graph attention networks. In: ICLR (2018) 24. Wang, X., Ji, H., et al.: Heterogeneous graph attention network (2019) 25. Beck, D., Haffari, G., et al.: Graph-to-sequence learning using gated graph neural networks. In: ACL, pp. 273–283 (2018) 26. Fu, X., Zhang, J., et al.: MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding (2020) 27. Ma, Y., Wang, S., et al.: Multi-dimensional graph convolutional networks. In: SDM, pp. 657– 665 (2019) 28. Ren, H., Hu, W., et al.: Query2box: reasoning over knowledge graphs in vector space using box embeddings. In: ICLR (2020) 29. Tu, M., Wang, G., et al.: Multi-hop reading comprehension across multiple documents by reasoning over heterogeneous graphs. In: ACL (2019) 30. Bourkoukou, O., Bachari, E.E., Boustani, A.E.: Building effective collaborative groups in E-learning environment. In: Ezziyyani, M. (ed.) AI2SD 2019. AISC, vol. 1102, pp. 107–117. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-36653-7_11
Tifinagh Handwritten Character Recognition Using Machine Learning Algorithms Rajaa Sliman1(B) and Ahmed Azouaoui2 1 Computer Science Department, Faculty of Sciences, Chouaib Doukkali University, El Jadida,
Morocco [email protected] 2 Computer Science Department, Chouaib Doukkali University, El Jadida, Morocco [email protected]
Abstract. The Amazigh people are an indigenous ethnic group of North Africa. They are distributed from morocco, to Siwa Oasis in Egypt passing by Algeria, Tunisia, Libya, Niger, Burkina Faso, Mali, Mauritania. Historically, they spoke Amazigh languages, classified under the Amazigh branch of the Afro-Asiatic family. The Tifinagh is the alphabet of this language, it’s normalized in morocco since 2001. In our work we will propose a model to treat the Handwritten Tifinagh Character Recognition and then apply different algorithms of machine learning to identify which one achieve the better accuracy. Keywords: SVM · KNN · FFNN · Random Forest · Decision Tree · Naive Bayes · CNN
1 Introduction Optical character recognition (OCR) is a new technology of artificial intelligence which means text recognition used mainly to transform or convert a dimensional image of text which can contain printed or handwritten text into machine-readable text, regardless of the language and the format in which it is written. OCR is used in many fields such as control of vehicles, banking domain, recognition of passports, health domain, and recognition of handwritten characters. Several studies have been developed to treat the Tifinagh handwritten character recognition, to realize an optimal and speed system we based on the choice of the classifier algorithm and the technique of extracting features. In this section we present different works related to this topic. Niharmine et al. in [1] aim to generate new features using genetic algorithms then use the feedforward neural network for classification, the result show that this system achieves an accuracy of 89.5%. The work of Sadouk et al. [2] propose two deep learning algorithms the first one is the convolutional neural network CNN and the second is Deep Belief Networks (DBNs), after training and tested using AMHCD handwritten character database, the CNN outperform DBN with an accuracy of 98.25% vs DBN with 95.47%. in [3] Niharmine et al. adopt a new methodology that use zoning gradient features as a feature extraction method and artificial neural network © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 26–34, 2023. https://doi.org/10.1007/978-3-031-26384-2_3
Tifinagh Handwritten Character Recognition Using Machine Learning Algorithms
27
as a classifier, the proposed system achieve a high accuracy of 99.5 with a small training time. Sabir et al. [4] propose the Optical Tifinagh Alphabets Recognition Algorithm based on three steps , the first step started by the segmentation of image, in the second step the neural network is used as a classify algorithm, the third step to resolve the cases of alphabets with high rate of confusion by using multiple classifiers. A new approach in [5] presented by Gounane et al. use a linear combination of the gravity center distance and pixel density as a method of extracting features then combination of k-nearest neighbor algorithm and the bigram language model, this system achieve a accuracy around 91.05%. El Kessab et al. [6] propose a system that uses a combination of neural network (multi-layer perceptron MLP) and the hidden markov model for classification and a mathematical morphology to extract features, the recognition rate found is 92.33%. A recognition system proposed by Benaddy et al. [7] based on deep convolutional neural network (CNN) is developed to extract features from raw pixels to make it flexible than the traditional conventional, this approach achieve an important accuracy of 99.10. The structure of the rest of this paper is as follows. Section 2 introduces the Tifinagh Alphabet. Section 3 discusses the main of machine learning algorithms. The proposed recognition system is explained in Sect. 4. Experimental results are detailed in Sect. 5. Finally, we will present our conclusion and the prospect for our future works.
2 The Tifinagh Alphabet The image processing and pattern recognition is a wide research area of the artificial intelligence to emulate the human brain especially in the Natural Language Processing. Many projects have been developed for Latin, Arabic [10–15] and Tifinagh characters still have few researches in this area. The latest version of Amazigh alphabet adopted by the royal institute of the Amazigh culture IRCAM [9] that is officially normalized by the International Organization of Standardization (ISO) [8] composed of 33 phonetic entities below some features of Tifinagh characters: • Tifinagh does not know any particular punctuation marks; IRCAM has advocated the use of conventional signs found in Latin scripts. • IRCAM has retained the horizontal direction from left to right for Tifinagh writing. • The absence of upper and lowercase characters (Fig. 1).
28
R. Sliman and A. Azouaoui
Fig. 1. Tifinagh Alphabet
3 Machine Learning Algorithms Machine Learning is an artificial intelligence technology allowing machines to learn and reproduce a behavior without being programmed beforehand, to learn and develop, computers are fed by a large amount of data that will be analyzed and trained to be able to confront many situations then build a model that can make decisions. Machine learning techniques are used in many areas including Medical diagnostics, online fraud detection, spam and malware filtering, autonomous driving, product recommendations, traffic prediction, and speech and image recognition. It’s based on a set of algorithms, we can distinguish three types: supervised and unsupervised and reinforcement learning. • Supervision Learning: this algorithm aims to train machine with a labeled data, after it’s provided with a new set of input data and produces a correct outcome. • Unsupervised algorithm: consists in training the data without assigning them a label, it groups the data according to their similarity and automatically finds the best structure. • Reinforcement learning: allows an agent to be adapted to an environment whose goal is to learn, from successive experiences and find the best solution. In this article we will be based on the following algorithms: 3.1 Support Vector Machine Support vector machine (SVM) belong to a family of supervised machine learning algorithms, used to solve regression and classification problems.it is adopted due to his simplicity of use, and offer a good results by achieving a high accuracy. The SVM serve to separate our data into multiple classes with a boundary as simple as possible that aims to maximize the distances between all groups and the border which separate them. 3.2 K-Nearest Neighbors The K-nearest neighbors (KNN) algorithm is one of the simplest supervised Machine Learning algorithm used to solve classification problems and regression. After selecting
Tifinagh Handwritten Character Recognition Using Machine Learning Algorithms
29
the number K of neighbors it calculate the distance between the new point that we want to classify and the others point already classified then it take the K nearest neighbor according to the distance calculated, among these K neighbors it’s count the number of point belongs to each category and finally assign the new point to the most present one. 3.3 Feed Forward Neural Network The Multilayer perceptron (FFNN) is a type of artificial neural network that’s organized in multiple layers, the information cross the network starting from the input layer towards the output one through the hidden layers, it is a direct propagation network (feedforward), each layer composed of a varying number of neurons, the last is composed of different categories that we target (Fig. 2).
Fig. 2. Architecture of the neural network
3.4 Random Forest The Random Forest named also the decision tree forest is a supervised learning algorithm used for classification, it’s combine a variety of decision tree each trained on a random subset of data according to the principle of bagging approach, they working independently to solve a classification problem, each of them produces an estimated results and then it assembles and analyzing them, It aims to taking inspiration from several opinions to make the most accurate decision.
30
R. Sliman and A. Azouaoui
3.5 Decision Tree Decision tree is a very popular in machine learning it’s belong to the supervised learning algorithms that make it possible to classify categories or predict a value, it’s aims to construct a tree composed of nodes which make the model simple and easy to use it. The principle is start from the top of the tree (the parent node) to the root, passing through the children by testing the characteristics in question and follows the path if this is not enough; we use another, and so on. It is widely used in many problems to facilitate the decision-making process. 3.6 Naive Bayes Naive Bayes is a Supervised Learning algorithm used for classification based on Bayes theorem, it is a probabilistic classifier, which means it predicts on the basis of the probability of the occurrence of other features. Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which help in constructing a fast machine learning models that can make quick and accurate predictions. 3.7 Convolutional Neural Network A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing it’s composed of two phases, convolution and classification. The convolutional operation receives the image and applies two categories of treatment: • Filters: it allows bringing out its characteristics for example identify vertical, horizontal, diagonal edges, color variations and detect complex shapes. • Simplifications: used to identify the main information. Then the classification phase bases on neural artificial network.
4 Proposed Recognition System This section present the structure of our approach proposed, it has 4 main steps as illustrated in the following Figure (Fig. 3): In our work we have used different supervised learning algorithms to classify our characters, we aims in this work to compare the results obtained to deduce which is the most accurate and which runs in an optimal time. – Input image Input layer in machine learning algorithms should contain image data, that is represent in our case the Tifinagh alphabet. – Preprocessing This step aims to prepare the image to the feature extraction and classification phase. • Convert the image to binary image using Otsu algorithm
Tifinagh Handwritten Character Recognition Using Machine Learning Algorithms
31
Fig. 3. OCR system design
• Remove noise on the image using gaussian filter • image erosion • resize image – Classification and recognition Apply machine learning algorithm to extract the features and classify the image. – Output The output layer gives us the label value of the input image.
5 Experimental Results To validate our approach, we used a database contain a set of 780 images of each 32 Tifinagh, divided into train and test images, the size of the data used for training is 19968 that represent 80% of images, and 20% for testing contain 4992 images. We carried out the experiments in a laptop with these specifications: window 11 equipped with one Processor- AMD Ryzen 5 5500U with Radeon Graphics - 2.10 GHz using a RAM size of 16 GB and system type is 64-bit Operating system. Furthermore, the language used during training was python as it is a programming language. The implementation was carried out using Scikit-learn and Keras libraries for training our models. After preprocessing, feature extraction and image recognition steps, the below table present the accuracy of the 32 tested characters during the training of the different algorithms proposed (Table 1):
32
R. Sliman and A. Azouaoui Table 1. Accuracy of Tifinagh handwritten characters
Decision Random Naive Bayes SVM Tree Forest
character
CNN
FFNN
KNN
ⴱ
98%
96%
92%
99%
89%
98%
96%
ⵛ
97%
99%
90%
99%
98%
94%
95%
ⴷ
99%
99%
95%
97%
94%
99%
99%
ⴹ
99%
ⵄ ⴼ
98% 97%
93% 100%
85% 96%
98% 100%
85% 98%
97% 97%
90% 99%
95%
88%
99%
95%
97%
98%
ⴳ
92%
ⵖ ⴳⵯ
98% 100%
94% 97%
82% 89%
97% 99%
78% 97%
94% 99%
94% 99%
90%
81%
90%
71%
100%
99%
ⵀ
99%
ⵃ ⵊ
99% 97%
95% 96%
91% 82%
99% 99%
81% 99%
100% 99%
99% 99%
97%
90%
96%
92%
95%
96%
ⴽ
100%
ⴽⵯ ⵍ
97% 99%
96% 96%
83% 84%
96% 99%
85% 93%
99% 97%
97% 96%
98%
88%
95%
81%
100%
100%
ⵎ
100%
93%
91%
97%
87%
99%
99%
ⵏ ⵇ
100% 99%
96% 95%
95% 87%
96% 99%
97% 83%
100% 97%
100% 98%
ⵔ
97%
84%
84%
90%
86%
98%
97%
ⵕ ⵙ
99% 90%
96% 94%
88% 74%
98% 95%
81% 90%
97% 93%
97% 73%
ⵚ
96%
95%
85%
96%
65%
97%
96%
ⵜ ⵟ
99% 99%
99% 90%
84% 86%
98% 98%
76% 82%
99% 94%
99% 85%
ⵡ
99%
96%
94%
99%
100%
100%
99%
ⵅ ⵉ
100% 98%
94% 96%
73% 88%
92% 99%
72% 97%
97% 100%
96% 99%
ⵣ
92%
84%
68%
88%
62%
95%
78%
ⵥ ⴻ
90% 98%
86% 96%
68% 88%
90% 99%
61% 88%
85% 98%
69% 96%
ⵉ
99%
92%
74%
94%
77%
99%
98%
ⵓ
98%
95%
86%
95%
93%
99%
96%
Tifinagh Handwritten Character Recognition Using Machine Learning Algorithms
33
The algorithms proposed has been tested and proved very good results, below the table that present the accuracy of our models (Table 2): Table 2. Models proposed accuracy CNN
KNN
Random Forest Decision Tree Naive Bayes
Accuracy 97.57% 94,73% 96,23%
85,15%
SVM
FFNN
84,29% 96,91% 94,59%
In terms of accuracy, CNN classifier outperformed all other classifiers with 97.57% followed by SVM algorithm with 96.91%. For more precision we treated the execution time of the models proposed below the table which presents it (Table 3): Table 3. Execution time of different trained models
Execution time (s)
CNN
KNN
Random Forest
Decision Tree
Naive Bayes
SVM
FFNN
584
15.70
21.54
10.48
9.34
61.20
55.13
From the result obtained it is appear that Naives bayes run in a reduced time compared to other models due to its simplicity, even if it does not achieve a high precision like CNN algorithm that give us a good accuracy but a longer execution time. According to the result obtained presented in the above tables we can conclude that we have two main issues which are: • Confusion of the following characters ⵙ, ⵔand ⵀ. • Confusion between ⵣ, ⵥand ⵅ.
6 Conclusion In this paper we present our proposed OCR system applied on the different algorithms of machine learning for Tifinagh character recognition, then a comparative study is carried out in order to identify the most powerful algorithm, after the study it appeared that the CNN outperform the others algorithms in terms of precision and the naïve Bayes in terms of execution time even the low accuracy obtained. The Tifinagh character recognition system need more complex algorithms for achieving more precision and avoid confusion between the different characters mentioned before as much as possible, in the perspective we plan to improve the CNN algorithm to solve the confusion problem and get an excellent accuracy.
34
R. Sliman and A. Azouaoui
References 1. Niharmine, L., Outtaj, B., Azouaoui, A.: Tifinagh handwritten character recognition using genetic algorithms. In: 2018 International Conference on Advanced Communication Technologies and Networking (CommNet), Marrakech, pp. 1–6. IEEE (2018) 2. Sadouk, L., Gadi, T., Essoufi, E.H.: Handwritten Tifinagh character recognition using deep learning architectures. In: Proceedings of the 1st International Conference on Internet of Things and Machine Learning, pp. 1–11 (2017) 3. Niharmine, L., Outtaj, B., Azouaoui, A.: Recognition of handwritten Tifinagh character using gradient direction features. J. Theor. Appl. Inf. Technol. 95(13), 3088–3095 (2017) 4. Sabir, B., Khazri, Y., Jadir, A., Touri, B., Moussetad, M.: Multiple classifiers combination applied to OCR of Tifinagh alphabets. Int. J. Eng. Innov. Technol. (IJEIT) 4(5) (2014) 5. Gounane, S., Fakir, M., Bouikhalen, B.: Handwritten Tifinagh text recognition using fuzzy k-nearest neighbor and bigram language model. Int. J. Eng. Sci. Innov. Technol. (IJESIT) 2(4) (2013) 6. El Kessab, B., Daoui, C., Bouikhalene, B.: Handwritten Tifinagh text recognition using neural networks and hidden Markov models. Int. J. Comput. Appl. 75(18), 0975–8887 (2013) 7. Benaddy, M., El Meslouhi, O., Es-saady, Y., Kardouchi, M.: Handwritten Tifinagh characters recognition using deep convolutional neural networks. Sens. Imaging. 20, 9 (2019) 8. Zenkouar, L.: L’écriture amazighe Tifnaghe et unicode. Etudes et documents berbères, Paris, France, no. 22, pp. 175–192 (2004) 9. Ameur, M., et al.: Initiation à la langue amazighe. Publications de l’Institut royal de la culture Amazighe, manuels, no. 1, p. 9 (2004) 10. Altwaijry, N., Al-Turaiki, I.: Arabic handwriting recognition system using convolutional neural network. Neural Comput. Appl. 33(7), 2249–2261 (2020). https://doi.org/10.1007/s00521020-05070-8 11. Younis, K.: Arabic handwritten characters recognition based on deep convolutional neural networks. Jordan J. Comput. Inf. Technol. (JJCIT) 3 (2018) 12. Alaasam, R., Kurar, B., Kassis, M., El-Sana, J.: Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text. In: 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), pp 124–128 (2017) 13. El-Sawy, A., Loey, M., Hazem, E.: Arabic handwritten characters recognition using convolutional neural network. WSEAS Trans. Comput. Res. 5, 11–19 (2017) 14. Tounsi, M., Moalla, I., Pal, U., et al.: Arabic and Latin scene text recognition by combining handcrafted and deep-learned features. Arab. J. Sci. Eng. (2021) 15. Anugrah, R., Bintoro, K.: latin letters recognition using optical character recognition to convert printed media into digital format. Jurnal Elektronika dan Telekomunikasi 17(2), 56–62 (2017)
Maintenance Prediction Based on Long Short-Term Memory Algorithm Mouna Tarik(B) and Khalid Jebari LMA, FSTT, Abdelmalek, Essaadi University, Tetouan, Morocco [email protected]
Abstract. Predictive maintenance is a prominent strategy that plays a key role in reducing the maintenance costs in manufacturing systems. It allows minimizing the downtime and failures risks. According to literature, deep learning has proven to show superior performance in certain domains such as object recognition, image classification and predictive maintenance. Among the deep learning methods, Long Short Term Memory Networks (LSTM) are especially appealing to the predictive maintenance domain due to the fact that they are very important at learning from sequences. This paper suggests a LSTM network to predict engine failure so that maintenance can be planned in advance. This model shows a very good performance for the aircraft engine dataset used in this study. Keyword: Predictive Maintenance · Deep Learning · Long-Short Term Memory
1 Introduction The predictive maintenance monitors the performance and condition of equipment during normal operation to reduce the likelihood of failures. The advantages of predictive maintenance are tremendous from a cost-savings perspective. They include minimizing planned downtime and maximizing equipment lifetime. As well as optimizing employee productivity and increasing revenue. In modern digital manufacturing, nearly 79.6% downtime of machine tools is caused by their mechanical failures [1]. In fact, it is very important to detect faults early and accurately. Referring to literature, Machine learning techniques based on neural networks are achieving good results in diverse domains, especially in predictive maintenance. In their article, Khanh and Nguyen [2] presented a novel dynamic predictive maintenance framework based on sensor measurements. It provides the probabilities that the system can fail in different time horizons to decide the moment for preparing and performing maintenance activities based on the Long Short-Term Memory classifier. Chia-Hung Yeh et al. [3] proposed a method based on machine learning to predict long cycle maintenance time of wind turbines in a power company. The experimental results showed that their proposed method reaches high accuracy, which helps to drive up the efficiency of wind turbine maintenance. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 35–45, 2023. https://doi.org/10.1007/978-3-031-26384-2_4
36
M. Tarik and K. Jebari
Bo Luo et al [1], presented in their paper a novel method for early fault detection in time-varying working conditions according to different steps. Their method showed considerable potential for early fault detection in manufacturing. This paper is organized as follows: Sect. 2 presents an overview about the predictive maintenance. In Sect. 3, we will introduce a theoretical background of deep learning and Long Shot-Term Memory algorithm. Section 4 presents the data used and the experimental results. Section 5 discusses the conclusion and future research issues.
2 Predictive Maintenance 2.1 Maintenance Strategies Divergence of maintenance strategies have emerged over a period of time. Some industries are still following “the fail and repair” strategy whereas others have adopted advanced strategies like predictive maintenance. The three main maintenance policies are specified as corrective maintenance, preventive maintenance and predictive maintenance. Corrective maintenance, also known as a reactive or ‘run to failure’ strategy, refers to the actual repair or replacement of equipment that has malfunctioned, broken, or worn down. Corrective maintenance strategy is a set of tasks performed when an equipment or a machine is identified to be not performing as per the intended functionality. Tasks include identifying the problem, rectifying it and restoring the device to an operational state. Preventive maintenance is conducted regularly on an asset to reduce the likelihood of failure. This is undertaken while an asset or an item of equipment is working so that it doesn’t break down unexpectedly. This maintenance style falls between the reactionary corrective maintenance and the predictive maintenance strategy. The Preventive maintenance is performed to minimize cost and increase an asset’s life-cycle, while avoiding unscheduled outages or breakdowns. This strategy can be divided into two types: – Time-Based Preventive Maintenance: This type of preventive maintenance is triggered periodically to deliver a regular inspection of a piece of equipment. For instance, this could be a weekly, six-monthly or annual inspection. – Usage-Based Preventive Maintenance: The procedures are triggered after a set amount of production cycles, hours in use, or even distance traveled. The implementation of preventive maintenance programs schedules repairs, lubrication, adjustments and machine rebuilds for all critical plant machinery [4]. 2.2 Predictive Maintenance Predictive maintenance uses data science and predictive analytics to estimate when a piece of equipment might fail so that corrective maintenance can be scheduled before
Maintenance Prediction Based on Long Short-Term Memory Algorithm
37
the point of failure. The goal is to schedule maintenance at the most convenient and most cost-efficient moment, allowing equipment’s lifetime to be optimized. Including predictive maintenance in a maintenance management program optimizes the availability of process machinery and reduces greatly the maintenance costs [4]. The predictive maintenance process flow consists of the following steps:
Fig. 1. Predictive maintenance process flow
The steps to implement a predictive maintenance program generally include: identifying critical assets, establish a database for historical data, analyze failure modes, make failure predictions then deploy the predictive maintenance technology to a group of pilot equipment to validate the program. Predictive maintenance is the most developed maintenance policy, insights are an extremely valuable asset in improving the overall maintenance and reliability of an operation. Benefits include: • • • • •
Minimizing the number of unexpected breakdowns, Maximize assets uptime and improve their reliability, Reduce operational costs by performing maintenance only when necessary, Maximize production hours, Improve safety.
38
M. Tarik and K. Jebari
3 Deep Learning 3.1 Theoretical Background Deep learning has been proved to be an advanced technology for big data analysis with a large number of successful cases in speech recognition [5], image processing, human action recognition [6], object detection and others [7]. Deep Learning is a subset of machine learning in the form of a neural network with several layers of nodes between the input and the output. The hierarchical relationship between machine learning, artificial neural networks and deep neural networks is summarized in Venn diagram of Fig. 1
Fig. 2. Venn diagram of machine learning concepts and classes (Source: Goodfellow et al. 2016 p. 9)
In literature, there are several types of architectures for neural networks such as: • The multilayer perceptrons, that are the oldest and simplest ones, • The Convolutional Neural Networks (CNN), particularly adapted for image processing, • The recurrent neural networks (RNN), used for sequential data such as text or times series. In this study, we will focus on RNN and more especially the Long Short-Term Memory network. 3.2 LSTM Long Short Term Memory (LSTM) cells were introduced by Hochreiter and Schmidhuber (1997) [8]. They were created to be capable to learn long time dependencies and to prevent back-propagated errors from vanishing or exploding. An LSTM layer consists of a set of recurrently connected blocks, known as memory blocks. Each one contains one or more recurrently connected memory cells and three multiplicative units - the input, the output and the forget gates - that provide continuous analogues of write, read and reset operations for the cells.
Maintenance Prediction Based on Long Short-Term Memory Algorithm
39
The internal state of the cell is maintained with a recurrent connection of weight 1.0. The three gates collect activations from inside and outside the block, and control the cell via multiplicative units. The input and output gates scale the input and output of the cell while the forget gate scales the internal state, for example by resetting it to 0 (making it forget). The cell input and output squashing functions (g and h) are applied at the indicated places (see Fig. 2). The net can only interact with the cells via the gates [9] (Fig. 3).
Fig. 3. LSTM memory block with one cell (Source: A. Graves and J. Schmidhuber 2005 p. 2048)
The Forget Gate In a cell of the LSTM network, the first step is to decide whether we should keep the information from the previous timestamp or forget it. Information from the previous hidden state and information from the current input is passed through the sigmoid function. Values come out between 0 and 1. The closer to 0 means to forget, and the closer to 1 means to keep. The Input Gate Input gate is used to quantify the importance of the new information carried by the input. First, we pass the previous hidden state and current input into a sigmoid function. This determines which values will be updated by transforming the values to be in between 0 and 1. Then, the hidden state and the current input are passed into the tanh function to squish values between −1 and 1 to help regulate the network. The tanh output with the sigmoid output are multiplied. The sigmoid output will decide which information is important to keep from the tanh output. The Output Gate The output gate decides what the next hidden state should be. First, we pass the previous hidden state and the current input into a sigmoid function. Then we pass the newly modified cell state to the tanh function. We multiply the tanh output with the sigmoid
40
M. Tarik and K. Jebari
output to decide which information the hidden state should carry. The output is the hidden state. The new cell state and the new hidden is then carried over to the next time step. A LSTM cell comprises at time step t, a memory cell ct and an output ot . As input, this cell at time t comprises xt , ct−1 and ht−1 (hidden state at previous time step ht−1 ). The hidden state at previous time step ht−1 , the input gate it and forget gate ft . The following updating equations are given as follows: [10] it = σ(Wi xt + Vi ht−1 + bi ), f t = σ(Wf xt + Vf ht−1 + bf ), ot = σ(Wo xt + Vo ht−1 + bo ), ct = f t ct−1 + it tanh Wc xt + Vc ht−1 + bc , ht = ot tanh(ct )
(1)
σ is the sigmoid activation function, represents the element-wise product, k is a hyper-parameter that designates the dimensionality of hidden vectors [10]. LSTM equations also generate f(t), i(t) , these are for internal consumption of the LSTM and are used for generating c(t) and h(t). The above equations are for only a one-time step. The weight matrices (Wf, Wi, Wo, Wc, Vf, Vi, Vo, Vc) and biases (bf, bi, bo) are not time-dependent, this means that these weight matrices don’t change from one time step to another. To calculate the output of different timesteps the same weight matrices are used.
4 Methodology and Experimental Results 4.1 Data Airlines companies are interested in predicting equipment failures in advance so that they can reduce flight delays and improve the maintenance efficiency. The data used contains simulated aircraft engine run-to-failure events, operational settings and 21 sensors measurements. The training data consists of multiple multivariate time series with “cycle” as the time unit, together with 21 sensor readings for each cycle. Each time series can be assumed as being generated from a different engine of the same type. In this simulated data, the engine is assumed to be operating normally at the start of each time series. It starts to degrade at some point during the series of the operating cycles. The degradation progresses and grows in magnitude. When a predefined threshold is reached, the engine is then considered unsafe for further operation. The testing set includes operational data from 100 different engines. The engines in the test dataset are completely different from engines in the training data set. All the values are numeric and there are no missing values. 4.2 Methodology The data uses simulated aircraft sensor values to predict when an aircraft engine will fail in the future so that maintenance can be planned in advance.
Maintenance Prediction Based on Long Short-Term Memory Algorithm
41
The code was performed using jupyter notebook running on python 3.8 language environment and executed on core I5-73000 CPU processor. The LSTM classifier proposed in this paper is constructed by using the python deep learning library, Keras. Long Short Term Memory (LSTM) networks are especially appealing to the predictive maintenance domain due to the fact that they are very good at learning from sequences. This paper establishes Long-Short Term Memory Recurrent Neural Network Model for aircraft engine failures prediction. The LSTM networks can automatically extract the right features from the data, eliminating the need for manual feature engineering. The expectation is that if there is a pattern in these sensor values within the window prior to failure, the pattern should be encoded by the LSTM. We build a deep network. The first layer is an LSTM layer with 100 units followed by another LSTM layer with 50 units. Dropout is also applied after each LSTM layer to control overfitting. Final layer is a Dense output layer with single unit and Sigmoid activation since this is a binary classification problem. 4.3 Experimental Results In order to evaluate the effectiveness of the suggested approach, we resorted to common quality metrics represented by accuracy, f1- score, recall and precision [12]. Accuracy =
tp + tn tp + tn + fp + fn
Recall =
tp tp + fn
(3)
tp tp + fp
(4)
precisionxrecall precision + recall
(5)
Precision = f1 -score = 2x
(2)
The loss function of LSTM was the binary crossentropy and the optimizer was ‘adam.’ We evaluated the performance of the model with 10 epochs with 0.3 as validation split. The results are presented in Table 1. Figure 4 shows a good fit between the plots of training and validation curves, the loss decreases to a point of stability with a minimal gap between the two final loss values. The learning curve of the model has a higher validation loss at epoch 6 and a lower validation loss at epoch 2. The learning curve for training loss seems to be a good fit and the learning curve for validation loss shows noisy movements around the training loss (Fig. 5). After increasing the epochs to 50, the plots are as follows (Fig. 6): The is no large gap is between both curves. The plot of validation loss decreases to a given point and starts increasing again. The inflection point in validation loss may be the point at which training could be halted as experience after that point shows the dynamics of overfitting.
42
M. Tarik and K. Jebari
Fig. 4. Performance Learning Curves: Accuracy vs epochs
Fig. 5. Optimization Learning Curves: Loss vs epochs
Fig. 6. Optimization Learning Curves: Loss vs epochs after 50 epochs
In their article [13], M. Tarik and K. Jebari presented a comparative study of different machine learning algorithms used for classification tasks in maintenance prediction. For the engine aircraft data, Multilayer Perception (MLP), Support Vector Machine (SVM) and Decision Tree (DT) have been compared and the results are as follows (Table 2): The above-mentioned research attempted to study comparative performances of different supervised machine learning algorithms in maintenance prediction.
Maintenance Prediction Based on Long Short-Term Memory Algorithm
43
Table 1. Results obtained for the LSTM model
LSTM
Accuracy
Precision
Recall
F-score
0.97
0.96
0.90
0.93
Table 2. Recapitulation of the results (Source: M. Tarik and K.Jebari 2020 p.2684) Model
Precision
Recall
F-score
Accuracy
MLP
0,875
0,56
0,682
0,87
Decision tree
0,933
0,56
0,7
0,88
SVM
0,944
0,68
0,79
0,91
Liliya Demidova et al. [14], in their work, performed the training of the aircraft engines dataset using the LSTM algorithm with the following structure: – The LSTM layer with 100 neurons. – The dropout layer with parameter equal to 0,2. – The dense layer with one neuron and the sigmoid activation function for the classification problem. – The accuracy as objective function and the binary-crossentropy as loss function. The accuracy obtained is 0,97849 and the following table presents the different works comparison results (Table 3): Table 3. Comparison with other works results Model
Precision
Recall
F-score
Accuracy
MLP
0,875
0,56
0,682
0,87
Decision tree
0,933
0,56
0,7
0,88
SVM
0,944
0,68
0,79
0,91
LSTM 1
0.96
0.90
0.93
0.97
LSTM 2
0,92593
1
0,96154
0,97849
The Table 3 shows the values of the learning outcomes related to different models. LSTM scored better in terms of accuracy and other metrics.
5 Conclusion In the industrial environment, maintaining machinery and the different equipment in a functional state proves to be a complex task since numerous factors can affect their
44
M. Tarik and K. Jebari
general condition, such as operational parameters, temperature, run time and the presence of chemical agents. Indeed, the benefits of predictive maintenance, such as helping to determine the condition of equipment and predicting when maintenance should be performed, are considerable. Predictive maintenance is a prominent strategy that helps detect anomalies and failure patterns and provide early warnings. For aircraft engines, and in order to predict failures, a deep neural network was implemented. The Long Short-Term Memory (LSTM) has been applied to predict when an aircraft engine will fail in the future so that maintenance can be planned in advance and the device reliability can be improved. A comparative result analysis between the machine learning algorithms, namely Support Vector Machine (SVM), Decision Tree (DT), Multi-layer Perceptron (MLP) and the deep learning algorithms based on the LSTM model is presented. Experiments demonstrate that, compared with other methods of anomaly detection for aircrafts engines, the LSTM model showed a good performance. The accuracy of the model reaches 97%.
References 1. Luo, B., Wang, H., Liu, H., Li, B., Peng, F.: Early fault detection of machine tools based on deep learning and dynamic identification. IEEE Trans. Ind. Electron. 66(1), 509–518 (2019). https://doi.org/10.1109/TIE.2018.2807414 2. Nguyen, K.T.P., Medjaher, K.: A new dynamic predictive maintenance framework using deep learning for failure prognostics. Reliab. Eng. Syst. Saf. 188, pp. 251–262 (2019). https://doi. org/10.1016/j.ress.2019.03.018 3. Yeh, C.-H., Lin, M.-H., Lin, C.-H., Yu, C.-E., Chen, M.-J.: Machine learning for long cycle maintenance prediction of wind turbine. Sensors (2019). https://doi.org/10.3390/s19071671 4. Keith Mobley, R.: An introduction to predictive maintenance. Elsevier science, USA (2002) 5. Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2014). https://doi.org/10.1007/s10 489-014-0629-7 6. Wu, D., Sharma, N., Blumenstein, M.: Recent advances in video-based human action recognition using deep learning: a review. In: 2017 International Joint Conference on Neural Networks, pp. 2865–2872 (2017). https://doi.org/10.1109/IJCNN.2017.7966210 7. Zhou, L., Zhang, C., Liu, F., Qiu, Z., He, Y.: Application of deep learning in food: a review. Compr. Rev. Food Sci. Food Saf. 18, 1793–1811 (2019). https://doi.org/10.1111/1541-4337. 12492 8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 9. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, vol. 4, pp. 2047–2052 (2005). https://doi.org/10.1109/IJCNN.2005.1556215 10. Zhao, R., Wang, J., Yan, R., Mao, K.: Machine health monitoring with LSTM networks. In: 2016 10th International Conference on Sensing Technology (ICST), pp. 1–6 (2016). https:// doi.org/10.1109/ICSensT.2016.7796266 11. Manaswi, N.K.: RNN and LSTM. In: Deep Learning with Applications Using Python. Apress, Berkeley, CA (2018). https://doi.org/10.1007/978-1-4842-3516-4_9
Maintenance Prediction Based on Long Short-Term Memory Algorithm
45
12. Jalayer, M., Orsenigo, C., Vercellis, C.: Fault detection and diagnosis for rotating machinery: a model based on convolutional LSTM, fast Fourier and continuous wavelet transforms. Comput. Ind. 125, 103378 (2021) 13. Tarik, M., Jebari, K.: Maintenance prediction by machine learning: study review of some supervised learning algorithms. In: Proceedings of the 2nd African International Conference on Industrial Engineering and Operations Management Harare, Zimbabwe, 7–10 December 2020 14. Demidova, L., Ivkina, M., Marchev, D.: Application of the machine learning tools in the problem of classifying failures in the work of the complex technical systems. In: 2019 1st International Conference on Control Systems, Mathematical Modelling, Automation and Energy Efficiency (SUMMA) (2019). https://doi.org/10.1109/summa48161.2019.89
Towards an Approach for Studying the Evolution of Learners’ Learning in E-Learning Ilham Dhaiouir1(B) , Loubna Cherrat2 , Mostafa Ezziyyani1 , and Mohamed Khaldi3 1 Computer Science Department, Laboratory of Mathematics and Applications, Faculty of
Technical Sciences, Tangier, Morocco [email protected], [email protected] 2 National School of Business and Management, Abdelmalek Essaâdi University, Tangier, Morocco [email protected] 3 Computer Science Department, S2IPU, ENS Tetouan, UAE, Tetouan, Morocco [email protected]
Abstract. Evaluation is very important for monitoring the progress of students’ learning, especially when it comes to distance learning, as it makes it easier for tutors of online courses to monitor the learners’ learning via the results obtained in each test to ensure that the course has been understood by them and that the proposed course meets the learning objectives of each learner. But the problem is that MOOCs are free and open to a massive number of learners. So, in order to solve this problem, we proposed in our work a complete study on the grouping of students according to their grades and via a precise planning in order to facilitate the tutors the operation of adapting the contents to the level of each group of learners. For this reason, each student is led to take a diagnostic test to check his or her level. The planning is based on three summative tests. According to the results obtained in the evaluation, the students will be classified into three groups: advanced, average, weak according to their levels using the results obtained in the diagnostic examination and also thanks to the K-means classification algorithm. The regression method will then be used by the application of the Linear Regression algorithm to predict which students will pass the exam after completing the online training. Keywords: Evaluation · Learning · MOOCs · Students · Tutors · Grouping · Diagnostic test · K-means · Regression
1 Introduction Evaluation has an important role in determining the evolution of learner learning in online training because it will allow the teacher to ensure that the content, he has offered to his students is adequate for their objectives learning [1]. Classroom teaching is easier to manage for a teacher because he is in front of a group of which he already has information about them, and he will be able to easily detect their © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 46–55, 2023. https://doi.org/10.1007/978-3-031-26384-2_5
Towards an Approach for Studying the Evolution of Learners’ Learning
47
levels and their shortcomings thanks to the live interaction with them. Unlike online education where learners are hidden behind their computers, and are from different geographical areas, different levels, languages, cultures. But according to [2], the latter affirmed in his work that a distance course allows learners to adapt easily to this new type of learning which requires familiarization with the new learning tool used, which generates in most cases a feeling of isolation among some learners who have difficulty adapting to the use of this new tool. To succeed in an online course, learners must respect the training schedules and be more committed to following the courses and carrying out the activities requested, even if they have real autonomy in this type of learning. As for the teachers, they must develop the courses they offer and personalize the educational activities according to the characteristics of each learner profile in order to increase the interactivity between them and the tutors and to eliminate any feeling of isolation observed at their home. Also, the use of places of exchange such as: the forum, the wiki, the chat. Favors the support relations between the learners because they exchange and share the information between them, as well as to make reminders to certain activities for those who missed something because it is the place where the learner can question any difficulty encountered. Asynchronous tools also allow learners to interact on specific questions and establish exchanges with their tutors, which facilitates the online learning process and allows teachers to easily manage learners enrolled in distance learning. Several researchers have conducted research to propose effective methods for predicting the evolution of learner learning in an e-learning course, either by following the traces left by learners behind them in the MOOCs in which they have already participated to determine their evolution in relation to the content [3] and [4], or by adapting the content to the levels of the learners via the proposal of personalized courses and activities that can be adapted to any type of learner profile [5]. For [6], he thinks that remediation can help learners to remedy their shortcomings, but this remediation is done randomly without determining the shortcomings of all learners, which seems a difficult task to achieve given the massive number of learners and the heterogeneity of their profiles. Nevertheless, these approaches are not sufficient to instrument the monitoring needs of existing training because this work has only been applied to some of the learners and lacks strategies, which in most cases has led to a cessation of training monitoring, training. Our work is organized as follows: Sect. 2 represents the results of the tests that we will analyze using the panda’s library for the data manipulation and analysis. Section 3 gives an overview, the regression method for predicting which students will pass the exam. Finally, Sect. 4 presents the conclusion and an overview of future work.
48
I. Dhaiouir et al.
2 Case Study This is an e-learning course entitled “Statistical Probability”, with a total of 109 students. The traces on which our study will be based, we will collect them from the E-learning platform of the polydesciplinary faculty of Larache, according to the: – Registration form. – Diagnostic test. – Summative test. 2.1 The Places Where the Traces Are Generated This is an e-learning course entitled “Statistical Probability”, with a total of 109 students. The traces on which our study will be based, we will collect them from the E-learning platform of the polydesciplinary faculty of Larache, according to the: – Registration form. – Diagnostic test. – Summative test. 2.1.1 Registration Form Before starting the course, participants are asked to fill in a registration form to get an idea of their profile: personal information, levels, background, learning interests, etc. 2.1.2 Data Pre-processing To test the validity of the information that learners have entered in the registration form, they are required to take a diagnostic test based on the results obtained, we will carry out the processing and analysis of the data by importing necessary libraries. First, we import the Pandas, NumPy, Pyplot, and Sklearn libraries. From the Scikit Learn library, we need KMeans, so we call it from the Sklearn cluster submodule (Fig. 1).
Fig. 1. Importing libraries.
Towards an Approach for Studying the Evolution of Learners’ Learning
49
Read data and display first five lines (Fig. 2):
Fig. 2. Result of the diagnostic test of some students.
According to the analysis of the results obtained by the students in the diagnostic test, we note that in total 74 learners were able to validate the diagnostic test and 35 learners did not obtain the average. This means that about 68% of the learners have the necessary prerequisites that learners enrolled in this e-learning course should have. Depending on the results obtained, we will classify the learners into three groups with the same level of prerequisites. – Group A: These are learners who have a very advanced level compared to the others, so depending on the scores of the pre-test we will limit the score of this category between 14 and 20. – Group B: These are learners of average level and whose mark is between 10 and 13. – Group C: These are learners with a very low level and whose score is less than 10. To display the information on the data we will use two methods. • First method: We used the describe() function to generate the descriptive statistics by summarizing the number of distinct values, the spread, the mean, minimum and maximum, and the shape of the distribution of a data set for the given serial object. And according to the results obtained in the figure above, we see that 109 of the students who are registered in the MOOC, were able to follow it from the beginning to the end, which shows that the work according to groups having the same levels by the teacher made it easier for him to monitor the students’ learning as well as the proposal of educational content appropriate to their learning objectives. As for the students, this shows that they were well supported during the training, also the remediation to play a very important role in improving their learning. So, the student no longer feels that he is abandoned, on the contrary, he was well supervised and accompanied by his teacher as if they were in face-to-face teaching.
50
I. Dhaiouir et al.
Fig. 3. Display of data used information.
• Second method:
Fig. 4. Result of the pretest of some students.
According to the results obtained in the figures above, the distribution of the values is represented in the following columns (Fig. 5):
Fig. 5. The diagram of the results of the diagnostic test and the diagram of the results of the summative test.
In order to follow the evolution of the learning of the three groups of profiles that we have previously determined, the learners are led to follow the MOOC to the end by benefiting from different activities according to the learning pace of each group. The teacher offers each group personalized content that can be adapted to the characteristics of their profiles, or even the activities that they must carry out according to their abilities and levels. Whenever the teacher notices a gap in the learners’ knowledge, he immediately
Towards an Approach for Studying the Evolution of Learners’ Learning
51
offers them the necessary remediation. With regard to group C, from the beginning the teacher offers them, in parallel with the content of the training, remedial courses to have the necessary prerequisites that groups A and B have. At the end of the course, learners are invited to take a summative test to check their overall learning during the MOOC. From the figure above we can see that: According to the analysis of the diagram of the results of the diagnostic test, we find that 74 learners who were able to validate the test against 35 learners who did not. not had the average. So, based on the results obtained, the learners were classified into three groups: Group A: this is the group of learners with an advanced level with an average of 14 going up to 20. Group B: this is the group of learners with an average level with a mark between 10 and 14. Group C: this is the group of apprentices with a very low level and who have a mark lower than 10. So according to the diagram we see that in group A we have 35 students, in group B we have 39 and in group C we have 35 students. In order to follow the evolution of the learning of the three groups of profiles that we have previously determined that the learners are led to follow the MOOC until the end by benefiting from different activities according to the learning pace of each group. The teacher offers each group personalized content that can be adapted to the characteristics of their profiles, or even the activities that they must carry out according to their abilities and levels. Whenever the teacher notices a gap in the learners’ knowledge, he immediately offers them the necessary remediation. With regard to group C, from the beginning the teacher offers them, in parallel with the content of the training, remedial courses to have the necessary prerequisites that groups A and B have. At the end of the course, learners are invited to take a summative test to check their overall learning during the MOOC. Then, according to the diagram of the results of the summative test we find that: Group A: contains 58 students that means the improvement of 13 learners. Group B: contains 27 students because 9 students were upgraded, and they were able to join group A. Group C: contains 24 students, that means 4 students have improved their learning and they were able to join group A. So, according to this study, we find that the learning of the students has been improved thanks to the remediation they have benefited from. Also, the learners showed a great commitment during the MOOC thanks to the method work according to groups of students with the same level, which easily allows the teacher to monitor and the supervision of its learners. This study carried out and the results obtained allow us to say that the work according to homogeneous groups will allow teachers to personalize the contents, the activities, and the tests by adapting them to the characteristics of the profiles of the learners, so
52
I. Dhaiouir et al.
that MOOCs, meet the needs of the various stakeholders regardless of the teacher or the learners.
3 Clustering: K-means Method 3.1 Overview A prefilled copyright form is usually available from the conference website. Please send your signed copyright form to your conference publication contact, either as a scanned PDF or by fax or by courier. One author may sign on behalf of all of the other authors of a particular paper, providing permission has been given to do so. In this case, the author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. Digital signatures are not acceptable. Clustering is an unsupervised classification method whose objective is to separate unlabeled data into homogeneous groups with common characteristics [7]. The similarity between two data can be deduced from the “distance” between their descriptors. Thus, two very similar data are two data whose descriptors are very close. This definition allows us to formulate the problem of data partitioning as the search for K-means, around which the other data can be grouped. 3.2 Classification and Grouping of Learners After the identification of the parameters in the previous step, the system moves on to the classification and grouping of learners. There are several ways to do this, but in our case, we opted to use K-means for classification and regression for prediction. Following the study and analysis of the data set, it is possible to divide the learners into three groups: • Group A: these are the students who had excellent marks in the pre-test, which enabled them to easily understand the content of the MOOC and to pass the summative test par excellence. This group of learners showed their seriousness and their strong will and their interest in following the MOOC to the end, this is what the teaching team noticed because they showed their commitments, especially when carrying out the requested activities by their teacher and the time they spent connected on the platform following the courses and the videos as well as their interactions in the forum. • Group B: are already those who have an average level and who had average marks in the diagnostic test. In this group we had seen the improvement of some students who were able to join group A. • Group C: are those who still have problems understanding and prerequisites. But thanks to the remediation some students who are close to the average were able to progress and validate the summative test. Figure 3 and Fig. 4, show the K-means application code and the results obtained (Fig. 6). With the use of the k-means classification algorithm, we were able to have three classes of learners. Class A (advanced learners) in red, class (B) average learners in blue and class (c) in green (learners in difficulty).
Towards an Approach for Studying the Evolution of Learners’ Learning
53
Fig. 6. The K-means application code.
4 Regression: Prediction of Students Who Will Pass the Exam In this part we will build the labels and the division of the data (Train & Test. See the code below (Fig. 7):
Fig. 7. Result of the division of the data.
54
I. Dhaiouir et al.
Then we will apply the Linear Regression algorithm on our data, and we obtain the following result (Fig. 8):
Fig. 8. Result of the Linear Regression algorithm.
The score obtained in the figure above showed the relevance of the classification obtained, it means that the latter was indeed carried out and that the grouping of the learners in homogeneous groups having the same profile characteristics and the same level. Each class is separated from each other which shows that we have obtained a good classification.
5 Conclusion and Perspectives During the training, the learners showed from the beginning their willingness to learn and their serious commitment and interest in following the training until the end. This was illustrated by the realization of the activities requested by their teachers and the time they spent connected on the platform following the courses and the videos as well as their interactions in the forum. The method of working in groups has made it easier for teachers to monitor learner learning, personalize content and supervise learners. As for the learners who have a low level, some of them were able to validate the training because they felt supervised and really followed by their teacher. They were able to progress and validate the summative test thanks to the remedial measures offered by their teacher during the training. This new method will lead to a good prediction of factors that can improve learner learning in a MOOC. The classification of learners according to their prerequisites at the start of an Elearning course is one of the major factors that positively influences the evolution of learners’ learning. Personalization and content recommendation are considered future work.
References 1. Glikman, V.: Quand les formations d’adultes “surfent” sur les nouvelles technologies. Recherche et formation 26, 99–112 (1997) 2. Glikman, V.: La “E-formation” entre globalisation des produits et pluralité des services. Actes du colloque 2001 Bogues. Globalisme et pluralisme, GRICIS, Montréal (2002) 3. Lund, K., Baker, M.J.: Teachers’ collaborative interpretations of students’ computer mediated collaborative problem-solving interactions. In: Lajoie, S.P., Vivet, M. (eds.) Proceedings of the International Conference on Artificial Intelligence and Education, Le Mans, Juillet 1999. Artificial Intelligence in Education, pp. 147–154. IOS Press, Amsterdam (1999)
Towards an Approach for Studying the Evolution of Learners’ Learning
55
4. Lund, K., Baker, M.J.: Interprétations par des enseignants des interactions d’élèves médiatisées par ordinateur. In: 3ème Colloque International Recherche et Formation des Enseignants: Didactique des Disciplines et Formation des Enseignants: Approche Anthropologique, Marseille, France (2000) 5. Bachimont, B.: Arts et sciences du numérique: Ingénierie des connaissances et critique de la raison computationnelle, Habilitation à Diriger des Recherches, Université de Technologie de Compiègne (2004) 6. Rodet, J.: La rétroaction, support d’apprentissage? Revue du conseil québécois de la formation à distance 4(2), 45–74 (2000) 7. Soni, N., Ganatra, A.: Comparative study of several clustering algorithms. Int. J. Adv. Comput. Res. 2((4)6) (2012). ISSN (print): 2249-7277, ISSN (online): 2277-7970
Chatbots Technology and its Challenges: An Overview Hajar Zerouani1(B) , Abdelhay Haqiq1,2 , and Bouchaib Bounabat1 1
ALQUALSADI Team, Rabat IT Center, ENSIAS, Mohammed V University in Rabat, Rabat, Morocco hajar [email protected], [email protected], [email protected] 2 ITQAN Team, LyRICA Laboratory, ESI, Rabat, Morocco
Abstract. A chatbot is a conversational agent that uses Artificial Intelligence (AI) to interpret the text of the chat using Natural Language Processing (NLP) in particular, instead of making direct contact with a live person, users can make conversation via text or voice. Chatbots are a fast-growing AI trend that involves the use of applications communicating with users in a conversational style and imitating human conversation using human language. Many industries are attempting to include solutions based on artificial intelligence like chatbots to improve their customer service in order to deliver better service to their customers with faster and less expensive support. This paper is a survey of the published chatbots to discover knowledge gaps and indicate areas that require additional investigation and study, starting from history and how it evolves during the past, then chatbots architectures to understand how it works, and to identify application of chatbots in many domains, and finish by chatbots limitations that shorten its lifespan and how can future work improve the chatbot for best performance. Keywords: Chatbots · Artificial Intelligence Processing · Deep Learning
1
· Natural Language
Introduction
Interactive Agent, Artificial Conversational Entity, talk bot, chatterbot, humancomputer dialogue system, digital assistants, those expressions mean the same term called chatbot. The recent is composed of two words ‘’chat” and ‘’bot” to refer to a bot for messaging that provides a human computer interaction (HCI) [1] which can perform three types of communication’s forms: speech [2], text and image. Over the past decade, AI has transformed the world subtly, machines now can learn and think through without the intervention of humans. Moreover, the number of chatbots has grown especially in the last two years as shown in Fig. 1 the development of research in this field according to Google Scholar. Thanks to AI that can solve the most serious issue that chatbots face today is their inability to understand and produce natural expression. The AI-Chatbots c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 56–64, 2023. https://doi.org/10.1007/978-3-031-26384-2_6
Chatbots Technology and its Challenges: An Overview
57
Fig. 1. Development of research in chatbots over the time by counting the number of papers according to Google Scholar.
overcomes the limitations of rule based chatbots using NLP, Machine Translation (MT), Image Recognition (IR), Neural Networks (NNs)...and many branches of AI to reach the best user experience [3]. According to industry analysts, the worldwide chatbot market will be worth USD 2485.7 million by 2028 and chatbot adoption will save businesses $11 billion a year in the healthcare, banking, and retail sectors by 2023 [4]. Nowadays many industries are in the race to develop their chatbots of customer services that offer rapid and smart development in many domains such as businesses, education, healthcare and governance... It allows users to receive answers to their questions in a timely manner without having to wait in phone lines or send several emails or search online for responses. Hence there are a variety of areas that a chatbot can serve, and they are often regarded as the key to a company’s long-term success and competitiveness. This paper covers a survey of chatbot design, architecture types and suggests steps to choose suitable architecture. This paper is structured as follows:
2
History
Back to 1950, “can machines think?” was a question posed by Alan Turing [5] that reveal the rise of chatbot concept, 16 years later especially in psychology domain first chatbot named ELIZA [6] permit to encourage the patient to talk but the chatbot’s capability is too inflexible, hereafter in 1970 [7] PARRY was created to imitate a paranoid patient [8], Later in 1988, this chatbot named JABBERWACKY, its objective was to switch from a text-based system to a completely voice-operated or verbal chatbot, this chatbot has so far achieved only second place in the annual Loebner Prize [9]. The human-computer communication system’s development process is shifting from “adapting people to computers” to “adapting computers to people”
58
H. Zerouani et al.
[10], In 1990 the prototype of chatbot JULIA was created [11] by the creator of the term ‘chatterbot’ Michael Mauldin [12], and it was a famous once upon time, thenceforward in 1992 an AI speech synthesis called DR. SBAITSO in psychotherapy [13], then ALICE the name is acronym of Artificial Linguistic Internet Computer Entity, this chatbot is inspired from ELIZE and designed with pattern matching or rule based which we are going to talk in following section; this chatbot is implemented in Artificial Intelligence Markup Language (AIML) created in 1995 [14]. This language is used to specify the pattern-matching rules that connect words and phrases submitted by users to associated subject areas [15]. In 2001 a chatbot named SmarterChild [16], which ran on MSN Messenger, used for entertainment [17], later in 2007 IBM launched his question-answering (QA) system named Waston [18] it still alive for now, designed for businesses, automating responses to customer inquiries [19], That revolution push big tech companies like Google, Apple and Microsoft to launch their chatbots named virtual personal assistants [20], in 2010 SIRI chatbot was created by Apple [21], then Google developed GOOGLE NOW in 2012 [22], then ALEXA in 2015 by Amazon [23], one year later Microsoft crafted CORTANA [24]. The Fig. 2 shows a brief chronological sequence of previous chatbots.
Fig. 2. Chronological sequences of chatbots.
3 3.1
Architectures and Types of Chatbots Architectures
A chatbot also known as a conversational agent or an artificial dialogue system [25]. It is a computer system that acts as an interface between human users and software applications, communicating mostly through natural language (spoken or written). Chatbots are frequently portrayed, conceived, and developed as a flow of communication between multiple components as shown in Fig. 3 However, it also has all of the necessary details:
Chatbots Technology and its Challenges: An Overview
59
User Interface Component: the chatbot’s service starts when it receives a request from a user via text or speech-based application [26], such as Facebook Messenger, Slack, WhatsApp, WeChat, Viber, or Skype. User Message Analysis Component: The User Interface Controller sends the user’s request to the User Message Analysis Component, which analyzes it to detect the user’s intent and extracts entities using machine learning techniques or pattern matching [26]. Dialog Management Component: this Component manages the conversation background and keeps it up to date. It saves the current purpose as well as the specified individuals until the conversation reaches that stage. If the chatbot is unable to gather the requisite context information, it will ask the user for additional context information to fill in the gaps [26]. Backend: when rule-based chatbots are used, a Knowledge Base is developed (KB). It also contains a list of handwritten answers to the user’s inputs [26]. Response Generation Component: uses one or more of the three possible models to produce responses: Rule-based, Retrieval-based, or Generative-based [26].
Fig. 3. General Chatbot Architecture [26].
In [27], the authors proposed an architecture composed of layers, and they added a security layer, elseways another work [28] attempt a complete syntactical and semantic study of user inputs as part of the system design, [29] works on architecture too, but no architectural design is provided. Depending on the type of chatbot, the developer may choose the components to implement [26].
60
3.2
H. Zerouani et al.
Type of Chatbots
Chatbots are classified depending on knowledge: particular knowledge domain or more than one Knowledge Domain (KD), Generic chatbots (G) can answer any user query from any domain, while Domain-Specific (DS) chatbots can only respond to questions about a specific information domain, and chatbots that work through multiple domains are known as Cross or Open-Domain (OD) chatbots, also there is a classification based on response generation: Rule based (RLB), Retrieval based (RB) and Generative based (GB) Chatbot, and Hybrid based (see Fig. 4) Though Generative-based chatbots are useful for engaging an individual in informal open-domain conversations, based on the previous and preceding inputs, they use NLG to answer in a natural language that resembles humans. Without creating new text responses, RLB chatbots select an answer from a collection of rules; this type is better for closed-domain communications. The RB model is more adaptable because it chooses the best solution based on a review and examination of available resources. If none of the rules fit [26], hybrid chatbots that weigh the retrieved information against the created response to determine which is better [26]. When the chatbot has completed an answer, it displays it to the user and waits for feedback.
Fig. 4. Different types of chatbots.
A chatbot’s operation can be combined with human interaction in certain cases where more flexibility is needed. Human computation is used in at least one aspect of a human-mediated (HM) chatbot. Staff working to incorporate their intelligence into fully Autonomous (A) chatbots will be able to resolve their flaws, chatbots may be classified as Open-source (OS) or Commercial (C), depending on the Permissions granted by the development platform, furthermore another classification is based on the type of Communication Channel (CC) used by chatbots, which can be text, speech, image, or all three. Intrapersonal chatbots (RA) (Fig. 5) are close friends who live in the user’s domain and are aware of his requirements, they are often connected to messaging apps such as Slack and WhatsApp. Interpersonal chatbots (ER) are those that provide services such as restaurant reservations, airline reservations, or FAQ searches without being a friendly companion. Finally, Inter-agent chatbots allow
Chatbots Technology and its Challenges: An Overview
61
bots to communicate with one another like Alexa and Cortana are two chatbots that have been linked together to converse [26].
Fig. 5. Difference between Intrapersonal/ Interpersonal/Inter-agent chatbots.
4
Discussion
Every paper customizes its features and numbers of layers needed for a good architecture of the chatbot that responds to all user’s needs. The choice of suitable chatbot approach, types, languages, and platforms follows the intelligence level of the chatbot and tasks of this chatbot, as shown in Fig. 6 Chatbot owner can answer the following questions to check satisfaction: 1- Will the Chatbot Respond to New Questions or Not? This question will help to choose the suitable approach, as mentioned in Fig. 7. Pattern matching for rule based chatbots and the other approach for chatbots which needs to learn from old conversations. 2-What are the Tasks Chatbot Can Do? The answer of this question will classify the chatbot type: KD, Service Provided (SP), Response Generation Method (RGM), Human Aid (HA) and Goals as detailed in examples of available chatbots in Table 1. 3-Which Language Will the Chatbot Speak and How Many? This question will reveal the needs of translation and to check the availability of corpus, datasets and models in this language. 4-Will the Chatbot be Connected to Many Platforms Like Facebook, WhatsApp, Skype...? This question helps to know which CC must be integrated (speech, text, image); the benefit of those platforms is that the user is already familiar with the messaging applications, and it has a big population.
62
H. Zerouani et al.
Fig. 6. Flowchart to build suitable architecture for chatbot.
The Table 1 and Table 2 presents a list of chatbots in different domains using a set of metrics: KD, SP, RGM, HA, P, CC, Goals (if chatbot is Informative (IF), Task Based (TB) or Chat Based (CB)), Languages used and links of project. There has been a major increase in the production and usage of chatbots in recent years, with significant benefits in a variety of domains. They work 24 h a day, 7/7 in customer service centers, handling many clients at the same time, lowering pay-roll costs significantly. In education, they also serve a growing numTable 1. Goals and Languages of existing Chatbots in different domains. References Chatbot’s name
Domain
Languages
[30]
Suve: Covid-19 symptom checker
Health
English /Estonia IF+TB
[28]
English practice (CSIEC)
Education
English
CB+TB
[31]
Chatbot platform for B2B services Business
English
CB+TB
[32]
the LvivCityHelper bot
Governance Ukrainian
Goals
IF+TB
Table 2. Type of existing chatbots in different domains. References KD SP
RGM
HA
P
CC
[30]
DS
RA RLB+GB A
OS Text
[28]
G
RA RLB
[31]
OD RA None
A
C
[32]
OD RA None
A
OS Text
None OS Text/voice
Link https://eebot.ee/en/ http://www.csiec.com
Text/voice/image https://itsalive.io https://city-helper.com
Chatbots Technology and its Challenges: An Overview
63
ber of students by providing educational material and personal assistance. They also outperform human teachers in some situations, such as when they reduce language anxiety in foreign language students. They offer a variety of services to patients in the area of healthcare, but there is a risk when patients receive a less precise answer, that is why it is necessary to measure the effectiveness of chatbots especially in the health domain.
5
Conclusion
This survey aims to present a comprehensive view of chatbots to reduce time of researchers to understand chatbots and its architectures and types, help developers to choose which approach is best and show examples of available chatbots in different domains with languages used and link of project. Future work will concentrate on challenges and good practices of chatbots, also its performance indicators in terms of customer relations, how can we be sure that a bot is working at full capacity? and how to improve the performance of a chatbot over time?
References 1. Følstad, A., Brandtzæg, P.B.: Chatbots and the new world of HCI. Interactions 24(4), 38-42 (2017) 2. Nass, C.I., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship, p. 9. MIT Press, Cambridge (2005) 3. Nirala, K.K., Singh, N.K., Purani, V.S.: A survey on providing customer and public administration based services using AI: chatbot. Multimedia Tools Appl., 1–32 (2022) 4. Nguyen, Q.N., Sidorova, A., Torres, R.: User interactions with chatbot interfaces vs. menu-based interfaces: an empirical study. Comput. Hum. Behav. 128, 107093 (2022) 5. Dennett, D.C.: Can machines think?. In: Teuscher, C. (eds.) Alan Turing: Life and Legacy of a Great Thinker, pp. 295–316. Springer, Heidelberg (2004). https://doi. org/10.1007/978-3-662-05642-4 12 6. Sharma, V., Goyal, M., Malik, D.: An intelligent behaviour shown by chatbot system. Int. J. New Technol. Res. 3(4), 263312 (2017) 7. Mezzi, R., Yahyaoui, A., Krir, M.W., Boulila, W., Koubaa, A.: Mental health intent recognition for Arabic-speaking patients using the mini international neuropsychiatric interview (MINI) and BERT model. Sensors 22(3), 846 (2022) 8. AbuShawar, B., Atwell, E.: ALICE chatbot: trials and outputs. Computaci´ on y Sistemas 19(4), 625–632 (2015) 9. Carpenter, R., Freeman, J.: Computing machinery and the individual: the personal turing test. Computing (2005). Accessed 22 Sept 2009 10. Akgun, M., Cagiltay, K., Zeyrek, D.: The effect of apologetic error messages and mood states on computer users’ self-appraisal of performance. J. Pragmat. 42(9), 2430–2448 (2010) 11. Curry, C.: Design, evolution & production of a storytelling chatbot (2011) 12. Deryugina, O.V.: Chatterbots. Sci. Tech. Inf. Process. 37(2), 143–147 (2010)
64
H. Zerouani et al.
13. Zemˇc´ık, M.T.: A brief history of chatbots. DEStech Trans. Comput. Sci. Eng. 10 (2019) 14. Marietto, M.D.G.B., et al.: Artificial intelligence markup language: a brief tutorial. arXiv preprint arXiv:1307.3091 (2013) 15. Singh, J., Joesph, M.H., Jabbar, K.B.A.: Rule-based chabot for student enquiries. J. Phys. Conf. Ser. 1228(1), 012060 (2019) 16. Adamopoulou, E., Moussiades, L.: An overview of chatbot technology. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IFIP AICT, vol. 584, pp. 373–383. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49186-4 31 17. Bhute, A.N., Meshram, B.B.: IntelligentWeb Agent for Search Engines. arXiv preprint arXiv:1310.4774 (2013) 18. Ferrucci, D.A.: Introduction to “this is Watson”. IBM J. Res. Dev. 56(3.4), 1 (2012) 19. High, R.: The Era of Cognitive Systems: An Inside Look at IBM Watson and How it Works. IBM Corporation, Redbooks, pp. 1–16 (2012) 20. Cahn, J.: CHATBOT: architecture, design, & development. University of Pennsylvania School of Engineering and Applied Science Department of Computer and Information Science (2017) 21. Ait-Mlouk, A., Jiang, L.: KBot: a Knowledge graph based chatBot for natural language understanding over linked data. IEEE Access 8, 149220–149230 (2020) 22. Ehrenbrink, P., Osman, S., M¨ oller, S.: Google now is for the extraverted, Cortana for the introverted: investigating the influence of personality on IPA preference. In: Proceedings of the 29th Australian Conference on Computer-Human Interaction, pp. 257–265, November 2017 23. Chung, H., Park, J., Lee, S.: Digital forensic approaches for Amazon Alexa ecosystem. Digit. Investig. 22, S15–S25 (2017) 24. Vadhera, A., Thute, A., Mala, S., Shankar, A.: Chatbot on COVID-19 for sustaining good health during the pandemic. In: Vadhera, S., Umre, B.S., Kalam, A. (eds.) Latest Trends in Renewable Energy Technologies. LNEE, vol. 760, pp. 271–284. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-1186-5 23 25. Shah, H., Warwick, K., Vallverd´ u, J., Wu, D.: Can machines talk? Comparison of Eliza with modern dialogue systems. Comput. Hum. Behav. 58, 278–295 (2016) 26. Adamopoulou, E., Moussiades, L.: Chatbots: history, technology, and applications. Mach. Learn. Appl. 2, 100006 (2020) 27. Wu, C., Szep, J., Hariri, S., Agarwal, N.K., Agarwal, S.K., Nevarez, C.: SeVA: an AI solution for age friendly care of hospitalized older adults. In: HEALTHINF, pp. 583–591 (2021) 28. Jia, J.: CSIEC: a computer assisted English learning chatbot based on textual knowledge and reasoning. Knowl. Based Syst. 22(4), 249–255 (2009) 29. Zahour, O., Eddaoui, A., Ouchra, H., Hourrane, O.: A system for educational and vocational guidance in Morocco: chatbot E-orientation. Procedia Comput. Sci. 175, 554–559 (2020) 30. H¨ ohn, S., Bongard-Blanchy, K.: Heuristic evaluation of COVID-19 chatbots. In: Følstad, A., et al. (eds.) CONVERSATIONS 2020. LNCS, vol. 12604, pp. 131– 144. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68288-0 9 31. Aarthi, N.G., Keerthana, G., Pavithra, A., Pavithra, K.: Chatbot for retail shop evaluation. Int. J. Comput. Sci. Mob. Comput. 9(3), 69–77 (2020) 32. Smith, B., Gorsuch, G.J.: Synchronous computer mediated communication captured by usability lab technologies: new interpretations. System 32(4), 553–575 (2004)
Machine Learning, Deep Neural Network and Natural Language Processing Based Recommendation System Manal Loukili1(B)
and Fay¸cal Messaoudi2
1
National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco [email protected] 2 National School of Business and Management, Sidi Mohamed Ben Abdellah University, Fez, Morocco [email protected]
Abstract. Nowadays, data clustering has become a key research area in many fields. Text clustering is the process of classifying documents into predefined categories based on their content. It automatically assigns natural language texts to predetermined categories. Some of the main requirements of text retrieval systems include text clustering, which extracts text in response to a user query, and text understanding systems, which aim to transform text to answer questions, produce summaries or extract data, etc. In this paper, in order to solve this clustering problem, one of the unsupervised machine learning techniques of clustering has been applied for the exploitation of the database of the newspaper “El Pa´ıs” to build a recommendation system for the journal articles. This study included the following steps. First, the collection and preprocessing of the data by manipulating the keywords, then the analysis of the data by the K-means clustering method, followed by the choice of the number of clusters by the elbow method. Next, automatic language processing was used as a similarity measure with the K-means clustering method to determine similar articles based on sections, keywords, and titles. A deep neural network was also implemented to predict the temporal probabilistic distribution of the searched keywords. The results obtained from the clustering showed that there are twenty-three different segments of newspaper articles in this journal. Also, after training the neural network on the data, the temporal probability distributions of the searched keywords were visualized. Keywords: Text Mining · Machine Learning · Clustering Deep Neural Network · Natural Language Processing
1
· k-Means ·
Introduction
In the modern world, natural language processing (NLP) has evolved to become strongly mathematically based. Numerous NLP challenges including parsing [1], c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 65–76, 2023. https://doi.org/10.1007/978-3-031-26384-2_7
66
M. Loukili and F. Messaoudi
plagiarism detection [2], word sense disambiguation [3], unsupervised paraphrasing [4], text summarization [5], and document clustering [6], have benefited significantly from the implementation of strong statistical techniques [7]. This paper will focus on an unsupervised learning approach for an NLP problem.As the volume of available online content recently increased; rapid and efficient document clustering has become crucial. Text mining is the process of transforming unstructured text into structured data for analysis [8]. This practice is based on Natural Language Processing technology, which enables machines to understand and process human language automatically. Machine learning is now able to automatically classify texts by sentiment, subject, or intention via several techniques, such as clustering. Clustering is a statistical analysis technique that allows to organize random data into homogenized categories. Where in each cluster, the data is arranged based on one or more common characteristics [9]. In order to determine the closeness of each element on the basis of specific criteria, several machine learning clustering algorithms are used. To establish the balance, it minimizes the inertia within the classes and maximizes the inertia between the subgroups in order to differentiate them well. The objective can be to prioritize or to distribute the data. Thus, the goal of clustering algorithms is to make sense of the data and extract value from large amounts of structured and unstructured data. These algorithms separate data according to their properties or functionalities and group them into different clusters based on their similarities. In this paper we use machine learning techniques such as NLP and clustering to determine similar articles based on sections, keywords, and titles. We also use a neural network to forecast the dates of the searched keywords. This article is organized as follows: the next section is a literature review, and Sect. 3 describes the methodology. Section 4 is for data preprocessing. Natural language processing and dimension reduction are presented in Sects. 5 and 6. Section 7 shows the implementation of the k-means algorithm, and the clustering results are presented in Sect. 8. And Sect. 9 focuses on the prediction of article distribution dates by integrating NLP and an artificial neural network. Finally, Sect. 10 concludes the paper.
2
Related Work
According to [10], similar documents can be distinguished based on three elements namely: strings, corpus and knowledge. The paper [11], represents a comparative study of partitioning clustering algorithms with hierarchical clustering schemes was conducted, and showed that partitioning clustering algorithms have the best performance. In study [12], the authors conducted a comparative study of similarity measures for web pages. They compared four similarity metrics in combination with different clustering methods namely: k-means, weighted graph partitioning, hyper-graph partitioning, self-organizing feature map, and random. The results showed that the cosine similarity metric outperformed the others, as well as the weighted graph has surpassed the other clustering methods in terms of performance. In [13] the authors conducted a mixture of corpus-based
ML, DNN and NLP Based Recommendation System
67
and knowledge-based semantic similarity to determine the degree of similarity between documents. As a result, they obtained a considerable improvement when the similarity and knowledge-based measures were combined. In [14] the authors used cosine similarity, K-means clustering and agglomerative clustering approaches to recover articles that are of interest to the user and whose relevance is based on the articles read by the user. In this work, our experimentation is conducted using term-based similarity measures. These terms are extracted from sections, keywords, and titles in order to determine the similarity between journal articles. Also, natural language processing was integrated with an artificial neural network to determine the probabilistic temporal distribution of these articles.
3
Methodology
The aim of this paper is to realize an intelligent clustering system of the articles of the newspaper “El Pa´ıs” [17] using the semantic meaning of the keywords, sections, and titles. Our approach is to start with our raw data. Then we will do a pre-processing. Then, we will interpret our text with NLP and vectorization techniques. Finally, we will train a clustering model using K-Means Clustering and a predictive model of distribution following the years using a neural network. Finally, we will train a clustering model using K-Means Clustering and a predictive model of distribution following the years using a neural network. The figure below (Fig. 1) summarizes our approach.
4
Data Pre-processing
Our preprocessing approach starts with a DataFrame consisting of 5 Excel files. We start by concatenating these to obtain a single one. Then, we selected only 50,000 rows randomly in order to reduce the execution time. Then, we assign to each row, whose date value does not exist in the database, its respective date from the link. This being the case, we change the writing format of the variable “Fecha 1” so that it can be interpreted correctly in the following steps. After that, we eliminate the variable “Address” that only contains the link of each article. Then we proceed to clean up the data by removing the html code present in the variables, changing the writing format, merging the names of similar sections, etc. as shown in Fig. 2 and Fig. 3.
5 5.1
Natural Language Processing Data Preparation
In this step, we use this library to prepare our data for the next step. SpaCy is an open-source library about advanced natural language processing in Python. SpaCy is specially developed for production use and assists in building applications that deal with a high volume of text. This library is used to pre-process
68
M. Loukili and F. Messaoudi
Fig. 1. Project pipeline map.
Fig. 2. “El Pa´ıs” newspaper data set.
ML, DNN and NLP Based Recommendation System
69
Fig. 3. Data pre-processing.
text for deep learning, to construct information extraction systems or natural language comprehension systems. Then we remove punctuation, conjunctions, and pronouns. In addition, we transform words from plural to singular, conjugated verbs to infinitive, feminine to masculine, and any changes that normalize our data. 5.2
Data Vectorization
In order to convert our string data into a measure of the importance of each word to the instance of the literature as a whole. We use the TF-IDF algorithm which stands for Term Frequency Inverse Document Frequency (Fig. 4). This is a very common algorithm for transforming text into a meaningful representation of numbers that is used to fit the machine algorithm for prediction [16]. After the vectorization of the text of all our instances, we notice that the dimension of the matrix is very important. So, we proceed to the dimension reduction.
6
Dimension Reduction
In this step, we use principal component analysis (PCA) in order to limit the storage space of the “X-reduced” matrix and to reduce the computation time in the next step. PCA is a technique to reduce the dimensionality of these datasets, which increases the possibility of interpretation while minimizing the loss of information. It does this by creating new uncorrelated variables that successively maximize variance. Finding these new variables, the principal components, is like solving an eigenvalue/eigenvector problem, and the new variables are defined by the available data set, not a priori, making PCA an adaptive data analysis technique. It is also adaptive in another sense, since variants of the technique have been developed to adapt to different data types and structures.
70
M. Loukili and F. Messaoudi
Fig. 4. Data vectorization.
7
Clustering
We adopt the K-means algorithm since we only have numerical data from the vectorization [8]. There are several methods to evaluate the value of K-means. In our case, we will use the “Elbow Method” which, in cluster analysis, is a heuristic used to determine the clusters number in a given data set. This method involves tracing the explained variation in terms of the number of clusters and choosing the elbow of the curve as the number of clusters of interest. This approach can also be applied to choose the number of parameters in other data-based models, such as the number of principal components to describe a data set. From the figure bellow, Fig. 5 we choose a value of k = 23.
Fig. 5. Elbow method for choosing the value of k.
ML, DNN and NLP Based Recommendation System
8
71
Visualization of Clusters
As our clusters are already well defined, we need to visualize them. Now, this is difficult since they are in a very large dimensional space R 1500 . In order to remedy this, we take advantage of the t-SNE algorithm. It is a nonlinear “feature extraction” algorithm that constructs a new representation of the data such that data that are close in the original space have a strong probability to have nearby representations in the new space. Inversely, data that are distant in the original space have a weak probability to have nearby representations in the new space. In practice, the similarity between each pair of data, in the two spaces, is measured by means of probabilistic calculations based on distribution hypotheses. And the new representations are constructed in such a way as to minimize the difference between the probability distributions measured in the original space and those in the new space. We define as a parameter of the function that runs the t-SNE the space in which we want to project our data points. We will choose for our case 2D and 3D. The figure below, Fig. 6, shows the result of the visualization in 2-dimensions. Similarly, Fig. 7 shows the distribution of points and clusters in a 3-dimensional space.
Fig. 6. Distribution of the different clusters in 2D.
72
M. Loukili and F. Messaoudi
Fig. 7. Distribution of the different clusters in 3D.
9
Integration of Natural Language Processing with an Artificial Neural Network
In this part we will integrate the output of our Vectorizer (TF-IDF), which we have reduced using PCA, with a deep neural network to predict the date of the text transmitted to the neural network. We will follow the architecture shown in the figure below (Fig. 8). 9.1
Results of the Temporal Probability Distributions of the Searched Keywords
After training our neural network on the data, we proceed to the visualization of the temporal probability distributions of some examples of keywords (Fig. 9, Fig. 10, Fig. 11). Figure 9 shows the temporal distribution of the keyword Brexit as it was the topic of the moment in 2016 from the first negotiations until the withdrawal of the UK. Similarly, Fig. 10 shows the distribution of the keyword the Flamenco Festival that takes place every year at almost the same date. For this reason, we observe a certain periodicity in the amount of articles per year. And Fig. 11 shows the great quantity of articles related to the Coronavirus pandemic. It can be seen that this situation has started from the end of the year 2019 and continued even in 2021.
ML, DNN and NLP Based Recommendation System
73
Fig. 8. Architecture of date prediction system.
Fig. 9. Graphical representation of the temporal probabilistic distribution of the keyword: Brexit.
74
M. Loukili and F. Messaoudi
Fig. 10. Graphical representation of the temporal probabilistic distribution of the keyword: Flamenco Festival.
Fig. 11. Graphical representation of the temporal probabilistic distribution of the keyword: Covid-19 virus.
10
Conclusion and Outlook
In this paper, we implemented a system that allows clustering of articles as well as prediction of the temporal distribution of articles. We started with a preprocessing of the data by manipulating the keywords. Then we used automatic language processing as a similarity measure with the k-means method to determine the different clusters. And then we integrated NLP with an artificial neural network to obtain the probabilistic date distributions. This project is useful for text mining using NLP and clustering, predicting the temporal distribution
ML, DNN and NLP Based Recommendation System
75
based on the titles and keywords contained in the article.In a next paper, we will implement a recommendation system based on the results obtained in this paper. Moreover, the suggested approach can be applied in other fields including e-commerce.
References 1. Aqel, D., AlZu’bi, S.: Comparative study for recent technologies in Arabic language parsing. In: 2019 Sixth International Conference on Software Defined Systems SDS, pp. 209–212. IEEE (2016). https://doi.org/10.1109/SDS.2019.8768587 2. Vanik, K., Gupta, D.: Using k-means cluster-based techniques in external plagiarism detection. In: 2014 International Conference on Contemporary Computing and Informatics IC3I, pp. 1268–1273. IEEE (2016). https://doi.org/10.1109/IC3I. 2014.7019659 3. Barzilay, R., Leel, L.: An unsupervised approach using multiple-sequence alignment. In: Proceedings of HLTNAACL, Edmonton, pp. 16–23 (2003). https://doi. org/10.48550/arXiv.cs/0304006 4. Siddique, A.-B., Oymak, S., Hristidis, V.: Unsupervised paraphrasing via deep reinforcement learning. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020, USA, pp. 1800– 1809 (2020). https://doi.org/10.1145/3394486.3403231 5. Gark, K.-D., Khullar, V., Agarwal, A.-K.: Unsupervised machine learning approach for extractive Punjabi text summarization. In: 2021 8th International Conference on Signal Processing and Integrated Networks SPIN, Noida, India, pp. 750–754. IEEE (2021). https://doi.org/10.1109/SPIN52536.2021.9566038 6. Al-Azzawy, D.-S., Al-Rufaye, S.-M.: Arabic words clustering by using k-means algorithm. In: 2017 Annual Conference on New Trends in Information and Communications Technology Applications NTICT, pp. 263–267. (2017). https://doi. org/10.10007/1234567890 7. Rangu, C., Chatterjee, S., Valluru, S.-R.: Text mining approach for product quality enhancement: (improving product quality through machine learning). In: 2017 IEEE 7th International Advance Computing Conference IACC, pp. 456–460. IEEE (2017). https://doi.org/10.1109/IACC.2017.0100 8. Agnihotri, D., Verma, K., Tripathi, P.: Pattern and cluster mining on text data. In: 2014 Fourth International Conference on Communication Systems and Network Technologies, Bhopal, India, pp. 428–432 (2014). https://doi.org/10.10007/ 1234567890 9. Huang, D., Wang, C.-D., Peng, H., Lai, J., Kwoh, C.-K.: Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Trans. Syst. Man Cybern. Syst. 51, 508–520. IEEE (2021). https://doi.org/10.1109/TSMC.2018. 2876202 10. Gomaa, W.-H., Fahmy, A.-A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013) 11. Huang, A.: Similarity measures for text document clustering. In: Proceedings of the sixth New Zealand Computer Science Research Student Conference NZCSRSC 2008, Christchurch, New Zealand, pp. 49–56 (2008) 12. Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search, USA, pp. 58– 64. AAAI (2000)
76
M. Loukili and F. Messaoudi
13. Aggarwal, N., Asooja, K., Buitelaar, P.: Pushing corpus based relatedness to similarity: shared task system description. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, Montr´eal, Canada, pp. 643–647 (2012) 14. Renuka, S., Raj Kiran, G.S.S., Rohit, P.: An unsupervised content-based article recommendation system using natural language processing. In: Jeena Jacob, I., Kolandapalayam Shanmugam, S., Piramuthu, S., Falkowski-Gilski, P. (eds.) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems, pp. 165– 180. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-8530-2 13 15. Spacy Python Library. https://spacy.io/. Accessed 1 May 2022 16. Towards Data Science. https://towardsdatascience.com/tf-idf-for-documentranking-from-scratch-in-python-on-real-world-dataset-796d339a4089. Accessed 5 May 2022 17. El Pais Journal. https://elpais.com/. Accessed 1 May 2022
Artificial Intelligence for Fake News Imane Ennejjai1(B) , Anass Ariss1 , Nassim Kharmoum1,2 , Wajih Rhalem3 , Soumia Ziti1 , and Mostafa Ezziyyani4 1
2 3 4
Department of Computer Science, Intelligent Processing Systems and Security Team, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco [email protected] National Center for Scientific and Technical Research (CNRST), Rabat, Morocco E2SN Research Team, ENSAM Rabat, Mohammed V University in Rabat, Rabat, Morocco Mathematics and Applications Laboratory, Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
Abstract. Fake news is a severe problem on social media networks, with confirmed detrimental consequences for individuals and organizations. As a result, detecting false news is a significant difficulty. In this way, the topic of new fakes and their proper detection are crucial. General knowledge states that the receiver of information must verify the sources. However, the creation of new information can be a difficult problem that requires more than a single viewpoint based on a news source. The objective of this paper is to evaluate the performance of six deep learning models for fake news detection including CNN, LSTM, Bi-LSTM, HAN, Conv-Han and Bert. We find that BERT and similar pre-trained models perform the best for fake news detection, the models are described below with their experimental setups.The models are examined against ISOT [22, 30] datasets.
Keywords: Fake news detection learning
1
· Natural language processing · Deep
Introduction
Fake news is information that is incorrect or misleading and is presented as news. With today’s technological breakthroughs, there is an excess of information available on numerous digital platforms, but no proper methods to filter or validate this information. Fake news has been around since the 1835 publication of the “Great Moon Hoax”. Fake news for different commercial and political reasons has appeared in enormous numbers and spread throughout the internet world in recent years, owing to the rapid development of online social networks. Fake news undercuts legitimate media coverage and makes it more difficult for journalists to cover major news events. According to a BuzzFeed investigation,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 77–91, 2023. https://doi.org/10.1007/978-3-031-26384-2_8
78
I. Ennejjai et al.
the top 20 fake news articles regarding the 2016 U.S. presidential election generated more Facebook engagement than the top 20 election stories from 19 major media sites. Anonymously hosted fake news websites without identified authors have also been chastised, as they make it harder to pursue false news producers for libel. Fake news is a neologism for fake information. This form of news, which can be found in traditional news, social media, or fake news websites, has no basis in reality but is portrayed as true. Nowadays, most information that individuals obtain is largely unverified and commonly presumed to be true, which is where the problem starts, leading to anarchy in the country. This situation has particularly reached a tipping point after certain key political events occurred two years ago, leading up to the advent of the COVID-19 epidemic. Fake news has pushed people to the extremities of their ethnic, political, and religious identities, causing instability in nearly every corner of the country. And here’s the crazy thing: a lie spreads faster, deeper, and wider than the truth. Furthermore, because there is no powerful and non-partisan fact-checking group, the problem is escalating. Facebook has claimed to be expanding its fact-checking program as part of its continuous efforts to improve the validity and quality of material found on the site. On the government side, the Ministers Council authorized a measure proposed by the attorney-general to combat hate propaganda and disinformation circulating in November 2019 [31] under the Computer Crime Proclamation. Although the policy is thought to be vital for mitigating the severity of the difficulties, some fear it may weaken free expression and digital rights by drawing on the government’s experience with harsh laws, online censorship, and full Internet outages. Because individuals are not given news literacy at an early time, and the essence of great professional journalism is not adequately entertained, implementing the law may not be as successful as planned. The following parts of this work are organized as follows: The second section elucidates the motivation and the statement of the problem. The third section is an overview of prior relevant work. The approach is summarized in the fourth part. The fifth portion will focus on feature extraction and model implementation. The sixth section explains the outcome and discusses how to put it into practice. Finally, the last part summarizes our findings and details our future plans.
2
Motivation and Statement of the Problem
Fake news is identified by classifying it according to its validity. This is a binary classification problem in a basic environment, but it is a high classification task in a more difficult context. Recently, one of the trendiest artificial intelligence research subjects has been detecting false news. It is simple to create false news and distribute it throughout the world thanks to the internet and people’s propensity to share information through social media. Fake news, when extensively distributed, may have a significant detrimental influence on many parts of life. As a result, a lot of study has lately been done to recognize false news. With the growth of social media, people are exchanging thoughts and information more quickly than ever before. There are now a number of techniques to detecting false news [1]. Despite widespread public interest, detection
Artificial Intelligence for Fake News
79
of fake news has stalled for some time due to a lack of fake news data. Fake news identification did not exhibit remarkable results until 2017, when the Liar [1] dataset was introduced. The length of this dataset is not sufficient for neural network analysis and some models suffer from overfitting. The majority of existing approaches for detecting false news are based on deep learning, as deep learning models have produced cutting-edge achievements in a variety of artificial intelligence fields such as language processing, natural language processing, computer vision, and so on. In particular, because earlier comparison studies were done on a certain type of dataset, it was impossible to draw conclusions about the performance of alternative models. Furthermore, this research concentrated on a small number of traits, resulting in an insufficient examination of the probable characteristics of fake news. Fake news pieces targeting certain news topics, such as health, education, tourism, sport, economy, security, science, IT, and political elections, may be found in the collection. Regarding the news domain, the bulk of the assessed datasets in the articles contained false news items related to politics and society, followed by technology, economy, science, and crime. Our goal in this paper is to give a comparative analysis of existing approaches’ performance on a dataset. We’re also combining diverse functions of current works and examining the effectiveness of several popular text categorization approaches. In this paper, we compare the performance of classical machine learning and deep learning models in a single dataset including news on a variety of themes. We show the results of sophisticated models like convolutional-HAN, Bidirectional LSTM, and Bert.
3
Related Work
Fake news is deliberately written misleading material intended to deceive the public that consists of two parts: authenticity and intent. Authenticity means that fake news contains false information that can be verified as such, which means that conspiracy theory is not included in fake news because it is difficult to be proven true or false in most cases. The second part, the intention, means that the false information was written with the intention of misleading the reader. Fake news has been around for a very long time, almost as long as news began to circulate widely after the invention of the printing press in 14397. However, there is no agreed definition of the term ’fake news’. Fake news has quickly become a societal issue, used to spread false information or rumors to change people’s behavior. The spread of fake news has been shown to have had a significant influence on the 2016 US presidential elections. Broader definitions of fake news focus on the authenticity or intent of news content. Some newspapers describe satire news as fake news because the content is false, even though satire is often entertainment-enabled and reveals its own deception to consumers. Other publications directly claim misleading news as fake news, which includes serious fabrications, hoaxes, and satires. In order to expand on fake news detection, it is important to understand what fake news is and how it is characterized. The following is based on Social Media Fake News Detection: A Data Mining Perspective [25].
80
I. Ennejjai et al.
The study of Shu, Silva, Wang, Jiliang and Liu [25] found that the user’s interactions are significant and meaningful. Furthermore, they proposed to use linguistic characteristics such as total number of words, characters per word, frequencies of large words, frequencies of sentences (i.e. say n-gram and bag of word approaches), parts of speech markup (POS). They took an in-depth look at fake news detection on social media, from the perspective of data mining, evaluation metrics, and representative datasets. On their suggested Liar dataset [1], Wang compared the performance of SVM, LR, Bi-LSTM, and CNN models. Several studies have shown good results in identifying bogus news and monitoring user propagation using neural networks. Wang created a hybrid convolutional neural network model that outperforms other standard machine learning algorithms in his [1]. Kai Shu et al. [25] aims to use the three auxiliary sources of information accessible in social media to correctly identify news content: news, publisher, and recruiters. News publisher, news and social media consumers form a three-way connection. lina et al. [27] managed to detect the position of a news item using the Bert model from a single direction of inference, which may have led in the loss of vital information. Ruchansky et al. [3] introduced a deep hybrid model for detecting fake news that used multiple sorts of variables such as time engagement between n users and m news items over time to provide a label for categorizing false news as well as a score for suspicious people. They extracted temporal aspects from news items using an RNN and social features using a fully connected network. The two networks’ outputs are then combined and utilized for the final categorization. In terms of textual features. They used two datasets to test their model: one from Twitter and the other from Weibo, a Chinese version of Twitter. CSI outperforms simpler models by 6% when compared to basic GRU networks. Bajaj et al. [8] used convolutional neural networks to address the problem from a pure NLP standpoint (CNN). The Stanford University project aims to create a classifier that can determine whether material is true or fraudulent only based on its content. Several architectures have been investigated, including a novel CNN design that includes an attention mechanism. Tacchini et al.[4] focus on using social media features to improve the reliability of their detector.They used logistic regression and the harmonic algorithm [6] to classify the information into hoax and non-hoax categories. Harmonic Algorithm is a method for transferring information between users who liked some common messages. The harmonic algorithm surpassing logistic regression. Ahmed et al. [6] suggested a new n-gram model for identifying fake material automatically, with an emphasis on reviews and news. The authors have published the results of two strategies for extracting distinct attributes, namely tf and tf-idf, as well as six machine learning classification algorithms. Linear classifiers namely: Linear Support Vector Machine (SVM), Stochastic Gradient Descent (SDG) and Logistic Regression (LR) performed better than nonlinear for fake reviews and news. P´erez-Rosas et al. [7] introduced two new datasets for the fake news detection task, covering seven different news areas. From a set of learning experiences to
Artificial Intelligence for Fake News
81
detect fake news, the authors concluded that accuracies of up to 76% could be obtained. Guibon et al. [28] proposed many methods for false news detection systems and sought to find a relationship between satire and fake news using redundant data, achieving a 93% accuracy for several datasets. Used dense neural networks to identify stance between the headline and the text article, dividing the stance into four categories: agree, disagree, discuss, and irrelevant. Dense neural networks are utilized with three forms of embedding TFIDF, Bag of words, and Word2vec with cosine similarity between the headline and text, yielding accuracies of 94.31%, 89.23%, and 75.67%, respectively. Khan et al. [10] specifically compares traditional algorithms such as logistic regression, support vector machine, decision trees, naive bayes, and neighbor K-Nearest over a wide range of neural architectures based on CNN or LSTM. The naive bayes classifier works surprisingly well, while the performance of neural networks depends on expanding the underlying dataset. To detect false news, most past research has used a combination of classical machine learning and neural networks. They did, however, concentrate on recognizing specific categories of data (such as policies). As a consequence, they built models and functionalities for specific data sets that corresponded to their research topic. These methods are likely to be skewed by dataset bias and perform badly on news from different topics. Some previous research has compared several strategies for detecting false news. Previous comparison studies have had the main drawback of being done on a single type of data set, making it impossible to draw conclusions about the performance of multiple models. Additionally, these articles focused on a limited number of features that resulted in an incomplete exploration of the potential characteristics of fake news. We have seen previously that most of the related work focuses on improving the quality of the prediction by adding additional features. The point is that these features are not always available, for example some articles may not contain images. There’s also the issue of using information from social media because it’s simple to start a new account and fool the detection system. That’s why we’ve decided to concentrate entirely on the article’s content to determine if we can effectively spot false news. There are numerous techniques of extracting features and using them in models, as discussed in the preceding sections. This study focused on the accessibility of text-based news content.
4
Overview of the Approach
In our proposed framework, as illustrated in Fig. 1. We begin with data collection for training our models, by preprocessing the dataset, removing unnecessary characters and words from the data. N-gram entities are extracted and an entity matrix is formed representing the documents involved. The last step in the classification process is to train the classifier. We have studied different classifiers to predict the class of documents. We specifically studied 6 different deep learning algorithms; CNN, LSTM, BI-LSTM, HAN, Convolutionnal HAN and Bert.
82
I. Ennejjai et al.
Fig. 1. Workflow for training algorithms and classification of news.
5 5.1
Experimental Evaluation Datasets Statistical Information
The ISOT Fake News Dataset, created by the ISOT Research Lab at the University of Victoria in Canada [22], is the biggest dataset of full-length Fake News stories accessible. There are 44,898 items in the ISOT collection, with 21,417 categorized Real and 23,481 designated Fake Table 1. Each model will be trained with 80% of the ISOT data at first. The remaining 20% of the ISOT data will be utilized to assess the trained classifiers’ accuracy. FakeNewsNet and the Original Data will also be used for testing, as previously stated. The purpose of these extra tests is to ensure that we are detecting Fake News rather than any other pattern in the ISOT dataset, such as a news organization’s style. Each article in the ISOT dataset tagged as Real was obtained from Reuters; all stories began with the term “Reuters”. Humans and robots alike might easily recognize this pattern. To avoid this, the term “Reuters” was omitted at the beginning of each item. We use word clouds for real and false news to visualize this dataset and get insight into it. Figure 2 shows a word cloud of the dataset’s real and fake news. 5.2
Studied Features
Tf-Idf: Term Frequency-Inverse Document Frequency. TF-IDF (Term Frequency- Inverse Document Frequency) is an analysis method that can be used in an SEO strategy to determine the keywords and terms that increase the relevance of published texts and therefore of the Web project in its together. It
Artificial Intelligence for Fake News
83
Table 1. The fragments of the ISOT dataset. News Type Real
Fake
News Type World news Government news Politics news Middle east news US news Left party news Politics news General news Size
21,417
23,481
Fig. 2. Word cloud of fake and true dataset’s
is a formula in which the two values TF (Term Frequency) and IDF (Inverse Document Frequency) are multiplied between them. The result is the relative frequency of terms (or “term weights”) in a document compared to all other Web documents that also contain the keyword in question during analysis. Before you can perform the TF-IDF analysis, the two mentioned factors must first be determined. Term Frequency describes how often a certain term appears in a document compared to all other terms contained in the document. To increase the significance of the measured value, the formula is based on a logarithm which prevents the middle term from receiving too much weight. The Term Frequency is mentioned for the first time in 1992 in the work of Donna Harman who, in her article “Ranking Algorithms”, sees it as the possibility of giving the words of a given document a weighting value useful to the science of l ’information. In website optimization, the TF value has been used for some time as an alternative to the less flexible Keyword Density, which simply reflects the relative frequency
84
I. Ennejjai et al.
of a keyword. The formula for determining the Term Frequency is as follows: TF =
number of times the term appears in a document total number of words in the document
(1)
number of documents in the corpus ) number of documents in the corpus contain the term
(2)
IDF = log(
The TF-IDF of a term is calculated by multiplying TF and IDF scores. T F − IDF = TF × IDF
(3)
Word Embedding. Word embedding is a subset of dense vector representation techniques for representing words and texts. Traditional word bag pattern coding techniques employed vast, dispersed vectors to represent each word or to mark each word in a vector to represent a complete vocabulary. Two examples of how to learn to integrate words from text:Word2Vec and Glove In addition to these carefully designed methods, word embedding can be learned as part of a deep learning model. This can be a slower approach, but fits the model to a set of specific training data. Word2Vec is available in two modes: continuous bag of words (CBOW) and skip gram. It was originally designed to predict a word in a context. For example, given two previous words and the next two words, which word is most likely to occur between them. But it seems that the hidden representation of these words works well as the embedding of words and has some very interesting properties such that words with similar meaning have similar vector representation. It is also possible to perform arithmetic that captures information such as singular, plural or even capital and countries. Glove is a famous unsupervised learning technique for word representation in vector space, developed by Stanford in 2014 [29]. It takes advantage of the word2vec skip-gram model and collaborative filtering methods, also known as matrix factorization. GloVe simply creates a word-word co-occurrence matrix from the whole document used for training and maps each word into a semantically relevant place in space, keeping the distance between related words minimal [20]. 5.3
Features Extraction in the Model
Data Preprocessing. Textual data requires special preprocessing to implement it machine learning or deep learning algorithms. There are various techniques widely used to convert textual data into a form ready for modeling. The data preprocessing steps we describe below are applied to the news content. We also provide information on the different representations of word vectors that we used in the framework of our analysis. Word Cloud: Before embarking on preproscessing, we visualize our data from the word cloud of the most used keywords in our data.
Artificial Intelligence for Fake News
85
Punctuation Removal: Natural language punctuation provides the grammatical context for the sentence. Punctuation marks such as a comma may not add much value to understanding the meaning of the sentence. Stop Word Removal: We start by removing stop words from the available text data. Stop words are insignificant words in a language that will create noise when used as features in text classification. The most common words in a language that do not provide not a lot of context; can be processed and filtered from text because they are more common and contain less useful information. We used the Natural Language Toolkit - (NLTK) library to remove stop words. Stemming: is a technique for removing prefixes and suffixes from a word, ending with the root. Using the root, we can reduce the inflectional forms and sometimes the derivative forms of a word to a common base form. Feature Extraction. The performance of deep learning models depends in large part on the design of the features. Extraction of N-Gram Features: Word-based n-gram was used to represent the context of the document and generate functionality to classify the document as false and real. Many existing works have used unigram (n = 1) and bigram (n = 2) approaches for the detection of false news [11]. We used the TfidfVectorizer function from the sklearn feature python extraction library to generate n-gram TF-IDF functionality. Pre-trained Word Embedding: For neural network models, word embeddings were initialized with pre-trained 100-dimensional embeddings from GloVe [29]. It was trained on a data set of one billion tokens (words) with a vocabulary of 400 thousand words. In this LSTM architecture with gensim, we used Google’s Word2Vec to represent words in 100-dimensional vector space embeddings. Bag of Words (Bow): The word bag technique treats each news item as a document and calculates the number of frequencies of each word in that document, which is then used to create a digital representation of the data, also known as fixed-length vector features. Bag of Words converts plain text to word count vector with the CountVectorizer function for feature extraction. CountVectorizer divides the content of the text form, builds the vocabulary, and encodes the text into a vector. This encoded vector will have a count of the occurrences of each word that is more like a frequency count as a key/value pair. This methodology has drawbacks in terms of information loss. The relative position of words is ignored and context information is lost. 5.4
Studied Models to Implementation of Approaches
Here, we first describe the experimental setup of different models based on neural networks and deep learning used in our experiment.
86
I. Ennejjai et al.
CNN. The Convolutional Neural Networks model was initialized as a sequence of layers. We will use a fully connected network structure with three layers. Fully connected layers are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the activation argument. We will use the rectified linear unit activation function called ReLU on the first two layers and the sigmoid function in the output layer. We use a sigmoid on the output layer to make sure our network output is between 0 and 1 and easy to map to a class 1 probability or hard classification of either class with a default threshold of 0.5. The model was compiled with the ADAM optimizer with a learning rate of 0.001 to minimize the loss of binary cross entropy, Finally, this model was trained over 3 epochs. LSTM with Gensim. We use the gensim library in python which supports a bunch of classes for NLP applications. As discussed, we use a CBOW model with negative sampling and 100-dimensional word vectors. The ADAM optimizer with a learning rate of 0.001 was applied to minimize binary cross-entropy loss and the sigmoid was the activating function for the final output layer. Finally, this model was trained over 3 epochs with batches of 64 and 512. Lstm with Glove. The LSTM model has been pre-trained with GloVe embeddings in 100 dimensions. The output dimension and time steps were set to 300. ADAM optimizer with 0.001 learning rate was applied to minimize the loss of binary cross entropy and the sigmoid was the activation function for the final output layer. Finally, this model was trained over 3 epochs with batches of 64 and 512. Bi-LSTM. The purpose of the Bi-LSTM model is to detect anomaly in a certain part of the news, we need to examine it with both previous and following action events. Bi-LSTM was initialized with pre-trained 100-dimensional GloVe embeddings. An output dimension of 100 and time steps of 300 have been applied. The ADAM optimizer with a learning rate of 0.001 was used to minimize the loss of binary cross-entropy. The size of the learning batch was set at 128 and a loss in each epoch was observed with recall. HAN. The hierarchical attention network consisted of two attention mechanisms for word level and sentence level coding. Before training, we set the maximum number of sentences in a press article to 20 and the maximum number of words in a sentence out of 100. In both encoding levels, a two-way GRU with an output dimension of 100 has been introduced into our custom attention layer. We used a word encoder as the input to our sentence encoder time distributed layer. We optimized our model with ADAM which learned at a rate of 0.001. We used the Keras library with the Tensorflow backend to implement the attention mechanism. Convolutional HAN. In order to extract high level input characteristics, we have incorporated a one-dimensional convolutional layer before each two-way
Artificial Intelligence for Fake News
87
GRU layer in HAN. This layer selected the characteristics of each trigram from the news article before passing it on to the attention layer. Bert. Transformers’ Bidirectional Encoder Representations is a pre-trained model for learning contextual word representations from unlabeled texts. It has two primary characteristics: it is a deep transformer model that can successfully analyze long phrases utilizing the ’attention’ mechanism, and it is bidirectional, meaning it will output based on the complete input sentence. We utilized BERT to manage the dataset and build a deep learning model for false news detection by fine-tuning the bert-based-uncased pre-trained model. Because the BERTLarge model requires a lot of time and memory, we selected BERT-Base for this investigation. The BERT-Base model is made up of 12 layers (transformer blocks) and 110 million parameters.
6
Result
In this section, we describe an analysis of the performance of our neural networkbased on deep learning models. We present the best performance for each dataset. We calculate the accuracy, precision, recall and f1 score for the false and real classes, and find their average, media-weighted (the number of true instances for each class) and report an average score of these metrics. 6.1
Evaluation Metrics
We use accuracy, precision, recall and f1 as evaluation metrics (tp, fp, fn in the following equations are true positive, false positive and false negative respectively). Precision is a measure calculated as the ratio of correct predictions to the total number of examples. Precision is measuring the percentage of positive predictions that are correct and is defined as: tp (4) P recision = tp + f p Recall consists of measuring the percentage of correct predictions that the classifier captures and is defined as follows: tp Recall = (5) tp + f n F1 is to find the balance between recall and precision and is calculated as follows: Precision × Recall (6) Precision + Recall Accuracy is often the most used metric representing the percentage of correctly predicted observations, either true or false. To calculate the accuracy of a model performance, the following equation can be used: tp + tn (7) Accuracy = tp + tn + f p + f n F 1score =
88
I. Ennejjai et al. Table 2. Results of the predictive models on the four datasets Models
Features
Accuracy Precision Recall F1-score
LSTM
word2vec (Genism)
0.98
0.98
0.98
0.98
LSTM
Glove Embedding
0.98
0.98
0.98
0.98
Bilsm
0.78
0.50
0.78
0.89
HAN
0.59
0.59
0.59
0.59
Conv-HAN
0.56
0.56
0.56
0.56
CNN
TF-idfCountvectorizer 0.90
0.90
0.90
0.90
Bert
0.99
0.99
0.98
0.99
Fig. 3. Confusion matrix for Bert and LSTM model
6.2
Result and Discussion
Previous studies on fake news detection mainly focused on traditional machine learning models. Therefore, it is important to compare their performance with the deep learning models. In particular, the goal of the previous study is to compare the performance of different traditional machine learning models and deep learning models on fake news detection. Considering the great success of pre-trained advanced language models on various text classification tasks, In Table 2, we report the performances of different deep learning models. The baseline CNN model is considered the best model for Liar in [1]. We find it to be the third-best neural network-based model according to its performance on the dataset. BILSTM-based models are most vulnerable to overfitting for this dataset, which is reflected by their performance. Although HAN is also a victim of overfitting, as mentioned in [1], Our LSTM with gensim or with gloves exhibits the best performance among the neural models for the dataset, with 98% accuracy and a 0.98 F1-score. CNN models show an improvement on the dataset, whereas CNN models continue their impressive performance Fig. 3. Among the
Artificial Intelligence for Fake News
89
pre-trained advanced natural language deep learning models we studied, Bert shows the best performance with 0.99 accuracy. The pre-trained BERT-based models outperform the other models. We see that the BERT-based model is capable of achieving high accuracy (over 90%) Hence, these models can be utilized for fake news detection in different languages where a large collection of labeled data is not feasible. Different pre-trained BERT models are already available for different languages.
7
Conclusion and Future Work
These papers introduce the problem of fake news. Fake News detection was addressed as a text classification problem. The detection of fake news was treated as a text classification problem. Different automatic learning approaches have been tried to detect it. Therefore, in this paper, we proposed a fake news detection model that relies on deep learning. We have tried many different approaches, including machine learning, deep learning, and transformers. We found that deep learning models outperformed other approaches. We showed that by combining an augmented linguistic feature set with the machine or deep learning models, we could, with high accuracy, identify fake news. In future it would certainly be work exhibits a programmed model for identifying fake news in well-known Twitter strings. Such a model could be important to a huge number of social media users by expanding their own credibility decisions. The Neural Network classifiers presented in this study could be implemented in other complex systems that would not focus on the text only, but also the images, videos, sources, and even the comment section of the news websites.
References 1. Wang, W.Y.: liar, liar pants on fire: a new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017) 2. Reis, J.C.S., Correia, A., Murai, F., Veloso, A., Benevenuto, F., Cambria, E.: Supervised learning for fake news detection. IEEE Intell. Syst. 34(2), 76–81 (2019) 3. Ruchansky, N., Seo, S., Liu, Y.: Csi: a hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on Conference (2017) 4. Tacchini, E., Ballarin, G., Della Vedova, M.L., Moret, S., de Alfaro, L.: Some like it hoax: automated fake news detection in social networks (2017) 5. Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 1953–1961. Curran Associates Inc (2011) 6. Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Priv. 1(1) (2017). https://onlinelibrary.wiley.com/doi/ full/10.1002/spy2.9 7. P´erez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, 20–26 August, pp. 3391–3401 (2018)
90
I. Ennejjai et al.
8. Bajaj, S.: The pope has a new baby! Fake news detection using deep learning (2017) 9. Bali, A.P.S., Fernandes, M., Choubey, S., Goel, M.: Comparative performance of machine learning algorithms for fake news detection. In: Singh, M., Gupta, P., ¨ Tyagi, V., Flusser, J., Oren, T., Kashyap, R. (eds.) ICACDS 2019. CCIS, vol. 1046, pp. 420–430. Springer, Singapore (2019). https://doi.org/10.1007/978-98113-9942-8 40 10. Khan, J.Y., et al.: A benchmark study on machine learning methods for fake news detection. CoRR abs/1905.04749 (2019) 11. Yang, Y., Zheng, L., Zhang, J., Cui, Q., Li, Z., Yu, P.S.: TI-CNN: convolutional neural networks for fake news detection (2018) 12. Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: Euzenat, J., Domingue, J. (eds.) AIMSA 2006. LNCS, vol. 4183, pp 77–86. Springer, Heidelberg (2006). https://doi.org/10.1007/11861461 10 13. Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638 (2017) 14. Afroz, S., Brennan, M., Greenstadt, R.: Detecting hoaxes, frauds, and deception in writing style online. In: ISSP 2012 (2012) 15. Shu, K., Wang, S., Tang, J., Zafarani, R., Liu, H.: User identity linkage across online social networks: a review. ACM SIGKDD Explor. Newsl. 18(2), 5–17 (2017) 16. Garg, A., Roth, D.: Understanding probabilistic classifiers. In: ECML 2001 (2001) 17. Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance detection task. CoRR abs/1806.05180 (2018) 18. Gilda, S.: Evaluating machine learning algorithms for fake news detection. In: 2017 IEEE 15th Student Conference on Research and Development (SCOReD), pp. 110– 115. IEEE (2017) 19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997) 20. Kaliyar, R.K., Goswami, A., Narang, P., Sinha, S.: FNDNet - a deep convolutional neural network for fake news detection. Cognit. Syst. Res. 61, 32–44 (2020) 21. Understanding LSTM Networks. https://colah.github.io/posts/2015-08Understanding-LSTMs 22. Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Priv. 1(1), e9 (2018) 23. Raza, S., Ding, C.: Fake news detection based on news content and social contexts: a transformer-based approach. Int. Jo. Data Sci. Anal., 1–28 (2022) 24. Barrutia-Barreto, I., Seminario-C´ ordova, R., Chero-Arana, B.: Fake news detection in internet using deep learning: a review. In: Lahby, M., Pathan, AS.K., Maleh, Y., Yafooz, W.M.S. (eds.) Combating Fake News with Computational Intelligence Techniques. SCI, vol. 1001, pp. 55–67. Springer, Cham (2022). https://doi.org/10. 1007/978-3-030-90087-8 3 25. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017) 26. Shu, K., Wang, S., Liu, H.: Exploiting tri-relationship for fake news detection. arXiv preprint arXiv:1712.07709, August 8 2017 27. Lin, S.-X., Wu, B.-Y., Chou, T.-H., Lin, Y.-J., Kao, H.-Y.: Bidirectional perspective with topic information for stance detection. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), Taipei, Taiwan, pp. 1–8 (2020) 28. Guibon, G., Ermakova, L., Seffih, H., Firsov, A., No´e-Bienvenu, G.L.: Multilingual fake news detection with satire. In: Proceedings of the International Conference on
Artificial Intelligence for Fake News
91
Computational Linguistics and Intelligent Text Processing, La Rochelle, France, 7–13 April 2019 29. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, January 2014, pp. 1532–1543 (2014) 30. https://www.uvic.ca/ecs/ece/isot/datasets/fake-news/index.php-2022 . Accessed 25 June 2022 31. Ethiopia cabinet approves bill to combat fake news, hate speech — Africanews. https://www.africanews.com/2019/11/11/ethiopia-cabinetapprovesbill-to-combat-fake-news-hate-speech/. Accessed 13 Jan 2021
Traffic Congestion and Road Anomalies Detection Using CCTVs Images Processing, Challenges and Opportunities Ayoub Es-Swidi1(B) , Soufiane Ardchir2 , Yassine Elghoumari2 , Abderrahmane Daif1 , and Mohamed Azouazi1 1
Department of Mathematics and Informatics, FSBM, University Hassane II Casablanca, Casablanca, Morocco [email protected] 2 ENCG, University Hassane II Casablanca, Casablanca, Morocco
Abstract. Traffic congestion and road accidents are some of the most common problems that urban residents suffer from. Because they have a direct impact on their security, physical and psychological health, and even on the economic side etc. this is due to several reasons including the huge number of vehicles on roads, irresponsibility of drivers, and also the state and the architecture of the infrastructures. In order to solve these problems or at least reduce them, several studies have addressed the issue and proposed different solutions from different sides. In this paper, we would like to study the possibility to use Computer Vision techniques and Big Data Analytics subfields, such as machine learning and deep learning algorithms, to build reliable systems which will allow us to detect, identify, and track vehicles and pedestrians in real-time. This will enable us to detect traffic congestion or any other anomaly. These systems will be installed in the network of the Closed-Circuit Televisions (CCTVs) fixed on roads and make them connected. Moreover, this study provides comparative analysis progress of the methods used to detect objects, especially vehicles and pedestrians to choose the most accurate model capable of tracking road users and analysing their behaviours, detecting frauds and saving useful numerical datasets.
Keywords: Traffic congestion Object detection · Tracking
1
· Machine learning · Deep learning ·
Introduction
Today, big cities are facing a real catastrophe because of the huge number of citizens which is constantly increasing and the number of vehicles on the road. The road congestion problems have a direct influence on humanity either on the tangible physical side, like time and security, or on the psychological side. In this field, there are a lot of studies and literature aimed to make life easier and with c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 92–105, 2023. https://doi.org/10.1007/978-3-031-26384-2_9
Traffic Congestion Challenges and Opportunities
93
less stress, by minimizing the congestion on the road and minimizing road incidents, etc. By making the traffic flow smoother and streamlined, we can create better conditions for people with fewer incidents and less stress compared with today. One of the plans pursued by governments is controlling and punishment; that is why our roads and streets are full of surveillance cameras (SC) or ClosedCircuit Television (CCTV) to detect the speed of vehicles or the violations of road marking and so on. However, the role of these cameras is still limited and human interventions are still required; their work starts after the event and not before. For example, the CCTVs cannot forecast an incident or congestion before it takes place, but their role is limited to just reviewing the records. The question raised here is: is it possible to predict an incident or road congestion before it happens, or at least create a system which works automatically; alert the stakeholders with the problem, without any human intervention? The world today is developing faster than ever due to the information technology (IT) and more precisely the domain of Big Data Analytics (BDA) and the Internet of Things (IoT). The Internet of Things (IoT) is a system of interrelated computing devices, mechanical and digital machines provided with unique identifiers (UIDs) and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction. Once the data is collected from CCTVs using IoT, the big data analytics will help us to analyse locomotives’ behaviours on the road by using machine learning algorithms (ML), like SVMs, K-means and deep learning (DL), for building models capable to predict the traffic congestions or road incidents in order to stop them or to reduce them. The remainder of the paper is organized as follows. In Sect. 2, we describe some approaches used in previous literature. In the following Sects. 3 and 4 we provide details about ML and DL algorithms and architectures. In Sect. 5, we overview the architectures that will impact traffic detection, the density (number of vehicles and pedestrians in a region) and the congestion detection. Finally, in Sect. 6, we draw our conclusions.
2
Related Work
Several studies had touched on the subject of road management and the forecasting of congestion using different techniques; one of these techniques is Big Data Analytics. Aided of the technological development and the availability of this sheer amount of structured or unstructured data, such as images, videos and geospatial data, etc. Either manually saved data, or data captured from CCTV, or from social media or other sources; this data makes data scientists anxious to exploit it to manage big cities and make them more intelligent, especially the road networks. Some studies had used the technique of three dimensional traffic monitoring platforms: “vehicle GPS, pedestrian GPS and cameras” [30] This technique can only be used in the first world countries; the countries that are equipped with the latest technological devices (radars, surveillance cameras and sensors ...). In addition to that, there is a law in those countries, which permits the use of the
94
A. Es-Swidi et al.
personal data of the citizens, such as localisation data and tracking. To collect all of this data and analyse it using the hierarchical structure of the system in Figure 1, the researchers in this study used the variables of the three principal dimensions: the first is related to the vehicle (the time, car GPS, and the vehicle number). The second is related to the pedestrian (the time, phone GPS). Third, and finally, the road conditions (the number of lines, line width, speed limit... etc.).
Fig. 1. ITS architecture based on Big Data.
In another study, [36] the researchers tried to understand the spatial-temporal patterns of the congestion using the k-means algorithm and temporal-geospatial analysis. They concluded that there are four modes: weekend mode, holiday mode, weekday mode A, and weekday day mode B. Each of these patterns possesses unique spatial and temporal characteristics. And the result was a map of Begin display congestion related to the four patterns. In [26] the researchers have adopted the SRHTCP technique (SVM-Based Real-time highway traffic congestion prediction) which is a model to represent the relation between the volume of the road, the speed and the density (number of vehicles in the road). This technique was applied on the three data sources which are traffic data, weather data and social media. They used spolt-bolts in apache storm to implement a real-time traffic prediction model. They evaluate the level of the road section in real-time with considering road speed by using the fuzzy theory. In [18] the researchers have analysed images captured from 3k cameras to detect congestion using the combination of various techniques change detection (two images)- the change detection algorithm, image processing and incorporation of a priori information such as traffic model (car or track)-size, and road
Traffic Congestion Challenges and Opportunities
95
network (lanes, width, the spacing between the vehicles). The results are represented in Fig. 2. In the top of the image we have the real image and at the bottom, we have the mask of the real image; the white color represent the vehicles and the colored lanes represent the level of congestion when the lane is red that means that we have congestion and non-congestion if the color of the lane is near to green.
Fig. 2. The hierarchical structure of the system.
The researchers in [38] had developed a system to detect traffic congestion in a city. The system displays the direction of the street according to the name of the street being passed and from each of the names of the streets, resulting by a tweet from the classification. They used social media text information to build a machine learning model and Compare the SVM (Support Vector Machine) Method, which has the highest level of accuracy to detect traffic congestion, especially in the mySVM implementation and libSVM libraries. In addition to this study, there are other studies focused on social media such as Twitter and Facebook to predict traffic congestion. In [21] the researchers trayed Car navigation systems to notify drivers about the current situation of traffic jams in three steps: the first, collect and extract driving information from social media using text-based classification methods, the Second is incorporate a method to transform geographically related terms into geographical coordinates.And finally, developing a system to provide information about important events for drivers,
96
A. Es-Swidi et al.
and to evaluate results of event extraction through comparison with information available from current media sources commonly found in most cars. In order to solve the problem of traffic congestion, some studies used computer vision and image processing. The paper [24] proposes a monitoring system that involves the following phases: (i) Image acquisition, (ii) Image cropping, (iii) Noise removal, (iv) Edge detection, (v) Background Subtraction, (vi) Binarization, (vii) Vehicle Density count, (viii) and Waiting time calculation.
3
Vehicles and Pedestrians Detections Methods
To this end, the use of ML or DL is the most appropriate solution compared with traditional programming. In traditional computing, algorithms are sets of descriptors to describe images in global or in local, The global features describe the image as a whole and can be interpreted as a particular property of the image involving all pixels, while the local features aim to detect key-points within the image and describe regions around these key-points [2]. In ML and DL the computers train data inputs and use statistical analysis in order to predict results or classify data. 3.1
Machine Learning for Object Detection
Machine learning is a subfield of Artificial intelligence(AI). The idea behind ML is to learn or to detect hidden patterns from numerical datasets, in general, ML is divided into four categories [22]: Supervised learning, Unsupervised Learning, Semi-supervised Learning and Reinforcement Learning – Supervised ML: in this case the computers know the target variable which they want to predict or to classify about it. This kind of dataset is namely labelled dataset which means that the target variable in all observations is not empty. As the name indicates, it has the presence of a supervisor as a teacher. Basically, supervised learning is when we teach or train the machine using data that is well labelled. The algorithm receives a set of inputs observations along with the corresponding correct outputs, and then the algorithm learns by comparing its actual output with correct outputs to find errors. Then, it modifies the model accordingly [33] the popular algorithms include: Nearest Neighbor, Naive Bayes, Decision Trees, Linear Regression, Support Vector Machines (SVM), Neural Networks, etc. – Unsupervised ML techniques are applied when the training data is not labelled. The algorithm doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabelled data. This kind of algorithm works well on problems like clustering or association. The popular algorithms include k-means, KNN (k-nearest-neighbor), and Hierarchical clustering, etc. – Semi-supervised ML algorithms use both labelled and unlabelled data for training. Initially, similar data is clustered along with an unsupervised learning algorithm, it helps to label the unlabelled data into labelled data. In
Traffic Congestion Challenges and Opportunities
97
many practical situations, the cost to label is quite high, since it requires skilled human experts to do that. So, in the absence of labels in the majority of the observations but present in a few, semi-supervised algorithms are the best candidates for model building. – Reinforcement Learning is a category of ML that interact with the environment by actions and locates errors to allow the machine to determine the ideal behaviour within a specific context automatically in order to maximize its performance. This kind of ML is a subset of unsupervised learning which performs learning very differently. It takes up the method of “cause and effect”. As we mentioned before, the algorithms of ML apply to numerical data, the input data type has to be an array of numbers in order to apply mathematical formulas and statical operations. Because the data captured from the CCTVs is videos or a sequence of frames. The goal is to build a model capable to detect objects and especially vehicles and pedestrians. So, the first step is to convert these frames into an array, to this end there are two possibilities: the first which is not recommended- is to apply to flatten function on the entire image for example an image of size 600 × 600 pixels will be converted to an array of 1 line and 360000 = 600 × 600 columns. This solution still limited and the results are not satisfied enough. The second possibility is to apply descriptors on the image before the ML algorithms including the following: Color Histogram, Color and Edge Directivity Descriptor (CEDD), [5] Histogram of Oriented Gradients (HOG) [17]. The descriptors aim to extract features to describe elementary characteristics such as colors, texture and shapes, etc. 3.2
Deep Learning for Object Detection
DL is also a subfield of machine learning, which is essentially a perceptron of more than two hidden layers known as neural network (NN) or fully convolutional neural network. It has demonstrated the ability to out-perform other machine learning algorithms on tasks such as object recognition in the field of computer vision [15], sentiment analysis in the field of text analysis and also on traditional problems like the classification of numerical data or the prediction. In addition to the reliability and the high accuracy of the models built by DL, the DL has a great feature which is it can work on the native data such as image and text, it eliminates the pre-processing that is typically involved with ML. It automates feature extraction, removing some of the dependency on human experts. In general DL learning include three types of neural networks: – Artificial Neural Networks (ANN) is a perceptron of multilayers each layer contain several nodes (neurons) connected with each other and each connection is assigned a weight that represents its relative importance in the network. It trains numerical data by repeating three steps, the first is forward propagation to move from the input layer to the output layer, the second is applying the error function, and finally the backword propagation in order to adjust or correct the weights from previous epochs.
98
A. Es-Swidi et al.
– Convolution Neural Networks (CNN) is a type of NN that work especially on images to either image classification, image segmentation, or multi-object detection (the objects and their locations in the image). The CNN algorithms are constructed by multiple layers; including convolutional layer, non-linearity layer, pooling layer and fully connected layer [1] – Recurrent Neural Networks (RNN) uses sequential data or time-series data. It is a special structure that aims to take advantage of temporal information of the input data [35] RNN algorithms are used to solve problems such as speech recognition [11], and Natural Language Processing (NLP). The most popular RNN architectures in use, including long short-term memory (LSTM) and gated recurrent units (GRUs) [7]. To detect and identify vehicles or pedestrians or any object related to the subject of the paper from the captured frames of CCTVs, the CNN is the perfect solution adoptable for this end, due to several reasons, including the high accuracy of the CNN models and its ability to detect objects and their location in the frame, which does not require knowledge about descriptors or conversion of the native image to a numeric array or any image processing. Moreover, CNN is useful for real-time multi-object detection by ingesting a trained and validated model in a system. As noted above the CNN is a suite of layers called architectures. 3.3
Architectures and Approaches
Generally, the CNN algorithm is three steps Convolutional network (ConvNet), flatten function, and a fully-connected network (FCN). The FCN is an ANN of a specified number of layers and nodes. It takes as input the flattened data which passed from the ConvNet, the ConvNet step is also an ordered suite of convolutional layers and pooling layers with tuned parameters. For example VGG16 [23], AlexNet [15], Inception-v4 and Inception-ResNet [25] etc. The Table 1 is an example of comparison on five architectures using ImageNet challenges, the Top1 and Top-5 the errors indicate the proportion of the time the classifier does not provide the highest score to the correct class and top five guesses respectively. Table 1. The comparison of different CNN architectures on model size, classification error rate, and model depth. Based on [8] Model
Size (M) Top-1/Top-5 error (%) Layers Model description
AlexNet VGG-16 VGG-19 GoogleNet ResNet-50 ResNet-152
238 540 560 40 100 235
41.00/18.00 28.07/9.33 27.30/9.00 29.81/10.04 22.85/6.71 21.43/3.47
8 16 19 22 50 152
5 convs + 3 fc layers 13 convs + 3 fc layers 16 convs + 3 fc layers 21 convs + 1 fc layers 49 convs + 1 fc layers 151 convs + 1 fc layers
Traffic Congestion Challenges and Opportunities
99
To detect vehicles or pedestrians from the frames of the CCTVs, applying one of these architectures on the entire frame is not logical because a frame contains several objects. Hence, the possible idea is to divide all frames into a number of regions or crop the interesting regions in each frame to use it as input for a CNN architecture. Although, it is still hard due to the huge number of regions and this could computationally blow up, therefore approaches such as Regions-CNN, Mask R-CNN, YOLO exist to make the detection faster and more accurate. – Region-Based CNN (RCNN) [10] method propose a scalable detection algorithm that improves mean average precision (mAP) by more than 30% compared with the previous best work by selection just 2000 regions from the image using selective research [27]. The RCNN takes around 47 s for each test image which still hard and high computation time for real-time detection problem. – Fast RCNN [9] algorithm start with a convNet and max-pooling layer on the whole image to produce a convolutional feature map, Then, for each object proposal a region of interest (RoI) pooling layer extracts a fixed-length feature vector from the feature map, and it finishes by FCN on each vector to take the prediction. – Faster RCNN [20] is composed of two modules: The first module is FCN that proposes regions, and the second module is the Fast R-CNN detector that uses the proposed regions instead of the selective research. – Mask R-CNN [12] is an object instance segmentation. It classifies each pixel into a fixed set of categories for example vehicle, pedestrian and so on. In addition to the category prediction and the bounding boxes, the Mask RCNN outputs a binary mask for each RoI. Furthermore, the object detection problem or generally computer vision has known a revolution in recent years by the appearance of two frameworks: YOLO (You Only Look Once) [4,19,28,37] and DETECTRON 2 [34]. Hence, to detect and identify vehicles or pedestrians or other object, the developers just need to collect and prepare the data related to the problem they want to solve, then they can fit their models with pre-trained classifiers built by one of those frameworks.
4
Identification and Tracking
Once the objects are detected either vehicles or pedestrians in each frame, we can imagine that each object is framed by a bounding box around it in the video. But how do we know if an object in one frame is the same as one in a previous frame? This study is interested in the behaviours analysis and the movement of these objects. Therefore, the identification of these objects in consecutive video frames is recommended and it allows us to track multi-objects and get reports about them in real-time. To this end, there are a variety of algorithms [6,14]. For example, siamMask [29] is a simple approach that enables fully-convolutional Siamese trackers to produce class-agnostic binary segmentation masks of the target object. This approach requires an initialised frame with a simple bounding
100
A. Es-Swidi et al.
box without the detection of the objects in each frame. There is also Mean-shift [31] which is a clustering algorithm similar to K-means without the number of clusters “K”. The Mean-shift algorithm is not well in some cases, especially under very complicated conditions. Currently, the most popular tracker is DeepSORT [32] which is an extension of Simple Real-time Tracker (SORT) [3]. This last adopts techniques such as the Kalman Filter [13], and Hungarian algorithm [16] to predict the track of previously identified objects, and match them with new detections, and it improved by the integration of appearance information to produce Deep-SORT. Due to this extension objects are trackable through longer periods of occlusion and the number of identity switches is reduced.
5
Proposed Workflow
Traffic congestion detection or anomalies on roads are services that exist in some applications like Wase application, most of these applications use social media or personal GPS data on the condition that pedestrians or vehicles enable access to the location information on their devices, which proposes a confidential problem for the governments or the companies interest in this field. Our proposed approach is to ingest object detection and tracking services in a public network of CCTVs fixed on roads, by the combination of various techniques. YOLOV5 to detect objects and Deep-SORT to identify and track objects. These two techniques demonstrated their high ability in this field; the Fig. 3 illustrates the mechanism used in the approach. The process is constructed by a trained and validated model to detect and identify pedestrians, vehicles and road signs. Then the model is ingested in each CCTV in the network to detect congestion, detect an accident, fraud according to the situation (Passing the red light, passing the legal speed and so on) and also save numerical datasets. This latter is a time-series dataset store information about the events, DateTime and locations, weather, information about the road quality (crumbling or not), number of lanes and the density (number of vehicles on the road) information about traffics and so on. This dataset will allow us to analyse the road network and deploy reports about a city and to predict traffic congestion. Furthermore, to forecast a road accident before it occurs. As we mentioned above, the model must be already trained and validated. To use YOLO-V5 as an object detection algorithm. The framework requires a specific YOLO label format; each image in the dataset is associated with a “.txt” file with the same name, these files contain the essential information’s about the objects that appeared in the corresponding image, which are the object-class (vehicle, pedestrian, or specific road sign), the coordinate of the object, the height, and the width. After that, our custom prepared dataset will be used to train different versions of YOLO-V5, including YOLOv5s, YOLOv5m, YOLOv5l, etc. To choose the accurate model. And then, vehicles and pedestrians will be associated to Deep SORT tracker, while the road signs determine which behavior we need to detect.
Traffic Congestion Challenges and Opportunities
101
Fig. 3. Proposed process workflow based on CCTVs image processing
6
Results and Discussion
In this section, the most important stage of the proposed model -multi-object detection- will be analysed and discussed. Some techniques of object detection mentioned in the methodology section are adopted and evaluated, using different metrics according to the algorithm type. 6.1
Evaluation Metrics
The first proposed method is the combination of two steps; transformation of the image to a vector using flatten function or descriptors, and ML Algorithms. This method is useful for image classification without locating the object in the frame and returns whether the frame contains an object or not. To evaluate such methods, three metrics are proposed precision, recall and f1-score. The last methods are algorithms of DL or in other terms, multi-object detection tools. To evaluate such methods, the mean average precision (mAP) metric is provided. Firstly, the precision metric is the ratio of true positive to all the positives predicted by the model, it explains how many of the correctly predicted objects turned out to be positive. Secondly, the recall, it explains how many actual positive objects the model is able to predict correctly. And finally, the F1 -Score which is a combination between the precision and the recall. To evaluate object detection models like RCNN or YOLO, the mAP is useful. The model compares the truth bounding box with the predicted bounding box according to the intersection over union (IOU) measurement. Generally, a 0.5 IOU ratio for each prediction at the training stage is aimed to assume the existence of the object. 6.2
Discussion
Based on the results in the Table 2, the use of traditional ML allows classification images, and since we are interested in studying and analysing the behaviours of objects we cannot adopt this type of solution, while the use of DL allows us to detect objects and their locations in the frames and gives quick results and good accuracy, especially YOLO algorithms. In addition to that looking at map validation and map test, one of the most famous problems of DL, which is overfitting is not raised while using YOLO algorithms.
102
A. Es-Swidi et al. Table 2. Relevant results for Pedestrian and Vehicle Detection. Algorithm
COCO dataset
metrics
prec
recall
F-1
0.2
0.22
+ 0.69
0.43
0.56
HOG descriptor + per- 0.63 ceptron
0.51
0.54
ML for image classifica- flatten + tion algorithms regression HOG SVM
logistic 0.32
descriptor
metrics
mAP val mAP test speed (ms/image)
DL for object detection SSD and localisation
19.7
19.3
49.7
VGG16
21.5
21.9
52
Fast RCNN
60.01
52.3
32
Faster RCNN
66.9
68.9
47
68.06
43.3
0.8
YOLO v5m6
51.3
69.2
0.88
YOLOv3
51.9
63.4
1.24
YOLO for object detec- YOLO v5s6 tion and localisation
7
Conclusion
Instead of the traditional ways of managing cities through using controlling and punishment with manual radars or human interaction (policeman), we propose the use of new technologies, such as IoT and big data analytics, by injecting the networks of CCTVs with programmes capable to detect, identify and track vehicles and pedestrians. Moreover, thanks to the CCTVs installed on roads and the huge amount of data - images containing vehicles and persons - and also several methods in the fields of computer vision and DL including the 5th version of YOLO and DeepSORT, the object detection and tracking solution for either vehicles or pedestrians have become a reality and a reliable solution which we can trust and adopt in the management of our cities. This solution allows us to detect a traffic congestion or an accident in real-time, to detect fraud automatically, and finally to store various numerical datasets. In future work This datasets, will enable us to use it to deploy reports, detect hidden patterns and forecast anomalies and also make it a reference between the architects’ hands for building new cities that would make our lives safer, smoother, and more comfortable.
References 1. Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neural network. In: 2017 International Conference on Engineering and Technology (ICET), Antalya, pp. 1–6. IEEE, August 2017. https://doi.org/10.1109/ ICEngTechnol.2017.8308186. https://ieeexplore.ieee.org/document/8308186/
Traffic Congestion Challenges and Opportunities
103
2. Awad, A.I., Hassaballah, M. (eds.): Image Feature Detectors and Descriptors: Foundations and Applications, Studies in Computational Intelligence, vol. 630. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28854-3 3. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468, September 2016. https://doi.org/10.1109/ICIP.2016.7533003. http:// arxiv.org/abs/1602.00763 4. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOV4: optimal speed and accuracy of object detection (2020) 5. Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 312–322. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79547-6 30 6. Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: 2013 IEEE International Conference on Computer Vision, Sydney, Australia, pp. 2304–2311. IEEE, December 2013. https://doi.org/ 10.1109/ICCV.2013.286. http://ieeexplore.ieee.org/document/6751397/ 7. DiPietro, R., Hager, G.D.: Deep learning: RNNs and LSTM. In: Handbook of Medical Image Computing and Computer Assisted Intervention, pp. 503–519. Elsevier (2020). https://doi.org/10.1016/B978-0-12-816176-0.00026-0. https://linkinghub. elsevier.com/retrieve/pii/B9780128161760000260 8. Fu, J., Rui, Y.: Advances in deep learning approaches for image tagging. APSIPA Trans. Signal Inf. Process. 6, e11 (2017). https://doi.org/10.1017/ATSIP.2017.12. https://www.cambridge.org/core/product/identifier/S2048770317000129/type/ journal article 9. Girshick, R.: Fast R-CNN. arXiv:1504.08083 [cs], September 2015. http://arxiv. org/abs/1504.08083 10. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs], October 2014. http://arxiv.org/abs/1311.2524 11. Graves, A., et al.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855– 868 (2009). https://doi.org/10.1109/TPAMI.2008.137. http://ieeexplore.ieee.org/ document/4531750/ 12. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2980–2988. IEEE, October 2017. https://doi.org/10.1109/ICCV.2017.322. http://ieeexplore.ieee.org/ document/8237584/ 13. Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960). https://doi.org/10.1115/1.3662552. https:// asmedigitalcollection.asme.org/fluidsengineering/article/82/1/35/397706/A-NewApproach-to-Linear-Filtering-and-Prediction 14. Kim, C., Li, F., Ciptadi, A., Rehg, J.M.: Multiple hypothesis tracking revisited. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 4696–4704. IEEE, December 2015. https://doi.org/10.1109/ICCV.2015. 533. http://ieeexplore.ieee.org/document/7410890/ 15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25. Curran Associates, Inc. (2012). https://papers.nips.cc/paper/2012/ hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
104
A. Es-Swidi et al.
16. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955). https://doi.org/10.1002/nav.3800020109. https:// onlinelibrary.wiley.com/doi/abs/10.1002/nav.3800020109 17. Lee, K.L., Mokji, M.M.: Automatic target detection in GPR images using histogram of oriented gradients (HOG). In: 2014 2nd International Conference on Electronic Design (ICED), Penang, Malaysia, pp. 181–186. IEEE, August 2014. https://doi.org/10.1109/ICED.2014.7015795. http://ieeexplore.ieee. org/document/7015795/ 18. Palubinskas, G., Kurz, F., Reinartz, P.: Model based traffic congestion detection in optical remote sensing imagery. Eur. Transp. Res. Rev. 2(2), 85– 92 (2010). https://doi.org/10.1007/s12544-010-0028-z. https://etrr.springeropen. com/articles/10.1007/s12544-010-0028-z 19. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv (2018) 20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv:1506.01497 [cs], January 2016. http://arxiv.org/abs/1506.01497 21. Sakaki, T., Matsuo, Y., Yanagihara, T., Chandrasiri, N.P., Nawa, K.: Real-time event extraction for driving information from social sensors. In: 2012 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), Bangkok, pp. 221–226. IEEE, May 2012. https://doi.org/10. 1109/CYBER.2012.6392557. http://ieeexplore.ieee.org/document/6392557/ 22. Saravanan, R., Sujatha, P.: A state of art techniques on machine learning algorithms: a perspective of supervised learning approaches in data classification. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, pp. 945–949. IEEE, June 2018. https://doi.org/ 10.1109/ICCONS.2018.8663155. https://ieeexplore.ieee.org/document/8663155/ 23. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 [cs], April 2015. http://arxiv.org/abs/1409. 1556 24. Sujatha, M., Devi, R.: Traffic congestion monitoring using image processing and intimation of waiting time. Int. J. Pure Appl. Math., 8 (2017) 25. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261 [cs], August 2016 26. Tseng, F.H., et al.: Congestion prediction with big data for real-time highway traffic. IEEE Access 6, 57311–57323 (2018). https://doi.org/10.1109/ACCESS.2018. 2873569. https://ieeexplore.ieee.org/document/8481486/ 27. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https:// doi.org/10.1007/s11263-013-0620-5 28. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-YOLOv4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13029–13038, June 2021 29. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.S.: Fast online object tracking and segmentation: a unifying approach. arXiv:1812.05050 [cs], May 2019 30. Wang, Z.: Analysis and prediction of urban traffic congestion based on big data. Int. J. Data Sci. Technol. 4(3), 100 (2018). https://doi.org/10.11648/ j.ijdst.20180403.14. http://www.sciencepublishinggroup.com/journal/paperinfo? journalid=390&doi=10.11648/j.ijdst.20180403.14
Traffic Congestion Challenges and Opportunities
105
31. Wen, Z.q., Cai, Z.x.: Mean shift algorithm and its application in tracking of objects. In: 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, pp. 4024–4028. IEEE (2006). https://doi.org/10.1109/ICMLC.2006.258803. http://ieeexplore.ieee.org/document/4028776/ 32. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. arXiv:1703.07402 [cs], March 2017 33. Woschank, M., Rauch, E., Zsifkovits, H.: A review of further directions for artificial intelligence, machine learning, and deep learning in smart logistics. Sustainability 12(9), 3760 (2020). https://doi.org/10.3390/su12093760. https://www.mdpi.com/ 2071-1050/12/9/3760 34. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https:// github.com/facebookresearch/detectron2 35. Yang, C., Jiang, W., Guo, Z.: Time series data classification based on dual path CNN-RNN cascade network. IEEE Access 7, 155304–155312 (2019). https://doi. org/10.1109/ACCESS.2019.2949287 36. Zhao, P., Hu, H.: Geographical patterns of traffic congestion in growing megacities: big data analytics from Beijing. Cities 92, 164–174 (2019). https://doi.org/ 10.1016/j.cities.2019.03.022. http://www.sciencedirect.com/science/article/pii/ S0264275119301891 37. Zhu, X., Lyu, S., Wang, X., Zhao, Q.: TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios (2021) 38. Zulfikar, M.T., Suharjito: Detection traffic congestion based on Twitter data using machine learning. Procedia Comput. Sci. 157, 118–124 (2019). https:// doi.org/10.1016/j.procs.2019.08.148. https://linkinghub.elsevier.com/retrieve/pii/ S187705091931066X
Text-Based Sentiment Analysis Adil Baqach(B) and Amal Battou IRF-SIC, Department of Computer Science, Faculty of Science, Ibn Zohr University, Agadir, Morocco [email protected], [email protected]
Abstract. The explosion of data has sparked a surge in interest in sentiment analysis in recent years. We may use it in various industries, including marketing, psychology, human-computer interface, and e-Learning. Sentiment analysis may take numerous forms, including facial expressions, speech recognition, and text classification. Sentiment analysis from text is helpful in various industries. Still, it is particularly significant in eLearning for assessing students’ emotional states and, as a result, putting in place the required interactions to drive students to engage in and complete their courses. Because sentiment analysis from text is a relatively new topic that still needs more work and study, this article focuses on it. In this article, we discuss many approaches for extracting sentiment from the text in the literature, starting with selecting features or text representation and ending with the training of the prediction model using either supervised or unsupervised learning algorithms. There is still work that can be done to improve performance. We must first review the current methods and approaches in this field, discuss improvements in specific techniques or even propose new ones. Keywords: Sentiment analysis from text processing · Deep learning
1
· Natural language
Introduction
Predicting users’ emotional states from their written messages and comments is vital, but it is also tricky owing to ambiguous language [1]. Textual statements are frequently not simply direct, employing emotional terms like “happy” or “angry,” but we also derive emotions by interpreting meanings and circumstances. However, the need for emotion recognition is growing as both structured and unstructured data grows in size as a result of social media [2]. However, it is still a study subject that requires much work before attaining sentiment analysis success. In the human-machine relationship, emotion detection is crucial. Speech, facial expressions, and written language may all discern emotions. However, enough effort has been made to detect face and speech emotions than text-based emotion identification, which motivated us to do more work in this field as sentiment analysis may significantly improve various applications. For example, predicting students’ feelings might help educators overcome difficulties c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 106–121, 2023. https://doi.org/10.1007/978-3-031-26384-2_10
Text-Based Sentiment Analysis
107
like bewilderment and boredom, which impair students’ engagement and performance [1,3]. These feelings may also be used as input for other applications, such as recommendation systems for students in e-learning environments, which can suggest alternative pedagogical pathways depending on the students’ sentiments [4,5]. Also, in marketing, emotion detection might be used to forecast consumers’ feelings about products and services, leading to changes in the product to suit the customers’ demands better and improve the customer connection. Many papers in the literature evoke the topic of sentiment analysis from the text. Therefore the role of this paper is to provide a review of different methods that exist in the literature. Such a review could provide a global perspective on this field, help us avoid redundancies, understand the various existing methods, and give a global comparison between them to emphasize methods with better performance, therefore analyzing the opportunities for improvement and innovation. According to the literature, there are two main sections for predicting emotions from the text: the first relies on text representation using various Natural Language Processing (NLP) approaches. The second considers the algorithm that receives the represented text and builds a sentiment prediction model. We will divide the rest of the article as follows: the first section will provide some preliminaries of the methods that we will refer to in the article; in the second section, we will state the well-known and performant algorithms used in emotion detection from the text in the literature, and finally, we will conclude and give our predicted future work.
2 2.1
Background Text Representation
This section will give a detailed overview of some of the most known text representation methods in natural language processing. Word Embedding. Word embedding is a powerful word-level representation that can recognize the semantic meaning of words. It represents Each word by a fixed-size vector containing values for each semantic characteristic derived by random initialization and modified throughout training. For this task, we can refer to many well-known methods in the literature. For example, word2Vec generates word vectors using one of two methods: – Continuous bag of words (CBOW): this method predicts the target word based on its surrounding words. – Skip-gram: this method does the opposite by predicting the surrounding context words based on the target word. GloVe, or Global Vectors for word representation, is a Stanford-developed unsupervised learning system; It produces word embeddings from a corpus’s global word-word co-occurrence matrix. In vector space, the resultant embeddings reveal fascinating linear substructures of the word. We can quantify the similarity
108
A. Baqach and A. Battou
of words after acquiring their vectors, depending on whether they are normalized or not. A simple dot product of vectors is generated to assess similarity if they are normalized. Cosine similarity is utilized if they are not normalized using the formula: (u.v) (1) cosinesimilarity = 1 − cosinedistance = ||u||||v|| Authors of [6] used Word2Vec to introduce their suggested Word2Vec-based paradigm. After obtaining the word vectors using the Word2Vec algorithm, they assigned weights to the vectors using the TF-IDF weighting technique. The authors’ novel strategy is to identify whether or not a word contains sentiment information; hence the weight calculation procedure for each word vector is: wi = tf idf.e α, ti is sentiment word e= 1, otherwise
(2)
(3)
Then let a be the distributed vector obtained by Word2Vec, then the vector obtained by the proposed method is: v = wi .a
(4)
In their study, authors of [7] employed this strategy to examine Word2Vec, TF-IDF, and the combination of the two approaches, and the findings revealed that integrating TF-IDF and Word2Vec offered the best results. Term Frequency - Inverse Document Frequency (TF-IDF). It is a statistical approach for determining how important a word is to a text, and its use is to re-weight vectors acquired by the other methods depending on the frequency of each token. It is determined by multiplying two metrics: the Term Frequency (TF) of a word in a document and the Inverse Document Frequency (IDF) of a word in a collection of documents: tf idf (t, d, D) = tf (t, d).idf (t, D)
(5)
tf (t, d) = log(1 + f req(t, d))
(6)
where:
idf (t, D) = log(
N ) count(d ∈ D : t ∈ d)
(7)
where N is the total number of documents (sentences), and (d ∈ D : t ∈ d) is the number of documents where the term t appears. Several studies used this method, [8] utilized it in their work to represent words before feeding them into a machine-learning algorithm to estimate sentiment and opinion from the text.
Text-Based Sentiment Analysis
109
Vector Space Model (VSM). The VSM model is an algebraic representation of a text as a vector of identifiers. It can help us spot similarities between distinct writings, even if they do not use the same terms. Each term is represented as a multi-dimensional vector, with each dimension representing a particular phrase (see Fig. 1). If a term appears in a document, its value is non-zero; there are various methods for calculating these values (TD-IDF, Co-occurrence, Etc.). After getting these vectors, we can calculate their similarity using a variety of methods:
Fig. 1. Word by document representation in a vector space
– Euclidian distance: length of a straight line between two vectors: n Euc(A, B) = (Ai − Bi )2
(8)
i=1
– Cosine similarity: cosine of angle between two vectors cos(θ) =
N wi,j .wi,q dj .q = i=1 N N ||dj ||.||q|| 2 2 i=1 wi,j . i=1 wi,q
(9)
110
A. Baqach and A. Battou
Pointwise Mutual Information (PMI). We can determine a word’s relevance inside a corpus using TF-IDF, but what if we quantify the score of terms concerning a specific category? Another feature scoring termed “association measures” is employed for this assignment. The most frequent association measure is PMI, which we can use to assess if a text is “positive” or “negative” by calculating the PMI score between each word in the document and the “negative” or “positive” categories. We can also use it to normalize a vector space matrix by finding the weight of a word “w” in the category “c.” We calculate PMI using the following mathematical formula: P M I(x, y) = log
p(x|y) p(y|x) p(x, y) = log = log p(x).p(y) p(x) p(y)
(10)
where: – P(x—y) is the number of documents (sentences) in category y that contains x. – P(x) is the number of documents containing x. After extracting all NAVA words (Noun, Adverb, Verb, Adjective), [9] calculates the PMI of each NAVA word concerning different words that represent different emotions. Then they used this PMI score to calculate an emotion vector that will be used in an unsupervised machine learning problem to detect sentiments from the text. Co-occurrence Matrix. In a fixed context frame, a co-occurrence matrix is a matrix that represents the number of times words appear together. Let us take an example: let us suppose our corpus comprises the following two sentences: – I like text classification – I like deep learning Let the window context be 1 for this example, which means that we will be considering one word to the left and one to the right for each term: – – – – – –
I: like (2) Like: I(2), text(1), deep(1) Text: like(1), classification(1) Classification: text(1) Deep: like(1), learning(1) Learning: deep(1) So the co-occurrence matrix results:
⎞ 200000 ⎜2 0 1 0 1 0⎟ ⎟ ⎜ ⎜0 1 1 0 0 0⎟ ⎟ Co − occurence = ⎜ ⎜0 0 1 0 1 0⎟ ⎟ ⎜ ⎝0 1 0 0 0 1⎠ 000010 ⎛
(11)
Text-Based Sentiment Analysis
2.2
111
Machine/Deep Learning Methods
Support Vector Machines. A Support Vector Machine (SVM) is a binary linear classifier that is non-probabilistic. It seeks out hyper-planes that divide the classes. SVM outperforms Naive Bayes in traditional text classification. When the number of dimensions exceeds the number of samples, SVM is effective. It is also memory-friendly. The kernel can alter the SVM’s efficacy. The most general kernel approaches are linear, polynomial, and radial basis functions. The linear kernel produces a straightforward classifier (see Fig. 2). It graphs as a straight line and works best with significant volumes of data. Non-linear kernels are more adaptable and typically perform better. Polynomial kernel (SVM Poly) and radial basis kernel (SVM RB) are the most frequent non-linear kernels. The polynomial kernel, commonly represented as a curved line, is helpful for natural language processing.
Fig. 2. SVM classifier with linear kernel
Convolutional Neural Network (CNN). CNN is a deep learning approach used to analyze visual data but has now proven helpful for various other applications, including text categorization. CNN has a shared-weight architecture
112
A. Baqach and A. Battou
of convolution kernels or filters that slide along input features to produce feature maps of translation equivariant replies. In contrast to other deep learning approaches, fully connected networks link each neuron in one layer to a neuron in the next layer, thus the name. Regularization, such as penalizing parameters during training, is required to solve the over-fitting problem in this sort of architecture. However, regularization in CNN takes a different approach by utilizing the hierarchical structure in data and creating patterns of greater complexity using smaller and simpler patterns found in its filters; We represent the architecture of CNN in Fig. 3 below.
Fig. 3. Simple CNN architecture
Long Short-Term Memory (LSTM). LSTM is an RNN (recurrent neural network). LSTM features feedback connections, unlike a conventional feedforward neural network, and it can handle not just single data points but also sequences of data like speech or video. A cell, input, output, and forget gate make up an LSTM. The cell remembers values across arbitrary periods, and the three gates govern the flow of information into and out of the cell. The LSTM model also overcomes the RNN model’s problem of disappearing gradients. As illustrated in Fig. 4, the LSTM contains three gates: input, forget, and output, with the following equations: it = σ(Wi .[h(t−1) , xt ] + bi )
(12)
ft = σ(Wf .[h(t−1) , xt ] + bf )
(13)
ot = σ(Wo .[h(t−1) , xt ] + bo )
(14)
It , ft and ot are the input, forget and output gate’s equations respectively, σ represents the Sigmoid function, Wx is the weight matrix for the gate x, h(t−1) is the output of the previous LSTM block, xt is the current timestamp input and bx is the bias for the gate x. For the cell, candidate, and final output equations: ct = tanh(Wc .[h(t−1) , xt ] + bc )
(15)
Text-Based Sentiment Analysis
113
ct = ft ∗ c(t−1) + it ∗ ct
(16)
ht = ot ∗ tanh(ct )
(17)
Fig. 4. One LSTM cell architecture
Self-attention. The authors more generically define the attention method in [10], who use the key, query, and value to define it. They achieved Self-attention by applying attention to each location in the input sequence according to the formula: Q.K T Attention(Q, K, V ) = sof tmax( √ )V (18) dk Instead of utilizing one attention function with the dimension of the keys, values, and queries dm odel, Vaswani et al. recommended employing multi-head attention by applying self-attention h times with distinct projections of dk , and dv , as presented in [10]. Attention is conducted parallel for each (Q, K, V) matrices, resulting in a dv dimension output. The final value is obtained by concatenating these values; the paper explains the process using the following formulas: M ultiHead(Q, K, V ) = Concat(head1 , . . . , headh )W O
(19)
headi = Attention(QWiQ , KWiK , V WiV )
(20)
where the projections are the matrices: WiQ ∈ R(dmodel ∗dk ) , WiK ∈ R(dmodel ∗dk ) , WiV ∈ R(dmodel ∗dv ) , andW O ∈ R(hdv ∗dmodel )
114
3
A. Baqach and A. Battou
Sentiment-Analysis Approaches: A Comparative Study
This section will cite some of the most machine and deep learning methods for performing sentiment analysis from the text that exists in the literature. 3.1
Machine Learning Methods
[11] used the Na¨ıve Bayes algorithm to analyze sentiment from children’s fairy tales in a text-to-speech challenge. The purpose was to recognize emotions in text and adjust the speech signal’s prosody, pitch, strength, and duration indicators. The data for this study were manually annotated, with each statement identified with one of Ekman’s primary emotions. The corpus was then analyzed for several characteristics (conjunctions of chosen features, WordNet emotion terms, positive/negative word count, Etc.), totalling 30 features. The paper used two separate parameter tuning procedures, both based on 10-fold-cross validation to choose the optimal combination of features: sep-tune-eval, which used just half of the dataset for tuning, and same-tune-eval, which used the whole dataset for tuning. When the extracted 30 features, the Na¨ıve baseline, and the Bag of Words method were compared, the proposed extracted 30 features outperformed all other methods, and the same-tune-eval performed marginally better than the sep-tune-eval. Using a four-emotion paradigm, namely, neutral, happiness, anger, and sadness, [12] employed Support Vector Machines (SVMs) to discern emotions from the text. After extracting the NAVA words (noun, adverb, verb, adjective), they were converted into a vector space using VSM. The authors used Rough Set Theory to create a minimum subset from the attributes to improve accuracy. After a 3-fold-cross validation, the authors discovered that Rough Set Theory paired with SVM performed better than SVM alone in terms of prediction accuracy. The model was the best for all emotions. Feedback is often gathered after a course on educational platforms, but it is more valuable if collected in real-time. [13] utilized students’ real-time feedback to forecast their moods. This assignment required Student Response Systems (SRS), after which the data were manually tagged. They classified the data using four machine learning algorithms: Na¨ıve Bayes (NB), Compliment Na¨ıve Bayes (CNB), Maximum Entropy (ME), and SVMs, and they used unigrams as features. According to the findings of 10-fold cross-validation, the top classifiers were SVM with 94% accuracy and CNB with 84%. [1] developed a sentiment prediction engine based on Twitter responses from students regarding English lectures. After preprocessing the data (tokenization, lower case, removing stop words, punctuation, hashtags, numbers, and URLs), they used different combinations of n-grams as features: Unigrams (UNI), Bigrams (BI), and Trigrams (TRI); and they compared them using different classifiers: Na¨ıve Bayes (NB), Multinomial NB (MNB), Compliment NB (CNB), SVM, Maximum Entropy (ME), and Random Forest(RF). In addition, they used various class models, ranging from two to eight classes. After 10-fold cross-validation, the best classifier was Compliment Na¨ıve Bayes (CNB). The
Text-Based Sentiment Analysis
115
best performing features combined all n-grams, and the models with two classes performed better than the rest. 3.2
Deep Learning Methods
[2] compared several machine learning and deep learning algorithms for predicting students’ attitudes based on feedback on educational content. Authors trained the data using classifiers such as MLP (deep learning backpropagation algorithm), SMO (Sequential minimal optimization), Decision Tree, Simple Logistics, Multi-class classifier, K-star (instance-based classifier), Bayes Net, and Random Forest, after which test data was applied to the resulting model, and 10-fold cross-validation was performed. Finally, the final results were evaluated in the Accuracy, Root Mean Square Error (RMSE), Sensitivity, and Receiver Operating Characteristics (ROC) curve area. According to the findings, SMO and MLP-deep learning algorithms outperformed all other classifiers. In another research, [14] employed deep learning algorithms to assess sentiments from IMDB reviews. For academics, the classification of reviews is crucial, and it may be based on the emotion and ratings of the film. It is also valuable to both consumers and film businesses since it may be utilized for marketing choices and as a suggestion tool for movie choosing. The authors used three deep learning algorithms, RNN, LSTM, and CNN, in this study to evaluate them and determine which one is superior for review categorization. The dataset utilized in this study was obtained from the public IMDB database, which was then preprocessed by eliminating punctuation before being turned into a one-dimensional vector. The findings of testing the three models were as follows: CNN outperformed the others and had greater accuracy (around 88.22%). Furthermore, RNN and LSTM have been demonstrated to outperform SVM. In another study, [15] proposed a new method for short-text sentiment analysis based on multi-channel CNN with a multi-head attention mechanism (MCNN-MA). The proposed model first extracts part-of-speech features into a multi-dimensional vector, combines each word’s position value into a position vector, and then collects the dependency syntax features into a dependency vector. Secondly, the word vector is combined with the part of speech feature vector, position feature vector, and dependency syntax feature vector to form three new inputs, which are then fed into a multi-channel CNN. Finally, the outputs of the multi-channel CNN are concatenated and fed into a multi-head attention mechanism that performs linear transformation followed by a scaled dot product. Finally, the text is classified using a fully connected layer. The experiments were conducted on Tan Songbo’s Chinese hotel review and TaoBao review datasets. After performing a comparison between the proposed model and eight other models, namely, MNB, CNN, CNN-SVM, LSTM, DCNN (dynamic CNN), WFCNN (combination of CNN and features of word sentiment sequences), CNNmulti-channel, and ATT-CNN (CNN with attention mechanism); results showed that the proposed model outperformed all other methods with an 86.32% accuracy for Tan Songbo and 85.29% for Taobao dataset.
116
A. Baqach and A. Battou
Authors of [16] proposed a new multi-level graph neural network (MLGNN) for text sentiment analysis. The graph neural network is a new method that aroused over the recent years; in a GNN, a node collects information from other nodes and updates its representations. A graph is used for each input text in the proposed model instead of one graph for the entire corpus. In order to consider both global and local text features, the authors implemented node connection windows with different sizes at different levels. The nodes in a small window are connected at the bottom level, the nodes within larger window sizes are connected in the middle level, and finally, all the nodes are connected at the top level. The authors also used a multi-head attention mechanism as a message-passing technique at the top level of the graph; it contains multiple dot-product attention mechanisms. Experiments were conducted on three datasets, namely, SSTBinary (Stanford Sentiment Treebank of movie reviews), Sentube-A (YouTube comments regarding automobiles), and Sentube-T (YouTube comments regarding tablets). After comparing the proposed model with other models, namely, BOW (logistic regression trained on a bag of words), AVE (logistic regression trained on average of word embeddings), LSTM, BiLSTM (Bidirectional LSTM), CNN, and GNN; results showed that the proposed method outperformed all other methods in all datasets. [17] proposed another hybrid method based on CNN and LSTM to perform text sentiment analysis; the convolution layer is implemented to extract text features, and the LSTM is used to detect long-term dependence between words. In this model, first, word embedding vectors were calculated for each word in the input text using Keras. They were fed into a convolutional layer followed by a pooling layer and a fully connected layer. The output of CNN is then fed into an LSTM, followed by a fully connected layer and finally an output layer. Experiments were conducted on two datasets, Airlinequality Airline Sentiment Data and Twitter Airline Sentiment Data. After comparing the proposed model with baselines, namely, NB, Logistic regression, Decision tree, SVM, CNN, and LSTM. Results showed that the proposed CNN-LSTM model outperformed all other baselines regarding the accuracy, precision, recall, and F-score in both datasets. In [18], authors used Bidirectional Encoder Representations from Transformers (BERT) for Twitter sentiment analysis. For this task, the authors implemented two steps; first, the Twitter jargon, including emojis and emoticons, is converted to plain text using language-independent or easily adaptable to many languages technique. Second, the tweets were classified using BERT. They were trained on plain text rather than tweets for two reasons: accessible plain text corpora are larger than tweet-only ones; pre-trained models on plain text are easily available in many languages. The preprocessing was implemented in the following steps: First, the noisy entities (email, date, money amount, numbers, percentages, phone numbers and time), Uniform Resource Locators (URLs) and hashtags were normalized. Then, the token (@user) replaced user mentions in tweets. Emoticons were transformed into words that most describe their mood, and emojis were translated into their meanings. Finally, all punctuations were
Text-Based Sentiment Analysis
117
removed. The base architecture of BERT was used in this research (12 hidden layers in the transformer encoder, 12 attention heads, a hidden size of 768 for the feedforward networks, and finally, a maximum sequence length of 512). The BERT architecture uses two unique tokens: [SEP] for segment separation and [CLS] for classification, as the first input token for any classifier. Experiments were performed on the Italian dataset SENTIPOLC 2016. After comparison with other methods, namely, ALBERTO (BERT trained on tweets from scratch), CNN, LSTM, and Multilingual BERT, results proved that the proposed model outperformed all other models. Authors of [19] proposed a RoBERTa (Robustly Optimized BERT Pretraining Approach) based multi-task aspect-category sentiment analysis model (RACSA). Treating each aspect category as a subtask, they used the RoBERTa based on a deep bidirectional Transformer to extract features from both text and aspect tokens and the cross-attention method to instruct the model to focus on the features most relevant to the given aspect category. After extracting text features and aspect tokens from the text using RoBERTa. Document attention and one-dimensional CNN are applied separately to text features resulting in two outputs (“s” and “p”, for example). Then cross-attention is applied to both text features and aspect tokens resulting in “r”. Then, “s”, “p”, and “r” are concatenated and fed to a classification layer to give a sentiment polarity. Experiments were performed on AI Challenger 2018 dataset, and comparisons were conducted with: 1D-CNN, AT-LSTM (attention mechanism combined with LSTM), GCAE (aspect-based sentiment analysis based on the convolutions and gating mechanism), RoBERTa, and BiLSTMACSA (proposed model by replacing RoBERTa with BiLSTM). Results showed that RACSA performed better than all other models.
4
Experiments
This section will implement a simulation of some machine and deep learning baselines to give more visibility to comparing these two groups in sentiment analysis from the text. The comparison will be between a machine learning and two deep learning methods, SVM, LSTM, and CNN. 4.1
Dataset
We chose a dataset that has been used in much research and is available on the public repository Kaggle. Sentiment140 dataset [20], which contains 1.6 Million labeled Tweets (positive and negative). 4.2
Preprocessing
The dataset must be preprocessed before running the given models for better results, including deleting unnecessary noisy data. Special characters, hashtags, and links have been deleted. Then, we tokenized sentences into arrays of words while excluding stop words and using stemming to keep only word radicals.
118
4.3
A. Baqach and A. Battou
Baselines
As mentioned before, we will compare the performance of three models, SVM as one of the best machine learning models in the literature [13], CNN and LSTM, which also showed promising results in previous research [14,17]. After the embedding layer, we implemented three convolutional layers for the CNN architecture. Each followed by a pooling and dropout layer; then, we concatenated the three outputs and fed them into a fully connected layer and a sigmoid function to get predictions. For the LSTM architecture, after the embedding layer, we used one LSTM layer followed by a dropout and a fully connected layer and sigmoid function for classification. 4.4
Parameter Settings
The baseline models were implemented with the Keras API, which is based on TensorFlow and written in Python. For the Sentiment140 dataset, we evaluated 15 initial words for the tokenizer approach; if a sentence has less than 15 words, the rest is filled with padding tokens; if a sentence has more than 15 tokens, it is shortened to meet the specified sentence length. We utilized the freely available Google pre-trained word2vec for the word embedding. The embedding size was set to 300. We utilized a randomized search for various parameters for hyperparameter tuning, and then the top-performing ones were applied. After running the randomized search for all models, the best hyperparameters are the following: – SVM: linear kernel with the regularization hyperparameter C = 1. – LSTM: 16 units for LSTM, 0.3 as a dropout rate, and 0.001 for Adam optimizer’s learning rate. – CNN: two filters of size 64, followed by three filters of size 32, and finally four filters of size 16, dropout rate set to 0.5, and the Adam optimizer’s learning rate is 0.001. For all models we trained for 20 epochs with a batch size of 256.
5
Results and Discussion
As we can conclude from Table 1, deep learning models performed better than SVM, which is backed up in much other research, as we mentioned before. The table shows that CNN augmented the accuracy by 4.12% and the F1 score by 4.13% compared to SVM. The use of a weighted F1 score here is due to the unbalanced dataset. In addition, we see that LSTM outperformed CNN by 0.27% in terms of accuracy and 0.28% in terms of F1 score. The cause of these results is that CNN can only catch low-level features of text but not the semantic relations between words. LSTM, in contrast, can catch these dependencies even between two words that are far from each other in the text; this makes LSTM understand the meaning of the sentence, therefore, gives better performance in predicting sentiment. Based on our results and the research results mentioned
Text-Based Sentiment Analysis
119
Table 1. Results of the evaluation of the baseline models on Sentiment140 (best results in bold dataset Method Class Precision Recall
F1-score Weighted F1 Accuracy
SVM
Neg. Pos
0.7419 0.7087
0.6919 0.7569
0.7160 0.7320
0.7240
0.7242
CNN
Neg. Pos
0.7769 0.7546
0.7478 0.7831
0.7621 0.7686
0.7653
0.7654
LSTM
Neg. Pos
0.7700 0.7661
0.7677 0.7688 0.7684 0.7672
0.7681
0.7681
above in Sect. 3, it is clear that machine learning models showed excellent results for text-based sentiment analysis, especially SVM, the best classifier for this task [13]. However, deep learning models showed even more robust performant results, and in many studies, we can see that they outperformed the standard machine learning algorithms [2,14]. Furthermore, among the deep learning models, we can see that the convolutional neural networks CNN and the recurrent neural networks RNN showed excellent performance. CNN was initially implemented to extract features from images, which led to being helpful for texts; it can extract local low-level features that can be used for good text classification. As for RNN, it is intended to catch semantic relationships between words in a sentence; therefore, it can understand its meaning and extract the right sentiment. Many research proved that combining CNN and RNN works well for text sentiment analysis [17]. Over the last years, many studies implemented BERT(Bidirectional Encoder Representations from Transformers)-based models for sentiment analysis from text, and they showed good performance and better than all other deep learning models [18]. In this route, a more advanced BERTbased model was highly implemented, named RoBERTa (Robustly Optimized BERT Pre-training Approach), and it showed some outstanding results [19]. More studies should be done on this task, especially for BERT-based models, to try and have even better results than all other literature methods.
6
Conclusion
We have evaluated the current status of sentiment analysis from textual data in this work, the techniques of representing the text to create meaningful and performing features, and the various learning models for predicting emotions. On the one hand, data quality poses a significant challenge for learning algorithms. We spend more time gathering and labelling data, and it is often insufficient to have well-represented data for this problem. On the other hand, the models are still insufficient; we can obtain a performing model that works well for some datasets but not for others. However, deep learning models showed their robustness in this task, outperforming the standard machine learning algorithms. Among the
120
A. Baqach and A. Battou
deep learning models, we observed that BERT-based ones often performed better; this urges us to further search into similar models to perform text-based sentiment analysis. Furthermore, discovering opportunities to propose improvements to existing approaches and offering new models to address this difficulty will be our primary emphasis in the future.
References 1. Altrabsheh, N., Cocea, M., Fallahkhair, S.: Predicting learning-related emotions from students’ textual classroom feedback via Twitter. In: The 8th International Conference on Educational Data Mining (EDM), pp. 436–440 (2015) 2. Sultana, J., Sultana, N., Yadav, K., AlFayez, F.: Prediction of sentiment analysis on educational data based on deep learning approach. In: 2018 21st Saudi Computer Society National Computer Conference (NCC), pp. 1–5. IEEE, Riyadh (2018) 3. Rodriguez, P., Ortigosa, A., Carro, R.M.: Detecting and making use of emotions to enhance student motivation in e-learning environments. Int. J. Continu. Eng. Educ. Life Long Learn. 24, 168–183 (2014) 4. Madani, Y., Ezzikouri, H., Erritali, M., Hssina, B.: Finding optimal pedagogical content in an adaptive e-learning platform using a new recommendation approach and reinforcement learning. J. Ambient Intell. Human. Comput. 11, 3921–3936 (2020) 5. Birjali, M., Beni-Hssane, A., Erritali, M.: A novel adaptive e-learning model based on Big Data by using competence-based knowledge and social learner activities. Appl. Soft Comput. 69, 14–32 (2018) 6. Xu, G., Meng, Y., Qiu, X., Yu, Z., Wu, X.: Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, 51522–51532 (2019) 7. Kilimci, Z.H., Akyokus, S.: Deep learning- and word embedding-based heterogeneous classifier ensembles for text classification. Complexity 2018, 1–10 (2018) 8. Barr´ on Estrada, M.L., Zatarain Cabada, R., Oramas Bustillos, R., Graff, M.: Opinion mining and emotion recognition applied to learning environments. Expert Syst. Appl. 150, 113265 (2020) 9. Agrawal, A., An, A.: Unsupervised emotion detection from text using semantic and syntactic relations. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 346–353. IEEE, Macau (2012) 10. Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 [cs] (2017) 11. Alm, C., Roth, D., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction (2005) 12. Teng, Z., Ren, F., Kuroiwa, S.: Emotion recognition from text based on the rough set theory and the support vector machines. In: 2007 International Conference on Natural Language Processing and Knowledge Engineering, pp. 36–41. IEEE, Beijing (2007) 13. Altrabsheh, N., Cocea, M., Fallahkhair, S.: Learning sentiment from students’ feedback for real-time interventions in classrooms. In: Bouchachia, A. (ed.) ICAIS 2014. LNCS (LNAI), vol. 8779, pp. 40–49. Springer, Cham (2014). https://doi.org/10. 1007/978-3-319-11298-5 5 14. Cen, P., Zhang, K., Zheng, D.: Sentiment analysis using deep learning approach. J. Artif. Intell. 2, 17–27 (2020) 15. Feng, Y., Cheng, Y.: Short text sentiment analysis based on multi-channel CNN With multi-head attention mechanism. IEEE Access 9, 19854–19863 (2021)
Text-Based Sentiment Analysis
121
16. Liao, W., Zeng, B., Liu, J., Wei, P., Cheng, X., Zhang, W.: Multi-level graph neural network for text sentiment analysis. Comput. Electr. Eng. 92, 107096 (2021) 17. Jain, P.K., Saravanan, V., Pamula, R.: A hybrid CNN-LSTM: a deep learning approach for consumer sentiment analysis using qualitative user-generated contents. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 84:1–84:15 (2021) 18. Pota, M., Ventura, M., Catelli, R., Esposito, M.: An effective BERT-based pipeline for Twitter sentiment analysis: a case study in Italian. Sensors 21, 133 (2021) 19. Liao, W., Zeng, B., Yin, X., Wei, P.: An improved aspect-category sentiment analysis model for text sentiment analysis based on RoBERTa. Appl. Intell. 51, 3522– 3533 (2021) 20. Sentiment140 dataset with 1.6 million tweets — Kaggle. (n.d.). https://www. kaggle.com/kazanova/sentiment140. Accessed 1 Mar 2022
Smart Tourism Destinations as Complex Adaptive Systems: A Theoretical Framework of Resilience and Sustainability Naila Belhaj Soulami(B) and Hassan Azdimousa National School of Business and Management (ENCG), Ibn Tofail University, Kénitra, Morocco {naila.belhajsoulami,hassan.azdimousa}@uit.ac.ma
Abstract. The concepts of “resilience” and “sustainability” have often been associated with each other in a systems view. The health crisis has prompted theorists and practitioners to increase their efforts to identify the stressors and impact factors that disrupt systems, as well as the factors that can enhance resilience and sustainability. In this respect, this paper presents tourism destinations as complex adaptive systems (CAS), and aims to establish the link between the resilience and the sustainability of these systems in favor of developing their intelligence. Thus, a theoretical model is proposed, representing the functioning of smart tourism destinations (STDs) seen as CAS, in terms of sustainability and responses to critical functions. This paper presents some theoretical and managerial implications. On the one hand, it establishes the link between resilience and sustainability of a system, based on the models proposed by Ahi, P., & Searcy, C. (2013) and developed by Marchese, D et al. (2018), and presents intelligent tourism systems as CASs undergoing the same functioning. And on the other hand, this model can prove useful for managers and administrators of destinations who seek to increase its resilience and sustainability by focusing on improving enhancing factors and controlling all types of disturbances. Keywords: Resilience · Sustainability · Complex Adaptive Systems (CAS) · Smart Tourism Destinations (STDs) · Stress and Impact factors
1 Introduction Tourism dependence is a reality for an ever-increasing number of cities, regions, and nations around the world [1]. However, tourism systems reveal many vulnerabilities to their internal and external environments [2]. The scope of these threats goes beyond the tourism sector and can affect a significant number of systems of all types. It is in this regard that there are ongoing efforts to ensure a future that offers both a high quality of life and resilience to the impacts of undesirable events. These efforts have led to an ever-increasing focus on sustainability and resilience [3]. In this sense, the development of smart tourism promises various benefits, including destination resilience and sustainability [4], and has become a mantra for many destinations around the world [5]. In fact, the Application Areas of Complex Adaptive Systems provide an esteemed overview of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 122–135, 2023. https://doi.org/10.1007/978-3-031-26384-2_11
Smart Tourism Destinations as Complex Adaptive Systems
123
the latest research on the strategies, applications, practice, and implications of complex adaptive systems to better understand the various critical systems that surround the social sciences in general, and the management domain of the tourism sector in particular. This article then presents smart tourism destinations as complex adaptive systems and proposes a theoretical model that compares this resilient and sustainable system to a non-resilient and unsustainable one, while specifying the factors for enhancing this system and the desired outcomes. Therefore, this paper addresses the following issue: How can we theoretically model the functioning of resilient and sustainable smart tourism destinations? To answer this question, three derived questions should be considered: • What is the relationship between resilience and sustainability of a complex adaptive system? • To what extent are STDs seen as CAS? • What are the stress and impact factors in STDs, and the factors enhancing their resilience and sustainability? In this article, the first part of literature review will be dedicated to defining the resilience of a system, an ecosystem and, finally, a complex adaptive system (CAS). Explaining the relationship between resilience and sustainability by presenting each time one as a component of the other will follow. The second part of literature review that presents smart tourism destinations as CAS and emphasizes their resilience and sustainability for the creation of renewed competitive advantages will be presented. Next, we will shed light on the stressors and impact factors of STDs, as well as the factors promoting their resilience and sustainability. Finally, a representative theoretical model will be proposed in order to bring together the knowledge already mentioned above.
2 Literature Review 2.1 The Resilience of Complex Adaptive Systems and Its Link with Sustainability The Resilience of an Eco/System Resilience, according to [6], is the ability of a system to absorb different disturbances. It is, in fact, another way of looking at the strength of a coupled human-environment system [7]. Indeed, the term “resilience” is one of the most polysemous terms, given its multidisciplinary use in humanities and social sciences, medicine, ecology, and economics and management sciences. The popularity of the term in question comes from the fact that resilience, as the ability to bounce back, can be applied to any ecosystem [8] to describe its ability to return to the normal functioning after a major disturbance. Therefore, the term can be extended to the solutions that the international community seeks to overcome the various crises/stress and impact factors that may arise. However, it should be noted that the notion of resilience is based on the idea that after a disturbance, the system is not marked by a simple return to equilibrium. It rather expresses a resistance behavior defined as the notion of “reactive resilience” that often reacts, unexpectedly, in a positive, creative way, thanks to multiple readjustments called
124
N. B. Soulami and H. Azdimousa
“proactive resilience”. This type of resilience stipulates that the occurrence of disturbances favors the renewal of the system. Indeed, in the case of complex systems such as social and territorial systems, the concept of proactive resilience is part of the paradigm of plural equilibria [9], in the sense that the system’s behavior is dictated by several attractors. The notion of resilience, thus, implies that the system ensures its continuity not only by preserving an immutable equilibrium, or, by returning to the same state as before the disturbance, but also by integrating transformations into its evolution. Accordingly, resilience is a concept that fits into the theoretical framework of systems far from equilibrium. The Resilience of a Complex Adaptive System (CAS) A complex adaptive system (CAS) is the set of special cases of a complex system capable of adapting to its environment through learning experiences. The term “CAS” was introduced by the Santa Fe Interdisciplinary Institute, notably by John H. Holland and Murray Gell-Mann in 1962. Indeed, the great difference between a complex system and a linear system is its predictability: in a linear model, such as chemical equations, we are able to predict its result. However, in a complex model, it is practically impossible to predict future results if we do not model all the relationships of the system. It is, therefore, not possible to describe the response of a complex system to a change due to the interdependence of all its stakeholders. Moreover, [10] state that the more complex a system is, the greater its fragility but also the greater its reservoir of structural change. In this regard, Scutarri and Corradini observe that resilience has evolved to be understood as a property of complex adaptive systems, “managing inevitable disturbances toward a desirable and unstable development path” (2018: 35). The Relationship and Difference between Resilience and Sustainability The concept of resilience has often been linked to the concept of sustainability, despite the differences in definition, tools, methodology, and application areas of the two concepts. For this reason, no single management framework can be proposed to meet the needs of all stakeholders. However, [11, 12] reviewed existing strategies for implementing the two concepts in the literature and categorized these strategies using three generalized frameworks that capture their common goals: (1) resilience as a component of sustainability, (2) sustainability as a component of resilience, and (3) resilience and sustainability as distinct goals. We have retained the first two assumptions in order to not only understand the relationship between the two concepts, but, to be able to present them in a common framework as well. Resilience as a Component of Sustainability Several scientific researchers have proposed quantitative methods that incorporate resilience as a component of sustainability [13, 14]. Indeed, this first view states that without resilience a system can only have fragile sustainability [15]. In addition, sustainability is seen as a system goal, and resilience is used as a tool to achieve these goals [16]. Furthermore, according to [17], the design process of a system must consider its vulnerabilities to disturbances to be considered sustainable. In this respect, Fig. 1 illustrates how the resilience of a system can impact the sustainability of that system [12] and shows how a resilient system can become sustainable after recovering from a disturbance through the adaptive component of resilience.
Smart Tourism Destinations as Complex Adaptive Systems
125
Sustainability as a Component of Resilience This view is used in several recent studies in the fields of supply chain management [11–19], public policy [20], and business management [21]. In this context, the goal of resilience is to maintain critical functionality (including safety, profit ..) during and after disturbances. Thus, with increasing economic, environmental, and social well-being, this critical functionality illustrates sustainability as a component of resilience (Fig. 2). Indeed, a more sustainable system is better able to absorb, recover from, and adapt to economic, environmental, and social disturbances. Put differently, this view asserts that increasing the sustainability of a system makes that system more resilient, but increasing the resilience of a system does not necessarily make that system more sustainable.
Fig. 1. Resilience as a component of sustainability adapted from [12]
An important difference between sustainability and resilience is the time scale of implementation. Sustainability efforts are often undertaken on longer time scales than resilience. The primary goal of sustainability is to create desirable conditions for future generations [22]. Therefore, the effects of sustainability policies may not directly influence current conditions, but they may have substantial effects on future conditions. Resilience, on the other hand, is understood in many situations as applying to more immediate time scales [23]. Policies that increase the resilience of a system will protect the system in the short term from potential disturbances. Common efforts exist across disciplines and framework styles to implement resilience and sustainability. These efforts include prioritizing sustainability and resilience goals [15, 24], capitalizing on synergies [3, 25], mitigating the negative impacts of conflict [26, 27] and communicating efforts to stakeholders [28]. 2.2 The Resilience of Smart Tourism Destinations and Their Sustainability The concept of resilience reinterprets the way of thinking about the urban system and its disturbances. Applied to a city, it retains its definition of the capacity to absorb a disturbance and to recover its functions following this disturbance. Consequently, the
126
N. B. Soulami and H. Azdimousa
Fig. 2. Sustainability as a component of resilience adapted from [12]
operationality of the concept would involve the need to adapt the functioning of the urban system, as well as its components to potential disruptions, to rebuild the urban system, following a major disruption, or, to define crisis management methods by integrating the complexity of the city itself [29]. Thus, the concept of resilience applied to the city seems to find operational translations, particularly in terms of urban services, which also meet the objectives of sustainability. Intelligent Tourism Systems seen as CAS: Resilience Offering Renewed Competitive Advantages Territorial systems can be described as CAS, in general, and tourism destinations, in particular. Indeed, the tourism literature of the last decades repeatedly describes tourism destinations as complex adaptive systems (CAS) [30–32] and resulted in an evolutionary approach. This view of destinations is accentuated by the growing interest in “smart tourism” practices and “smart tourism destinations”. Consequently, over the past three decades, e-tourism, as a field of scientific research, has evolved into a substantial body of knowledge focused on the development of information theory and technology, in relation to fundamental tourism issues [33]. Therefore, technology has become a major factor in building resilience in tourism [34, 35]. It is not surprising, then, that tourism academics and practitioners are investing considerable time and energy in research focused on resilience strategies that will ensure the longevity of tourism destinations in the face of crisis or adversity, as well as in relation to slow-moving change. Indeed, in a context of social uncertainty and crisis, tourism development may be different from that observed in times of prosperity [36]. For this reason, evolutionary resilience efforts go beyond maintenance towards improving the system through continuous adaptation and transformation [32]. Put differently, evolutionary resilience thinking
Smart Tourism Destinations as Complex Adaptive Systems
127
and action extends the scope of a system beyond resistance and recovery to trajectories of reorientation and renewal [37]. Accordingly, the pursuit of destination resilience at the system level as an overarching strategic direction not only contributes to destination survival [38, 39], but, by embedding resilience into the very fabric of the destination, it can help ensure system longevity through a pipeline of renewed competitive advantages [40]. Indeed, the scale and pace of change/disruption, the exposure and sensitivity of the system as a whole, or, just its component parts, and the ability of the system to mitigate and/or compensate for certain vulnerabilities are critical to a destination’s survival [41–43]. Stress and Impact Factors in Smart Tourism Destinations Within and across tourism systems, destinations frequently face a combination of internal and external disturbances that take the form of slow-moving “stressors” or fast-moving “shocks” [43, 44]. External shocks experienced by tourism destinations include, for example, epidemics, terrorism, natural disasters, and large-scale transportation or construction infrastructure accidents. External stressors, on the other hand, include natural resource shortages, climate change, economic downturn, and environmental degradation [45]. As for technological advances, they can be considered external stressors that affect all destinations, but even more so smart destinations due to their increased reliance on technology [46]. These advancements can also be considered internal stressors when it comes to, for example, the dissemination of misinformation or even the slow rollout of a new destination marketing strategy, especially when many stakeholders are involved. Really, this type of stressor can lead to a weakening of relationships, a decline in stakeholder performance, and a compromise in the overall quality of the tourism experience at the destination. Factors for Reinforcing the Resilience of Smart Tourism Destinations [47] describe three areas of capacity for tourism systems and the disruptions they face: 1) absorptive adaptation; 2) adaptive capacity; and, 3) transformation. [48] draws these findings together to propose a set of six key conditions toward building destination resilience: 1) variety and redundancy; 2) connectivity; 3) polycentric governance; 4) environmental sensitivity; 5) learning and reflexivity; and finally, 6) adaptive thinking and systems. Indeed, [48] notes that when operationalizing these constituent elements of resilience, destinations may encounter difficulties if their tourism system is “trapped” in one of the four specific traps: the rigidity trap, the lock-in trap, the poverty trap, and the isolation trap. These traps can foster a particular development trajectory that is not conducive to building resilience. By extension, [47] effectively analyzed the intersection between Hartman’s [48] destination resilience conditions and smart tourism goals and objectives, and provided insights into how smart tourism infrastructure and governance can be used to support destination resilience. As a result, they proposed five pillars of smart destinations that support these six conditions and, thus, contribute to destination resilience, namely 1) sensing (sensing), 2) openness, 3) sharing, 4) governance, and 5) innovation [47]. Sustainability of STDs In the field of tourism, efforts have focused on developing indicators for sustainable
128
N. B. Soulami and H. Azdimousa
tourism destinations [49– 54]. Vargas Sánchez’s model [55] is the most well-known conceptual model of destination competitiveness in the tourism literature and has served as a starting point for much other research on the competitiveness of “sustainable” destinations. Today, the concept of smart tourism itself is based on the generalization of this notion, as a destination cannot be considered smart if it is not sustainable. Indeed, the development of tourism sustainability has positive effects, as before the environmental awareness of tourism demand, the perception of the destination (if it is presented as sustainable) generally tends to be better [56]. Sustainability of tourism destinations is one of the main current pillars of territories building tourism destinations and smart cities. This approach to the sustainable development of cities and towns, in addition to considering the environment as a fundamental aspect, also involves cultural and economic aspects as important roles in creating sustainable territories. Therefore, sustainable tourism is based on three main aspects when building up a smart environment: environment, socio-culture, and economy [57]. Indeed, destination sustainability indicators adopt the balance approach (economic, social and environmental sustainability) as the most visible position in public policy making and the dominant position in academic discourse [58]. Moreover, the tourism sector has an extraordinary capacity to link economic, social, and environmental aspects of sustainability. This is possible because tourism, as an economic activity, relies on intact environments, rich cultures, and welcoming communities. Therefore, technological tools enable the creation of jobs and income from cultural experiences [59]. That is, smart tourism destinations must respect the main pillars of sustainability: environment, economy, and socio-culture, establishing a perfect combination of the three parts through ICT and data analysis [59].
3 Research Methodology This research was based on a literature review of several scientific articles related to: 1) resilience and sustainability, 2) smart tourism destination (STD) management 3) resilience and sustainability of STDs. The method adopted is a synthetic analytical method. In order to establish the link between resilience and sustainability, we based ourselves on the system representations proposed by [11] and developed by [12] as mentioned before. Subsequently, a connection between these systems and those of tourism destinations was noted and we retain therein the enhancing factors of smart tourism destinations proposed by [47], factors that can only be developed through the existence of a strong human/commercial dimension. Other enhancing factors relate to technological infrastructure and infostructure. All these elements seek, in the short term, to successfully integrate and optimize resources, in order to achieve desired outcomes in the long term; outcomes that represent the purpose of “smart tourism” practices according to several researchers.
Smart Tourism Destinations as Complex Adaptive Systems
129
4 Results This diversified literature review has allowed us to conceptualize smart tourist destinations as complex adaptive systems and to situate this system in a graph according to time, critical functionality, and sustainability. We have seen that a non-resilient system, when it undergoes social, economic, or environmental disturbances, suffers a significant degradation in the course of the system depending on the strength of these critical functionalities. This system takes a long time to heal and ends up collapsing, therefore, it is an unsustainable system. However, a resilient system is characterized by its ability to bounce back, even to exceed its subsequent level, hence its sustainability. Thus, we can model these two systems in a single graph in order to compare them easily. However, a system can reach a good level of resilience if it is a system that adapts to its environment and is able to take advantage of its complexity. This is the case with all complex adaptive systems, such as the smart tourist destinations as already demonstrated. In this respect, acting on the destinations, we synthesize the factors of enhancing the resilience that relate to the advanced technological infrastructure, such as the internet of things and sensors, artificial intelligence, machine learning, use of big data, and cloud computing. On the other hand, these factors also relate to the human and social dimensions and can be synthesized, according to [47], into openness, sharing, polycentric governance, innovation and sensing. Bringing intelligence into tourism destinations, therefore, means dynamically connecting stakeholders through a technological platform on which they can exchange information about their tourism activities in real-time [46, 57]. Moreover, we can say that the use of smart technology, and big data, in particular, allows the effective integration of resources, and consequently forms the interactive dimension of the smart tourism ecosystem [60]. This allows for better optimization of resources. Finally, we were keen to model towards the end of the “time” abscissa the desired outcomes of smart tourism: first, this approach guarantees renewed competitive advantages, as the system manages to bounce back and opens up other possibilities during the resilience process. Secondly, the co-creation of value, since all social and economic actors are resource integrators, Moreover, operational resources are the fundamental source of competitive advantage [61]. Furthermore, the combination of tangible and intangible smart components within an adaptive complex system structure offers the possibility of sustainable competitive advantage and improved quality of life for residents and tourists in smart tourism destinations [4], hence, the third and final desired outcome, namely improved quality of life for residents and improved tourist experience. This experience is characterized by being real-time and context-based. Therefore, Fig. 3 brings together all the aforementioned concepts along with their interrelations to reveal the functioning of smart tourism destinations as CAS for the development of resilience and sustainability into a single representation model.
130
N. B. Soulami and H. Azdimousa
Fig. 3. The functioning of smart tourism destinations as CAS for the development of Resilience and sustainability
5 Discussion In discussing the resilience and sustainability of smart tourism systems, it is worth noting the link often made between the development of a city and that of a destination. It is known that by maintaining urban functions at an acceptable level of functioning, the resilient capabilities of urban systems contribute to the economic, social, and environmental aspirations of a sustainable city. More specifically, improving resilience may be the means to restore the balance between the three pillars of sustainable development when disruptions challenge the social, economic, or environmental functioning of the urban system, which is forced to adapt. Whether it is short-term or long-term, resilience combines issues at different spatial and temporal scales, through a systemic vision, and articulates the skills of all the city’s actors. What applies to urban cities also applies to the destination, especially when it is a smart city or smart destination, where the qualification of smartness forces them to adopt the principles of resilience and sustainability. Thus, we model, on the one hand, a resilient and sustainable system as one that manages to absorb disturbances in a minimal time and manages to reorganize itself, so that it can be improved and become more efficient. We model by analogy, on the other hand, a non-resilient and non-sustainable system, which undergoes the same criticality of disturbances as the first system, but which takes more time to absorb, recover and adapt, and which never returns to its previous level, until it crashes, hence, its non-sustainable character. The idea is that by adopting a simple definition of resilience, a concrete approach
Smart Tourism Destinations as Complex Adaptive Systems
131
to improving urban sustainability, in general, and destination sustainability, in particular, is proposed. At the same time, building resilience that leads us to sustainability requires reinforcing factors to be implemented over time. These factors can be divided, as seen before, into: 1) Human and business dimension. 2) Technological infrastructure and infostructure. The first dimension represents operant resources of the destination and includes the five pillars of smart destinations proposed by [47] that support resilience. The second dimension represents operand resources and groups together the set of smart technologies that support this resilience and contribute to sustainability. In fact, actors and objects effectively work collectively as a complex adaptive socio-technical system, with benefits arising from interdependencies within networked systems [62, 63]. Moreover, the co-creation of value in the smart destination is closely related to the complex ecosystem of actors involved [4] and the increasingly blurred roles of each actor in this ecosystem [46]. Indeed, in all tourism destinations in general, which are known to be highly complex and multifaceted service ecosystems [57], it is essential to consider the engagement of all actors to maximize interactions and opportunities for positive value co-creation [60, 64]. Indeed, technical networks are a fundamental basis for maintaining urban functions and are, therefore, the subject of operational methodological developments for the assessment and management of local governments. In addition, sustainable tourism development meets the current needs of visitors and host regions while protecting and enhancing opportunities for the future. By focusing on integral resource management and optimization, economic, social, and aesthetic demands can be met while respecting cultural integrity, essential ecological processes, biological diversity, and life support systems. In sum, tourism sustainability reinforces the STD (Smart Tourism Destination) model, as actions in this area are limited and sometimes associated with poor sustainability and a lack of the holistic management necessary for sustainable development to become a differentiator for the destination. We, then, have modeled the desired outcomes towards the end of the graph. These include the possibility of having renewed competitive advantages, being able to co-create value with the participation of all stakeholders, improving the quality of life of the destination’s residents, as well as improving the experience in general. An experience marked by its contextual adaptation and real-time operability. It should be noted that the theoretical model proposed in this article is none other than a model of representation of the functioning of the intelligent tourism system as a complex adaptive system, undergoing disturbances and different crises, and characterized by its resilience and its sustainability. The dimensions presented in the model are those on which the vast majority of the scientific community agrees. However, the integration of resources is often presented as an interactive dimension according to the SD logic, and their optimization surfaces next. The desired results are also the synthesis of the various expectations of smart tourism.
6 Conclusion The term resilience has been and continues to be widely debated across disciplines for several decades. It has received even more attention with the advent of the global health
132
N. B. Soulami and H. Azdimousa
crisis. Indeed, the depth and complexity of the impact of COVID-19 requires both shortterm response and long-term preparation to understand some of its far-reaching effects at the fundamental level. This health crisis is just one example of an infinite number of disruptions that can arise and impact multiple sectors. Today, the quest for resilience is no longer a luxury, and all systems are under pressure to increase their resilience and, thus their sustainability. Each system, therefore, needs to detect the type of disturbances that can affect it, the factors that can enhance it, and the results it wishes to achieve. This article has shed light on the tourism sector by presenting tourist destinations as complex adaptive systems, which can increase their resilience and sustainability by becoming more intelligent. The theoretical model proposed in this article presents the resilience enhancing factors of a destination according to several authors, factors that help the destination to move from a non-resilient and unsustainable system to a resilient and sustainable complex adaptive system. Comparing these two systems allows us to understand why non-resilient systems fail to achieve desired system outcomes. This paper presents some theoretical and managerial implications. On the one hand, it establishes the link between resilience and sustainability of a system, based on the models proposed by [11] and developed by [12], and presents smart tourism systems as CASs undergoing the same functioning. And on the other hand, this model may prove to be useful for managers and administrators of destinations that seek to grow its resilience and sustainability by focusing on improving the enhancement factors and controlling all types of disturbances. In fact, by drawing conclusions from a wide range of applications, this analysis provides critical insights for the joint implementation of sustainability and resilience that can more effectively and efficiently guide both action and management. However, this article was based on extensive theoretical research and only brings together pre-established knowledge. It then lacks empirical testing in order to be validated. We hope this will contribute to future efforts to minimize conflicts and maximize synergies between sustainability and resilience.
References 1. OECD: OECD Tourism Trends and Policies 2018. OECD (2018) 2. Ghaderi, Z., Mat Som, A.P., Henderson, J.C.: When disaster strikes: the Thai floods of 2011 and tourism industry response and resilience. Asia Pac. J. Tour. Res. 20(4), 399–415 (2014). https://doi.org/10.1080/10941665.2014.889726 3. Redman, C.L.: Should sustainability and resilience be combined or remain distinct pursuits? Ecol. Soc. 19(2), 37 (2014). https://doi.org/10.5751/ES-06390-190237 4. Boes, K., Buhalis, D., Inversini, A.: Smart tourism destinations: ecosystems for tourism destination competitiveness. Int. J. Tour. Cities 2(2), 108–124 (2016). https://doi.org/10.1108/ IJTC-12-2015-0032 5. Pan, B., Li, J., Cai, L., Zhang, L.: Guest editors’ note: being smart beyond tourism. J. China Tour. Res. 12(1), 1–4 (2016). https://doi.org/10.1080/19388160.2016.1184209 6. Walker, B., Holling, C.S., Carpenter, S.R., Kinzig, A.P.: Resilience, adaptability and transformability in social-ecological systems. Ecol. Soc. 9(2) (2004). https://doi.org/10.5751/ES00650-090205 7. Carpenter, S., Walker, B., Anderies, J., Abel, N.: From metaphor to measurement: resilience of what to what? Ecosystems 4(8), 765–781 (2001). https://doi.org/10.1007/s10021-001-0045-9
Smart Tourism Destinations as Complex Adaptive Systems
133
8. Holling, C.S.: Resilience and stability of ecological systems. Annu. Rev. Ecol. Syst. 4(1), 1–23 (1973). https://doi.org/10.1146/annurev.es.04.110173.000245 9. Dauphiné, A., Provitolo, D.: La résilience: un concept pour la gestion des risques. Annales de Géographie 654(2), 115 (2007). halshs-00193824 10. Bodin, P., Wiman, B.: Resilience and other stability concepts in ecology: notes on their origin, validity, and usefulness. ESS Bull. 2, 33–43 (2004). diva2:308795 11. Ahi, P., Searcy, C.: A comparative literature analysis of definitions for green and sustainable supply chain management. J. Clean. Prod. 52, 329–341 (2013). https://doi.org/10.1016/j.jcl epro.2013.02.018 12. Marchese, D., Reynolds, E., Bates, M.E., Morgan, H., Clark, S.S., Linkov, I.: Resilience and sustainability: similarities and differences in environmental management applications. Sci. Total Environ. 613–614, 1275–1283 (2018). https://doi.org/10.1016/j.scitotenv.2017.09.086 13. Milman, A., Short, A.: Incorporating resilience into sustainability indicators: an example for the urban water sector. Glob. Environ. Change 18(4), 758–767 (2008). https://doi.org/10. 1016/j.gloenvcha.2008.08.002 14. Saunders, W., Becker, J.: A discussion of resilience and sustainability: land use planning recovery from the Canterbury earthquake sequence, New Zealand. Int. J. Disaster Risk Reduct. 14, 73–81 (2015). https://doi.org/10.1016/j.ijdrr.2015.01.013 15. Ahern, J.: Urban landscape sustainability and resilience: the promise and challenges of integrating ecology with urban planning and design. Landsc. Ecol. 28(6), 1203–1212 (2013). https://doi.org/10.1007/s10980-012-9799-z 16. Anderies, J.M., Janssen, M.A.: Robustness of social-ecological systems: implications for public policy. Policy Stud. J. 41(3), 513–536 (2013) 17. Blackmore, J.M., Plant, R.A.J.: Risk and resilience to enhance sustainability with application to urban water systems. J. Water Resour. Plan. Manag. 134(3), 224–233 (2008) 18. Closs, D.J., Speier, C., Meacham, N.: Sustainability to support end-to-end value chains: the role of supply chain management. J. Acad. Mark. Sci. 39(1), 101–116 (2010). https://doi.org/ 10.1007/s11747-010-0207-4 19. Bansal, P., DesJardine, M.R.: Business sustainability: It is about time. Strateg. Organ. 12(1), 70–78 (2014). https://doi.org/10.1177/1476127013520265 20. Chapin, F.S., et al.: Ecosystem stewardship: sustainability strategies for a rapidly changing planet. Trends Ecol. Evol. 25(4), 241–249 (2010). https://doi.org/10.1016/j.tree.2009.10.008 21. Avery, G.C., Bergsteiner, H.: Sustainable leadership practices for enhancing business resilience and performance. Strategy Leadersh. 39(3), 5–15 (2011). https://doi.org/10.1108/ 1087857111112876 22. Meacham, B.J.: Sustainability and resiliency objectives in performance building regulations. Build. Res. Inf. 44(5–6), 474–489 (2016). https://doi.org/10.1080/09613218.2016.1142330 23. Lew, A.A., Ng, P.T., Ni, C.C.N., Wu, T.C.E.: Community sustainability and resilience: similarities, differences and indicators. Tour. Geogr. 18(1), 18–27 (2016). https://doi.org/10.1080/ 14616688.2015.1122664 24. Bocchini, P., Frangopol, D.M., Ummenhofer, T., Zinke, T.: Resilience and sustainability of civil infrastructure: toward a unified approach. J. Infrastruct. Syst. 20(2), 04014004 (2014) 25. Ulanowicz, R.E., Goerner, S.J., Lietaer, B., Gomez, R.: Quantifying sustainability: resilience, efficiency and the return of information theory. Ecol. Complex. 6(1), 27–36 (2009). https:// doi.org/10.1016/j.ecocom.2008.10.005 26. Derissen, S., Quaas, M.F., Baumgärtner, S.: The relationship between resilience and sustainability of ecological-economic systems. Ecol. Econ. 70(6), 1121–1128 (2011) 27. Gasparini, P., Manfredi, G., Asprone, D.: Resilience and Sustainability in Relation to Natural Disasters: A Challenge for Future Cities (SpringerBriefs in Earth Sciences), 1st ed. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04316-6
134
N. B. Soulami and H. Azdimousa
28. Magis, K.: Community resilience: an indicator of social sustainability. Soc. Nat. Resour. 23(5), 401–416 (2010). https://doi.org/10.1080/08941920903305674 29. Toubin, M., Diab, Y., Laganier, R., Serre, D.: Les conditions de la résilience des services urbains parisiens par l’apprentissage collectif autour des interdépendances. Vertigo, vol. 13, no. 3 (2013) 30. Farrell, B.H., Twining-Ward, L.: Reconceptualizing tourism. Ann. Tour. Res. 31(2), 274–295 (2004). https://doi.org/10.1016/j.annals.2003.12.002 31. Davoudi, S., et al.: Resilience: a bridging concept or a dead end? The politics of resilience for planning: a cautionary note. Plan. Theory Pract. 13(2), 299–333 (2012). https://doi.org/ 10.1080/14649357.2012.677124 32. Scuttari, A., Volgger, M., Pechlaner, H.: Transition management towards sustainable mobility in Alpine destinations: realities and realpolitik in Italy’s South Tyrol region. J. Sustain. Tour. 24(3), 463–483 (2016). https://doi.org/10.1080/09669582.2015.1136634 33. Xiang, X., Corsi, G.I., Anthon, C., et al.: Enhancing CRISPR-Cas9 gRNA efficiency prediction by data integration and deep learning. Nat. Commun. 12, 3238 (2021). https://doi.org/ 10.1038/s41467-021-23576-0 34. Hall, C., Prayag, G., Amore, A.: Tourism and resilience: Individual, organisational and destination perspectives (2017). https://doi.org/10.21832/HALL6300 35. UNWTO: International tourism growth continues to outplace the global economy (2020). https://unwto.org/international-tourism-growth-continues-to-outpace-the-economy 36. Garau-Vadell, J.B., Gutierrez-Taño, D., Diaz-Armas, R.: Economic crisis and residents’ perception of the impacts of tourism in mass tourism destinations. J. Dest. Mark. Manag. 7, 68–75 (2018). https://doi.org/10.1016/j.jdmm.2016.08.008 37. Boschma, R.: Towards an evolutionary perspective on regional resilience. Reg. Stud. 49(5), 733–751 (2015). https://doi.org/10.1080/00343404.2014.959481 38. Pyke, S., Hartwell, H., Blake, A., Hemingway, A.: Exploring well-being as a tourism product resource. Tour. Manag. 55, 94–105 (2016). https://doi.org/10.1016/j.tourman.2016.02.004 39. Pechlaner, H., Innerhofer, E.: Linking destinations and resilience – challenges and perspectives: challenges and opportunities for destination management and governance (2018). ISBN 9781032339252 40. Van der Zee, E., Vanneste, D.: Tourism networks unraveled; a review of the literature on networks in tourism management studies. Tour. Manag. Perspect. 15, 46–56 (2015). https:// doi.org/10.1016/j.tmp.2015.03.006 41. Nelson, D.R., Adger, W.N., Brown, K.: Adaptation to environmental change: contributions of a resilience framework. Annu. Rev. Environ. Resour. 32(1), 395–419 (2007). https://doi. org/10.1146/annurev.energy.32.051807.090348 42. Béné, C., Wood, R.G., Newsham, A., Davies, M.: Resilience: new utopia or new tyranny? Reflection about the potentials and limits of the concept of resilience in relation to vulnerability reduction programmes. IDS Working Papers (405), pp. 1–61 (2012). https://doi.org/10.1111/ j.2040-0209.2012.00405.x 43. Lew, A.A.: Scale, change and resilience in community tourism planning. Tour. Geogr. 16(1), 14–22 (2014). https://doi.org/10.1080/14616688.2013.864325 44. Sharpley, R.: Travels to the edge of darkness: towards a typology of “dark tourism” (2005). https://doi.org/10.1016/B978-0-08-044644-8.50023-0 45. Calgaro, E., Lloyd, K., Dominey-Howes, D.: From vulnerability to transformation: a framework for assessing the vulnerability and resilience of tourism destinations. J. Sustain. Tour. 22(3), 341–360 (2014). https://doi.org/10.1080/09669582.2013.826229 46. Gretzel, U., Sigala, M., Xiang, Z., Koo, C.: Smart tourism: foundations and developments. Electron. Mark. 25(3), 179–188 (2015). https://doi.org/10.1007/s12525-015-0196-8 47. Gretzel, U., Scarpino-Johns, M.: Destination resilience and smart tourism destinations. Tour. Rev. Int. 22(3), 263–276 (2018). https://doi.org/10.3727/154427218X15369305779065
Smart Tourism Destinations as Complex Adaptive Systems
135
48. Hartman, S.: Destination Resilience: Challenges and Opportunities for Destination Management and Governance, 1st ed. [E-book]. Routledge (2018) 49. Schianetz, K., Kavanagh, L.: Sustainability indicators for tourism destinations: a complex adaptive systems approach using systemic indicator systems. J. Sustain. Tour. 16(6), 601–628 (2008). https://doi.org/10.1080/09669580802159651 50. Fernández, J.I.P., Rivero, M.S.: Measuring tourism sustainability: proposal for a composite index. Tour. Econ. 15(2), 277–296 (2009). https://doi.org/10.5367/000000009788254377 51. Blancas, F.J., Caballero, R., González, M., Lozano-Oyola, M., Pérez, F.: Goal programming synthetic indicators: an application for sustainable tourism in Andalusian coastal counties. Ecol. Econ. 69(11), 2158–2172 (2010). https://doi.org/10.1016/j.ecolecon.2010.06.016 52. Tanguay, G.A., Rajaonson, J., Therrien, M.-C.: Sustainable tourism indicators: selection criteria for policy implementation and scientific recognition. J. Sustain. Tour. 21(1), 1–18 (2013) 53. Torres-Delgado, A., Saarinen, J.: Using indicators to assess sustainable tourism development: a review. Tour. Geogr. 16(1), 31–47 (2014). https://doi.org/10.1080/14616688.2013.867530 54. Cabello, J.M., Navarro-Jurado, E., Rodríguez, B., Thiel-Ellul, D., Ruiz, F.: Dual weak– strong sustainability synthetic indicators using a double reference point scheme: the case of Andalucía, Spain. Oper. Res. Int. J. 19(3), 757–782 (2018). https://doi.org/10.1007/s12 351-018-0390-5 55. Vargas Sánchez, A.: Exploring the concept of smart tourist destination. Enlight. Tour. Pathmak. J. 6, 178–196 (2016). https://doi.org/10.33776/et.v6i2.2913 56. Invat.tur (Valencian Institute of Tourism Technologies) and IUIT (Universitary Institute of Tourism Research) ‘Destinos Turísticos Inteligentes. Manual operativo para la configuración de Destinos Turísticos Inteligentes’. Agència Valenciana del Turisme, Valencia (2015) 57. Femenia-Serra, F., Neuhofer, B., Ivars-Baidal, J.A.: Towards a conceptualisation of smart tourists and their role within the smart destination scenario. Serv. Ind. J. 39(2), 109–133 (2019). https://doi.org/10.1080/02642069.2018.1508458 58. Hall, C.M., Gossling, S., Scott, D.: The Routledge Handbook of Tourism and Sustainability. Taylor & Francis (2015). ISBN 9781138071476 59. UNEP; UNWTO: Making Tourism More Sustainable—A Guide for Policy Makers; UNEP, UNWTO: Madrid, Spain (2005) 60. Polese, F., Botti, A., Grimaldi, M., Monda, A., Vesci, M.: Social innovation in smart tourism ecosystems: how technology and institutions shape sustainable value co-creation. Sustainability 10(2), 140 (2018). https://doi.org/10.3390/su10010140 61. Vargo, S.L., Lusch, R.F.: Service-dominant logic: continuing the evolution. J. Acad. Mark. Sci. 36(1), 1 (2007). https://doi.org/10.1007/s11747-007-0069-6 62. Mele, C., Polese, F.: Key dimensions of service systems in value-creating networks. In: Demirkan, H., Spohrer, J., Krishna, V. (eds.) The Science of Service Systems. Service Science: Research and Innovations in the Service Economy, pp. 37–59. Springer, Boston (2011). https:// doi.org/10.1007/978-1-4419-8270-4_3 63. Chandler, J.D., Lusch, R.F.: Service systems. J. Serv. Res. 18(1), 6–22 (2014). https://doi.org/ 10.1177/1094670514537709 64. Jaakkola, E., Alexander, M.: The role of customer engagement behavior in value co-creation. J. Serv. Res. 17(3), 247–261 (2014). https://doi.org/10.1177/1094670514529187
Machine Learning Algorithms for Automotive Software Defect Prediction Ramz Tsouli Fathi(B) , Maroi Tsouli Fathi, Mohammed Ammari, and Laïla Ben Allal Materials, Environment and Sustainable Development (MEDD), FSTT, Abdelmalek Essaadi University, Tetouan, Morocco [email protected] Abstract. The use of machine learning algorithms has increased lately in multiple domains, this includes software engineering. Identifying a suitable methodology to recognize Software defect, will help improvement the quality of the software and save costs, especially in an industry like cars manufacturing. It is important to find suitable and significant measures which are most relevant for finding the defects in a software. This paper contributes towards the decision of adoption of machine learning techniques for the purpose of software defect predictions. In our approach, we considered supervised machine learning algorithms to build our model that aims to predict the occurrence of the software defects. Keywords: Machine learning · Predictive analysis · Software defect · Defect prediction · Automotive Software · Defect prediction features
1 Introduction Software testing to find and fix defects is one of the most expensive activities in software development. Therefore, organizations are still studying how they can predict the quality of their software in an earlier software life phase. Within about 30 years - amount of software in cars went from about zero to more than 10 million lines of code [1] To meet the demands of quality and reliability, that are essential in automotive industry a significant effort is made in verification and validation. Recently, a large part of the added value in cars is in the electronics and therefore the software, that allows us to access to major functionalities. As a result, automotive software is more complex and requires more effort and are more expensive. Different methods for defect prediction have been evaluated and used in software engineering, and a wide range of prediction models have been proposed. Machine learning enables to predict the occurrence of an event based on Data. It finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. This paper provides an approach to construct a model for solving automotive software defect. To predict software defect, we applied supervised learning classifiers as Decision trees, random forest, logistic regression, and Naïve Bayes. We constructed our model based in the results obtain from each algorithm. We then validated our model using K-Fold cross-validation technique resulting in a model that works well for all scenarios and evaluated its performance using ROC curves. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 136–144, 2023. https://doi.org/10.1007/978-3-031-26384-2_12
Machine Learning Algorithms for Automotive Software
137
2 Background and Related Work 2.1 Software in Cars Manufacturing The importance of electronics is increasing in more than an industry. Car’s manufacturing as well goes by this trend, cars electronics captures nowadays most of the added value. Modern automotive is typically driven by more than 50 embedded computers known as ECUs (Electronic Control Units) and numerous sensors embedded over physical system of a vehicle [2, 3]. This ECUs need embedded software that connect in the car network. To stand out from their competitors, car manufacturers rely on software that enable new safety, comfort, and economy functions. Which involves developing more software and more complexed ones. Within the growth of this complexity, pressure on time and cost as well as demanding high quality are increasing. In addition to that, automotive software faces several challenges, it’s one of the most challenging within the software engineering. In addition to that software errors, misunderstanding of the specifications, occurs in a late phase in the car’s life cycle, it happens generally when the car is almost ready for its market launch. 2.2 Software Defect Prediction Using Traditional Approaches Several research have been registered in the field of software failure detection, it is due mainly to the increase of software complexity. Software failure defect detection establish an essential role in ensuring the quality of the software. It helps reduce the risk of software bugs, that can produce incorrect or unexpected and unintentional results and behaviour. Catal [4] surveyed in his paper the software engineering literature on software fault prediction. Traditionally, statistical based approaches were used to predict the software defects. Logistic regression, classification trees, logical classification approaches were the main methods used the outcome was binary for most cases. Lanubile, Lonigro, and Visaggio (1995) [5] used various statistical methods, 11 metrics but none of those procedures supplied ideal results. Yuan, Khoshgoftaar, Allen, and Ganesan (2000) [6] for the prediction of the number of errors applied clustering. Logistic regression was applied by Denaro (2000) [7] that conclude on identifying the correlation between static metrics and fault-proneness. [8]. 2.3 Software Defect Prediction Using Machine Learning Techniques A wide range of prediction models have been proposed. Recently, machine learning techniques are mostly employed for software defect prediction. [9, 10]. This method is based on a process. The first step in the construction of this model, is the creation of the dataset. We can collect this data from car’s manufacturing archives, or their software suppliers. Several software developers set up issue tracking system that helps them keep track within the errors reported, such systems may a gold mine for dataset.
138
R. Tsouli Fathi et al.
Every Dataset contains a number of features for software defect prediction. Features extraction techniques are then applied to extract the main ones. Their values represent the complexity of software and its development process. The third step would be to label the dataset. Then, we can construct our model by training the train set based on the selected features. The final step is an evaluation of the model with the test set. We can then proceed to our prediction (Fig. 1).
Fig. 1. Software defect prediction process
3 Machine Learning Algorithms Used With the increasing amount of information and data that various software systems process, methods of calculating and managing this data while being able to identify important aspects and patterns from the data have made for machine learning is more present. We will present the supervised learning methods used: 3.1 Decision Trees Decision Trees are a type of Supervised Machine Learning where the data is continuously split according to a certain parameter [11]. Even if we mostly applied this method for classification problems solving, it can be applied for classification and Regression problems. It is a tree-structured classifier, features of the dataset are represented as nodes, from these nodes the decisions can be traced as branches. It is a very common machine learning method due to its simplicity and because it can be visualized. It has very little cost in terms of memory consumption because it’s logarithmic and requires little data preparation. 3.2 Random Forest To ameliorate the predictive accuracy of the data set we can use random forest method, it is also a supervised learning technique, which is based on Decision trees method.
Machine Learning Algorithms for Automotive Software
139
Even if random forest takes less training time, the accuracy of the predictions is significantly high, it is due principally to the combination of a large number of decision trees. 3.3 Logistic Regression One of the most popular supervised learning techniques is Logistic regression, it is mainly used for solving classification problems. Logistic regression (LR) is based on the concept of Maximum Likelihood estimation. According to this estimation, the observed data should be most probable. [12] To obtain an output we pass the weighted sum through a function; we obtain values bounded between 0 and 1. This function is called sigmoid curve or S-curve. LR is based on an equation: y = b0 + b1 x1 + b2 x2 + b3 x3 + b1 x1 + · · · + bn xn log 1−y
3.4 Naïve Bayes Based in Bayes theorem, Naïve Bayes is a supervised learning technique used for classification problems. It’s based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given. Naive Bayes learners and classifiers are simple and very effective, they can be extremely fast compared to more sophisticated methods. Naïve Bayes is used for text classification as for example spam filtration. Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which is used to determine the probability of a hypothesis with prior knowledge. It depends on the conditional probability. Bayes’ theorem states the following relationship: P(A|B ) =
P(B|A )P(A) P(B)
where: • P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B. • P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true • P(A) is Prior Probability: Probability of hypothesis before observing the evidence. • P(B) is Marginal Probability: Probability of Evidence.
4 System Design and Methodology 4.1 Design Methodology Model To address software defect prediction problem, a structured process is implemented. We began with creating analytical Data set, that we structure. We identify failure factors and
140
R. Tsouli Fathi et al.
from our dataset that we extract and define as to be our features. Base line models and benchmarking models are implemented on top of analytical dataset. All algorithms are validated through k fold validation methodology to find the right accuracy. The design of our methodology is presented in Fig. 2. 1. Data collecting: One of the important phases of the any machine learning problem is to collect the right data that will address well the problem. 2. Features extraction definition: To create analytical dataset, defect software factors must be extracted and defined that will help to get the right predictions. 3. Data pre-processing: When we collect data most of the time, it’s not ready for running machine learning algorithms. Then we split our Data Set to Training set (80%) and testing set (20%). 4. Building Model: In this research, the model is the process of software defect prediction which implemented using different machine learning techniques. We run our data supervised learning algorithms that we identified as effective for our research. Decision trees, random forest, Logistic regression and Naïve Bayes, Data set is divided into train and test set. The performance of the models is validated by identify the accuracy score, with these measures it has been tried to improve upon the model performance and Defect Detection. 5. Model Validation: Validate the model performance on defined metrics. Like measuring the accuracy of random forest decision tree, Naive Bayes and majority of the classification algorithms. We used also Confusion matrices it’s a technique used to evaluate the performance model. We represent in the column of the matrix the actual result called Class, and in its row the predicted result, which will help in analyzing the properties of the classification algorithm. TP (True Positive) and TN (True Negative) lead us to the number of outcomes that are classified correctly. While FP (False Positive) and FN (False Negative) lead us to the number of outcomes that are classified incorrectly.
Fig. 2. Software defect prediction model design
Machine Learning Algorithms for Automotive Software
141
5 Experiments and Discussion 5.1 Data Sets In our research, in order to construct a dataset that is representative of automotive software defect, we selected two different public datasets that we complete with parameters to get close to an actual functional one. We tried to work with dataset with a large number of projects, to calculate a wide set of features. Our first choice was AEV data set was collected by Altinger et al. [13], it is an automotive embedded software dataset projects developed by Audi Electronics Venture GmbH. It contains 3 projects each has a total of 29 features. However, due to the number of projects, we assumed that we don’t have enough data that we can get reliable results. We then chose to merge this dataset with PROMISE Dataset. PROMISE is a repository of empirical software engineering data, available since 2005 which has by Jureczko and Madeyski [14] for most they are academic and open-source software projects. 38 (15 open source, 6 proprietary projects and 17 academic projects) different projects were collected in the different versions, what results in 92 versions in total, each version include 20 features.[15]. To complete this Dataset, we designed 2 new features, calculated in the base of our studies of automotive software specifications. The goal is to obtain a dataset that represent the complexity of automotive software with enough features to construct a good model. 5.2 Data Pre-processing and Feature Selection Before running machine learning algorithms, we have to treat our data. Several treatments were used, we first removed null values, outlier treatment, and Garbage value treatment have been done to create analytical dataset. We obtain a dataset of 8954 rows and 42 columns. Then we used Okutan and Yildiz [16] new metric called lack of coding quality (LOCQ). To measure the quality of source codes. The aim of feature selection is to improve the performance of classification and increase the prediction capacity of our model. We selected the features which are more relevant to our target by combining the features of the two original datasets and adding new ones and removing features which are redundant and uncorrelated. 5.3 Building the Model To construct our model, we applied Supervised machine learning algorithms: Decision Tree, Naïve Bayes, Logistic regression, and Random Forest classifier. The elements of this model are the elements of approach used. We present result of our algorithms in Table 1, we choose to compare firstly our algorithms by accuracy. Overall, the results obtained are very interesting, we can observe that Random Forest acquires the best result, and Naïve Bayes obtained the least one, which was very expected, base that Naïve Bayes is known as a bade estimator, and Random Forest is an ensemble technique which help improving the prediction results.
142
R. Tsouli Fathi et al. Table 1. Results of Software defect predictions using supervised ML algorithms ML Algorithm
Accuracy
Decision Tree
87%
Random Forest
94%
Logistic Regression
92%
Naive Bayes
86%
5.4 Model Validation To validate our model, we used K-Fold cross technique, it is a method that’s allows to estimate the skill of ML algorithm. It’s an easy to implement and use technique that compare and select a model for our software prediction problem, it has lower bias than other methods. We applied 5-fold cross validation on our chosen algorithms, the result is a confirmation of the accuracy (already presented in Table 1). 5.5 Performance Comparison In order to use our model with confidence, the model performance should be evaluated. Confusion matrices that include information about actual and predicted model outputs are shown in Table 2. Table 2. Confusion matrix for logistic regression algorithm Predicted output Actual Output
+
−
+
TP = 86
FN = 5
−
FP = 25
TN = 75
Several model performance evaluation metrics can be generated from the confusion matrix as sensitivity, specificity, precision. We also used Receiver Operating Curve (ROC to measure the classification effectiveness of our predictive model run with training and testing data. ROC plots the sensitivity (true positive) versus specificity (false positive) of the model.
6 Conclusions and Future Work Machine learning techniques have shown very promising results to address this sort of problem solving. The more so since the amount of data available within organizations is
Machine Learning Algorithms for Automotive Software
143
in constant growth. In this research, a prediction model for automotive software defect is presented. Defect prediction will help organization to improve the quality of the software and to focus more effort in developing new functionalities than fixing software defects [17]. We used multiple metrics to evaluate the model’s performance on training and test datasets. The result observed in confusion matrix and the model’s ROC curve, make us feel confident in the performance of our model. And it appears to be accurate and produces efficient results. In future work we plan to use different machine learning techniques (RNN, unsupervised or semi-supervised) to address the same problem, then we can compare to our current model. We also want to test the impact of effect size of important features towards Machine learning algorithms, in order to identify the principal factor(s) that contribute to solve our problem in automotive industry and their adoption.
References 1. Broy, M.: Challenges in automotive software engineering. In: Proceedings of the 28th International Conference on Software Engineering, pp. 33–42 (2006) 2. Pretschner, A., et al.: Software engineering for automotive systems: a roadmap. In: Proceeding of the ICSE 2007, Future of Software Engineering 2007, pp. 57–71. IEEE Computer Society (2007) 3. Aoyama, M., et al.: A design methodology for real-time distributed software architecture based on the behavioral properties and its application to advanced automotive software. In: 18th Asia-Pacific Software Engineering Conference (2011) 4. Catal, C.: Software fault prediction: a literature review and current trends. Expert Syst. Appl. 38(4), 4626–4636 (2011) 5. Lanubile, F., Lonigro, A., Visaggio, G.: Comparing models for identifying fault-prone software components. In: Seventh International Conference on Software Engineering and Knowledge Engineering, pp. 312–319 (1995) 6. Yuan, X., Khoshgoftaar, T.M., Allen, E.B., Ganesan, K.: An application of fuzzy clustering to software quality prediction. In: Third IEEE Symposium on Application-Specific Systems and Software Engineering Technology, p. 85. IEEE Computer Society, Washington, DC (2000) 7. Denaro, G.: Estimating software fault-proneness for tuning testing activities. In: Twentysecond international conference on software engineering, pp. 704–706. ACM, New York, NY (2000) 8. Catal, C.: Software fault prediction: a literature review and current trends/Expert Systems with Applications 38, 4626–4636 (2011) 9. Lessmann, S., et al.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Software Eng. 34(4), 485–496 (2008) 10. Gondra, I.: Applying machine learning to software fault-proneness prediction. J. Syst. Softw. 81(2), 186–195 (2008) 11. Decision Trees for Classification: A Machine Learning Algorithm. https://www.xoriant.com/. Accessed 15 May 2022 12. Naïve Bayes Classifier Algorithm. https://www.javatpoint.com/. Accessed 15 May 2022 13. Altinger, H., Siegl, S., Dajsuren, Y., et al.: A novel industry grade dataset for fault prediction based on model-driven developed automotive embedded software. In: Proceedings of the 12th Working Conference Mining Software Repositories, pp. 494–497 (2015)
144
R. Tsouli Fathi et al.
14. Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference Predictive Models in Software Engineering, pp. 1–10 (2010) 15. Li, Z., et al.: Progress on approaches to software defect prediction. IET Softw. 12(3), 161–175 (2018). The Institution of Engineering and Technology 2018 16. Okutan, A., Yıldız, O.T.: Software defect prediction using Bayesian networks. Empir. Softw. Eng. 19(1), 154–181 (2012). https://doi.org/10.1007/s10664-012-9218-8 17. Rana, R.: A framework for adoption of machine learning in industry for software defect prediction. In: ICSOFT-EA 2014 - 9th International Conference on Software Engineering and Applications (2014)
Agile User Stories’ Driven Method: A Novel Users Stories Meta-model in the MDA Approach Nassim Kharmoum1,2,6(B) , Sara Retal2,3 , Karim El Bouchti2 , Wajih Rhalem4,6 , Mohamed Zeriab Es-Sadek5,6 , Soumia Ziti2 , and Mostafa Ezziyyani7 1
5
National Center for Scientific and Technical Research (CNRST), Rabat, Morocco [email protected] 2 IPSS Team, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco 3 SmartiLab, Moroccan School of Engineering Sciences (EMSI), Rabat, Morocco 4 E2SN Research team, ENSAM Rabat, Mohammed V University in Rabat, Rabat, Morocco M2CS team, ENSAM Rabat, Mohammed V University in Rabat, Rabat, Morocco 6 Moroccan Society of Digital Health, Rabat, Morocco 7 Mathematics and Applications Laboratory, Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tetouan, Morocco
Abstract. The mastery of the development cycle of a project, especially in software development, has become a need and an unavoidable necessity to produce a deliverable with value. Thus, to satisfy the requirements of users who have become more volatile. In this regard, the Agile manifesto, or Manifesto for Agile software development, has been proposed to bring increased flexibility and pragmatism to the delivered products by giving values and fundamental principles to revolutionize software development processes incrementally, but without handling the technical framework of their productions. So, our defiance in this paper is to add a technical framework to agile software development processes by proposing a user stories’ meta-model driven method in the Model Driven Architecture (MDA) approach. This approach will lead us to put the models and their meta-models at the center of the development process of any software system to facilitate the development process based on the construction and the transformation of models tasks in the Agile context.
Keywords: User Story (MDA) · Agile
1
· Meta-model · Model Driven Architecture
Introduction
The key to project success in the software development field is related to the mastery of its development cycle, from the requirement expression phase until c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 145–154, 2023. https://doi.org/10.1007/978-3-031-26384-2_13
146
N. Kharmoum et al.
its delivery. Different approaches have emerged, especially after the persistent failure of software development in practice, which has challenged the solution [1]. In this regard, 17 software development experts came together in 2001 to share their respective methods, resulting in the Agile Manifesto or Manifesto for Agile Software Development [2]. This result revolutionizes software development processes by proposing core values and principles of agility. These values and principles will increase the satisfaction of volatile customer requirements by delivering high value-added functionality quickly and consistently. Many methodologies have adopted the Agile approach to take advantage of their values and principles. Among them, we have Scrum, Kanban, Scrumban, Extreme programming, Scaled Agile Framework (SAFe), Nexus, and so on [3]. The agile manifesto has gained popularity outside its context of diligence as research has attempted to transpose its values and principles to other fields [4]. On the other hand, the Agile manifesto with all these methodologies exploits and implement values and fundamental principles to revolutionize software development processes incrementally without dealing with the technical framework of their productions. It leaves organizations free to choose the suitable paradigm that perfectly meets their needs, their application domain, and their expertise to implement their information systems. For our method, we focus on Model Driven Engineering (MDE) [5] as a software engineering paradigm to the Agile manifesto, considering the models and their meta-models as the main entity in the software systems development process. MDE is considered today as a technological solution for the future of the software industry, whose goal is to automate the development process of any information system. Indeed, Model Driven Architecture (MDA) [6] which is supported by Object Management Group (OMG) is qualified as the most widespread approach to MDE. MDA is based on the foundations of MDE, however, it offers its properties, represented by the recommendation of the use of several standards and the respect of various requirements [7]. The MDA offers three levels of abstraction for the MDE, namely, the highest level of abstraction (CIM: Computation Independent Model) containing the requirements models, the middle level of abstraction (PIM: Platform Independent Model) containing the analysis and design models, and finally, the lowest level of abstraction (PSM: Platform Specific Model) containing the code models. In this paper, we propose the basis of our method that aims to automate the construction and the generation of the information system in the context of the agile software development processes. To do so, we based on the user stories’ [8] model and their meta-model at the CIM level as a source model to generate the hole of information system models at the PIM and PSM levels. We focused on user stories as a source model because they represent a simplified format of business requirements from an end-user perspective in the agile context and clearly show how a software feature will deliver value. Thus, user stories concept are adapted by most agile methodologies to represent business requirements that are understandable by the different technical and functional actors of the project. For this paper, we present the proposed meta-model for the User Stories mod-
Agile User Stories’ Driven Method
147
els creation and validation that will be the source of all generated information system models for the agile context. To get to the heart of our study, the remainder of the paper will be structured as follows: the second section explains our proposal, which presents the user stories’ meta-models and their elements. A case study illustrating our method is shown in section three. Section four provides a discussion of all the results obtained. In the last section, we conclude our contribution and introduce our future work.
2
Our Proposed Method
In this section, we will highlight our method, which aims to construct the user stories model at the higher abstraction level of the MDA approach to generate the other information system models at different MDA abstraction levels in the context of the Agile project. We shed more light on the higher abstraction level (CIM level) because we consider it as the most complex and an important level [9]. On the one hand, there are no defined standards for this level. On the other hand, each change in this level affects all the other levels, PIM and PSM [10]. We stress the fact that our proposal will benefit from the experience we had during the generation of different models from the CIM abstraction level [11–15]. Our goal in this method is to apply the MDA approach to the Agile context [16], but in our own way. We place the user stories model at the center of all information system development process that presents a simplified format of business requirements in the context of an agile project at the CIM level. Thus, we aim to use the models of our method as a base for the system’s understanding and facilitating communication between technical and business stakeholders. Also, to generate other information system models in other abstraction levels, such as PIM and PSM levels (see Fig. 1).
Fig. 1. Our proposed method
To illuminate our method according to the MDA approach, Fig. 2 illustrates its transformation process. We focus on the definition of the user stories meta-
148
N. Kharmoum et al.
model to generate other models; the purpose is to create and generate correct models and to perform automated model transformations [17].
Fig. 2. Transformation process of our proposed method
The meta-model notion in the MDA approach plays a crucial role. The OMGMOF [18] defines the meta-model notion as “a model defines the language for expressing a model”. So, meta-models allow us to define the structure of models, their elements, and the relations between them. It allows validating the structure of existing models. Also, meta-models drive the generation of models by defining the mapping rules between the source and the target meta-models. The proposed meta-model for user stories describes the structure of our source model, to which any user story model created must conform. The goal is to create a correct model explaining how a business requirements system is created, organized, exchanged, consumed, and tested in an agile project. Therefore, our user-stories meta-model (Fig. 3) is made of the following elements, which are inspired by different agile methodologies based on the Agile Manifesto [8]. The user stories elements are: – Backlog:is an emerging list of tasks ones needed to develop our information system; – Epic: presents a corpus of tasks that can be subdivided into specific tasks called “user stories”; – User Story: is the main element of our meta-model, and it represents a business requirement in the agile context; – Priority: is the degree of importance and urgency of a “user story” in a backlog it helps the team during scheduling of the “user stories”; – Estimation: Determine the weight of the US based on an “estimation type”, either a ‘HOURS’ duration or a unit determined by the team which is ‘STORY POINT’; – User Story Description: is the simplified format of a business requirement and consists of three elements “User Role”, “Function”, and “Business Value”; – User Role: is the first part of “User Story Description”, it presents the actor of the “user story” (answer the ’who’ question); – Function: is the second part of “User Story Description”, and it presents the functionality and the goal of the “user story” (answer the ’what’ question);
Agile User Stories’ Driven Method
149
Fig. 3. Our user stories meta-model
– Business Value: is the third part of “User Story Description”, and it presents the benefit and the goal of the “user story” (answer the ‘why’ question); – Business Rule: are the essential business rules for the development of the “user story”; – Rule: presents one element of the “Business Rule”; – Acceptance Criteria: The set of test scenarios that validate a user story. – Scenario: presents one element of the “Acceptance Criteria” and consists of three elements “Precondition”, “Test Action”, and “Expected Outcome”; – Precondition: is the first part of an “Acceptance Criteria”, it presents the prerequisites to perform the test scenario; – Test Action: is the seconds part of an “Acceptance Criteria”, it presents the action of the test scenario; – Expected Outcome: is the third part of an “Acceptance Criteria”, it present the expected result of the test scenario; – User Story Model: includes all of the above elements, and it presents the user stories model. For our proposed method, the only constraint to create a correct user stories model is to respect its meta-model that is previously explained and presented in Fig. 3.
150
3
N. Kharmoum et al.
Case Study
To illustrate our method, this section shows the practical aspect of our “Call for Proposals” case study through Eclipse Modeling Framework (EMF) [19]. Thus, Fig. 4 illustrates our case study in two main folders; the first contains the user stories meta-model (has extensions “.ecore” and “.aird”), the second contains the user stories model (with “.xmi” extensions).
Fig. 4. Our practical case structure
– Extensions explanation: • ecore: is the meta-meta-model for the Eclipse Modeling Framework (EMF). • aird: represents the graphical part of the EMF meta-, and contains Sirius representations data [20]. • xmi: XML Metadata Interchange [21] is an OMG standard for describing an instance of the meta-model in XML format. The “Call for Proposals” case study is illustrated in Fig. 5 and contains tree’User stories’: “Fill the personal information”, “Fill the project budget”, and “Fill the partners structures’ information”. For the Fig. 5 we focus just on the “Fill the partners structures’ information” which contains the ’Priority’ with the value 1, ‘Estimation’ with value 8 that mean 8 hours, and ‘status’ “TODO”. For the ‘User Story Description’, we have: “As a candidate, I WANT to fill the partner structures’ information, SO THAT I can benefit from the collaboration with our partners”, because we choose the prefix “AS” for the ‘User Role’, “I WANT” for the ‘Function’, and “SO THAT” for the ‘Business Value’. For this ‘User Story’ we propose three ‘Business Rules’: “The partners must be members
Agile User Stories’ Driven Method
151
of the university”, “The partners must contribute to the application result”, and “The partners should have already collaborated before”. However, to validate our ‘User Story’we have as ‘Acceptance Criterion’ two ‘scenarios’ the first is “GIVEN the candidate has an account, WHEN the candidate open the page for filling the partners structures’ information, THEN the personal information of the candidate must be pre-filled”. And we have: “GIVEN the candidate on the page for filling the partners structures’ information, WHEN the candidate clicks on save bottom, THEN the partners structures’ information must be saved on the database”. For the ‘Acceptance Criterion’ we use the prefixes: “GIVEN” for ‘Prediction’, “WHEN” for ‘Test Action’, and “THEN” for ‘Expected Outcome’. In our method we propose that ‘User Stories’ are grouped on a ‘Epic’ for our case study all the studied ‘user stories’ are on the ‘Epic’: “Call for Proposals application submission”. In the same way, all the studied Epics are grouped in the ‘Backlog’ “Call for Proposal backlog”. In turn, ‘Backlogs’ are grouped on one “User Story Model” named ‘Call for proposal model’. We emphasize the fact that the proposed “Call for Proposals” model respects the structure of our proposed meta-mode. And this will be the case for all the models created via our user stories’ meta-model.
4
Discussion
In this work, we focused on the application of the MDA approach in the agile context to profit from the strengths of Agile and the MDA approaches. To do so, we propose a novel users stories meta-model at the higher level of the MDA Approach. As results: – We focus on a simplified format of business requirements in the context of an agile project, to generate the hole of the information system models; – We start with the MDA higher level, that is the most complex and an important MDA level to generate other information system models in different abstraction levels; – The created user stories model is confirmed to the proposed meta-model; – The proposed meta-model can validate the existing user stories models in our method before processing the generation of other models in different abstraction levels; – The proposed user stories meta-model define the structure of user stories models, their elements, and the relations between them; – The proposed user stories meta-model elements are extracted from different agile methodologies and can be applied to most of them; – The created user stories model gives more details about each user story, its business rules, acceptance criteria, status, etc. – The source and the target models can be used as system’s understanding and facilitating communication between technical and business stakeholders. In addition to this, based on the proposed meta-model, we can extract several pieces of information such as the main tasks of the project (epics) and its details
152
N. Kharmoum et al.
Fig. 5. The proposed User Stories models (Call for Proposals)
(user stories), the total estimate of the project (estimation), the progress of all the tasks based on their status (status). We can also extract the business rules of the project (via business rule), and we can retrieve the validation tests of the project (via Acceptance criteria). Thus, all the user stories elements can help us detect mapping rules with other models at different abstraction levels to generate further models. Also, we can rely on the different scenarios of the acceptances criterion to promote the use of the Behavior Driven Development (BDD) [22] and Test Driven Development (TDD) [23] software development practices.
5
Conclusion and Future Work
Our challenge in this paper was to lay the foundation of the method based on user stories. To overcome this challenge, we propose a novel users stories meta-model in the MDA Approach for the agile context. The proposed user stories metamodels at the MDA higher level, allow us to define the structure of user stories models, their elements, and the relations between them. It allows validating the structure of user stories models. We have proposed a user stories’ meta-model
Agile User Stories’ Driven Method
153
adapted to the context of Agile methodologies that will allow us to benefit from the strengths of Agile and the MDA approaches. The main goal is to master and simplify the development cycle of software development projects so that we can be competitive by meeting the requirements of users who are becoming more and more volatile. In our future work, we will focus on the generation of the information system models at different MDA abstraction levels based on our proposed source user stories meta-models, while respecting the MDA approach in the Agile context.
References 1. Egbokhare, F.: Causes of software/information technology project failures in nigerian software development organizations. African J. Comput. ICT 7(2), 107–110 (2014) 2. Manifesto, A.: Agile manifesto. Haettu 14, 2012 (2001) 3. Alqudah, M., Razali, R.: A review of scaling agile methods in large software development. Int. J. Adv. Sci. Eng. Inform. Technol. 6(6), 828–837 (2016) 4. Hajjaj, M., Lechheb, H.: Co-construction of a new management approach in a public research funding agency through the contextualization of agile thinking. Organ. Cultures 21(1), 21–34 (2021) 5. B´ezivin, J., Briot, J.P.: Sur les principes de base de l’ing´enierie des mod`eles. Obj. Logiciel Base donn´ees R´eseaux 10(4), 145–157 (2004) 6. OMG-MDA: MDA Guide version 2.0. OMG (2014) 7. Rhazali, Y., Hadi, Y., Mbarki, S.: Transformation des mod`eles depuis CIM vers PIM dans MDA: Transformation automatique depuis le cahier de charge vers l’analyse et la conception. Noor Publishing (2017) 8. Cohn, M.: User Stories Applied: For Agile Software Development. Addison-Wesley Professional (2004) 9. Blanc, X., Salvatori, O.: MDA en action: Ing´enierie logicielle guid´ee par les mod`eles. Editions Eyrolles (2011) 10. Bousetta, B., El Beggar, O., Gadi, T.: A methodology for cim modelling and its transformation to PIM. J. Inform. Eng. Appl. 3(2), 1–21 (2013) 11. Kharmoum, N., Rhalem, W., Retal, S., Ziti, S., et al.: Getting the uml’s behavior and interaction diagrams by extracting business rules through the data flow diagram. In: Kacprzyk, J., Balas, V.E., Ezziyyani, M. (eds.) Advanced Intelligent Systems for Sustainable Development (AI2SD’2020). AI2SD 2020. AISC, vol. 1417, pp. 40–547. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-90633-7 5 12. Kharmoum, N., Ziti, S., Rhazali, Y., Omary, F.: An automatic transformation method from the e3value model to IFML model: An MDA approach. J. Comput. Sci. 15(6), 800–813 (2019) 13. Kharmoum, N., Ziti, S., Rhazali, Y., Omary, F.: An automatic transformation method from the e3value model to uml2 sequence diagrams: an mda approach. Int. J. Comput. 18(3), 316–330 (2019) 14. Kharmoum, N., Ziti, S., Rhazali, Y., Fouzia, O.: A method of model transformation in MDA approach from e3value model to bpmn2 diagrams in cim level. IAENG Int. J. Comput. Sci. 46(4) (2019)
154
N. Kharmoum et al.
15. Kharmoum, N., Retal, S., Ziti, S., Omary, F.: A novel automatic transformation method from the business value model to the UML use case diagram. In: Ezziyyani, M. (eds.) Advanced Intelligent Systems for Sustainable Development (AI2SD’2019). AI2SD 2019. Advances in Intelligent Systems and Computing, vol. 1104, pp. 38–50. Springer, Cham (2019). https://doi.org/10.1007/978-3-03036671-1 4 16. Mellor, S.J.: Agile MDA. MDA Journal, www. bptrends. com June (2004) 17. Rodr´ıguez, A., de Guzm´ an, I.G.R., Fern´ andez-Medina, E., Piattini, M.: Semiformal transformation of secure business processes into analysis class and use case models: an MDA approach. Inform. Software Technol. 52(9), 945–971 (2010) 18. OMG-MOF: Meta Object Facility version 2.5. OMG (2015) 19. Budinsky, F., Ellersick, R., Steinberg, D., Grose, T.J., Merks, E.: Eclipse Modeling Framework: A Developer’s Guide. Addison-Wesley Professional (2004) 20. Viyovi´c, V., Maksimovi´c, M., Perisi´c, B.: Sirius: a rapid development of dsm graphical editor. In: IEEE 18th International Conference on Intelligent Engineering Systems INES 2014, pp. 233–238. IEEE (2014) 21. OMG-XMI: XML Metadata Interchange version 2.5.1. OMG (2015) 22. Solis, C., Wang, X.: A study of the characteristics of behaviour driven development. In: 37th EUROMICRO Conference on Software Engineering and Advanced Applications, vol. 2011, pp. 383–387. IEEE (2011) 23. Beck, K.: Test-Driven Development: By Example. Addison-Wesley Professional (2003)
AI-Based Adaptive Learning - State of the Art Aymane Ezzaim(B) , Aziz Dahbi, Noureddine Assad, and Abdelfatteh Haidine Laboratory of Information Technologies, National School of Applied Sciences, Chouaib Doukkali University, El Jadida, Morocco {ezzaim.a,dahbi.a,assad.n,haidine.a}@ucd.ac.ma
Abstract. In recent decades, the area of education has seen numerous transformations, owing to the rapid advancement of Information and communication technologies (ICTs), the democratization of the internet, the rise of web technologies, and, most recently, the rapid advancement of artificial intelligence (AI) techniques, especially those who fall under the banner of the AI-subset entitled Machine learning. In this respect, the goal of this paper is to present a state of the art about AI applications in education while highlighting the most requested artificial intelligence in education (AIEd) approach called adaptive learning. This approach that has created a new opportunities in terms of adapting the different elements of the learning process (content, pedagogy, learning path, presentation etc.) to the needs of the learner (learning style, level, prior knowledge, preference, performance etc.), through the potential of AI represented by different systems and applications, in order to increase learning outcomes. In this present paper, we will also show real-case AI-based adaptive learning system implementations and explore their objectives, mechanisms, factors employed during adaptation, AI algorithms adopted and impact on learning. This study will provide insight into the topic of adaptive learning through a frame of reference and descriptive analysis, which can be used as a springboard for further investigation. Keywords: Artificial intelligence · Adaptive learning · AIED · AI-based adaptive learning systems
1 Introduction According to [1] Learning is a somewhat long-term process of changing one’s behavior because of practice or experience. Adaptability is the ability to examine the details of each circumstance in order to behave appropriately [2]. In terms of [3], the research and development of intelligent hardware and software that can actually think, learn, gather knowledge, communicate, manipulate and see objects is known as artificial intelligence (AI). Based on the three definitions, we may claim that AI-based adaptive learning is a discipline devoted to the study and development of intelligent systems capable of examining the specifics of each learning situation in order to optimize learning processes and provide relevant practices and experiences in the appropriate way, at the right place and at the proper time. Through this study, we aim to investigate the applications of artificial intelligence in the context of school and academic education, in particular adaptive learning as a concept, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 155–167, 2023. https://doi.org/10.1007/978-3-031-26384-2_14
156
A. Ezzaim et al.
which has piqued the interest of many researchers in recent years. The motivation for specifically choosing this research topic refers to a number of factors, including the great advantage that has demonstrated the use of AI in several areas as well as the significant shift in education that has occurred in the aftermath of the covid-19 pandemic, in which the use of digital resources and platforms has become an urgent goal. This in order to overcome the limitations of traditional learning, which lacks interactivity, personalization of educational content, and satisfaction of learners’ needs in learning presentation [4]. To achieve the objective of our research three questions were chosen, namely: • What is adaptive learning? • What are the applications of AI in education? • How and why AI is introduced into adaptive learning? Our paper containing an extensive literature review to answer these questions is organized as follows: The definitions of two fundamental concepts, artificial intelligence and adaptive learning, will be discussed in the first section. The second section discusses how AI can be used in the realm of education. The third section, which covers algorithms, issues, and related work, explains how and why AI is used in adaptive learning. The conclusion, as well as some suggestions for future research on the topic, are included in the last section.
2 Background In this section, we hope to establish the frame of reference for our research, by explaining and detailing the artificial intelligence (AI) and adaptive learning as a key concepts 2.1 Artificial Intelligence Artificial intelligence (AI) is a field of study in which researchers strive to create a collection of computer technologies that are inspired by how humans utilize their neurological systems and bodies to feel, learn, reason, and act [5]. According to [6] AI is more than an engineering discipline; it is a broad collaboration of various disciplines and participants in the goal to provide the ability to augment and maybe replace human functions and activities across a wide range of applications and domains, including business and management, government, public sector, research and technology etc. AI has undergone various evolutions since the 1950s until today; the figure below shows some of the developments as mentioned in [7]. As shown in Fig. 1, the concept of artificial intelligence originated in the 1950s through early explorations aimed at solving simple problems such as solving puzzles, answering simple questions, classifying visual images etc. The heyday of AI was from the mid-1960s to the mid-1970s, with conferences, books, and funding boosting scientific study. From the 1970s forward, AI specializations and applications evolved, such as Speech Recognition and Understanding Systems, Consulting Systems, and so on. Machine learning, which emerged from the 1980s, is a subset of AI that refers to the
AI-Based Adaptive Learning - State of the Art
157
Fig. 1. Timeline of AI ideas and achievements.
process that allows a machine to learn independently from a vast quantity of data and produce predictions [8]. During the same era the Bayesian Networks and the architecture of intelligent systems appeared. Regarding to modern AI, it refers to deep learning as a subset of machine learning based on Artificial Neural Network (ANN) as well as a wide range of applications such as: • • • •
Chess games. Robotic systems. Ubiquitous artificial intelligence (smart home). Smart tools in medicine, translation of business practices into languages, planning and negotiation, etc.
We add that AI has a number of advantages over humans in a variety of areas, including [9]: • High task speed compared to humans (e.g. fraud detection in finance, predicting future trends in marketing etc.). • Perform difficult and unpleasant tasks with ease (e.g. disease risk prevention in medicine etc.). • Perform multiple tasks at the same time (e.g. recommend beneficial content and optimize ads). • The rate of success is high (e.g. identifying students who are at risk in education field etc.). • Calculating circumstances that are long and difficult (e.g. sales forecasting in marketing). 2.2 Adaptive Learning Firstly, it is necessary to define human learning, which represents according to [9, 10] the process of obtaining knowledge or abilities in a certain subject via experience and
158
A. Ezzaim et al.
practice, resulting in a lasting change in a person’s behavior. As for adaptive learning, it is a field of study that tries to adapt the learning pathway and personalize learning experiences for each student based on their needs, preferences, knowledge, performance, learning style, success, and a variety of other cognitive and affective factors [11, 12]. The goal of this adaptation is to boost learner motivation and satisfaction, as well as the effectiveness of the learning process as assessed by success rate, completion rate, and performance, among other metrics [13–15]. Adaptive Interaction, Adaptive Course Delivery, Adaptive Collaboration Support, and Adaptive Assessment are examples of adaptations that facilitate learner interactions, adapt courses to the learner’s needs, allow for adaptive social interaction, and adapt the assessment process in terms of questions, presentation, and difficulty to the learner’s knowledge [16, 17]. In order to implement adaptive learning, several theoretical models have been produced which have allowed the development of a range of solutions and systems which we will interpret in the following. The main approaches used in adaptive learning are: • Macro-Adaptive Approach. This approach is based on various factors (cognitive or learning styles, achievement levels, motivation, cognitive abilities, etc.) to adapt learning objectives, learning content, presentation etc. [18]. • Micro-Adaptive Approach. Based on a diagnostic process, this approach aim to identify the specific needs of each learner and use these to adapt instructions and provide teaching requirements [18]. • Machine learning approach. The goal of this approach, as the name suggests, is to use machine-learning algorithms (Supervised learning algorithms, Unsupervised learning algorithms, Semi Supervised Learning, Reinforcement Learning etc.) [19] to solve difficulties that arise during adaption learning. Among these issues we can mention [20–23]: • • • •
Suggesting personalized learning material. Customization of learning units. Predicting student engagement. Classification of learners according to learning style, for example.
Since the 1980s, different systems have been conceived to put these approaches into practice [24]. Adaptive learning systems effectively aim to provide an interactive learning experience that takes into account students’ individual characteristics to promote learning processes [25, 26]. Among the first adaptive learning systems, we cite intelligent tutoring systems (ITSs). These types of tools can comprehend and learn from students’ nuanced answers, provide clues to help them develop, provide tailored instruction, profile each student and measure their level of proficiency, and so on, by replicating the talents, behavior, and supervision of a human teacher [27]. In terms of architecture, ITSs enable dynamic adaptability through four model components [28, 29]: • Expert Model: This model seeks to identify what ITSs should offer students and to assess their activities and reactions by comparing them to expert recommendations.
AI-Based Adaptive Learning - State of the Art
159
• Pedagogical model used to make decisions about the type, timing and manner of delivering information to a learner as well as the appropriate learning strategy. • Learner model, which aim to aims to build individualized pathways based on a number of factors relating to the learner. • Communication model. This model is a kind of graphical representation of the expert and knowledge model, in favor of the learner. One of the limitations of this type of system is that they do not allow for self-directed learning, which has led to the design of a new form known as adaptive hypermedia systems (AHSs) [29]. AHSs allow to describe and organize content, represent the objectives, preferences and knowledge of each learner and generate adaptive learning paths, relying respectively on three basic components [30]: • The application domain model. • The user model. • The adaptation model. After the emergence of web technologies, another type of adaptive learning system known as Adaptive E-learning systems appeared. This form combines the capabilities of ITSs and AHMs with those of E-learning platforms to provide capacity access to a large amount of material, the ability to accommodate large numbers of students without the need for multiple buildings [31] as well as to adapts the online learning process, and provides modified access to objects based on the learner’s needs [32]. The evolution of adaptive learning has resulted in the creation of systems and platforms based on AI approaches, particularly the machine learning approach, that enable high-quality adaptive learning in terms of instructional material and parameters taken into account during adaptation [33]. In the next sections, we will go through this type of system in further depth.
3 Research Methodology In order to address the questions posed to identify the objective of our research aimed at understanding and explaining the AI-based adaptive learning approach, a qualitative study of descriptive type relying on the literature review is put into practice. The first step was the collection of data through an electronic search in different scientific databases and university archives namely, Google Scholar, Springer, ScienceDirect, IEEE Xplore Digital Library (IEEE), the Open Publication Archive of the University of Tampere and Botho University and others. The search string was the combination of keywords like, (“Education" and “Artificial Intelligence") or (“Adaptive learning" and “Artificial Intelligence") or (“educational-AI" and “adaptive learning"). Our sample consists of 18 studies written in English and published between 2010 and 2022. The second step was to categorize the obtained data into two categories: AI applications in education generally and AI applications in adaptive learning specifically. The purpose of this classification was to determine the ultimate applications of AIED to improve the teaching and learning process as well as the contributions of AI to adaptive
160
A. Ezzaim et al.
learning in terms of goals, factors and algorithms determined through the analysis and comparison of different case studies.
4 Results 4.1 AI and Its Application in Education Area Education has always followed technological development, before with the emergence of Information and communication technologies (ICT) and today with the growth of AI and the great potential offered by its sub-domains (machine learning, deep learning). AI in education, like other field, offers a wide range of applications that fall under the umbrella of the Artificial Intelligence in Education (AIEd) research area. The ultimate applications of AIED to improve the teaching-learning process include: • Predicting learner success through machine learning algorithms. This study [34] found that the following machine learning algorithms: decision trees, and naïve Bayes classification, were successful in predicting learner performance and their accuracy rates were 78%, 98% respectively. • Measuring dropout risk, attrition risk, and completion risk. The researchers were able to construct a model based on data mining to predict retention before the second year with an accuracy of roughly 80% through this research [35]. • Support teachers, institution staff and parents. This study [36] shows how a dashboard based on learning analytics, machine learning may help teachers, and parents track the student’s more subtle cognitive consequences. • Language Learning through Natural Language Processing (NLP). In this work [37], for example, the researchers created an educational game for teaching English based on the Natural Language Processing technique, which automatically answers questions on English. • Online supervision. This study [38] offered a high-performance audit process for online English instruction using machine-learning techniques. • Personalized learning materials. We also emphasize the personalization of educational content, which is part of the adaptive learning domain and will be examined in detail in the next section, as one of AI’s contributions to education. These diverse contributions are most likely to be made via AI-based learning systems and platforms, specifically [39]: • • • • •
Intelligent tutoring system (ITS). Recommendation systems. Adaptive learning systems. Learner Diagnosis, Assistance, and Evaluation System. Automated Essay Scoring (AES) systems.
In this section, we have highlighted some of the applications of AI in education, and we have discussed how both AI techniques and their applications in education are progressing to solve other limitations. In the following, we will discuss in detail the
AI-Based Adaptive Learning - State of the Art
161
applications of AI in the area of adaptive learning, which is indeed part of the field of education. 4.2 AI in Adaptive Learning In this section, we will address one of the ultimate goals of AIED, which belongs to the paradigm that considers the learner as a leader [40] and aims to provide a high level of adaptability through complex AI-based adaptive systems that include several entities and factors. As the name implies, AI-based adaptive learning systems are environments that suggest appropriate learning material, manage learner interaction, and provide individualized learning, generating different learning paths for different concepts based on the learner’s most influential factors such as behavior, performance, preferences, learning styles, personality, knowledge, performance, and so on [20, 23, 41]. The main objective of these solutions is to improve learning using various AI algorithms. The aims addressed by these systems, the factors on which they rely, and the methods used are listed in the table below (Table 1). Table 1. Examples of AI contributions in adaptive learning. Objectives addressed Factors used
AI algorithms used
Study example
Performances
Learner’s marks Online behavior Level of engagement
Naive Bayes Decision Tree K-Nearest Neighbors Support Vector Machine
[42]
Motivation
Learning style preferences Personality strengths and weakness Prior motivation Needs Prior knowledge Goals Responses Learner’s marks
Logistic Regression Support Vector Multinomial Naive Bayes XGBoost CNN-LSTM
[43]
Knowledge
Interest Needs Preferences Digital properties Physical properties Physical conditions Noise Illumination level
Heuristic algorithms Similarity algorithms Decision–based algorithms
[44]
(continued)
162
A. Ezzaim et al. Table 1. (continued)
Objectives addressed Factors used
AI algorithms used
Study example
Engagement
Learning styles Browsing behavior
Multi-layer feed-forward neural network Artificial neural networks
[45]
Outcomes
Learning styles Domain knowledge
Data mining algorithms Regression techniques
[46]
Self-assessment and Regulation
Preferences Prior learning performances
Fuzzy Constraints
[47]
AI-based adaptive learning systems, as indicated in the table above, can improve numerous aspects of the learning process, including learner performance, knowledge, engagement, motivation, and skills like self-assessment. To make such enhancements, various cognitive factors such as learner grades, prior knowledge, domain knowledge, prior learning performance, and affective factors such as preferences, learning styles, prior motivation, level of engagement, and so on are used. The most often used algorithms are those linked to supervised-machine learning, which classify data based on prior knowledge [48], such as: • • • • • • • • • •
Naive Bayes. Decision Tree. K-Nearest Neighbors. Support Vector Machine. Logistic Regression. Multinomial Naive Bayes. XGBoost. Similarity algorithms. Decision–based algorithms. Regression techniques.
To wrap up this section, we may offer the following works that are related to the use of AI in adaptive learning to demonstrate how and why this technology is employed. In [49], the work presented intends to personalize learning materials for children with special needs by identifying the user’s level of writing, speaking, and listening skills using machine learning techniques such as NaiveBayes, multilayer perceptron, SMO, and J48. The following research [50] proposes an adaptive narrative game system that allows for the presentation of individualized tips, prompts, and/or lessons based on the student’s current level of domain knowledge as determined by a random forest machine-learning model that considers factors such as the time required to find solutions, errors in solutions, and emotional indicators. The authors in [51], suggests a User-Intelligent Adaptive Learning Model for Learning Management System for the purpose of approving appropriate learning materials for
AI-Based Adaptive Learning - State of the Art
163
learners based on the identified learning style. To do so, data mining techniques are used to categorize learners’ learning styles, followed by AI techniques (fuzzy logic, incorporating intelligence, machine learning, and decision support systems) to facilitate the extraction of learners’ needs based on variables that influence learners’ performance. An adaptive system called Learning Intelligent System (LIS) is presented in this study [42]. This system relies on various factors such as, learner’s grades, enrolment numbers etc. to predict the risk of failure and thereby increase learner’s performance as well as reduce dropout. Naive Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) were the machine methods used in this study. “SPOnto” is an anthology presented in this study [43] that aims to represent a student profile in an adaptive gamified learning system by classifying different types of learners using machine-learning techniques such as: • • • • •
Logistic Regression. Support Vector. Multinomial Naive Bayes. XGBoost. CNN-LSTM.
The goal of this classification is to discover the genuine needs of students based on their qualities (background knowledge, preferred learning style, aims, personality, and so on), which will help these students become more motivated. In this section, we have answered the third research question by addressing the applications of artificial intelligence in adaptive learning, in terms of factors employed during adaptation, algorithms adopted and use cases.
5 Discussion According to the results presented in the above section, it is remarkable that artificial intelligence with its different techniques and algorithms is widely used to address the various issues in the field of education. This implementation, which expressed results characterized by a high percentage of accuracy, validates the effectiveness of this technology in this delicate domain. Furthermore, AI has influenced not only the student, who is at the center of the teaching-learning process, but also the other actors, such as teachers, administrators, and parents, by facilitating their practices and enhancing their performance. Among the fields of AI application in education, there is the adaptive learning, one of the most difficult problems to solve without the use of powerful technologies, due to the great number of parameters that influence the adaptation process. In this respect, artificial intelligence was one of the technologies that answered this need in an efficient way. Based on the research that has addressed this topic, the most important factor considered in the adaptation process is learning style. A factor that encompasses the preferred cognitive, affective, motivational and behavioral strategies employed during learning [52]. In addition, this adaptation is generally concerned with the delivery of
164
A. Ezzaim et al.
appropriate learning materials with a lack of adaptation of pedagogical approaches and educational theories, which represents a gap in the scientific research related to AI-based adaptive learning. Finally, it should be mentioned that AIED is a broad and fruitful subject from which many new applications using artificial intelligence to better teaching and learning have emerged. In terms of adaptation, further in-depth research with a large sample of studies is needed to identify more limits that should be included in future studies.
6 Conclusion The goal of this study was to undertake a literature review of AI-based adaptive learning without overlooking other AI in education applications that fall within the AIED umbrella. Three questions guided our research, the first of which was to identify adaptive learning as a research field. The second is about the use of artificial intelligence’s potential in learning. The third is on AI’s applications in adaptive learning, namely the issues addressed, the algorithms employed, the procedure followed, and the systems built in this area. We attempted to answer the first question by defining adaptive learning from several perspectives and examining these approaches and systems in the first part. We addressed the second question in the second half, which focused on AI applications in education. In the third portion, we looked at AI-based adaptive learning from several angles, as well as how and when AI is employed in adaptive learning. We can deduct from the findings of this study that adaptive learning is an area of research that is in high demand and represents a revolution in the educational field, especially due to one of the strengths of this approach, which is the use of artificial intelligence techniques. Our future study will build on this rich ground. Firstly by carrying out other deeper investigations in order to overcome the limitation of this study that is manifested in the small number of studies collected. Secondly by designing other systems that allow the personalization of other aspects regarding the learning process, such as assessment, as well as other features that have not been widely adopted, and finally by experimenting with them in different contexts not widely addressed.
References 1. Lachman, S.J.: Learning is a process: toward an improved definition of learning. J. Psychol. 131(5), 477–480 (1997). https://doi.org/10.1080/00223989709603535 2. Spector, J.M.: The potential of smart technologies for learning and instruction. Int. J. Smart Technol. Learn. 1(1), 21–32 (2016) 3. Pannu, A.: Artificial intelligence and its application in different areas. Artif. Intell. 4(10), 79–84 (2015) 4. Alshammari, M., Anane, R., Hendley, R.J.: Adaptivity in ELearning Systems. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems, pp. 79–86 (2014)
AI-Based Adaptive Learning - State of the Art
165
5. Stone, P., Brooks, R., Brynjolfsson, E., Calo, R., Etzioni, O., Hager, G., Hirschberg, J., Kalyanakrishnan, S., Kamar, E., Kraus, S., Leyton-Brown, K., Parkes, D., Press, W., Saxenian, A., Shah, J., Tambe, M., Teller, A.: Artificial Intelligence and life in 2030: the one hundred year study on artificial intelligence (2016) 6. Dignum, V.: AI is multidisciplinary. AI Matters 5(4), 18–21 (2020) 7. Nilsson, N.J.: The Quest for Artificial Intelligence. Cambridge University Press (2009) 8. Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3(3), 210–229 (1959). https://doi.org/10.1147/rd.33.0210 9. Khanzode, K.C.A., Sarode, R.D.: Advantages and disadvantages of artificial intelligence and machine learning: a literature review. Int. J. Lib. Inform. Sci. 9(1), 3 (2020) 10. Mayer, R.: Applying the science of learning to medical education. Med. Educ. 44(6), 543–549 (2010). https://doi.org/10.1111/j.1365-2923.2010.03624.x 11. Maryam, Y., Jahankhani, I.H.: A personalized adaptive e-learning approach based on semantic web technology. Webology 10(2), 1–14 (2013). http://www.webology.org/abstract.php? id=271 12. Forsyth, B., Kimble, C., Birch, J., Deel, G., y Brauer, T.: Maximizing the adaptive learning technology experience. J. High. Educ. Theory Pract. 16(4), 80–88 (2016) 13. Bauer, M., Bräuer, C., Schuldt, J., Krömker, H.: Adaptive e-learning for supporting motivation in the context of engineering science. In: Nazir, S., Teperi, A.-M., Polak-Sopi´nska, A. (eds.) AHFE 2018. AISC, vol. 785, pp. 409–422. Springer, Cham (2019). https://doi.org/10.1007/ 978-3-319-93882-0_39 14. Huang, S.-L., Shiu, J.-H.: A user-centric adaptive learning system for e-learning 2.0. J. Educ. Technol. Soc. 15(3), 214–225 (2012). (International Forum of Educational Technology & Society). http://www.jstor.org/stable/jeductechsoci.15.3.214 15. Bijwe, R.P., Raut, A.B.: A survey of adaptive learning with predictive analytics to improve students learning. Bulletin Monumental - ISSN / e-ISSN 0007-473X. Volume 22: Issue 1 2021. http://bulletinmonumental.com/gallery/2-jan2021.pdf 16. Paramythis, A., Loidl-Reisinger, S.: Adaptive learning environments and elearning standards. Electron. J. e-Learn. 2(1), 181–194 (2004) 17. Chieu, V.M.: Constructivist learning : an operational approach for designing adaptive learning environments supporting cognitive flexibility/. UCL - Université Catholique de Louvain (2005). https://dial.uclouvain.be/pr/boreal/object/boreal:5145 18. Felix, M., et al.: The Past, the Present and the Future of Adaptive E-Learning: An Approach within the Scope of the Research Project AdeLE. (2004) 19. Batta, M.: Machine Learning Algorithms -A Review (2019). doi:https://doi.org/10.21275/ ART20203995 20. Shawky, D., Badawi, A.: A reinforcement learning-based adaptive learning system. In: Hassanien, A.E., Tolba, M.F., Elhoseny, M., Mostafa, M. (eds.) AMLTA 2018. AISC, vol. 723, pp. 221–231. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74690-6_22 21. Kurilovas, E.: Advanced machine learning approaches to personalise learning: learning analytics and decision making. Behav. Inform. Technol. 1–12 (2018). https://doi.org/10.1080/ 0144929x.2018.1539517 22. Abdul Hamid, S.S., Admodisastro, N., Manshor, N., Kamaruddin, A., Ghani, A.A.A.: Dyslexia adaptive learning model: student engagement prediction using machine learning approach. In: Ghazali, R., Deris, M.M., Nawi, N.M., Abawajy, J.H. (eds.) SCDM 2018. AISC, vol. 700, pp. 372–384. Springer, Cham (2018). https://doi.org/10.1007/978-3-31972550-5_36 23. Li, Y., Shao, Z., Wang, X., Zhao, X., Guo, Y.: A concept map-based learning paths automatic generation algorithm for adaptive learning systems. IEEE Access 7, 245–255 (2019). https:// doi.org/10.1109/ACCESS.2018.2885339
166
A. Ezzaim et al.
24. Azough, S.: E-Learning Adaptatif: Gestion intélligente des ressources pédagogiques et adaptation de la formation au profil de l’apprenant (2014). http://thesesenafrique.imist.ma/handle/ 123456789/1666 25. El-Sabagh, H.A.: Adaptive e-learning environment based on learning styles and its impact on development students’ engagement. Int. J. Educ. Technol. High. Educ. 18(1), 1–24 (2021). https://doi.org/10.1186/s41239-021-00289-4 26. Shute, V.J., Zapata-Rivera, D.: Adaptive educational systems. Adapt. Technol. Train. Educ. 7, 27 (2012) 27. Bashar, G.A.-B., Abu Naser, S.S.: Design and development of an intelligent tutoring system for C# language. Euro. Acad. Res. 4(10) (2017) 28. Scott, C.W.B.: Adaptive systems in education: a review and conceptual unification. Int. J. Inform. Learn. Technol. 34(1), 2–19 (2017). https://doi.org/10.1108/IJILT09-2016-0040 29. Phobun, P., Vicheanpanya, J.: Adaptive intelligent tutoring systems for e-learning systems. Procedia. Soc. Behav. Sci. 2(2), 4064–4069 (2010). https://doi.org/10.1016/j.sbspro.2010. 03.641 30. Cannataro, M., Pugliese, A.: XAHM: an xml-based adaptive hypermedia model and its implementation. In: Reich, S., Tzagarakis, M.M., De Bra, P.M.E. (eds.) AH 2001. LNCS, vol. 2266, pp. 252–263. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45844-1_24 31. Arkorful, V., Abaidoo, N.: The role of e-learning, the advantages and disadvantages of its adoption in higher education. Int. J. Educ. Res. 2, 397–410 (2014) 32. Everton, G., et al.: Use of deep multi-target prediction to identify learning styles. Appl. Sci. 10(5), 1756 (2020). https://doi.org/10.3390/app10051756 33. Eugenijus, K., et al.: Recommending suitable learning paths according to learners’ preferences: experimental research results. Comput. Hum. Behav. 51, pp. 945–51 (2015). https:// doi.org/10.1016/j.chb.2014.10.027 34. Murat, P.: Using Machine Learning to Predict Student Performance (2017). https://trepo.tuni. fi/handle/10024/101646 35. Delen, D.: A comparative analysis of machine learning techniques for student retention management. Decis. Support Syst. 49(4), 498–506 (2010) 36. Kommers, P.: Machine learning for learning analytics for meta-cognitive support. In: McKay, E. (ed.) Manage Your Own Learning Analytics. SIST, vol. 261, pp. 205–217. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-86316-6_10 37. Andhik Ampuh, Y., et al.: English education game using non-player character based on natural language processing. Procedia Comput. Sci. 161, 502–508 (2019). https://doi.org/10.1016/j. procs.2019.11.158 38. Lu, W., Vivekananda, G.N., Shanthini, A.: Supervision system of english online teaching based on machine learning. Prog. Artif. Intell. (2022). https://doi.org/10.1007/s13748-02100274-y 39. Zawacki-Richter, O., Marín, V.I., Bond, M., Gouverneur, F.: Systematic review of research on artificial intelligence applications in higher education – where are the educators? Int. J. Educ. Technol. High. Educ. 16(1), 1–27 (2019). https://doi.org/10.1186/s41239-019-0171-0 40. Ouyang, F., Jiao, P.: Artificial intelligence in education: the three paradigms. Comput. Educ. Artific. Intell. 2, 100020 (2021). https://doi.org/10.1016/j.caeai.2021.100020 41. Real-Fernández, A., Molina-Carmona, R., Pertegal-Felices, M.L., Llorens-Largo, F.: Definition of a feature vector to characterise learners in adaptive learning systems. In: Visvizi, A., Lytras, M.D. (eds.) RIIFORUM 2019. SPC, pp. 75–89. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-30809-4_8 42. Guerrero-Roldán, A.-E., Rodríguez-González, M.E., Bañeres, D., Elasri-Ejjaberi, A., Cortadas, P.: Experiences in the use of an adaptive intelligent system to enhance online learners’ performance: a case study in Economics and Business courses. Int. J. Educ. Technol. High. Educ. 18(1), 1–27 (2021). https://doi.org/10.1186/s41239-021-00271-0
AI-Based Adaptive Learning - State of the Art
167
43. Missaoui, S., Maalel, A.: Student’s profile modeling in an adaptive gamified learning environment. Educ. Inf. Technol. 26(5), 6367–6381 (2021). https://doi.org/10.1007/s10639-02110628-7 44. Gómez, S., Zervas, P., Sampson, D.G., Fabregat, R.: Context-aware adaptive and personalized mobile learning delivery supported by UoLmP. J. King Saud Univ. Comput. Inform. Sci. 26(1), 47–61 (2014). https://doi.org/10.1016/j.jksuci.2013.10.008 45. Jia-Jiunn, L., et al.: Designing an adaptive web-based learning system based on students’ cognitive styles identified online. Comput. Educ. 58(1), 209–22 (2012). https://doi.org/10. 1016/j.compedu.2011.08.018 46. Arsovic, B., Stefanovic, N.: E-learning based on the adaptive learning model: case study in Serbia. S¯adhan¯a 45(1), 1–13 (2020). https://doi.org/10.1007/s12046-020-01499-8 47. Chih-Yueh, C., et al.: A negotiation-based adaptive learning system for regulating helpseeking behaviors. Comput. Educ. 126, 115–28 (2018). https://doi.org/10.1016/j.compedu. 2018.07.010 48. Singh, A., Thakur, N., Sharma, A.: A review of supervised machine learning algorithms. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1310–1315 (2016) 49. Banik, L., Bhuiyan, M., Jahan, A.: Personalized learning materials for children with special needs using machine learning. In: 2015 Internet Technologies and Applications (ITA) (2015).https://doi.org/10.1109/itecha.2015.7317390 50. Hare, R., Tang, Y., Cui, W., Liang, J.: Optimize student learning via random forest-based adaptive narrative game. In: 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), pp. 792–797 (2020). https://doi.org/10.1109/CASE48305.2020. 9217020 51. Sivakumar, S., Venkataraman, S., Gombiro, C. : A user-intelligent adaptive learning model for learning management system using data mining and artificial intelligence (2015) 52. Elizabeth, R.P., et al.: Researching the psychology of cognitive style and learning style: is there really a future? Learn. Individ. Differen. 19(4), 518–23 (2009). https://doi.org/10.1016/ j.lindif.2009.06.003
A New Predictive Analytics Model to Assess the Employability of Academic Careers, Based on Genetic Algorithms Abderrahim El Yessefi(B) , Soumaya Elmamoune, Loubna Cherrat, Sarah Khrouch, Mohammed Rida Ech-Charrat, and Mostafa Ezziyyani Laboratory of Mathematics and Applications, Faculty of Science and Technology, University Abdelmalek Essaadi, BP 416, Tangier, Morocco [email protected]
Abstract. With the rapid development of society, the employability of university careers, as an important part of the Moroccan economy, shows an upward trend in government priorities, especially with a series of social contradictions and causes social unrest in severe cases. It’s about that protester who struggles with finding and keeping decent work, moving up in the company or changing jobs, and adapting to changes in technology and market conditions. It is therefore urgent to improve and optimize the employability of academic career models. In this perspective, this research allows the modeling of a decision support system to predict the employability of university careers based on relevant artificial intelligence algorithms. Our study takes into consideration some important factors regarding career in higher education and professional career that enhance an individual’s ability to take advantage of employment opportunities. The corresponding model was established and the data was obtained by a survey. The proposed model serves to help the student to choose the best academic career adapted to his profile and which has a high rate of employability, knowing that the index of success here is the direct insertion into the labor market after Graduation. The genetic data processing showed that the Economics/Management course offers a higher rate of employability in comparison with other sectors, on the other hand, the chemistry and biology sectors show a poorer rate of employability. Keywords: IA · Genetic Algorithm · Data analytics · Predictive · Employability · Academic Careers · Decision support systems
1 Introduction Employability is a multidimensional concept, which can be defined according to several points of view. After a search of literature, we can conclude that the employability of a person is its ability to be employed, in this sense we can discuss the skills and knowledge. From an academic point of view, the employability of a course or career is the ability to train people who are easy to employ, and from a business point of view, it will be the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 168–180, 2023. https://doi.org/10.1007/978-3-031-26384-2_15
A New Predictive Analytics Model to Assess the Employability
169
company’s ability to hire employees, it is the company’s participation in the creation of job opportunities. From where this term takes several positions. In our work, we will focus on the employability of university career, in order to allow the decision maker to implement the paths that can generate more opportunities for insertion in the labor market in the future. In this paper, we will propose an approach to assess the employability of academic carriers, based on algorithm genetic processing. 1.1 Related Works According to the research that I carried out in the various databases of scientific publications (Springer Link, Science Direct,…) There have been several studies related to our problematic, that discuss the employability in different perspectives. Among these works, we can mention: Saouabi M. and Ezzati A. [1] they present a data mining approach to predict employability in Morocco, in a form of graphs, with phases, explaining every phase and what it includes in details. This paper discusses the same problematic of our paper, but it proposes a different approach based on graphs. A. Dubey and M. Mani [2] using a supervised machine learning model, they went to predict the employability of high school students with local businesses for part-time jobs. The results show that it is possible to predict the employability of high school students with local businesses with high-predictive accuracies. The trained predictive models perform better with larger dataset. Yan Zixiang [3] presented an intelligent comprehensive assessment of employability of art design students. He utilizes the environmental perception and autonomous behavior decision-making ability of the nonlinear regression algorithm (NRA), and through the experimental analysis, this paper shows that the proposed method can be used as the new thought of the teaching reform of art design students in colleges and universities under the ever-changing social development, in an employment-oriented way cultivate students’ professional skills, and improve the evaluation system of students’ employability. Zou xiaoling and Ye Long [4] publish a paper which studies the competency-based employability indicators of college students, using self-questionnaire to Sampling investigate 2010 session graduates in Beijing. They use the statistical software SPSSl8.0 exploratory factor analysis, to extract the five common factors, combined with interviews with the situation, and ultimately build the structure dimensions of employability of college students, trying to work and employment for the college students employed to provide a smooth draw ideas and practical significance. The most of those works discussed the employability from a student view by analyzing the student competences and abilities, except the work of Saouabi M. and Ezzati A. which focused to predict the employability rate in Morocco. This is why we propose another approach to predict the employability rate by implementing the famous genetic algorithm.
170
A. El Yessefi et al.
2 Proposed Work 2.1 Approach Design In the first part, we will collect data about the university career of the population, as well as the professional career taking into account the duration of unemployment, the adequacy of the training with the profession, the remuneration and other clues. In the second part, we will encode the collected data to implement the genetic algorithms. Then, we work on the application of genetic algorithm techniques, mainly mutation and crossover, to create and propose better models/careers in terms of employability. We can simplify our work by the following diagram (Fig. 1):
Data Collection
Survey
Data encoding
Data Set
Genetic Algorithm
Importation of DataSet
Genetic processing of data Fitness, Selection, Mutation, Crossover Optimal model/career objective: the optimal career is the most adequate
Fig. 1. Diagram of genetic algorithm process
2.2 Identification of the Studied Sample This study is applied to a sample of 100 students and graduates from Abdelmalek Essaadi University. The choice of this sample has two reasons, the first is that this study is done within the university itself, the second because the university Abdelmalek Essaadi contains several establishments of different technical, scientific and literary disciplines. From where one can obtain a homogeneous sample on macro-view (the students, make their studies under almost the same conditions), and heterogeneous in micro-view because of the diversity of the specialties offered by the various establishments.
A New Predictive Analytics Model to Assess the Employability
171
2.3 Data Collection by Survey We have built the survey in a very precise way, which focuses on 2 components, university education, and professional career. We targeted the graduates of the Abdelmalek Essaadi University. The survey asks for the sex and age of the participant, the year in which the baccalaureate was obtained, the university training specialty, the year in which the Bac+5 diploma was obtained, the type of this diploma as well as the mention obtained. Then, the participant is invited to give information concerning his professional status, the date of his first employment, the position held, the number of years of experience, the salary, etc. 2.4 Chosen Population After data collection, we divided the Dataset into 3 categories, according to the diploma obtained, Bachelor’s category, Master’s category, and Doctorate category, these are the 3 populations of our generation studied. So, we chose the Master population for the following reasons: – It features most of the Survey participants – Obtaining a master’s degree gives the graduate a better chance of finding a job – The specialties in the master’s cycle are more precise, hence the analysis of this population will allow us to draw more concrete results about the specific university careers
3 Implementing Genetic Algorithm Genetic algorithm is an evolutionary algorithm and can be understood very easily as one derived from Darwin’s Theory, following the rule of—survival of the fittest [5]. To implement genetic algorithms, we firstly need to define the necessary elements of the genetic concept (Generation, Population, Individual, Chromosome and gene) and thus the processing and reproduction methods (mutation, crossover, etc.) We will consider the following assignments: Generation: All collected profiles. Population: the populations will be the groups of the profiles according to their higher education type (Bachelor, Master, Doctorate). Individual: The profile, each profile is defined by several chromosomes. Chromosome: Consider 3 main chromosomes, person chromosome (sex, age), Education chromosome, and professional chromosome. Gene: genes are the elements of each chromosome (Fig. 2).
172
A. El Yessefi et al.
Bachelor
Master
Doctorate
Individuals
Populations
Generation All Participants
Chromosomes
Each individual represented by 3 chromosomes
Chromosome Person
Chromosome Education
Chromosome Employment
Genes
Each chromosome contain many genes
G1: Genre G2: Age
G3: Type diploma G4: Type of establishment G5: Number of years to obtain Bac+5 G6: Specialty diploma G7: Mention of diploma
G8: Number of Change of specialty G9: Employment status G10: Number of years of unemployment G11: Number of years of experience G12: Salary G13: Job specialty
Fig. 2. The schema of genetic algorithm implementation
A New Predictive Analytics Model to Assess the Employability
173
3.1 Data Encoding Encoding is a process in which a chromosome is represented in a suitable format such that operations can be performed on the chromosome. Most often, this representation is in a string format, for easier readability and for operations to take place in an easily understandable format [5]. In our case we will use a binary encoding for each gene, then we will adapt some decoding function to make data more readable. The table below contains all the properties (genes) of each chromosome, as well as the proposed binary coding: Chromosome Person: contains 2 main genes, – Gene sex, which will take 2 values, 0 if female, 1 if male – Gene Age: The age value is categorized to ensure some kind of normalization, so we will code 4 age groups, since all participants are between 20 and 40
Encoding of Experience and Salary The salaries declared by the participants are between 4000 and 25000 MAD, so we will categorize the salaries according to the level of experience, starting with the value 001. And even for the number of years of experience, we will divide them into 5 categories, with the same encoding as the salary categories. This will allow us to calculate the salaryexperience distance, in the normal case the distance must be 0, if the Salary-Experience distance is negative it means that the individual has not the salary consistent with his experience, if the distance is positive, it means that the individual is progressing well in his career. Encoding of Education Specialty and Job Specialty To be able to compare the specialty of high education with the specialty of the position occupied by the individual, we will encode all the existing specialties in the responses of the participants, and will draw up a list of the specialties of the jobs. In order to measure the distance between education specialty and employment specialty. The specialties on each list will be ordered in a way where the specialties that have common content or paths will be closer. For example, the telecom engineering specialty is closer to the computer engineering specialty than civil engineering. To ensure good consistency in terms of coding, we have replaced a few attributes. For example, we have replaced the two attributes “year of obtaining the Bac” and “year of obtaining the diploma” by “Duration of training = year of obtaining the diploma year of obtaining the Bac”, the same thing for “Number of years unemployed = Date of first job - Date of graduation”. So, we propose this encoding of our data (Fig. 3):
174
A. El Yessefi et al. Chromosome Person
Genes G1 Sex G2
Age
G3
Diploma type
G4
Type of establishment Duration of study Bac+5
G5
Education G6
Diploma specialty
G7
Mention of diploma
G9
Number of Specialty Changes Employment status
G10
Years of unemployment
G11
Experience Category
G12
Salary Category
G13
job specialty
G8
Employement
Encoding 0: F | 1: M 00: 20 - 24 | 01: 25 - 29 10: 30 - 34 | 11: 35 - 40 0: Master 1: Engineering cycle 0: Faculty | 1: School 0101: 5 | 0110: 6 0111: 7 | 1000: 8 000: Law 001: Economics/Management 010: IT 011: Telecommunication 100: Electronics 101: Industrial 110: Civil Engineering 111: Chemistry 00: Fair | 01: Pretty Good 10: Good | 11: Very Good 00: No change | 01: 1 time 10: 2 times | 11: 3 times 0: Looking for a job | 1: Employee 00: 0 year | 01: 1 year 10: 2 years | 11: 3 years 001: Beginner | 010: Junior 011: J.Confirmed | 100: Senior 010: S. Confirmed 001: Beginner | 010: Junior 011: J.Confirmed |100: Senior 010: S. Confirmed 000: Law 001: Economics/Management 010: IT 011: Telecommunication 100: Electronics 101: Industrial 110: Civil Engineering 111: Chemistry
Fig. 3. The table of encoding data
So, each participant (individual) will be presented as a suite of binary values (Fig. 4) Chromosome1
individual1 1
00 0
Male
Master
Age:22
Chromosome 2
1
Chromosome 3
............
Salary= Fm , this gave us a population of 47 individuals, having an average fitness Fm_new = 0.2, on which we are going to apply the processes of mutation and crossover. 3.5 Mutation Mutation is a genetic process based on the change (mutation) of one or more genes to produce a fittest individual. The random mutation of G10 in the entire selected population, raises the mean fitness of the child population produced up to Fmutation = 0.22 (Fig. 5).
A New Predictive Analytics Model to Assess the Employability
177
Parent1 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
00
1
1
0110
010
01
01
1
1
011
010
011
Child1 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
00
1
1
0110
010
01
01
1
0
011
010
011
1
Mutation
Fig. 5. Mutation process
3.6 Crossover Method One-Point The application of the crossover with the one-point method gave a very good result, because the average fitness amounts to 0.24, and the best individual has a fitness Fmax = 1.32 (Fig. 6). Parent 1 g1
g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
00
1
1
0110
010
01
01
1
1
011
010
011
One-Point
Parent 2 g1 g2 g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
0
10
1
1
0101
001
01
01
1
0
100
010
001
Child1 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
00
1
1
0101
010
01
01
1
0
100
010
001
Child2 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
0
1
1
0101
001
01
01
1
0
011
010
011
10
Fig. 6. Crossover process with method one-point
Method Two-Points The crossover of two parents by the two-points method allowed us to produce 6 children, different from the parents. This method gave us the same fitness average (0.24), but the best individual has a fitness Fmax = 1.4. Below, we will mount the first 4 children (Fig. 7):
178
A. El Yessefi et al.
Parent1 g1
g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
00
1
1
0110
010
01
01
1
1
011
010
011
Parent2 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
11
0
1
0001
010
10
01
0
1
100
001
010
Child1 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
00
1
1
0110
010
10
01
0
1
011
010
011
Child2 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
11
0
1
0001
010
01
01
1
1
100
001
010
Child3 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
11
0
1
0001
010
01
01
1
1
011
010
011
Child4 g1 g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
g13
1
0
1
0001
010
10
01
0
1
011
010
011
11
Two-points
Fig. 7. Crossover process with method two-points
4 Results and Discussions At the algorithm process level, the comparison of the results obtained by the mutation and the crossover, shows that the individuals obtained by the crossover are better than those obtained by mutation, as indicated in the following table (Fig. 8) In general, the result obtained shows very clearly that the Economics/Management career has the highest employability rate, knowing that the best individuals in terms of fitness are graduates in the Economics/Management field. This result can be justified by the large percentage of Economics profiles in the population. But the results also show that the chemistry option has a very low employability rate. The negative value of the average DSE shows that most of the participants do not receive the appropriate salary compared to the number of years of experience.
A New Predictive Analytics Model to Assess the Employability
Mutation
179
Crossover
Individuals
Fitness
Individuals
One-point Fitness
Two-points Fitness
Child 1
1,12
Child 1
1,32
1,40833
Child 2
0,92
Child 2
0,92
0,92
Child 3
0,62
Child 3
0,81111
0,81666
Child 4
0,72
Child 4
0,61666
0,72
Child 5
0,5125
Child 5
0,52
0,62
Child 6
0,32
Child 6
0,51428
0,61111
Child 7
0,31666
Child 7
0,5125
0,60555
Child 8
0,31428
Child 8
0,42
0,31666
Child 9
0,41666
Child 9
0,42
0,31428
Child 10
0,21666
Child 10
0,12
0,21666
Fig. 8. The table or results comparison
5 Future Work In our next work, we will continue to improve the approach proposed in this paper, by in-depth analysis of the data to determine the parameters of the fitness function more precisely, especially the factors of importance. In addition, we will make more effort to collect a significant amount of data, so that the results become more credible. During the implementation of the genetic algorithm and its applications, we found that independence management is essential for this model to achieve the desired objective, so that the applied operations (mutation and crossover) give more realistic predictions.
6 Conclusion The results obtained -despite the need for precision- show the importance of the genetic approach in the treatment of this type of problem, because the genetic algorithm is always
180
A. El Yessefi et al.
able to give better/fittest populations by reproducing new individuals using the process of crossover and even by mutation, eliminating the bad individuals by the process of selection. Responding to our main problem, the genetic data processing showed that the Economics/Management career offers a higher rate of employability in comparison with other sectors. On the other hand, the chemistry and biology sectors show a reduced rate of employability. Improving this model will be our next mission to develop an expert decision support system based on machine learning by exploiting the advantages of genetic algorithms. Acknowledgement. We would like to thank the committee of the International Conference on Advanced Intelligent Systems for Sustainable Development (AI2SD’2022) applied to Education, Agriculture, Energy, Health, and Environment for the opportunity that it offers to participate in this great scientific conference, which opened -for us- the world of scientific research and enabled us to take another step in its path.
Annex: The survey. Question 1: How old are you? Question 2: What is your gender? Question 3: What is your current status? (Student, unemployed, Employed) Question 4: The date of obtaining the Baccalaureate Question 5: Type of higher education institution: School, Faculty Question 6: Degree obtained: Bachelor’s, Master’s, Doctorate Question 7: Type of your Bac+5 degree? Question 8: Bac+5 diploma specialty Question 9: Date of obtaining your Bac+5 diploma Question 10: Mention of Bac+5 diploma Question 11: How many times have you changed your specialty? Question 12: Date of your first job Question 13: Specialty of your first job Question 14: Specialty of your current job Question 15: Your current salary
References 1. Saouabi, M., Ezzati, A.: Data mining approach for employability prediction in Morocco. In: Proceedings of ESAI, p 723. Fez, Morocco (2019). https://doi.org/10.1007/978-981-15-0947-6 2. Dubey, A., Mani, M.: Using Machine Learning to Predict High School Student Employability – A Case Study, 2019 IEEE (2019). https://doi.org/10.1109/DSAA.2019.00078 3. Yan, Z.: Intelligent comprehensive assessment of employability of art design students relying on Nonlinear Regression Algorithm (NRA). In: 2018 International Conference on Virtual Reality and Intelligent Systems (2018) https://doi.org/10.1109/ICVRIS.2018.00053 4. Zou, X., Ye, L.: Study on Structure Dimensions of Ability to Work for University Graduates based on the Employability. IEEE (2011) 5. Khatwani, S., Arya, A.: A novel framework for envisaging a learner’s performance using decision trees and genetic algorithm. In: International Conference on Computer Communication and Informatics (ICCCI-2013) (2013)
New Approach for Anomaly Detection and Prevention Chliah Hanane(B) and Battou Amal IRF-SIC Laboratory/Agadir, Agadir, Morocco [email protected], [email protected]
Abstract. Cyber-attacks are increasing in every aspect of daily life. There are a number of different technologies around to tackle cyber-attacks, such as Intrusion Detection Systems (IDS), Intrusion Prevention Systems (IPS), firewalls, switches, routers, etc. which are active round the clock. These systems generate alerts and prevent cyber-attacks. In this paper, we have proposed a DP_ML intrusion detection prevention solution using machine learning, to find attack patterns. This solution is a part of the project to implement a complete solution of intrusion detection and prevention. To achieve this goal, we have utilized a Netflow to collect stream data. The data is analyzed by using ElasticSearch technology namely an ELK (ElasticSearch, Logstash, and Kibana) stack. At the heart of the engine is machine learning implemented. Keywords: Machine Learning · Stream Processing · Intrusion Detection · Intrusion Prevention
1 Introduction The development of technology becomes more and more dynamic and the amount of processed and stored data is increasing dramatically. In many situations, data such as network traffic or telephone connections needs to be verified for suspicious patterns on an ongoing basis - to provide proper recognition and detection. Migrating your applications and services to a scalable system, as the cloud environment will boost your performance, but will also increase your product’s attack surface, generated by the infrastructure’s vulnerabilities or by other applications running on the same server. To prevent such scenarios powerful and costly equipment is necessary or software capable of detecting and preventing threats in real-time.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 181–188, 2023. https://doi.org/10.1007/978-3-031-26384-2_16
182
C. Hanane and B. Amal
However, while new systems depending machine-learning techniques for analysis in cybersecurity data has flourished. The big challenges is the rights amount of data. The advances of Big Data ecosystem in machine learning and data mining techniques introduce new possibility in recognizing various types of patterns in cybersecurity [1], in building the applications using machine learning algorithms [2] and many more. The main contribution of this work is to propose the architecture DP_ML Detection Prevention Machine Learning, using ELK (ElasticSearch, Logstash, Kibana) stack which Elasticsearch as database and Kibana for visualization. In this document, we present part of the complete project solution for implementing a complete intrusion prevention and detection solution. The article is structured as follows: session 2, presents an overview of IDS using Elasticsearch and machine learning. The session 3 describes the methodology and proposition of the model in this work. The session 4 provides discuss of the preliminary results of this work. Finally, a summary and an outlook for future works are presented in conclusion.
2 Background and Related Work An IDS is a system for identifying encroachments on an organization throughout the transmission of information. An IDS is used to identify suspicious movements by observing the organization’s traffic and modifying certain parts of the organization in case of a problem. It is monitors network as well as to safeguard assets and caution against huge dangers. Various instances of guarded solutions for protection can be found in the scientifique literatures. The authors of [3] propose an original framework CAMLPAD that adresses these two issues for network protection autonomous machine learning platform for anomaly detection in real time using Elasticsearch. The authors adapted well-known algorithms such as: Isolation Forest, Histogram-Based Outlier Score (HBOS), Cluster Based Local Outlier Factor (CBLOF), and K-Means Clustring, to deal with the information. The anomalies are visualized with Kibana and are allocated an anomaly score. The CAMLPAD platform obtained an adjusted score of 95%. Another example of IDS solution is the system presented in the article [4]. The assume that the purpose of their work is to build an IDS which integrate approach of both techniques, network or host anomaly based IDS for cloud computing to detect intrusion. For this reason, they use Suricata and Snort as open source devices with the proposed DDoS detection rule, to make the working of producing hybrid IDS with high alert and more effective.
New Approach for Anomaly Detection and Prevention
183
According to [5], they provided an answer for malicious attack mitigation by bundling a machine learning algorithm, SVM, into an intrusion detection system and calculating the plausibility of incrementally processing US ADS-B messages. The authors used in this solution, a few devices. OpenSky was utilized as the data source for Automatic Dependent Surveillance-Broadcast (ADS-B) messages, NiFi was used for data management, Elasticsearch was employed as the log parser, Kibana was used for visualization of data for feature selection, and Support Vector Machine (SVM) was applied for classification. The proposed solution outperforms the SVM with precision and recall close to 80%. The authors, in [6], the authors proposed model for combination of machine learning and deep learning techniques, with the objective to propose a model which is shrewd and ready to distinguish huge of Net Flow traffic. The model was involved Non Symmetric Deep auto Encoder as Feature Extractor and SVM algorithm as classifier. The exploratory outcomes give undeniable degrees of accuracy, precision and recall with reduced training time. According to [7], the authors propose a SCADA intrusion detection system test framework that emulates SCADA traffic and detects malicious network activities. The framework provides realistic SCADA traffic by combining various components such as Kali Linux, Conpot, QTester 104 and Open MUC in a virtual machine.
3 Approach Proposed In this section, we discuss and explain each component and its role in the DP_ML architecture. As displayed in Fig. 1, the proposed solution is a tool that offers a large number of modular solutions to build a sophisticated machine-learning environment. The proposed solution start with data delivery. The network flow is collected by Netflow protocol, which is a protocol for collecting, aggregating and recording network traffic flow data. It was developed by Cisco and is integrated into Cisco’s IOS software on the company’s routers and switches. Many other hardware manufacturers support NetFlow or use alternative flow technologies, such as sFlow or IPFIX. Netflow uses UDP/SCTP protocols to transfer data from routers to special collectors. According to protocol description, all packets with the same source/destination IP address, source/destination ports, protocol interface and class of service are grouped into a flow and then packets and bytes are tailed. Then these flows are bundled together and transported to the Netflow collector server [8]. The whole environment is integrated into the Elasticsearch database, and each processed data is sent to different indexes. ELK is an assortment of three completely open source items, which all are created, kept up with and oversaw by an organization called Elastic. ElasticSearch, which is used as a database for storing logs. LogStash, which is a server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends
184
C. Hanane and B. Amal
it to the database. Kibana, which acts as the Elastic Stack window. It is a visualization tool (a web interface) and allows you to visualize the data stored in Elasticsearch [9]. Utilizing this arrangement gives the ability to view, efficiently search and browse records using the Kibana devices. During implementing the mechanisms for collecting and transmitting Netflow frames to a specific environment ELK stack, the next important step is to prepare the data for feature extraction and machine learning algorithms. After the development of the solution, one of the aspect that had the most importance was the prevention of the malicious traffic. In order to provide the solution with performance, each malicious traffic must transferred to NMS(Network Management System), which is an application or set of utilizations that permits network heads to oversee free parts of an organization inside a bigger organization the board structure and fills a few key roles. All these components constitute the basic environment of the proposed solution. The architecture of the proposed solution is shown in Fig. 1. The algorithm of the approach is presented in Algorithm 1.
Fig. 1. Architecture of proposed solution DP_ML
Algorithm 1: Algorithm for anomaly detection prevention process
New Approach for Anomaly Detection and Prevention
185
186
C. Hanane and B. Amal
4 Implementation and Results In this work, we present a part of the complete project to set up an intrusion detection and prevention solution. Our main objective of the DP_ML solution is to detect and prevent threats from the data flow. With the data collected by Netflow, the build in module of machine learning jobs from ELK stack can be used to detect basic vulnerabilities or anomalies. The algorithms can identify abnormal behavior in statistical sparseness and anomalies related to temporal deviations in different metrics, namely counts, frequencies, or values. Examples that describe this workflow are available here [10], along with the process for defining and executing them in the ELK stack. ELK stack provides beats modules for Kibana’s canvas functionality, in which we will define a set of queries and their corresponding representation (histogram, graph, chart, etc.). In our implementation, this is a functionality to monitor the data flow and visualize them on Kibana, it can be shown in Fig. 2 and 3. The proposed architecture is first tested on three operating systems in order to finalize the general implementation of a complete solution whose objective is to detect and prevent threats automatically in real time.
Fig. 2. Visualization 1 of anomaly detection in network traffic
Fig. 3. Visualization 2 of anomaly in network traffic
New Approach for Anomaly Detection and Prevention
187
5 Discussion All components used in this paper are widely utilized in numerous intrusion detection projects, and one cannot be made a decision about better than the others. The context and the nature of data are very important when selecting flow data, the more traffic the more visibility of the results. As far as the results are concerned, the anomalies are detected, but they are not detected in all the traffic, since the working environment is only tested on three machines. With very limited data, the DL_ML solution shows results far from reality, which can be explained by the fact that the power of these methods lies in the use of the largest environment and more connected machines in the network.
6 Conclusion DP_ML is a solution of the architecture proposed in this document. This solution is part of the general solution of a real-time intrusion detection and prevention system that is under development. The solution utilize the NetFlow protocol, which has various benefits, like decreasing processing load to perform network intrusion detection. NetFlow also allows full inclusion of the network packets. The solution is equipped to make the flow collected by Netflow into the ELK stack, which allows immediate accessibility and analysis of enormous volumes of data, and implementation of machine-learning algorithms. One more benefit of ELK stack is the consistent integration with Kibana, which is an excellent visualization tool. Further work would focus on improving that research by developing live streams in the area of network training. Moreover, further improvement will integrate and expand the concepts from the domain of machine learning.
References 1. Edgar, T.W., Manz, D.O.: Science and Cyber Security. In: Research Methods for Cyber Security, pp. 33–62. Elsevier (2017). https://doi.org/10.1016/B978-0-12-805349-2.00002-9 2. Dixit, P., Silakari, S.: Deep learning algorithms for cybersecurity applications: a technological and status review. Comput. Sci. Rev. 39, 100317 (2021). https://doi.org/10.1016/j.cosrev.2020. 100317 3. Hariharan, A., Gupta, A., Pal, T.: CAMLPAD: Cybersecurity Autonomous Machine Learning Platform for Anomaly Detection, ArXiv190710442 Cs, juill. 2019, Consulté le: 3 mars 2022. [En ligne]. Disponible sur: http://arxiv.org/abs/1907.10442 4. Pareta, P., Rai, M., Gangwar, M.: An integrated approach for effective intrusion detection with elasticsearch. Int. J. Sci. Res. Comput. Sci. Eng. 6(3), 13–17 (2018). https://doi.org/10. 26438/ijsrcse/v6i3.1317 5. Mink, D.M., et al.: Near-real-time IDS for the U.S. FAA’s NextGen ADS-B. Big Data Cogn. Comput. 5(2), 27 (2021). https://doi.org/10.3390/bdcc5020027 6. Nandurdikar, Bhakti, Mahajan, Rupesh: Intelligent and effective intrusion detection system using machine learning algorithm. Int. J. Eng. Adv. Technol. 9(6), 237–240 (2020). https:// doi.org/10.35940/ijeat.F1231.089620
188
C. Hanane and B. Amal
7. Waagsnes, H., Ulltveit-Moe, N.: Intrusion detection system test framework for SCADA systems. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, pp. 275–285. Funchal, Madeira, Portugal (2018). https://doi.org/10.5220/000 6588202750285 8. Cisco IOS NetFlow, Cisco. https://www.cisco.com/c/en/us/products/ios-nx-os-software/iosnetflow/index.html (consulté le 17 mars 2022) 9. What is Elasticsearch? | Elasticsearch Guide [8.1] | Elastic. https://www.elastic.co/guide/en/ elasticsearch/reference/current/elasticsearch-intro.html (consulté le 18 mars 2022) 10. Kuc, R., Rogozinski, M.: Elasticsearch Server: Second Edition Ed. 2. Packt Publishing, 2014. Consulté le: 20 mars 2022. [En ligne]. Disponible sur: https://univ.scholarvox.com/book/888 51046
FunLexia: An Intelligent Game for Children with Dyslexia to Learn Arabic Fatimaezzahra Benmarrakchi(B) School of Collective Intelligence, Université Mohammad VI Polytechnique, Rabat, Morocco [email protected]
Abstract. Learning games have been widely used in international educational programs, however, there is a scarcity of educational games that improve the learning abilities of children with disabilities especially in the Moroccan regions. This paper presents the design of an educational game called ‘FunLexia’ for children with dyslexia. The game has two main characteristics, its design is based on the basis of the linguistic analysis of spelling errors in Arabic made by children with dyslexia and an intelligent character named ‘Lexia’ that accompanies the child in their learning journey and provides real-time feedback to guide the child toward the correct behavior without diminishing the challenge of the game. Keywords: Dyslexia · Information and Communication Technology (ICT) · Arabic · Intelligent game · Instructional Design · Emotion Recognition · Real-time Feedback
1 Introduction Dyslexia is a neurological disability that disrupts a person’s language development and functioning. It is a disorder manifested by the difficulty in learning to read, despite conventional instruction, adequate intelligence and sociocultural opportunity [1]. Recently, there has been increasing interest in the use of Information and Communication Technology (ICT) to support students with learning disabilities [2, 3]. As stated by [4] the use of ICT is beneficial for learners with dyslexia, and awareness supportive for their families and teachers. However, most of these studies have focused on Latin languages, such as English. Researchers have not treated learning disabilities in no-Latin languages such as Arabic in much detail. Little is known about dyslexia in the Arab world and it is not clear what factors cause different manifestations of dyslexia in Arabic. The main goal of this work is to develop a methodology for instructional designers to determine the level of accessibility of Arabic content for individuals with learning disabilities such as dyslexia.
2 Learning Games for Children with Dyslexia Providing adaptive support in learning environments is necessary but challenging because of the diverse learning needs. Particularly, children with learning disabilities © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 189–195, 2023. https://doi.org/10.1007/978-3-031-26384-2_17
190
F. Benmarrakchi
demand special support and personalized intervention to develop their abilities in which they have weaknesses and to enable them to find alternative solutions. Digital games can be used as a supportive tool, as stated by [5] digital games are one of the best ways to engage children in learning. Educational games have a significant role in improving children attitudes and knowledge. Studies that focus on this research area can be found in the work of [6] who proposed a set of games in Arabic for children with dyslexia and dysgraphia. Or the work of [7] about AraDaisy, a system for automatic generation of Arabic DAISY books designed for people with print disabilities which is a difficulty or inability to read printed material due to a perceptual, physical or visual disability. The researchers proposed a framework for an Arabic digital talking books that uses DAISY format. The proposed system includes an image-to-text converter, context-injector, text-to-audio-generator, and a DAISY generator. Another example is LexiPal [8] an application for children with dyslexia, that incorporates kinesthetic activity in multisensory implementation by using natural user interface.
3 Game Design 3.1 Funlexia: An Overview FunLexia is an educational game designed especially for children with dyslexia to learn to read Arabic. The theoretical foundations of this game are based on previous studies and research theories such as the analysis of spelling errors in Arabic made by children with dyslexia [9–12]. The first version of the game was developed in 2017 [1], then it was evaluated using a heuristic evaluation and interview with specialists in the field of special education and with children with dyslexia [13]. The heuristic evaluation is a usability engineering method for finding the usability problems in a user interface design and it proposes 10 general principles for interaction design (e.g., visibility of, system status, user control and freedom, error prevention). The current version of this game was developed based on the results of the heuristic evaluation and feedbacks from specialist. The game is constructed around the theme of Treasure Island; the player is asked to click on the map marker icon in each island to start an activity. When the child finishes the activity, the boat moves to the next island until the end of the game. In total, the game contains pretest, posttest and three major activities that includes a set of different tasks with specific aims and predefined pedagogical objectives: such as recognizing letters, structure of words and identifying letter/word shape. These activities are ‘I listen and I build’, ‘I build and I complete’ and ‘I listen and I complete’. The activities of the game are described below in details. The design of the educational content of this current version of the game is based on the analysis of spelling errors made by children with dyslexia in Arabic [9–11]. This current version tends to offer an interactive learning environment for children with dyslexia to develop their abilities in which they have weaknesses and motivates and allows them to learn while having fun. Figure 1 below presents the storyboard of the game. One important characteristic of the design is that it tends to ensure that the learner masters the prerequisites before starting a new learning situation. Also, it offers the remediation activities in order to help children surpass their learning difficulties. The
FunLexia: An Intelligent Game for Children with Dyslexia
191
learning model is composed of three parts: Entry System, Learning System and Exit System. Entry System includes three steps, the first one is displaying the objectives of the game to inform the learner about what the game offers, a Pretest on the objectives that system aims to achieve, and an Entry test on the prerequisites to enter the Learning System. Concerning the Pretest, there are three cases after taking the test, successful which means that all objectives are fully mastered by the learner so s/he will be oriented towards another unit, partially successful where only some objectives are mastered so the learner will be oriented to the unit corresponding to the objectives not mastered, and unsuccessful where the learner will be have to follow the model in its entirety. The entrance test or the control of the prerequisites, the term prerequisite is used to describe the skills that the learner is expected to master in order to take a course and derive maximum benefit from it. Therefore, the entrance test makes it possible to ensure that the learner has the cognitive means to follow a model. Learning System is divided into learning units, each of these units is structured as an autonomous entity aimed at a specific goal. Entry into each of these units is decided: either according to the information provided by the entry and exit systems or considering the units already mastered within the current module as shown in Fig. 2.
Fig. 1. The storyboard of the game FunLexia
Fig. 2. The learning system of our proposed game FunLexia
Exit System, main function is to test whether the learner has mastered the objectives of the game and orient the learner at the end of the learning. This test is called the
192
F. Benmarrakchi
Posttest, it is a test at the end of the game to ensure that the learner has acquired the skills targeted by the units. The Posttest only focuses on the skills effectively addressed by the game. Depending on the case, the orientation will consist of; either to leave the game (Posttest perfectly successful) or to direct the learner to remediation possibilities when the posttest is partially or totally missed. Another important characteristic of this current version of the game is the use of an intelligent character named ‘Lexia’ that accompanies the learners in their learning journey and provides real-time feedback. This point is described in the next section. The game’s activities are described below. • Pretest: The pretest was designed to test the degree of mastery of skills that the game proposes to acquire to target player (see Fig. 3). • First activity ‘I listen and I build’: The main objective of the first activity is to improve children with dyslexia reading skills. This part consists of five different levels (see Fig. 4). The player is asked to listen to the word and drag and drop letters in the right place to construct the word. The screen contains random letters where the player is invited to select the right letters to write the correct word. The screen contains also the picture of the word to increase attention span. This activity focuses on phonological similarities and letters’ shape similarities, for instance. • Second activity ‘I build and I complete’: The goal of this activity is improving learners’ writing skill. This part consists of five different levels. It offers an interactive educational puzzle game. The learner is invited to build the picture piece by piece. When the picture is built, s/he can notice the shape of the word and match it with the correct written word, which is chosen from three choices. Those choices contain one correct written word and two different words with the similar letters in a random order. This activity focuses especially on omitting letters, transposition, letter addition and letters’ shape similarities (see Fig. 5). • Third activity ‘I listen and I complete’: The goal of this activity is to develop learners’ short-term memory and concentration. This part consists of five different levels. The learner is invited to complete the sentence with letter from the list by trying to find the missing letter that completes each word. This activity focuses on the most important difficulties encountered by children with dyslexia in Arabic which are long and short vowel and syntactical rules. • Posttest: The posttest was designed to ensure that the learner has acquired the skills aimed in the game. To explain the educational game components, a video was used to present further instructions to follow while performing FunLexia. For the implementation of the game we used graphics freely available from the site freepik.com. 3.2 Lexia: The Intelligent Character Recently, the research has turned the attention to the emotional aspects of learning difficulties [12]. Understanding emotions from facial expressions is essential to comprehend the intent of students so that we can provide the appropriate feedback. Real-time feedback
FunLexia: An Intelligent Game for Children with Dyslexia
193
Fig. 3. Screen shot of the pretest
Fig. 4. Screen shot of the first activity
is a very powerful tool during learning and assessment procedures, research studies have shown the positive impact of feedback on students’ learning achievements. As stated by [14], the presence of feedback gives learners an opportunity not just to evaluate the progress they have made, but also to improve their self-regulation. A recent research [15] highlighted the importance of real-time feedback in virtual reality confirming that feedback is essential for effective skill acquisition and must be both timely and contextually relevant.
Fig. 5. Screen shot of the second activity
The game allows for an intelligent character named Lexia to accompany learners in their learning journey. The game detects children’s emotions through their facial expressions by analyzing seven basic facial emotion expressions (angry, disgust, fear, happy, sad, surprise and neutral) while playing the game. Therefore, the learners are
194
F. Benmarrakchi
engaged in an interactive environment with the game which motivates them and allows them to improve their abilities and surpass their weaknesses. Lexia (a monkey character) changes its facial expressions in real-time taking one of the seven basic emotions. The emotion system is launched when the game starts. When a negative emotion expressed by the player is detected (e.g., sad, fear) Lexia provides a positive feedback to encourage the player and when a positive emotion expressed by the player is detected (e.g., happy) Lexia provides a positive feedback. For instance, if the system recognizes happy emotion then, the game sends Lexia with a happy face and a positive feedback ‘Keep going the good work’, or if the system recognizes sad emotion then, the game sends Lexia with an unhappy face and a positive feedback ‘Don’t’ give up! I know you can do it’. After a change of the emotional state Lexia uses one of the provided feedbacks to ether encourage or congratulate the learner. When the emotion recognition system detects the negative emotion for a long time, Lexia then provides hints for the learner to help. Lexia accompanies the learners and unlike a teacher in class full of students where s/he may not notice that a learner is having a problem. Lexia thanks to the emotion recognition system can detect that the learner is in fact is having problems and then take measures to help and encourage her/him.
4 Conclusion This work presents the design of an educational game called FunLexia for children with dyslexia to learn Arabic. The game offers an interactive learning environment which motivates and allows to the target children to improve their abilities and surpass their weaknesses. An important characteristic of this game is the presence of an intelligent character called Lexia that accompanies children in their learning journey and provides real-time feedback. The proposed design aims to empirically explore the implementation of facial expression recognition for an educational game to support children with dyslexia. The use of facial emotion detection can aid in understanding which emotions a learner is going through in real- time as s/he is playing and then provide the effective feedback.
References 1. Benmarrakchi, F.E., et al.: Exploring the use of the ICT in supporting dyslexic students’ preferred learning styles: a preliminary evaluation. Educ. Inf. Technol. 22(6), 2939–2957 (2017) 2. Cristani, M., Maso, S.D., Piccinin, S., Tomazzoli, C., Vedovato, M., Vender, M.: A Technology for assisting literacy development in adults with dyslexia and illiterate second language learners. In: Uskov, V.L., Howlett, R.J., Jain, L.C. (eds.) Smart Education and e-Learning 2021. KES-SEEL 2021. Smart Innovation, Systems and Technologies, vol. 240. Springer, Singapore, pp. 475-485 (2021). https://doi.org/10.1007/978-981-16-2834-4_40 3. Cano, S.R., Delgado-Benito, V., Gonçalves, V.: Educational technology based on virtual and augmented reality for students with learning disabilities: specific projects and applications. In: Emerging Advancements for Virtual and Augmented Reality in Healthcare. IGI Global, pp. 26–44 (2022)
FunLexia: An Intelligent Game for Children with Dyslexia
195
4. Kalyvioti, K., Mikropoulos, T.A.: Virtual environments and dyslexia: a literature review. procedia computer science. In: 5th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, DSAI 2013, vol. 27, pp. 138–47 (2014). https://doi.org/10.1016/j.procs.2014.02.017 5. Prensky, M.: Digital game-based learning. Comput. Entertain. 1(1), 21 (2003). https://doi. org/10.1145/950566.950596 6. El Kah, A., Lakhouaja, A.: Developing effective educative games for Arabic children primarily dyslexics. Educ. Inf. Technol. 23(6), 2911–2930 (2018). https://doi.org/10.1007/s10639-0189750-2 7. Doush, I.A., Alkhateeb, F., Albsoul, A.: AraDaisy: a system for automatic generation of Arabic DAISY books. Int. J. Comput. Appl. Technol. 55(4), 322 (2017). https://doi.org/10. 1504/IJCAT.2017.086007 8. Saputra, M.R., Alfarozi, S.A., Nugroho, K.A.: LexiPal: kinect-based application for dyslexia using multisensory approach and natural user interface. Int. J. Comput. Appl. Technol. 57(4), 334 (2018). https://doi.org/10.1504/IJCAT.2018.10014728 9. Benmarrakchi, F., El Kafi, J., Elhore, A.: Communication technology for users with specific learning disabilities. Procedia Comput. Sci. 110, 258–265 (2017) 10. Wattad, H., Abu Rabia, S.: The advantage of morphological awareness among normal and dyslexic native Arabic readers: a literature review. Read. Psychol. 41(3), 130–156 (2020) 11. Abu-Rabia, S., Sammour, R.: Spelling errors’ analysis of regular and dyslexic bilingual Arabic-English students. Open J. Mod. Linguist. 3(01), 58 (2013) 12. Ouherrou, N., Elhammoumi, O., Benmarrakchi, F., El Kafi, J.: Comparative study on emotions analysis from facial expressions in children with and without learning disabilities in virtual learning environment. Educ. Inf. Technol. 24(2), 1777–1792 (2019). https://doi.org/10.1007/ s10639-018-09852-5 13. Ouherrou, N., et al.: A heuristic evaluation of an educational game for children with dyslexia. In: 2018 IEEE 5th International Congress on Information Science and Technology (CiSt). IEEE (2018) 14. Corbalan, G., Kester, L., van Merriënboer, J.J.: Dynamic task selection: effects of feedback and learner control on efficiency and motivation. Learn. Instr. 19(6), 455–465 (2009) 15. Davaris, M., Wijewickrema, S., Zhou, Y., Piromchai, P., Bailey, J., Kennedy, G., O’Leary, S.: The importance of automated real-time performance feedback in virtual reality temporal bone surgery training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 96–109. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-23204-7_9
Artificial Neural Networks Cryptanalysis of Merkle-Hellman Knapsack Cryptosystem Hicham Tahiri Alaoui(B) , Ahmed Azouaoui, and Jamal El Kafi Computer Science Department, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco [email protected], {azouaoui,elkafi.j}@ucd.ac.ma
Abstract. Machine learning in general, and artificial neural networks in particular have been used in many fields, and especially in cryptanalysis, showing good results for the recovery attacks on lightweight ciphers especially, and as good distinguishers. In this paper, show how we applied artificial neural networks based cryptanalysis to perform a known-plaintext attack to the Merkle–Hellman knapsack public key cryptosystem. After revisiting some of the related work already done in the literature, and showing how the Merkle Hellman knapsack public key cryptosystem works, and a brief introduction of neural networks, we show how the model used, the data used and present the obtained results. Keywords: Cryptanalysis · Artificial Neural Networks · Machine Learning · Merkle–Hellman knapsack public key cryptosystem
1 Introduction Security of communications is an essential part of information security. For this reason, cryptography plays an essential role in securing the communications between different parties. To this end, the field of cryptanalysis has arised as an essential part of validating and assessing the level of strength of a cryptosystem. The increasing number of these cryptosystems yields to the necessity of more flexible and general cryptanalysis methods. Cryptanalysis is the technique of extracting plaintexts or keys or both without possessing the encryption key. Many classical cryptanalysis techniques are applied in the field of cryptography, the well known are: • Linear Cryptanalysis first introduced and applied to Data Encryption Standard (DES) by M. Matsui [1]. This technique tries to find an affine approximation of the analyzed cipher. • Differential cryptanalysis: introduced to the public for the first time by E. Biham and A. Shamir [2], this technique studies the effect of information input difference on the resulting output.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 196–205, 2023. https://doi.org/10.1007/978-3-031-26384-2_18
Artificial Neural Networks Cryptanalysis
197
• Integral cryptanalysis: originally introduced by Knudsen, L. and Wagner, D., it is particularly applicable to block ciphers not vulnerable to differential attacks. It examines the propagation of the sums of values, in contrast to differential cryptanalysis which considers the propagation of differences between pairs of values, and thus can be seen as a dual to differential cryptanalysis [3]. Multiple cryptanalytic attack types exist based on the given data size available to the cryptanalyst. Some of these attacks are: • Ciphertext only attack: only the ciphertext is known. • Known plaintext attack: at least, some of the pairs of plaintexts and their corresponding ciphertexts are known. • Chosen plaintext attack: The cryptanalyst has the ability to choose a number of plaintexts and get their corresponding ciphertexts. In recent years, the application of machine learning, artificial neural networks and deep learning in the cryptanalysis of modern and old ciphers has gained a lot of attention, after being useful in big data techniques with applications in natural language processing, pattern recognition, speech recognition, computer vision and recommendation systems [4]. Artificial neural networks (ANNs) are flexible nonlinear models trying to simulate the biological neural systems which consist of multiple layers with a large number of neurons processing information in parallel. In our work, we will be performing a chosen plaintext attack using ANNs techniques.
2 Related Work Over the last years, many studies have been conducted in the field of cryptanalysis using machine learning techniques. In 1991, R.L. Rivest has mentioned the relationship between machine learning and cryptography [5]. Some of the work has been an attempt to study the nature of the cipher or find the relationship between the plaintexts, ciphertexts and keys. For the first category (cipher nature studies), and in 2008, Bafghi et al. in [6] studied the differential characteristics of Serpent block cipher using neural networks and obtained a 7-round differential characteristic of this cipher.In 2012, Alani in [7] managed to implement a known-plaintext attack based on neural networks upon DES and Triple-DES. In the same year, Alallayah et al. [8] applied the Levenberg–Marquardt algorithm for Neural networks to construct a black-blox system for emulation and cryptanalysis of the Simplified Data Encryption Standard (SDES). In 2014, Danziger and Henriques in [9] applied neural networks cryptanalysis to S-DES which could map the relation between keys, inputs and outputs and obtain the correct values for the key bits k0, k1 and k4. They also applied differential cryptanalysis on the key space which led to an explanation of the neural network partial success. The neural network was unsuccessful in recovering the 3 key bits mentioned after applying new differential attack resistant S-Boxes. In 2018, Focardi and Luccio in [10] used a statistical weakness structure (such as frequencies of single letters, digrams and trigrams) of classical ciphers
198
H. T. Alaoui et al.
such as shift (Caesar), substitution and Vigenère and trained a neural network that takes these frequencies as well as the adopted key as input, and generalizes the attack for any ciphertext. In the same year, A. N. Gomez et al. in [11] have proposed an architecture (CipherGAN) inspired by the CycleGAN model [12, 13], which is a model used to learn the mapping between an output image and its corresponding input image using a training set of aligned image pairs, and demonstrated that CipherGAN is capable of cracking language data enciphered using Vigenère and shift ciphers. In the same year also (2018) Xinyi and Yaqun in [14] applied neural networks cryptographic attack on AES and were able to restore more than half of the bytes with 89% probability. In 2019, A.N. Khan et al. [15] proposed an artificial neural network cryptanalysis of the MHKC. Other work has been an attempt to classify the ciphers or encrypted data from random one, and obtain what we call “distinguishers”. In 2009, Alshammari and Zincir-Heywood in [16] showed the ability to classify SSH and Skype encrypted traffic using machine learning techniques, with a higher performance using C4.5 algorithm than AdaBoost, Support Vector Machine, Na¨ıve Bayesian and RIPPER. In 2016, Tan et al. [17] showed that they were able to use deep learning to distinguish ciphertexts encrypted with five block ciphers: AES, Blowfish, DES, 3-DES, and RC5 with up to 90% success rate for the ciphertexts encrypted with the same key in the training and testing phases. In 2017, Liu et al. [18] proposed a new unsupervised learning cost function and applied it to cryptanalyse and break the Caesar cipher. In 2019, Gohr in [19] presented a block cipher SPECK distinguisher based on deep residual neural networks that achieved a mean key rank five times lower than analogous classical distinguishers using the full difference distribution table. Yet, the working mechanism and the information deducted by this neural distinguisher remained unclear. In 2021, Benamira et al. [20] analyzed Gohr’s findings and showed that Gohr’s neural distinguisher relies on the differential distribution on the ciphertext pairs, and also on differential distribution in penultimate and antepenultimate rounds. In the same year, and inspired by the work of Benamira et al.[20], ZeZhou et al. [21] proposed new neural distinguishers of SIMON based on input differences of high probability differential characteristics and on SAT/SMT solver. They also proposed new neural distinguishers that can distinguish reduced-round NSA block ciphers from pseudo-random permutation better, and conducted key recovery attacks on different versions of SIMON.
3 Merkle-Hellman Knapsack Cryptosystem Classical (or private-key) cryptosystems use the same keys for encryption and decryption, or these keys can be found from each other. In 1976, Diffie and Hellman proposed a new cryptosystem called public-key cryptosystem [22], where different keys (pair of keys) are used for encryption and decryption. The pair of keys consists of a public key which can be known by others and a private key which is only known by its owner. These keys are generated using some cryptographic algorithms based on one-way functions that are easy to compute given an input but hard to invert. In 1977, Ronald L. Rivest, Adi Shamir and Leonard Adleman developed the first public-key cryptosystem, RSA [23]. In 1978 R.C. Merkle and M.E. Hellman [24] created
Artificial Neural Networks Cryptanalysis
199
the first knapsack public key cryptosystem. The Merkle-Hellman Knapsack Cryptosystem (or MHKC for short) is based on the subset sum problem, a special case of the knapsack problem. The general knapsack problem is an NP complete combinatorial problem that is known to be computationally difficult to solve in general. MHKC working algorithm can be described as follows: • Choose a super-increasing sequence of positive integers W = (w(1), w(2),…, w(n)), which means that each number in the sequence is greater than the sum of the previous numbers: w(k) > w(i), i = 1.. k-1. • Choose a modulus such that M > w(i), i = 1.. n • Choose a multiplier R such that 1 < R < M-1 and gcd(R,M) = 1 With gcd (R, M) is: the greatest common divisor of R and M. The private key consists of W combined with the modulus M and the multiplier R: (W, M, R). The public key consists of the trapdoor knapsack sequence W which may be known by anyone. The holder of the private key is the only person who can decrypt the initial message. The Fig. 1 shows an example of how the MHKC works:
Fig. 1. The Merkle-Hellman Knapsack Cryptosystem
4 Neural Networks and Deep Learning Artificial intelligence (AI) is a thriving field with many practical applications and active research topics. We look to intelligent software to automate routine labor, understand speech or images, make diagnoses in medicine and support basic scientific research. Deep learning is an approach to machine learning that has drawn heavily on our knowledge of the human brain, statistics and applied math as it developed over the past several decades. In recent years, it has seen tremendous growth in its popularity and usefulness, due in large part to more powerful computers, larger datasets and techniques to train
200
H. T. Alaoui et al.
deeper networks. The years ahead are full of challenges and opportunities to improve deep learning even further and bring it to new frontiers [25]. Deep feedforward networks, also known as feedforward neural networks, or multilayer perceptrons (MLPs), are the most essential deep learning models. A feedforward network tries to approximate a function f. For example, for a classifier, y = f(x) maps an input x to a category y. A feedforward network tries to learn parameters of a mapping y = f(x; θ) that gives the best approximation [26] (Fig. 2).
Fig. 2. Simple feedforward neural networks architecture
5 Neural Network Cryptanalysis of Merkle-Hellman Knapsack Cryptosystem 5.1 Proposed Models Structure In order to create a neural network model that can cryptanalyse the MHKC, we need an architecture that is sufficient to learn mixing functions such as XOR as proposed by Abadi et al. in [27]. Thus, we will choose a mix and transform architecture, which has a fully-connected (FC) layer as a first layer. The ciphertext is fed into this FC layer, which can perform a linear combination or permutation of the input bits. Model 1: First proposed structure is to use a sequence of convolutional layers after the first FC layers. The last layer produces an output of the same size as the plaintext. These convolutional layers can learn to apply some function to groups of the bits with those of the previous layer, without a previous knowledge of that function. The opposite order (FC after convolutional layers) is more prevalent in image-processing applications.
Artificial Neural Networks Cryptanalysis
201
Convolutions are used to learn some spatial locality, which we want to apply in our case of neural cryptanalysis (which bits to combine). Model 2: The second proposed structure is the use of multilayer perceptron (MLP), combined with ReLU activation layers as the nonlinear function. The goal is to try to find an approximation to the MRKC algorithm. For the implementation, we used the open source library Pytorch [28] for neural network models, Matplotlib [29] for data visualization and NumPy [30] for multi-dimensional arrays and matrices, as well as other libraries.
5.2 Data Generation The generation of plaintexts and ciphertexts is done as follows: – Generate random ascii plaintexts of fixed width. – 1: Generate a new private key, modulo and multiplier for each batch of the generated plaintexts, and then generate the corresponding ciphertexts with the previous fixed key, and transform them to binary numeric format. – 2: Generate and fix a private key (a super increasing vector), a modulo and a multiplier, and then generate the corresponding ciphertexts with the previous fixed key, and transform them to binary numeric format. 5.3 Experiment Environment The experiments were done using the framework PyTorch for neural networks and Python as a programming language for the implementation of MHKC and data generation. Some other useful libraries were used like numpy for mathematical operations on arrays, to name a few. The computer used is an Intel(R) Core(TM) i7-6920HQ CPU @ 2.90 GHz and 16 GB in memory. The operating system used is Ubuntu 18.04.6 LTS x86_64. 5.4 Experiments and Results with Model 1 (CNN) Figure 3 shows, for one successful run, the evolution of the deciphering error vs. the number of training steps for N = 160 bit plaintext values, using a batch size of 256, a learning rate of 0.1, and using a new key for each batch of plaintexts, for the training and validation phases. Each point in the graph is the mean error across 256 examples. An ideal result would have the error drop to zero. The training phase begins with an error rate of 0.55. The average error value obtained after 400 epochs is: 0.182 The average validation error gives a 0.182 error rate. Figure 4 shows the training and validation error evolution over epochs, for the pairs of plaintexts and ciphertexts generated with the same key for every batch of plaintexts. The average training error value obtained after 400 epochs is: 0.182 The results obtained show that the neural system can learn a model that cryptanalyse the MHKC with an estimated error of 0.18, whether the keys are fixed or not.
202
H. T. Alaoui et al.
Fig. 3. Training and validation error, with random key generated for each batch of plaintexts
Fig. 4. Training and validation error, with fixed key used in every batch of plaintexts
5.5 Experiments and Results with Model 2 (MLP) The same experiments were done with the same parameters, to be able to compare the performances of the models. Figure 5 shows the evolution of the deciphering error vs. the number of training steps for N = 160 bit plaintext values, using a batch size of 256, a learning rate of 0.1, and using a new key for each batch of plaintexts. Each point in the graph is the mean error across 256 examples. An ideal result would have the error drop to zero. The training phase begins with an error rate of 0.5. The average error value obtained after 6 epochs is: 0.175 Figure 6 shows the evolution of the deciphering error vs. the number of training steps for N = 160 bit plaintext values, using a batch size of 256, a learning rate of 0.1, and using a fixed key for every batch of plaintexts. Each point in the graph is the mean error across 256 examples. An ideal result would have the error drop to zero. The training phase begins with an error rate of 0.5. The average error value obtained after 180 epochs is: 0.149 These results show the second model based on multilayer perceptron converges quicker and performs better than the model 1, which is based on convolutional neural networks.
Artificial Neural Networks Cryptanalysis
203
Fig. 5. Training and validation errors, with new key used for every batch of plaintexts
Fig. 6. Training and validation errors, with fixed key used for all batches of plaintexts
6 Conclusion In this paper, we applied two models of neural networks, namely the multilayer perceptron and the convolutional neural networks to cryptanalyse the Merkle-Hellman public key cryptosystem. As described in this paper, and as proposed in other works cited here, this work gives good results for both of the two models. The multilayer perceptron based model converges quicker to better results than those of the convolutional neural network cryptanalytic system. Our proposed cryptanalytic attack can be further enhanced by improving its performance, analyzing the interpretability of the model which is a common problem in machine learning, and be expanded to other cryptosystems in the future.
References 1. Matsui, M.: Linear cryptanalysis method for DES cipher. In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 386–397. Springer, Heidelberg (1994). https://doi.org/10.1007/3540-48285-7_33 2. Biham, E., Shamir, A.: Differential cryptanalysis of DES-like cryptosystems. J. Cryptol. 4, 3–72 (1991)
204
H. T. Alaoui et al.
3. Knudsen, L., Wagner, D.: Integral cryptanalysis. In: Daemen, J., Rijmen, V. (eds.) FSE 2002. LNCS, vol. 2365, pp. 112–127. Springer, Heidelberg (2002). https://doi.org/10.1007/3-54045661-9_9 4. Abiodun, O.I., Jantan, A., Omolara, A.E., Dada, K.V., Mohamed, N.A., Arshad, H.: Stateof-the-art in artificial neural network applications: a survey. Heliyon, 4(11), e00938 (2018). ISSN 2405–8440, https://doi.org/10.1016/j.heliyon.2018.e00938 5. Rivest, R.L.: Cryptography and machine learning. In: Imai, H., Rivest, R.L., Matsumoto, T. (eds.) ASIACRYPT 1991. LNCS, vol. 739, pp. 427–439. Springer, Heidelberg (1993). https:// doi.org/10.1007/3-540-57332-1_36 6. Bafghi, A.G., Safabakhsh, R., Sadeghiyan, B.: Finding the differential characteristics of block ciphers with neural networks. Inf. Sci. 178(15), 3118 – 3132 (2008). Nature Inspired ProblemSolving 7. Alani, M.M.: Neuro-cryptanalysis of DES and triple-DES. In: Proceedings of the International Conference on Neural Information Processing (ICONIP), pp. 637–646, Doha, Qatar (2012) 8. Alallayah, K.M., Alhamami, A.H., AbdElwahed, W., Amin, M.: Applying neural networks for simplified data encryption standard (SDES) cipher system cryptanalysis. Int. Arab J. Inf. Technol. 9(2), 163–169 (2012) 9. Danziger, M., Amaral Henriques, M.A.: Improved cryptanalysis combining differential and artificial neural network schemes. In: 2014 International Telecommunications Symposium (ITS), pp. 1–5 (2014). https://doi.org/10.1109/ITS.2014.6948008 10. Focardi, R., Luccio, F.L.: Neural Cryptanalysis of Classical Ciphers. ICTCS, pp. 104–115 (2018) 11. Gomez, A.N., Huang, S., Zhang, I., Li, B.M., Osama, M., Kaiser, L.: Unsupervised cipher cracking using discrete gans. arXiv preprint arXiv:1801.04883(2018) 12. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.:Unpaired image-to-image translation using cycleconsistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV) (2017) 13. CycleGAN. https://keras.io/examples/generative/cyclegan/ 14. Hu, X., Zhao, Y.: Research on plaintext restoration of AES based on neural network. Secur. Commun. Netw. 6868506, 9 (2018). https://doi.org/10.1155/2018/6868506 15. Khan, A.N., Yu Fan, M., Malik, A., Husain, M.A.: Cryptanalyzing Merkle-hellman public key cryptosystem with artificial neural networks. In: 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), pp. 1–7 (2019). https://doi.org/10.1109/I2CT45611. 2019.9033917 16. Alshammari, R., Zincir-Heywood, A.N.: Machine learning based encrypted traffic classification: identifying SSH and Skype. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1-8 (2009).https://doi.org/10.1109/CISDA.2009. 5356534 17. Tan, C., Ji, Q.: An approach to identifying cryptographic algorithm from ciphertext. In: 2016 8th IEEE International Conference on Communication Software and Networks (ICCSN), pp. 19–23 (2016). https://doi.org/10.1109/ICCSN.2016.7586649 18. Liu, Y., Chen, J., Deng, L.:Unsupervised sequence classification using sequential output statistics. Adv. Neural Inf. Process. Syst. 30 (2017) 19. Gohr, A.: Improving attacks on round-reduced speck32/64 using deep learning. In: Boldyreva, A., Micciancio, D. (eds.) CRYPTO 2019. LNCS, vol. 11693, pp. 150–179. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26951-7_6 20. Benamira, A., Gerault, D., Peyrin, T., Tan, Q.Q.: A deeper look at machine learning-based cryptanalysis. IACR Cryptol. ePrint Arch., 287 (2021). https://eprint.iacr.org/2021/287 21. Hou, Z.Z., Ren, J.J., Chen, S.Z.: Improve neural distinguishers of simon and speck. Secur. Commun, Netw. 9288229, 11 (2021). https://doi.org/10.1155/2021/9288229
Artificial Neural Networks Cryptanalysis
205
22. Diffie, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theory 22(6), 644–654 (1976). https://doi.org/10.1109/TIT.1976.1055638 23. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and publickey cryptosystems. Commun. ACM 21 (2), 120–126 (1978). https://doi.org/10.1145/359340. 359342 24. Merkle, R., Hellman, M.: Hiding information and signatures in trapdoor knapsacks. IEEE Trans. Inf. Theory 24(5), 525–530 (1978). https://doi.org/10.1109/TIT.1978.1055927 25. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). www.deeplearn ingbook.org 26. What are different types of Artificial Neural Networks?. https://intelligencereborn.com/Art ificialNeuralNetworksTypes.html. Accessed 20 Apr 2022 27. Abadi, M., Andersen, D.G.: Learning to protect communications with adversarial neural cryptography. arXiv preprint arXiv:1610.06918 (2016) 28. PyTorc. https://pytorch.org/ 29. Matplotlib. https://matplotlib.org/ 30. NumPy. https://numpy.org/
Using Machine Learning Algorithms to Increase the Supplier Selection Process Efficiency in Supply Chain 4.0 Houria Abouloifa(B) and Mohamed Bahaj Mathematics and Computer Science Department, Faculty of Sciences and Techniques, Lab L.M.I.E.T Hassan Ist University, Settat, Morocco {h.abouloifa,m.bahaj}@uhp.ac.ma
Abstract. Supply Chain in one of the processes that took an important part of the attention given to Industry 4.0 because of its information flows which are dispersed in different directions. To that purpose, many new technologies have seen the day to improve the flows between the different links of the Supply Chain: Machine learning is being introduced into the manufacturing environment, which initiated a fourth industrial revolution. As such, this paper explore the application of machine learning algorithm to the supplier selection process based on past supplier orders and choices. The proposed model will enable firms to predict the good suppliers from the bad ones using supervised learning algorithms. To this purpose, we start by using three supervised machine learning tasks: AdaBoost, SVM and KNN. AdaBoost shows the most reliable results with an accuracy of 98% and an F1-score of 86%. Keywords: Machine Learning (ML) · Supplier selection · Artificial Intelligence (AI) · Supply Chain 4.0 · Industry 4.0
1 Introduction With the rise of new technologies, Companies are facing challenges in dealing with rapid decision-making for improved productivity. Business practices must be continually adapted and improved to insure responsiveness especially with the importance and the variation of the present information flow. The communication technologies changed the speed of diffusion, complexity and ease of access, and Data has become a concept of value creation [1]. The traditional philosophy of manufacturing systems will change, including Enterprise Resources Planning systems. Thus, the emerging technologies also called modern technologies are being introduced into the manufacturing environment which initiate a fourth industrial revolution. Industry 4.0 envisions a digital transformation in the enterprise, entwining the cyberphysical world and real world to deliver networked production with enhanced process transparency [2]. It holds the promise of increased flexibility in manufacturing, along with mass customization, better quality, and improved productivity [3]. The introduction © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 206–216, 2023. https://doi.org/10.1007/978-3-031-26384-2_19
Using Machine Learning Algorithms to Increase the Supplier Selection Process
207
of Industry 4.0 into manufacturing has many impacts on the whole supply chain. The integration of artificial intelligence in this context has enabled Supply Chain to boost efficiency, save costs and increase assets velocity through transparency and visibility since the collaboration between suppliers, manufacturers and customers is crucial to increase the transparency of all the steps from when the order is dispatched until the end-of-life of the product [4]. This paper explores the application of machine learning algorithms to the supplier selection process using a dataset of past supplier performances. To this end, three algorithms are deployed: AdaBoost, SVM and KNN. Results are compared based on accuracy and F1-score of each model. For this study, AdaBoost achieved the most reliable results with an accuracy of 98% and an F1-score of 86%. The next section is a summary and analysis of related works to our paper. After that, we elaborate the innovation of traditional Supply chain to Supply chain 4.0 in Sect. 3. Section 4 is the empirical study where machine learning algorithms are employed in Python, and the last section is conclusion.
2 Related Works In the last decade, Data analyzing has become the main key to knowledge mastering in modern manufacturing systems. Many new jobs disciplines, that didn’t exist over the last ten years, have seen the day showing why data is being relied on. Positions like Data analyst, Data scientist, Data architect and others are nowadays the stepping stone roles to make critical business decisions. In that scope, Ni, D., Xiao, Z. & Lim, M.K. [5] conducted a systematic examination of over 120 of the latest research papers in the field of ML application in SCM. The study shows that the use of ML in SCM has not achieved the expected level of maturity that might enable the integration of such advanced technology in enterprises culture as a tool of SC improvement. The research level for the most of these papers was described as theoretical and in a developmental stage. In agricultural and food supply chains, the benefits of ML techniques in leading to Agricultural Supply Chain sustainability are highlighted [6]. Its role in providing realtime analytic insights for pro-active data-driven decision-making is identified by the mean of a framework using ML algorithms, to improve agricultural productivity and sustainability. Just as in food manufacturing industry, ML is used, in combination with block chains and fuzzy logic algorithm to insure perishable food traceability [7]. Risk management is one of the most suitable application area for artificial intelligence (AI) techniques [8]. The rapid and adaptive decision-making process leaning on multidimensional and large data sources make ML algorithms a great tool for improving the detection and the processing of risks in the Supply chain. Financially, ML is utilized to improve the accuracy of forecasting credit risk is small and medium companies using a hybrid ensemble ML approach called RS-MultiBoosting [9]. Many approaches can be incorporated to achieve the needed level of optimization and the results shows that the efficiency of ML in this field is higher than the traditional credit risk forecasting. The review done on the most recent papers shows that machine learning is adding a remarkable value to Supply Chain management. Meanwhile, it is not fully exploited and
208
H. Abouloifa and M. Bahaj
its tools and techniques can leverage the supply chain’s total value [10]. Although the amount of research studies processing ML applications especially in the 5 last years, the global work is theoretical and it consists mostly of a state of art and empirical literature of past studies: definitions, application fields and ratios about the use of ML methods. Few have tested ML tools is the current industrial context particularly in Supply Chain. In this paper, we are not only presenting the improvement that ML can provide to Supply Chain, but we are applying those algorithms to a real case of study of a company in the aeronautical industry.
3 Supply Chain 4.0: Innovation of Supply Chain in an Industry 4.0 Context Supply Chain is defined as a combination of four independent yet interlinked entities: Marketing, Procurement, Warehouse Management and Transportation [10]. Supply Chain entities are interconnected by a significant physical flow that includes raw materials, work-in process inventories, finished products and returned items, information flows, and financial flow. Managing the increasing complexity in Supply Chains is necessary to companies to compete better in the global market. Traditional supply chain (Fig. 1) were designed for competing in a traditional environment where systems rely on human intervention and interaction. Nowadays, supply chains worldwide are operating under an ever-changing environment and are vulnerable to a myriad of risks at all levels; customers have become more and more demanding in term of delays, costs and services levels [8]. More demanding customer requirements and new technologies in the area of digitization drive the need for an evolution of traditional supply chains towards connected, smart, and highly efficient supply chain ecosystems that proactively manage and efficiently fulfil customer needs across multiple channels [12]. This new philosophy of Supply chain management is called Supply Chain 4.0 (Fig. 2). Supply Chain 4.0 happens when supply chain is incorporated into and drives Industry 4.0. Plan
Supplier
Order and confirm
Plan
Plan
Production
Distribution
Order and confirm
Plan
Costumer
Order and confirm
Consumer
Order and confirm
Fig. 1. Traditional supply chain.
The primary purpose of a supply chain 4.0 is to extract useful information by analyzing humongous amount of data being generated from all the objects across the supply chain. The challenge lies in aggregating such huge diverse set of data generated from multiple sources and providing on- time information to assess present situation and predict what-if scenarios and take smart decisions.
Using Machine Learning Algorithms to Increase the Supplier Selection Process
209
Supplier
Consumer
Production
Supply Chain Control
Costumer
Distribution
Plan, Order and Confirm
Fig. 2. Architecture of Supply Chain 4.0
Many differences are detected between regular Supply Chain and Supply Chain 4.0. In the sheet bellow (Table 1), a comparison between those two operating modes of Supply Chain is made based on recent works describing the evolution of such process in a digital-automated environment. We chose a comparison based on the four main processes of the Supply Chain: Planning, sourcing, making and delivering since they are the most common processes whatever the operating field is. Table 1. Comparison between traditional supply chain and supply chain 4.0 Supply chain
Planning
Sourcing
Making
Tradition al Supply Chain
Human interaction and intervention
Human interaction and intervention
Human Human interaction interaction and and intervention intervention
delivering
Supply Chain 4.0
*real-time data collection *real-time data treatment *predictive analytics *Smart forecasting
*machine learning and automation technics *implementation of new technologies devices
*automated versions of simple monotonous tasks *robotics technologies *delivery routing, inventory
*automating the logistics of distribution and delivery *real-time optimizations drawn *Automated inventory management
210
H. Abouloifa and M. Bahaj
Unlike a traditional supply chain model, digital supply networks are dynamic, integrated, and characterized by a high-velocity, continuous flow of information and analytics. Supply Chain 4.0 is essentially the combined use of the latest innovations in internet, robotics, and data technology. The initial step toward this new supply-chain has mostly involved the creation of digitized and/or automated versions of simple, monotonous, and yet laborious tasks that occur throughout the supply-chain [13]. The transition to digital offers businesses new possibilities in terms of inventory management, reduction of production costs, time savings, among others. Thanks to an optimal alliance of Industry 4.0 solutions (connected objects, IoT, collection and analysis of field data, etc.) [14], organizations can now place the customer at the heart of the supply chain by adapting production to their needs. In other words, the adoption of a new digitalization strategy for supply chain 4.0 is today a necessary step to understand the emerging technological landscape, to reap the benefits, and thus strengthen its position in the face of competition.
4 Application of Machine Learning in Supply Chain 4.0 4.1 Machine Learning Machine learning is defined as a set of computer algorithms that learn from the experience without human intervention. It is a type of artificial intelligence that uses historical raw data to extract patterns [15]. Example data replaces rigid calculation rules of a program. From the given example data, learning methods or algorithms extract statistical regularities, and represent those in the form of models. The models can react to new, unknown data and classify them into categories or make predictions [16]. Using supervised, unsupervised learning or Reinforcement Learning, the ultimate goal of ML is improve performance and make concrete predictions [17] by the mean of its precise algorithms. Supervised Learning is a process in which computer program uses known data labelled into inputs and outputs, and relate them by finding a connection formed as a set of rules. Those rules are applied to new data as the program is getting trained. This is the most used type of ML [18] since it’s faster and cheaper. Most common types of supervised learning are classification and regression. Classification aims to predict on which class discrete data belong and regression is utilized for continuous data to show relations between a dependent variable and a fixed dataset collection of different variables [19]. 4.2 Machine Learning Application in Supplier Selection Process: Case Study One of the major challenges of actual Supply Chain is to find suitable Suppliers that will respond correctly to the firm expectations to insure a high level of efficiency and performance. A. Proposed methodology In this paper, we propose a model of supplier selection using machine learning. The implementation is executed in Python using scikit-learn. We start from dataset
Using Machine Learning Algorithms to Increase the Supplier Selection Process
211
of past Supplier orders that has been cleaned in a pre-processing step. After that, we elaborate a training model based on the supplier choices done all over dataset past period. We use the training results to develop a test model and we deploy 3 tasks to that purpose: AdaBoost, SVM and KNN. Results shows that AdaBoost model is the most suitable for our case study with an accuracy of 98% and an F1-score of 86%. B. Data collecting Dataset in this case study is provided by a Moroccan company working in the aeronautical manufacturing field. Data are based on information history of the four past years (from 2018 to 2021) and are extracted from the ERP system. Collected data include information about past orders to 107 suppliers and are constructed as follow: • • • • • • • • • • • • • • •
Order number: The number given to the order by ERP Supplier ID: The ID by which the supplier is created in ERP Order quantity: The quantity of the goods in the order Order cost: The cost of the order line Discount: Discount applied on the order Due date: The date on which the goods should be delivered Delivery date: The date on which the good are actually delivered Delay in days: number of days between the due date and the delivery date Non-compliance quantity: The quantity of goods judged as non-compliant Non-compliance level: The ratio between delivered goods and goods judged as noncompliant Delivered quantity: The quantity actually delivered Remaining quantity: The difference between the order quantity and the delivered quantity Administrator ID: The ID of the administrator of the order Delivery ID: The number of the delivery document Choice: OK if the Supplier is still a part of the actual supplier panel, NOK if not.
Provided data are extracted as an Excel document and imported to Jupyter notebook database. Jupyter is an interactive web-based development environment for notebooks, code, and data and its’ is often combined with Python to execute machine Learning code. C. Exploratory data analysis Once dataset is imported to the notebook, a data analysis is established to detect useful one. Fig. 4 is an exploratory vision that allows us to handle the missing values so that we can choose which one to clean or not from dataset. Black color in the figure define the presence of the value and white one is the opposite. Ratios of missing value are also used in the same purpose: variable columns with more than 75% missing values will be deleted in the cleaning phase (Fig. 3). D. Data pre-processing In the majority of ML projects, the data cleaning processes actually consume most of the time of the total effort [18]. The pre-processing level aims to recreate new subsets based on the previous step. All data judged as unsuitable are cleaned from the dataset. Only relevant one are kept to achieve the maximum level of efficiency
212
H. Abouloifa and M. Bahaj
Fig. 3. Exploratory vision of present/missing values
and to create reliable set of information. In this step, rows missing most values were removed and invalid structures were corrected. As enlightened in the beginning of this part, Machine Learning model in our study is established using the scikit-learn library. To respond to our research question, we have decided to use the selection model function from the ‘sk-learn’ liberary. We realized our trainset on 20% of studied features and we created a testset based on trainset results. E. Creating the model and evaluation To create a Machine Learning model for Supplier selection, we need to define a set of models able to predict class or value of target variable by learning decision rules inferred from prior data. As so, the model creation in our study is based on a decision tree classification method. Once the models are created, they should be evaluated. The evaluation procedure aims to define the precision, the recall, the F1-score and the accuracy of the proposed model via a confusion matrix. A confusion matrix is a direct representation of the test results in a prediction model. Columns and rows of the matrix represent prediction class instances and actual class instances, [21]. Our study consist of using 3 task models. The choice of these tasks are based on their capacity to treat efficiently intermediary and small sized Datasets in classification problems. • AdaBoost: AdaBoost is an iterative ensemble method that combines multiple classifiers to increase the accuracy of the classifiers. It builds a powerful classifier by combining several poorly performing classifiers so that you get a powerful classifier with high accuracy
Using Machine Learning Algorithms to Increase the Supplier Selection Process
213
• SVM: Consists of finding the linear decision boundary that distances the classes the farthest from each other. It is easy to create models with infinite dimensions with this method. • KNN: From a labeled database, we can estimate the class of a new datum by looking at what is the majority class of the k nearest neighboring data. F. Results visualization and analysis Results shows that AdaBoost model has the best F1- score (86%) and better accuracy (98%) followed by SVM model with an F1-Score of 40% and an accuracy of 94%. Least good performance is achieved with KNN model: an F1-Score of 29% and an accuracy of 90% (Table 2).
Table 2. F1-SCORE and Accuracy for the three studied models. F1-SCORE
Accuracy
SUPPLIER OK
SUPPLIER NOK
AdaBoost
0, 86
0, 99
0, 98
SVM
0, 4
0, 97
0, 94
KNN
0, 29
0, 95
0, 9
As we can see, the training score plot of AdaBoost is still around the maximum and the validation score is very low at the beginning and increases by the end (Fig. 4). The learning curve of SVM classifier start very high but decreases considerably as the algorithm goes by. The training score is not good since the beginning, it increases by the end but the score is still not good enough (Fig. 5). KNN classifier is the weakest algorithm for our study. With a very low F1-score and an accuracy less good then other studied classifiers. The plots shows that both train and test models increases as the algorithm
Fig. 4. AdaBoost classifier traning and validation score plot
214
H. Abouloifa and M. Bahaj
processes data, although that the scores are not very good. It can mean that the model needs more time and data to process and to reach higher scores. An over-fitting issue can be observed in this case as the validation model tries to fit the training data too much. It can mean that the model needs more time and data to process and to reach higher scores (Fig. 6).
Fig. 5. SVM classifier traning and validation score plot
Fig. 6. KNN classifier traning and validation score plot
5 Conclusion The traditional supply chain paradigm is increasingly being converted into Supply Chain 4.0 as a result of Industry 4.0. The integrated supply chain eco-system has improved efficiency, transparency, communication, collaboration, flexibility, and responsiveness among supply chain partners, allowing for more accurate demand forecasting, shorter order lead times and lower costs, improved product and service quality, and higher customer satisfaction. In this paper, we established a machine learning model to increase the efficiency of supplier selection process in Supply chain. Three algorithms were deployed to this
Using Machine Learning Algorithms to Increase the Supplier Selection Process
215
end: AdaBoost, SVM and KNN. A comparison of the resulting performance of each model shows that AdaBoost gives the best precision in our study. As a perspective, the established models can be optimized by a cross validation algorithm to enable achieving higher accuracy. Besides all the positive contributions as briefly described above, Supply Chain 4.0 comes with some challenges or even negative impacts. Supply Chain 4.0 requires substantial investment in hardware and software. The local SMEs may encounter a lot of barriers, costs and challenges to adopt and apply the emerging technologies. Moreover, the use of robotics, artificial intelligence, self-driving vehicles and big data may result in unemployment to many white- collar and blue-collar employees.
References 1. El Hamdi, S., Abouabdellah, A., Oudani, M.: Disposition of Moroccan SME manufacturers to industry 4.0 with the implementation of ERP as a first step. In: 2018 Sixth International Conference on Enterprise Systems (ES), pp. 116–122 (2018). https://doi.org/10.1109/ES. 2018.00025 2. El Hamdi, S., Oudani, M., Abouabdellah, A.: Morocco’s readiness to industry 4.0. In: Bouhlel, M.S., Rovetta, S. (eds.) SETIT 2018. SIST, vol. 146, pp. 463–472. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-21005-2_44 3. Zhong, R., Xu, X., Klotz, E., Newman, S.: Intelligent manufacturing in the context of industry 4.0: a review. Engineering 3, 616–630 (2017). https://doi.org/10.1016/J.ENG.2017.05.015 4. Tjahjono, B., Esplugues, C., Enrique, A., Peláez-Lourido, G.: What does industry 4.0 mean to supply chain? Procedia Manufactu. 13, 1175–1182 (2017). https://doi.org/10.1016/j.promfg. 2017.09.191 5. Ni, D., Xiao, Z., Lim, M.K.: A systematic review of the research trends of machine learning in supply chain management. Int. J. Mach. Learn. Cybern. 11(7), 1463–1482 (2019). https:// doi.org/10.1007/s13042-019-01050-0 6. Shahbazi, Z., Byun, Y.: A procedure for tracing supply chains for perishable food based on blockchain. Mach. Learn. Fuzzy Logic. Electron. 10, 41 (2020). https://doi.org/10.3390/ele ctronics10010041 7. Bousqaoui, H., Achchab, S., Tikito, K.: Machine learning applications in supply chains: an emphasis on neural network applications, 1–7 (2017). https://doi.org/10.1109/CloudTech. 2017.8284722 8. Baryannis, G., Validi, S., Dani, S., Antoniou, G.: Supply chain risk management and artificial intelligence: state of the art and future research directions. Int. J. Prod. Res. 57(7), 2179–2202 (2019). https://doi.org/10.1080/00207543.2018.1530476 9. Zhu, Y., Zhou, L., Xie, C., Wang, G.-J., Nguyen, T.: Forecasting SMEs’ credit risk in supply chain finance with an enhanced hybrid ensemble machine learning approach. Int. J. Prod. Econ. 211, 22 23 (2019). https://doi.org/10.1016/j.ijpe.2019.01.032 10. Younis, H., Sundarakani, B., Alsharairi, M.: Applications of Artificial Intelligence and Machine Learning within Supply Chains: Systematic review and future research directions. J. Model. Manag. 17(3), 916–940 (2021). https://doi.org/10.1108/JM2-12-2020-0322 11. Awwad, M., Kulkarni, P., Bapna, R., Marathe, A.: Big data analytics in supply chain: a literature review. In: Proceedings of the International Conference on Industrial Engineering and Operations Management Washington DC, USA, September 27–29, pp. 418–425 (2018) 12. Prasad, S., Sounderpandian, J.: Factors influencing global supply chain efficiency: implications for information systems. Supply Chain Manag. 8(3), 241–250 (2003)
216
H. Abouloifa and M. Bahaj
13. Why Traditional Supply-Chain Management Systems Are Dying Building Up from SupplyChain 4.0 SupplyBloc Technology Jul 18 (2018) 14. Witkowski, K.: Internet of things, big data, industry 4.0 – innovative solutions in logistics and supply chains management. Procedia Eng. 182, 763–769 (2017). https://doi.org/10.1016/ j.proeng.2017.03.197 15. Cabos, R., Hecker, P., Kneuper, N., Schiefele, J.: Wind forecast uncertainty prediction using machine learning techniques on big weather data, p. 3077 (2017).https://doi.org/10.2514/6. 2017-3077 16. Kirste, M., Schürholz, M.: Einleitung: Entwicklungswege zur KI. In: Wittpahl, V. (ed.) Künstliche Intelligenz, pp. 21–35. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3662-58042-4_1 17. Gentsch, P.: AI eats the world. In: AI in Marketing, Sales and Service, pp. 3–9. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-89957-2_1 18. Géron, A.: Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. “ O’Reilly Media, Inc. (2017) 19. Abdulla, A., Baryannis, G., Badi, I.: Weighting the Key Features Affecting Supplier Selection using Machine Learning Techniques (2019). https://doi.org/10.20944/preprints201912.015 4.v1
A New Approach to Intelligent-Oriented Analysis and Design of Urban Traffic Control: Case of a Traffic Light Abdelouafi Ikidid1(B) , Mohamed El Ghazouani2 , Yassine El Khanboubi3 , Charafeddine Ait Zaouiat2 , Aziz El Fazziki1 , and Mohamed Sadgal1 1 Computer Systems Engineering Laboratory (LISI), Faculty of Science and Technology,
Cadi Ayyad University, Marrakesh, Morocco [email protected], {elfazziki,sadgal}@uca.ac.ma 2 Polydisciplinary Faculty of Sidi Bennour, Chouaîb Doukkali University, El Jadida, Morocco 3 Faculty of Science Ben M’Sik, Hassan II University, Casablanca, Morocco
Abstract. The aim of this work is to propose a modeling method for the industrial urban traffic control based on the agent paradigm. Thus, from a detailed functional specification of the urban traffic, we elaborate a method of analysis and design of the traffic light system in five levels. First of all, a problem description phase that allows us to define the problem. Then, a requirement analysis phase that allows us to describe the needs and identify the scenario to be studied. After, the analysis phase that allows us to design the structure of the system, at several levels of abstraction, and to focus on the interactions and behaviors of the components. The Agentification phase that permits us to go from the component to the agent. And finally, the implementation and evaluation phase. Keywords: Artificial Intelligence · Intelligent System · Agent Technology · Urban Traffic Control
1 Introduction The complexity of traffic signal management is increasing, and new ideas on how to control signals based on multiple aspects are needed. Most approaches typically divide the intersection network into regions or subparts that cover one or more intersections. Effective control of region signals is one of the main problems in traffic signal control. Coordinating all signals in a given region is a more complicated task, because on the one hand, there are many variables and factors to consider, and on the other hand, there are various objectives to consider such as minimizing delay, favoring public transportation, or maximizing user safety. This paper proposes an approach for the development of an intelligent urban traffic management system based on fuzzy logic and a decentralized multi-agent architecture allowing a large coordination and collaboration between the control entities. The goal is to implement a tool that is adaptable to various traffic conditions, that provides better © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 217–230, 2023. https://doi.org/10.1007/978-3-031-26384-2_20
218
A. Ikidid et al.
cooperation and coordination between control areas, and that is a wide-ranging support for the rational control and management of traffic signal networks. Furthermore, this approach takes advantage of the distributed organization of the multi-agent system to ensure global optimized management and to avoid local optimization. In other words, each control zone agent can negotiate not only with agents in the same zone, but also with other agents in neighboring zones, upstream and downstream. Figure 1 shows a general overview of the different development processes of the proposed approach. The rest of the paper is organized as follows: Sect. 2, presents related works. Section 3 describes the proposed system and the methodology used. Finally, Sect. 4 summarizes the results of the approach and states some future work.
2 Related Works The most appealing characteristics for an MAS used in traffic and transportation management are autonomy, collaboration, and reactivity [1]. Agents can use perceptive data and received information from other agents to achieve their goals. Each agent can cooperate with neighboring agents and adjust its reaction online to its surroundings’ changes. Thus, multi-agent technology treats a complicated system in a distributed manner. It splits the complex control system into a simple subsystem, therefore permitting parallel and fast decision-making [1]. Moreover, agents can run, learn new contexts and skills, and make autonomous decisions in the complete or partial absence of human supervision. With all of these features, the MAS is rapidly growing as one of the most powerful popular technologies proposed to solve complicated problems in different fields, such as irrigation [2], computer science, electronic commerce, civil engineering, and transportation systems [3, 4]. In a transportation system and with the diversity of actors involved, agent technology can be used in the various components of the system, such as drivers and vehicles [5, 6], traffic light [7], phases [8], and to handle diverse aspects, e.g., congestion [9], the green transportation system [10], and route guidance[11]. In urban traffic networks, signalized intersections are one of the most important and influential ingredients, and the traffic signal is the most utilized instrument for scheduling and managing traffic flow. In what follows we analyze and discuss succinctly several studies that use a multi-agent system and artificial intelligence techniques to perform intelligent traffic signal control. The ATS incorporate artificial technologies to overcome the congestion challenges and other transportation issues that are difficult to address using traditional computational techniques. The widely used artificial intelligence techniques for optimizing traffic signals are Artificial Neural network Systems [12, 13], Deep Learning [14–16], Genetic Algorithm [17, 18]„ Fuzzy logic (FL) [19], Multi-Agent System (MAS) [20], Case-Based Reasoning [21] and Ant Colony Algorithm[22]. These methods are used to handle diverse problems, e.g., traffic congestion [23], incident detection [24], and route guidance [25]. Since the traffic system is characterized by uncertainty, fuzzy and inexact data, and wide-reaching distributed architecture, in this paper, we propose a multi-agent system that uses agent fuzzy logic to design a cooperative real-time traffic signal optimization system, where the signal control plan is frequently updated to meet the unpredictable traffic conditions.
A New Approach to Intelligent-Oriented Analysis
Fig. 1. General overview of the proposed approach development process
219
220
A. Ikidid et al.
3 The Proposed Approach 3.1 Problem Description Phase The Functional Zones of an Intersection. A road intersection is a crossing of several roads that contains three functional zones (Fig. 2) managed by a traffic light; red queues vehicles in a storage area, green provides access to the exit area through the conflict area, and yellow is a transition period from green to red to allow vehicles to exit the conflict area.
Fig. 2. Functional areas of an intersection of two one-way roads
Modeling the Road Network. The intersection network is considered a distributed system that is modeled by a strongly connected directed graph G = (C, A), where (C) is a set of nodes that represent intersections and (A) is a set of arcs that represent the paths that connect these intersections. Thus, whatever (C i , Cj ) ∈ C, there exists a consecutive sequence of arcs connecting Ci and Cj . Each intersection, as a component of the distributed system, has its own requirements and it coordinates with its adjacent intersection. Two intersections connected by an arc are considered adjacent. Adjacent intersections cooperate and share their data to achieve a common system goal, which is the optimization of traffic flow management. Each arc Aij is bounded by two intersections:i, the initial intersection and origin of the arc’s flow, and j, the terminal junction and destination flow. The downstream of the arc’s flow of an arc is the group of successor arcssucc Aij = Ajk , (i, j, k) ∈ C , where outgoing flows from the arc can be routed. The upstream flow of an arc is the set of
A New Approach to Intelligent-Oriented Analysis
221
predecessor arcspred Aij = {Aki , (i, j, k) ∈ C}, where the incoming flows of the arc arrive. An intersection is considered congested if it fails to clear all activated arc storage areas after a green phase time. In other words, it is considered congested if the downtime of an incoming arc exceeds the cycle time.
3.2 Requirement Analysis Phase The purpose of the requirements study stage is to understand the scenario under study and to present the different aspects of the problem, as well as the components, their functions and their interactions. This step is based on two tasks: the description of the scenario and the presentation of a first requirements diagram according to the Tropos method [73]. Description of the Scenario. The idea is to optimize in real time the distribution and order of signals at the intersection in collaboration and synchronization with its successors and predecessors. Each intersection is managed by a control subset. We assume that the throughput rate of a given intersection is not only influenced by the queued vehicles on its inbound lanes, but also by the inbound traffic upstream that causes saturation of the storage areas, and by the traffic condition downstream that restricts the egress from the intersection (Fig. 3). The traffic signal optimization scenario presented in Fig. 3 addresses the following needs: The aim of the system is to optimize the traffic light plan. It updates the phase layout by deciding to suspend or extend the active phase and determining which phase will take place. It also creates a database containing the history and key information of decisions. Definition of Initial Requirements. The initial requirements phase captures requirements collected and explored by the scenario description. We use the Tropos methodology to describe the initial requirements collection. This methodology represents the system as a set of social actors and describes the dependencies between them. Tropos is based on three notions: “actor” which is an active entity with goals, it can be a function, a role or an agent; the “goal” which is the objective that an actor must achieve and is decomposed into two subclasses: (i) “HardGoal” which is a strategic interest with obvious conditions for when and if the goal is reached (ii) “SoftGoal” which is not specified in an obvious way and presents the way in which the goal should be reached. Finally, “dependencies” are the social interactions between actors to achieve their goals. Figure 4 shows the initial requirements diagram for optimized dynamic traffic signal control.
3.3 Analysis Phase The analysis phase aims to describe the structure of the system, at several levels of abstraction, and focus on the interactions and behaviors of the components. This phase includes two tasks: defining the macro and micro architecture of the system.
222
A. Ikidid et al.
Fig. 3. Generic scenario of traffic signal optimization
System Macroarchitecture. The selection of the system architecture is an essential step in the development of a traffic signal control system and has an impact on the following steps. Various specifications guide the definition of the organizational structure, including the fuzzy characteristics of the environment, the distributed architecture of the road network, the ability of the control process to support the computational complexity, and the need to respect organizational rules and ensure coordination of plans. From the point of view of the signal control system architecture, most approaches generally divide the road network into regions or subparts that cover one or more intersections. The partition into regions mainly follows a high-level operational objective. These sub-areas are controlled by a set of entities called control groups (CGs). The organizational structure of these groups can be modeled in different ways. The organizational structure determines the interactions, roles and grouping of the groups and can be designed in many forms such as centralized, decentralized. The urban network of intersections is spatially and functionally distributed. Then each intersection is considered as a region and controlled by autonomous, cooperative and intelligent CGs. Each region controls its own incoming flows and synchronizes and collaborates with the connected regions. It provides its neighbors with the requested information about the status of its flows, otherwise it requests the downstream status. In our approach, the region always contains a single intersection (Fig. 5).
A New Approach to Intelligent-Oriented Analysis
223
Fig. 4. Preliminary requirements diagram
System Microarchitecture. The objective of this activity is to define the operational components that constitute each control group. As a result, we need to distribute the overall expected behavior at the CG level into sets of components (Fig. 6). Arc Monitor: This type of actor represents an intersection arc, each incoming arc is monitored by an arc monitor. Inactive Phase Manager: The inactive phase manager controls the phase sequences; it selects a phase from among all the phases, except the current phase and the phases already activated in the cycle, to be a candidate for the next green time. Active Phase Manager: This actor is responsible for managing the current phase. Its objective is to maintain the green time as much as possible to maximize the evacuation of vehicles. Planner: This actor is the axis of our control unit. The objective of this actor is to optimize the distribution of phases by updating, in real time, the signal control plan. Coordinator: The objective of this actor is to coordinate with neighboring control units. It represents a control unit communication interface and mediates all external communications.
224
A. Ikidid et al.
Fig. 5. Macroarchitecture of the system
3.4 Agentification Phase We propose the use of a simplified multi-agent architecture. This architecture is adopted particularly for the structuring of the system. To do this, we propose an approach organized in three steps: • Organizational structuring of the MAS • Structuring the system into groups of agents • Identification of agents and roles Organizational Structuring of the MAS. The selection of the organizational structure is an essential step in the development of the MAS that has an impact on the following steps. Various specifications guide the definition of the organizational structure, including the characteristics of the environment, the architecture of the real-world organization. Structuring the System into Groups of Agents. In this step, we need to divide the expected overall functioning of the system into groups. The representation of a traffic signal control system (TSCS) by a MAS is based on a correspondence between the components of the TSCS and the MAS.
A New Approach to Intelligent-Oriented Analysis
225
Fig. 6. Microarchitecture of the system
Identify Agents and Roles. This step identifies both the core competencies and the core interactions needed to achieve the system goals.
3.5 Implementation Phase To instantiate the generic structure of the agent-oriented system in the simulator, we use, in this step, the AnyLogic programming and simulation platform and the jFuzzyLogic library. Implementation and Simulation Platforms. Many solutions exist to model road traffic, in particular traffic light intersections and flow control systems. The authors of [26].make a very brief comparison of simulators dealing with road traffic (whether urban, freight or other), the authors of [27] present the purely commercial tools that exist to simulate the case of intersections and also discuss the case of simulations made from numerical computation software such as MatLab or Scilab (free equivalence), AnyLogic: the ANYLOGIC simulator validates the performances of the proposed system. Based on the JAVA language, AnyLogic is a programming and simulation platform used to model hybrid systems, it can handle both agent modeling and traffic simulation. AnyLogic is the only simulation software that combines the three best modern simulation methods:
226
A. Ikidid et al.
• System Dynamics can be applied to any complex system. For example, a company wants to simulate the marketing options for a new product. • Discrete event modeling can also be applied to any type of business. It is mainly used in the manufacturing sector. For example, a company can simulate the movement of a product through the production line or any other process. • This software is known for its user-friendly interface for multi-agent models. Agents are the main building blocks of the AnyLogic model. The design of an agent usually starts with the identification of its attributes, its behavior and its interface with the external world. In the case of a large number of agents with dynamic connections (such as social networks), agents can communicate by calling functions. An agent is an instance of an agent type. We have developed five agent types following the steps shown in Table 1. Table 1. Agent creation process on AnyLogic Step
Description
Initialization
Create a population of agents, a single agent, or define only the agent type
Create a new agent type
Create a new agent type, or use an existing agent type
Agent animation
Specify the animation form for this agent type
Agent parameters
Specify and define the parameters required for this agent type
Population
An agent can represent an individual agent or a population of agents. A population represents a collection of agents of the same type and we need to define the initial number of agents in the population
Configure the new environment Configure the environment in which the agents of this population will reside, choose the dimensions and the type of space (Continuous, GIS, or Discrete)
jFuzzyLogic: To handle the fuzzy inference mechanism, we have integrated the jFuzzyLogic library into AnyLogic. Developed in JAVA language, jFuzzyLogic is an open source fuzzy logic library and an implementation of Fuzzy Control Language (FCL), which provides a complete functional implementation of fuzzy inference [75]. Table 2 presents a partial list of software used to develop a fuzzy system and the languages used.
A New Approach to Intelligent-Oriented Analysis
227
Cingolani and Alcalá-Fdez [28] make a brief comparative study on different noncommercial fuzzy logic software. The study focuses on free software because of its important role in the scientific research community [76]. Table 2. List of open fuzzy logic software packages Name
Last version
Language
Description
FisPro [29]
2011
C++ /Java
Design and optimization of fuzzy inferences
Fuzzy Logic Tools [30]
2011
C++
Framework for the development of fuzzy control systems
FuzzyBlackBox [31]
2016
−
Implementation of fuzzy logic
FuzzyPLC [32]
2016
Java
Fuzzy controller using jFuzzyLogic
GUAJE Fuzzy [33]
2016
Java
Development environment using FisPro
JFCM [34]
2014
Java
Fuzzy cognitive maps
jFuzzyLogic [28]
2015
Java
FCL and fuzzy logic API
jFuzzyQt [35]
2015
C++
jFuzzyLogic clone
Libai [36]
2015
Java
AI library, implements an application of fuzzy logic
LibFuzzyEngine [37]
2010
C++
Fuzzy engine for Java
Octave FLT [38]
2021
Octave
Fuzzy logic for the Toolkit
Nefclass [39]
1999
C ++ /Java
Neuro-fuzzy classification
In our implementation, the fuzzy inference mechanism has been developed in the JAVA language, using jFuzzyLogic. To integrate jFuzzyLogic with AnyLogic, we add the jFuzzyLogic library to the external Java libraries in AnyLogic (AnyLogic offers the possibility to add any external Java archive file (*.jar, *.zip) to the simulation model) as shown in Fig. 7.
228
A. Ikidid et al.
Fig. 7. Add the jFuzzyLogic library to AnyLogic’s external Java libraries
4 Conclusion This paper aims to propose a generic infrastructure for a decentralized intersection network control system. This infrastructure is based on two levels of coordination; interjunction coordination which allows coordination between entities of neighboring control regions, and intra-junction coordination which allows coordination between entities of the same region.
References 1. Evans, M.R., Elston, D.S.: Agent-based modeling and simulation for transportation, VASTO: Evolutionary Agent System for Transportation Outlook, pp. 1–88 (2013) 2. Ikidid, A., El Fazziki, A., Sadgal, M.: Smart collective irrigation: agent and Internet of Things based system. In: ACM International Conference Proceeding Series, pp. 100–106 (2021). https://doi.org/10.1145/3444757.3485113 3. Ikidid, A., Abdelaziz, E.F.: Multi-agent and fuzzy inference based framework for urban traffic simulation. In: Proceedings - 2019 4th International Conference on Systems of Collaboration, Big Data, Internet of Things and Security, SysCoBIoTS 2019 (2019). https://doi.org/10.1109/ SysCoBIoTS48768.2019.9028016 4. Ikidid, A., El Fazziki, A.: Multi-agent based traffic light management for privileged lane. In: 8th International Workshop on Simulation for Energy, Sustainable Development and Environment, SESDE 2020, pp. 1–6 (2020).https://doi.org/10.46354/i3m.2020.sesde.001
A New Approach to Intelligent-Oriented Analysis
229
5. Małecki, K.: A computer simulation of traffic flow with on-street parking and drivers behaviour based on cellular automata and a multi-agent system. J. Comput. Sci. 28, 32–42 (2018). https:// doi.org/10.1016/j.jocs.2018.07.005 6. Hamidi, H., Kamankesh, A.: An approach to intelligent traffic management system using a multi-agent system. Int. J. Intell. Transp. Syst. Res. 16(2), 112–124 (2017). https://doi.org/ 10.1007/s13177-017-0142-6 7. Ikidid, A., El Fazziki, A., Sadgal, M.: A multi-agent framework for dynamic traffic management considering priority link. Int. J. Commun. Netw. Inf. Secur. 13(2), 324–330 (2021). https://doi.org/10.54039/ijcnis.v13i2.4977 8. Zhang, Z., Yang, J., Zha, H.: Integrating independent and centralized multi-agent reinforcement learning for traffic signal network optimization. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 2020-May, pp. 2083–2085 (2020) 9. Ikidid, A., Abdelaziz, E.F., Sadgal, M.: Multi-agent and fuzzy inference-based framework for traffic light optimization. Int. J. Interact. Multimedia Artif. Intell. In Press, 1 (2021). https:// doi.org/10.9781/ijimai.2021.12.002 10. Ikidid, A., El Fazziki, A., Sadgal, M.: A fuzzy logic supported multi-agent system for urban traffic and priority link control. JUCS – J. Univ. Comput. Sci. 27(10), 1026–1045 (2021). https://doi.org/10.3897/jucs.69750 11. Eydi, A., Panahi, S., Kamalabadi, I.N.: User-based vehicle route guidance in urban networks based on intelligent multi agents systems and the ANT-Q algorithm. Int. J. Transp. Eng. 4(3), 147–161 (2016) 12. Ma, X., Dai, Z., He, Z., Ma, J., Wang, Y., Wang, Y.: Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors (Switzerland) 17(4), (2017). https://doi.org/10.3390/s17040818 13. Doˇgan, E., Akgüngör, A.P.: Forecasting highway casualties under the effect of railway development policy in Turkey using artificial neural networks. Neural Comput. Appl. 22(5), 869–877 (2013). https://doi.org/10.1007/s00521-011-0778-0 14. Chen, C., Liu, B., Wan, S., Qiao, P., Pei, Q.: An edge traffic flow detection scheme based on deep learning in an intelligent transportation system. IEEE Trans. Intell. Transp. Syst. 22(3), 1840–1852 (2021). https://doi.org/10.1109/TITS.2020.3025687 15. Veres, M., Moussa, M.: Deep learning for intelligent transportation systems: a survey of emerging trends. IEEE Trans. Intell. Transp. Syst. 21(8), 3152–3168 (2020). https://doi.org/ 10.1109/TITS.2019.2929020 16. Nguyen, H., Kieu, L.M., Wen, T., Cai, C.: Deep learning methods in transportation domain: a review. IET Intell. Transp. Syst. 12(9), 998–1004 (2018). https://doi.org/10.1049/iet-its.2018. 0064 17. Shen, T., Hua, K., Liu, J.: Optimized public parking location modelling for green intelligent transportation system using genetic algorithms. IEEE Access 7, 176870–176883 (2019). https://doi.org/10.1109/ACCESS.2019.2957803 18. Ghanim, M.S., Abu-Lebdeh, G.: Real-time dynamic transit signal priority optimization for coordinated traffic networks using genetic algorithms and artificial neural networks. J. Intell. Transp. Syst. Technol. Plann. Oper. 19(4), 327–338 (2015). https://doi.org/10.1080/ 15472450.2014.936292 19. Kumar, N., Rahman, S.S., Dhakad, N.: Fuzzy inference enabled deep reinforcement learningbased traffic light control for intelligent transportation system, 1–10 (2020) 20. Sathiyaraj, R., Bharathi, A.: An efficient intelligent traffic light control and deviation system for traffic congestion avoidance using multi-agent system. Transport 35(3), 327–335 (2020). https://doi.org/10.3846/transport.2019.11115
230
A. Ikidid et al.
21. Quirion-Blais, O., Chen, L.: A case-based reasoning approach to solve the vehicle routing problem with time windows and drivers’ experience. Omega (United Kingdom), 102, 102340 (2020). https://doi.org/10.1016/j.omega.2020.102340 22. Guo, X., Liu, Y.: Intelligent traffic cloud computing system based on ant colony algorithm. J. Intell. Fuzzy Syst. 39(4), 4947–4958 (2020). https://doi.org/10.3233/JIFS-179980 23. Khoza, E., Tu, C., Owolawi, P.A.: Decreasing traffic congestion in vanets using an improved hybrid ant colony optimization algorithm. J. Commun. 15(9), 676–686 (2020). https://doi. org/10.12720/jcm.15.9.676-686 24. Nikolaev, A.B., Sapego, Y.S., Jakubovich, A.N., Berner, L.I., Stroganov, V.Y.: Fuzzy algorithm for the detection of incidents in the transport system. Int. J. Environ. Sci. Educ. 11(16), 9039–9059 (2016) 25. Wu, J., Chen, B., Zhang, K., Zhou, J., Miao, L.: Ant pheromone route guidance strategy in intelligent transportation systems. Phys. A Stat. Mech. Appl. 503, 591–603 (2018). https:// doi.org/10.1016/j.physa.2018.02.046 26. Giunchiglia, F., Mylopoulos, J., Perini, A.: The tropos software development methodology: processes, models and diagrams. In: Giunchiglia, F., Odell, J., Weiß, G. (eds.) AOSE 2002. LNCS, vol. 2585, pp. 162–173. Springer, Heidelberg (2003). https://doi.org/10.1007/3-54036540-0_13 27. Liu, Z.: A survey of intelligence methods in urban traffic signal control. IJCSNS Int. J. Comput. Sci. 7(7), 105–112 (2007) 28. Cingolani, P., Alcalá-Fdez, J.: jFuzzyLogic: a java library to design fuzzy logic controllers according to the standard for fuzzy control programming. Int. J. Comput. Intell. Syst. 6(SUPPL1), 61–75 (2013). https://doi.org/10.1080/18756891.2013.818190 29. Guillaume, S., Charnomordic, B.: Learning interpretable fuzzy inference systems with FisPro. Inf. Sci. 181(20), 4409–4427 (2011). https://doi.org/10.1016/j.ins.2011.03.025 30. Barragán Piña, A., Andújar Márquez, J.M.: Fuzzy Logic Tools : reference manual v1.0, p. 235 31. FuzzyBlackBox download | SourceForge.net. https://sourceforge.net/projects/fuzzyblac kbox/. Accessed: 29 Nov 2021 32. FuzzyPLC download | SourceForge.net. https://sourceforge.net/projects/fuzzyplc/. Accessed 29 Nov 2021 33. Alonso, J.M., Magdalena, L.: Generating understandable and accurate fuzzy rule-based systems in a java environment. In: Fanelli, A.M., Pedrycz, W., Petrosino, A. (eds.) WILF 2011. LNCS (LNAI), vol. 6857, pp. 212–219. Springer, Heidelberg (2011). https://doi.org/10.1007/ 978-3-642-23713-3_27 34. JFCM. https://jfcm.megadix.it/. Accessed 29 Nov 2021 35. jFuzzyQt - C++ Fuzzy Logic Library | SourceForge.net. https://sourceforge.net/projects/jfu zzyqt/. Accessed 29 Nov 2021 36. libai | SourceForge.net. https://sourceforge.net/projects/libai/. Accessed 29 Nov 2021 37. libFuzzyEngine++ | SourceForge.net. https://sourceforge.net/projects/libfuzzyengine/. Accessed 29 Nov 2021 38. Package Database - Package fuzzy-logic-toolkit-oct324 (Fuzzy logic toolkit for Octave). https://pdb.finkproject.org/pdb/package.php/fuzzy-logic-toolkit-oct324. Accessed 29 Nov 2021 39. Nauck, D., Kruse, R.: NEFCLASS - a neuro-fuzzy approach for the classification of data. In: Proceedings of the ACM Symposium on Applied Computing, pp. 461–465 (1995) https:// doi.org/10.1145/315891.316068
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study EL Gougi Badreddine(B) , Hassouni Larbi, Anoun Houda, and Ridouani Mohammed Higher School of Technology Hassan 2 University, Casablanca, Morocco [email protected]
Abstract. Crime forecasting is among the most challenging tasks in governments nowadays, especially when the objective is to evaluate crime patterns from a Spatio-temporal perspective. In this work, we present a survey of crime forecasting approaches in space and time, with different crime analysis types, data sets used, and a critical review of the literature. We review 11 articles and provide a classification of these studies based on the type of approaches into three categories: Machine Learning Basic Methods, Machine Learning Ensemble Methods, and Deep Learning. We list and analyse 7 public crime datasets and discuss 13 methodologies that tackle crime prediction. The study illustrates the dominance of Random Forest compared to deep learning architectures. Keywords: Crime forecasting · Deep learning · Machine learning Spatio-temporal analysis · Time series
1 Introduction An action is considered a crime if declared as such by the relevant and applicable law. Crime is evil, harmful, or wrongful act, punishable by some authority. It has a direct effect on the lifestyle, economic growth, and reputation of a society. Safety and crime rates are some of the major factors that affect important life decisions, like organizing holidays, tripping at the right time, avoiding risky areas when moving to a new place, where to invest, etc. For a crime to occur, three conditions must be met: (a) prompted offender (b) appropriate victim (c) absence of protection, as suggested by Routine activity theory [1]. Similarly, the rational choice theory [2] proposes that a potential criminal, weights the profit of a successful crime as opposed to the probability of being caught, then makes a rational choice. Consequently, crime occurrences are not randomly distributed but come behind spatiotemporal and social patterns. Forecasting crimes/Predictive policing ultimate goal is to use analytics and statistics to predict future crimes or deduce past wrongdoers. Admittedly, forecasting crimes needs to infer: (a) What crime type will occur? (b) When? (c) Where? (d) By who? (e) Who are the victims? The reviewed studies focus on the crime type, occurrence location, and time. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 231–251, 2023. https://doi.org/10.1007/978-3-031-26384-2_21
232
E. G. Badreddine et al.
The rest of this paper shows the contributions of our work, organized as follows: • Section 2: We review 11 crime forecasting articles that use Machine Learning (ML), and Deep Learning (DL) approaches. • Section 3: We present the public well-known crime records datasets. • Section 4: We list the crime analysis approaches. • Section 5: We overview the different used ML/DL approaches. • Section 6: We give conclusions, criticize the reviewed literature and manifest our perspectives.
2 Related Works In the literature, there are plenty of studies that discuss crime forecasting from an artificial intelligence perspective. As Table1 illustrates, we organize the previous works based on the following criteria: • Deep learning: if the study’s methodology uses Deep learning techniques and algorithms. • Machine learning: it illustrates whether the study opts for machine learning approaches. • Data mining: if data mining was considered in the forecasting task. • Data set availability: it indicates whether the used dataset is publicly available or not. • EDA (Exploratory Data Analysis): if the authors analyze their data sets to get insights and characteristics. Shiju et al.[3] used the concept of Data mining focusing on daily crime factors. To find patterns in the Web scrapped data, they opt for the Apriori algorithm and Naïve Bayes was used to classifying news articles to the most accurate crime types. To predict future crimes, a decision tree was made for each location from a set of the labelled training set. Udo Schlegel’s [4] Master thesis was about predicting if/where a crime will be committed based on the time axis applying DL techniques. The datasets were from 3 different cities (Chicago, San Francisco, and Los Angeles). Two architectures were proposed: The first one, train an RNN (Recurrent Neural Network) with the latent space of an initial AE (Auto Encoder). The second approach, train a GAN (Generative Adversarial Network). The two models were given the heatmaps of a (Crime type, City, Week of the year) as input, another heatmap image is expected as output.
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
233
Table 1. Reviewed studies on crime forecasting. Citationsa
EDA
Open Data Set
ML
DL
Evalb
Eval Metricsc
Crime analysis 2014 and prediction using data mining [3]
109
No
No
Yes
No
90%
Acc
Towards Crime 2018 Forecasting Using Deep Learning [4]
1
Yes
Yes
No
Yes
96.8% 94.2%
Acc
Improving crime count forecasts using Twitter and taxi data [5]
2018
30
Yes
Partial
Yes
No
–
–
Crime Prediction 2018 & Monitoring Framework Based on Spatial Analysis [6]
21
Yes
Yes
Yes
No
–
–
Predicting crime using time and location data .[7]
2019
11
Yes
Yes
Yes
No
99.92
Acc
Crime Prediction Using K-Nearest Neighboring Algorithm [8]
2020
5
Yes
Scrap
Yes
No
99.51%
Acc
A systematic review on spatial crime forecasting [9]
2020
22
−
−
Yes
Yes
–
PAI
Predicting time 2020 and location of future crimes with recommendation methods [10]
5
No
Yes
Yes
No
82.7% 72.8%
Acc
Safety App: Crime Prediction Using GIS [11]
4
Yes
Partial
Yes
No
–
–
Study
Year
2020
(continued)
234
E. G. Badreddine et al. Table 1. (continued)
Study
Year
Citationsa
EDA
Open Data Set
ML
DL
Evalb
Eval Metricsc
An Empirical Study of the Perception of Criminality through Analysis of Online Newspapers [12]
2020
79
Yes
No
Yes
No
89.2% 90.2% 86.4% 94.9%
Acc
South Africa Crime Visualization, Trends Analysis, and Prediction Using Machine Learning Linear Regression Technique [13]
2021
0
Yes
Yes
Yes
No
84.7%
R2
a The number of citations was collected from ResearchGate platform b Evaluation of the model, (-): undefined or unclear. c Evaluation metrics: Acc: accuracy, R2: R-squared, PAI: Predictive Accuracy Index
Lara V et al. [5] extend regular crime forecasting by including taxi flows, public venues, and social media data, in NewYork city. Random Forest, Gradient Boosting Machines, and Feed-forward ANNs were used with a rolling window prediction approach, to forecast two types of crimes (violence-related and property-related) weekly. Results show that using the new features significantly improves the accuracy. Hitesh K et al. [6] proposed an R-based GUI framework to visualize crime distribution over a location, and predict the future crime type in that area. The KNN and Naive Bayes classifiers were used on a training set of 2 years, from the UK Police department. Jesia Q et al. [7] worked on crime prediction given (time, location), based on a dataset of 16years records from the Chicago Police department. They used ML and DM by applying Random Forest and decision trees. To improve the performance of the approach, they used ensemble methods (AdaBoost, Extra Trees, and Bagging). The results show that the Bagging technique has the highest accuracy. Akash et al. [8] opt for KNN to classify the type of newly occurred crimes in a location, from the history (no period was mentioned) of crimes in Indore city. The extra Trees method was used to calculate the feature’s importance. The proposed work, with a K = 3, claims a 99% accuracy calculated via MAE and RMSE.
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
235
Ourania K et al. [9] represent a critical review of 32 papers selected out of 786, gathered via the PRISMA reporting guidance, from 4 databases (WoS, SD, IEEE, and ACM). They state that the most dominant approaches were: Random Forest and Multilayer perceptron, on the other hand, DL approaches were the less used ones. The two most dominant model validation approaches were: Train-test split and Crossvalidation. Furthermore, the Accuracy and Accuracy index were mostly used to measure the performance. Zhang Y et al. [10] predicted Theft and Assault crimes in San Francisco at a Spatiotemporal level based on Collaborative Filtering Recommendation Systems using matrix factorization. With 50% man-hours, the model reaches 90% accuracy in predicting Theft crimes and 79% for Assault. Atharva D et al. [11] used Random Forest algorithm in two models: classification: get the crime type of a (time, location) and regression: predict the safety score of the location. The dataset of the study was created from official crime reports in Mumbai city and social networks data. Manuel Saldana [12] from web scraped data and using a Naïve Bayes classifier, successfully generated a database of location and nature of criminal events, according to the set of news samples recovered through the proposed model, and identifying the location of certain criminal events by analyzing the semantics of the sentences into news corpus. Ibdun C et al. [13] want to help security agencies in South Africa to have good insights into crime trends. A linear regression model was trained on 27 crime types from the crime records data set of the country, to predict the number of crimes at any province, given the population and its density.
3 Public Datasets Public datasets of crime records, usually are provided by GOV (governments) or research contributors. The reviewed studies data sets were issued by GOVs, especially Police departments. Table 2 lists for each work the used public datasets and their sampling period. Public datasets or data sharing generally, encourages the researcher to build upon existing work, collaborate, rather than reinvent the wheel, which can boost the new findings within the field and make results comparison much more efficient and clearer. Table 2. Public datasets in the reviewed studies. Study
Datasets
From
To
Udo Schlegel [4]
Chicagod
2001
2018
San Fransiscoe
2003
05/2018
Los Angelesf
2010
2018 (continued)
236
E. G. Badreddine et al. Table 2. (continued)
Study
Datasets
From
To
Lara V et al. [5]
New Yorkg
2006
2015
Hitesh K et al. [6]
UK police crime datah
2015
2017
Jesia Q et al. [7]
Chicagob
01/2001
01/2017
Zhang Y et al. [10]
San Fransiscoc
2003
2020
Atharva D et al. [11]
Mumbaii
–
–
Ibdun C et al. [13]
South Africaj
2004
2015
d https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2 e https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-
yvry f https://data.lacity.org/Public-Safety/Crime-Data-from-2010-to-2019/63jg-8b9z g https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Map-Historic-/57mv-nv28 h https://data.police.uk/data/ i https://data.gov.in/ j https://www.kaggle.com/slwessels/crime-statistics-for-south-africa
3.1 Chicago This dataset reflects reported incidents of crime that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified (Fig. 2 and Fig. 3).
Fig. 1. Chicago: Yearly incidents with percent of arrests
From the above figures, the most dominant crimes are theft and battery, we see clearly a remarkable decrease in the density of incidents after 2010 accordingly with the arrest percent.
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
Fig. 2. Chicago: Most frequent crimes
237
Fig. 3. Chicago: Crime density before and after 2010
3.2 San Francisco This dataset is one of the most used datasets on DataSF. The dataset compiles data from the department’s Crime Data Warehouse (CDW) to provide information on incident reports filed by the SFPD in CDW, or filed by the public with the SFPD (Fig. 5).
Fig. 4. San Francisco: Most frequent crimes
Theft and offenses are the leading crime types in San Francisco; however, we can’t see any interesting correlations. 3.3 Los Angeles This dataset reflects incidents of crime in the City of Los Angeles from 2010−2019. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy (Fig. 6 and Fig. 8).
238
E. G. Badreddine et al.
Fig. 5. San Francisco: Features correlation heatmap
Fig. 6. Los Angeles: Number of incidents based on premise description
We observe clearly that battery and burglary crime type who occur most. The locations with a huge number of incidents are streets and family dwelling. The top three important features in this dataset are the used weapon, premise code, and victim sex. 3.4 New York This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) from 2006 to the end of last year (2015).Offenses occurring at intersections are represented at the X Coordinate and Y Coordinate of the intersection. Crimes occurring anywhere other than an intersection are geo-located to the middle of the block. For additional details, please see the attached data dictionary in the ‘About’ section (Fig. 9 and Fig. 10).
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
239
Fig. 7. Los Angeles: Features importance
Fig. 8. Los Angeles: Most frequent crimes
Fig. 9. New York: Most frequent crimes
Fig. 10. New York: Attempted vs completed crimes
We observe clearly that the number of completed crimes remains kind of stable along the 10 years, where the leading crime types are larceny, harassment and assault.
240
E. G. Badreddine et al.
3.5 UK police crime data See (Fig. 11 and Fig. 12)
Fig. 11. UK: Crime types trends
Fig. 12. UK: Crime density by type
From this dataset which is by the police station of the United Kingdom, major crime types are Anti-social behavior, violence and sexual offenses. The crime rates are denser in the north-middle region. The anti-social behaviors are spread almost evenly all over the country, on the other hand, burglary and vehicle crimes are nearly absent in the west. 3.6 Mumbai See (Fig. 13 and Fig. 14)
Fig. 13. Mumbai: Crime types by cities
Fig. 14. Mumbai: Crime counts by time intervals
Theft is the dominant crime type while the crime peak time is the evening.
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
241
3.7 South Africa See (Fig. 15 and Fig. 16)
Fig. 15. South Africa: Crime types cloud
Fig. 16. South Africa: Correlation heatmap
From the above Correlation Heatmap, there is clearly a strong positive correlation between the total amount of crimes committed compared to the population and density of a Province. There is also a positive correlation between the total number of police stations in a province compared to the total amount of crimes. This does not mean that there is higher crime because there are more police stations but perhaps there are more stations to combat the higher amount of crime. It can also be observed that there is a negative correlation between the area size of a province compared to the number of crimes committed.
4 Crime Analysis Types Crime analysis is a systematic process whose aim is to provide timely information about patterns and trends in crime. Law enforcers use the resultant information in many areas such as to solve a crime, apprehend offenders, and in planning for police resources. Figure 17 bellow show the different types of analysis, types with low levels of aggregation focus on individual cases and used qualitative data and analysis techniques and those with high levels of aggregation focus on a limited scope of larger amounts of data and information.
242
E. G. Badreddine et al.
4.1 Tactical Crime Analysis Tactical crime analysis deals with immediate criminal offenses (immediate, hours, days, or weeks from the time of the crime). It promotes quick response to recent offenses such as burglaries and robberies. The tactical crime analysis provides information to assist operational personnel in the identification of crime trends and the arrest of criminal offenders. 4.2 Strategic Crime Analysis Strategic crime analysis is primarily concerned with operational strategies and seeks solutions to ongoing problems (weeks, months, quarters, or years). The purpose of strategic crime analysis is to perform police service more effectively and efficiently by matching service delivery to demands for service. 4.3 Administrative Crime Analysis This type focuses on the provision of management information. Data is summarized and the trends are more generalized than in tactical or strategic analysis. Economic and geographic information is also provided. The administrative analysis is not focused on solving any current or recurrent problems but rather it is important for long-term planning. It also provides the information that is communicated to the public regarding crime. These reports must therefore be in a format that is easy to understand and comprehensive. 4.4 Criminal Investigation Analysis This type of analysis utilize the least aggregated and most qualitative data. The data consist of information about informal networks of criminals and their non-criminal acquaintances and relatives as well as where individuals live, work, and “play.” The focus here is on the specifics of criminals, the nature of their crimes, their relationships, and their lives in general.
Administrative Strategic Tactical Investigation Fig. 17. Crime analysis types
5 Crime Forecasting Approaches As Fig. 1 shows, the reviewed studies used different approaches with heterogeneous architectures to tackle the crime forecasting problem, based on spatial-temporal metrics and features of each data set (Fig. 18).
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
243
Fig. 18. Crime forecasting approaches in the reviewed studies.
5.1 Machine Learning Basic Approaches Naïve Bayes. Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation, where A and B are events and P(B) = 0: P(A|B) =
P(B|A)P(A) P(B)
(1)
Naive Bayes classifier is a probabilistic classifier that when given an input, gives a probability distribution of a set of all classes rather than providing a single output. It can be used with location, date features to classify the crime type that will occur [6]. P(y|x1 , . . . , xn ) ∝ P(y) ⇓ yˆ = arg max P(y) y
n
P(xi |y )
i=1 n
(2)
P(xi |y )
i=1
These classifiers are called “naïve” algorithms because they assume that the predictor variables are independent of each other [12], hence, assuming that the presence (or absence) of a feature of a class is unrelated to the presence (or absence) of any other feature. Naive Bayes classifiers can handle an arbitrary number of independent variables, whether continuous or discrete. Decision Trees. The main goal is to create a model that predicts the value of an output variable by learning decision rules inferred from the data features. It can be visualized, so it’s simple to interpret and to understand. It’s suitable with large datasets, thus, helping in making better decisions about variables [3, 7].
244
E. G. Badreddine et al.
Apriori Algorithm. The algorithm is Apriori because it uses prior knowledge of frequent itemset properties. An itemset consists of two or more items. An itemset is called frequent if it satisfies a minimum threshold value for support and confidence. Apriori can be used to determine association rules which highlight general trends in the database[3], so when a new set of features take place, we can easily identify if a crime will occur, based on the crime pattern of that location. The algorithm is composed of three components: (a) support, (b) confidence, (c) lift. Support = P(A, B)/Total
(3)
Confidence = P(A, B)/P(A)
(4)
Lift = Confidence/P(B)
(5)
Linear Regression. It is an ML supervised linear model, e.g., a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that y can be calculated from a linear combination of the input variables (x). yˆ = f (x) = b0 + b1 x1 + . . . + bn xn
(10)
Linear regression computes the estimators of the regression coefficients or the predicted weights, denoted with b0 , b1 , . . . , br . It was considered in [13] for building a crime predictive model using two continuous variables: one variable, denoted by X , , is referred to as the predictor (population, density, and so forth). The other variable denoted by y is regarded as the target—crime variable. K-Nearest Neighbors. A supervised ML algorithm for classification and regression. It assumes that similar things exist nearby. Simply, similar things are near each other. In our context, the areas nearby the previous crime location are more vulnerable to crime occurrence. Therefore, the location/time are factors to be considered [6]. Classification is computed from a simple majority vote of the nearest neighbors of each point. Regression is inferred using a weighted average of the k nearest neighbors, weighted by the inverse of their distance. KNN is not parametric which implies that it does not make any supposition on the primary data distribution. In other words, the model structure is decided by the data. Since it is a distance-based classifier, meaning that it implicitly assumes that the smaller the distance between two points, the more similar they are. Bellow, three different distance metrics are listed, and they are all problem based, for example in crime forecasting Euclidean distance was the most used. n (6) Euclidean distance d (x, y) = (xi − yi )2 Manhattan distance
d (x, y) =
i=1 n
|xi − yi |
(7)
i=1
Minkowski distance
n 1 c |xi − yi |c d (x, y) = i=1
(8)
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
245
5.2 Machine Learning Ensemble Methods Random Forest. (RF) Random Forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned (Fig. 19).
Fig. 19. Visualization of RF (Source: tibco.com, What is a random forest)
In Atharva D et al. study [11], RF was used for both, classification (to forecast a crime type within an area), and regression (to predict the safety score of that area). RF prevents overfitting by building different sized trees from subsets and combining the results, this difference, leads also to high accuracy. Gradient Boosting Machines. A technique used in classification and regression tasks. It gives a prediction model in the ensemble of weak prediction models, which are typically decision trees. Lara V et al.[5] use regression trees as base models. These are fitted to the residuals via the negative gradient of the loss function of the current ensemble. Boosting is a method of converting weak learners into strong learners. It’s usually performing better than Random Forests [14, 15]. The gradient boosting method assumes a real-valued y and seeks an approximation ˆ F(x) in the form of a weighted sum of functions hi (x) from some class H called weak (or base) learners: ˆ F(x) =
M i=1
γi hi (x) + const
(9)
246
E. G. Badreddine et al.
Recommender Systems. Aims to predict the preference a user would give to an item by filtering the most important information based on the data provided by a user and other factors that take [13]care of the user’s preference and interest. By modeling crime in this manner, those techniques can be applied effectively in crime prediction. Zhang Y et al. [10] propose to model time as the item and location as the user. The most used approaches are (a) User-based collaborative filtering, (b) Item-based collaborated filtering, combined with similarity finding algorithms like KNN.
5.3 Deep Learning Approaches Feed-forward Neural Networks. Also called (MLP) Multi-Layer Perceptron, the goal of a feedforward network is to approximate some function f ∗. e.g., for a classifier, y = f ∗ (x) maps an input x to a category y. A feedforward network defines a mapping y = f (x; θ ) and learns the value of the parameters θ that result in the best function approximation (Fig. 20).
Fig. 20. Feed-forward ANN illustration (Source: wikipedia.org)
These models are called feedforward because information flows through the function being evaluated from x, through the intermediate computations used to define f , and finally to the output y. There are no feedback connections in which outputs of the model are fed back into itself. Recurrent Neural Networks. A recurrent neural network (RNN) is a special type of artificial neural network adapted to work for time series data or data that involve sequences. Ordinary feedforward neural networks are only meant for data points, which are independent of each other. However, if we have data in a sequence such that one data point depends upon the previous data point, we need to modify the neural network to incorporate the dependencies between these data points. RNNs have the concept of ‘memory’ that helps them store the states or information of previous inputs to generate the next output of the sequence.
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
247
A simple RNN has a feedback loop as shown in Fig. 4. The feedback loop shown on the left side of the figure can be unrolled in K time steps to produce the second network on the right of the Fig. 21.
Fig. 21. Simple RNN architecture. (Source: ibm.com/cloud/learn/recurrent-neural-networks)
At every time step, we can unfold the network for K time steps to get the output at the time K + 1 step. The unfolded network is very similar to the feedforward neural network. The hidden layer (green circle) in the unfolded network shows an operation taking place. So, for example, with an activation function f : ht+1 = f (Xt , ht , Wx , Wh , bh ) = f (Wx Xt + Wh ht + bh ) The output y for time t is computed as:
yt = f ht , Wy = f Wy · ht + by
(11)
(12)
Where: • • • • • • • •
Xt ∈ R is the input at time step t. yt ∈ R is the output of the network at time step t. We can produce multiple outputs. ht ∈ R a vector stores the values of the hidden units/states at time t. Wx ∈ Rm are weights associated with inputs in the recurrent layer. Also called the current context, m is the number of hidden units. Wh ∈ Rm∗m are weights associated with hidden units in the recurrent layer. Wy ∈ Rm are weights associated with hidden to outputs units. bh ∈ Rm is the bayes associated with the recurrent layer. by is the bayes associated with the feedforward layer.
The RNN has multiple types: (a) One to one, (b) One to many, (c) Many to one, (d) Many to many. The common choices of activation functions are cited below: Sigmoid function
1 1 + e−X
(13)
248
E. G. Badreddine et al.
Tanh function
ex − e−x ex + e−X
(14)
Relu function
max(0, X )
(15)
Schlegel U. [4] trains an RNN on the sequence of the previous weeks’ crimes to forecast the upcoming one, with Adam optimizer. Long Short-Term Memory. Abbreviated as LSTM, is a popular RNN architecture, which was introduced by Sepp Hochreiter and Juergen Schmidhuber [16] as a solution to the vanishing gradient problem. LSTM uses three gates called input, output, and forget gate (Fig. 22).
Fig. 22. LSTM unit. (Source: wikipedia.org/wiki/Recurrent_neural_network)
Gated Recurrent Units. It also works to address the short-term memory problem of RNN models. Instead of using a “cell state” to regulate information, it uses hidden states, and instead of three gates, it has two—a reset gate and an update gate. Similar to the gates within LSTMs, the reset and update gates control how much and which information to retain (Fig. 23). AutoEncoders. Autoencoders (AE) are an unsupervised learning technique in which we leverage neural networks for the task of representation learning. The most traditional application of an AE model is dimensionality reduction. An AE has two main parts: an encoder that maps the input into the code, and a decoder that maps the code to a reconstruction of the input (Fig. 24). As Fig. 7 illustrates, a latent space is a bottleneck between the input and the output of an AE, is a form of data compression. Schlegel U. [4], using heatmaps images of crime types within a specific (location, week) set, as an input to train an AE, then, an RNN was trained with the latent space of the previous AE as an input, to forecast the upcoming week’s crime heatmaps. He used RNN with two variants of AEs, standard AE and RNN, and a Variational autoencoder (VAE), the results showed that the VAE model converges more slowly than AE, but performs better.
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
249
Fig. 23. GRU unit. (Source: wikipedia.org/wiki/Recurrent_neural_network)
Fig. 24. Autoencoder architecture. (Source: mathworks.com/discovery/autoencoder.html)
Generative Adversarial Networks. GANs for short are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: • the generator model that we train to generate new examples. • the discriminator model that tries to classify examples as either real (from the domain) or fake (generated). Both models are trained together in an adversarial way (a zero-sum game) until the generator model is generating plausible examples, so the discriminator model is fooled (Fig. 25). To forecast crimes on a specific date, Schlegel U. [4] used an altered variant, named Auxiliary Classifier GAN, or AC-GAN for short, is an extension of the conditional1 GAN that changes the discriminator to predict the class label of a given image rather than receive it as input. It has the effect of stabilizing the training process and allowing the generation of large-high-quality images whilst learning a representation in the latent space that is independent of the class label. The training was done, using the heatmaps of the weeks and the corresponding date. 1 A type of GAN that involves the conditional generation of images by a generator model. The
added conditions can be extra information y like labels of a class or other data and can be directly attached to the generator and discriminator as further input layers.
250
E. G. Badreddine et al.
Fig. 25. Differences between CGAN and AC-GAN
For the forecasting phase, the generator gets the date to forecast on and a random noise vector as input. The discriminator is used to calculate confidence for the prediction of the generator. On the one hand, the likelihood of the generated heatmap being a real heatmap or not is one way to decide the plausibility of the forecasting. On the other hand, the probabilistic class output of the discriminator predicts in this case, how likely, the generated heatmap of the generator is the wanted date of the user The tanh is not a suitable activation function with this method, whereas, ReLU produces the best results with a reasonable training time.
6 Conclusions and Perspectives In this paper, we discuss “Crime Forecasting” with a focus on the Spatio-temporal axis, which is a prediction approach for crime in both time and space. We managed a literature review of 11 works, to understand the concepts, the methodologies, and the datasets used to predict crime. We answered several questions that deal with crime forecasting and its role in crime analysis and prevention, especially the major role of Spatio-temporal crime prediction in seeking safety and sustainable development of a society. We list the different types of crime analysis, the available datasets to conduct a study on the field, the state-of-the-art methodologies, their architectures, and performance metrics. The predominant methods are the classic ones Random Forest (4 studies) and Naïve Bayes (3 studies), in contrast to DL learning approaches (2 studies), with the Hotspots forecasting task. We note that the most studied crime type is Theft/Burglary. To summarize, the field of study is still very fertile with a huge interest from various disciplines (Sociology, computer science, criminology, etc.). However, we reported some ambiguities like the absence of sequence length (sampling size), spatial unit,
Spatio-Temporal Crime Forecasting: Approaches, Datasets, and Comparative Study
251
feature-engineering process, methodologies not well described. Those factors make an experiment difficult to replicate and compare results. A future opportunity could be including other factors, like Gross Domestic Product (GDP), weather, census of an area. Furthermore, using CNN architectures, with crime heatmaps images as input might leads to good results.
References 1. Cohen, L.E., Felson, M.: Social change and crime rate trends: a routine activity approach. Am. Sociol. Rev. 44(4), 588 (1979). https://doi.org/10.2307/2094589 2. Ronald, C.V.C., Derek B.: The Reasoning Criminal. Routledge (2014). https://doi.org/10. 4324/9781315134482 3. Sathyadevan, S., Devan, M.S., Surya Gangadharan, S.: Crime analysis and prediction using data mining. In: 1st International Conference on Networks and Soft Computing, ICNSC 2014 - Proceeding, pp. 406–412 (2014). https://doi.org/10.1109/CNSC.2014.6906719 4. Schlegel, U.: Towards Crime Forecasting Using Deep Learning (2018). https://www.resear chgate.net/publication/330777194 5. Vomfell, L., Härdle, W.K., Lessmann, S.: Improving crime count forecasts using Twitter and taxi data. Decis. Support Syst. 113, 73–85 (2018). https://doi.org/10.1016/j.dss.2018.07.003 6. Toppireddy, H.K.R., Saini, B., Mahajan, G.: Crime prediction & monitoring framework based on spatial analysis. Procedia Comput. Sci. 132, 696–705 (2018). https://doi.org/10.1016/j. procs.2018.05.075 7. Yuki, J.Q., Mahfil Quader Sakib, M., Zamal, Z., Habibullah, K.M., Das, A.K.: Predicting crime using time and location data. In: ACM International Conference Proceeding Series, pp. 124–128 (2019). https://doi.org/10.1145/3348445.3348483 8. Kumar, A., Verma, A., Shinde, G., Sukhdeve, Y., Lal, N.: Crime Prediction Using K-Nearest Neighboring Algorithm (2020). https://doi.org/10.1109/ic-ETITE47903.2020.155 9. Kounadi, O., Ristea, A., Araujo, A., Leitner, M.: A systematic review on spatial crime forecasting. Crime Sci. 9(1), 1–22 (2020). https://doi.org/10.1186/s40163-020-00116-7 10. Zhang, Y., Siriaraya, P., Kawai, Y., Jatowt, A.: Predicting time and location of future crimes with recommendation methods. Knowl.-Based Syst. 210, 106503 (2020). https://doi.org/10. 1016/j.knosys.2020.106503 11. Deshmukh, A., Banka, S., Dcruz, S.B., Shaikh, S., Tripathy, A.K.: Safety app: crime prediction using GIS. In: 2020 3rd International Conference on Communication Systems, Computing and IT Applications, CSCITA 2020 - Proceedings, pp. 120–124 (2020). https://doi.org/10. 1109/CSCITA47329.2020.9137772 12. Saldaña, M.: An empirical study of the perception of criminality through analysis of newspapers online. J. Inf. Syst. Eng. Manag. 5(4), em0126 (2020). https://doi.org/10.29333/jisem/ 8492 13. Obagbuwa, I.C., Abidoye, A.P.: South Africa Crime visualization, trends analysis, and prediction using machine learning linear regression technique. Appl. Comput. Intell. Soft Comput. 1–14 (2021). https://doi.org/10.1155/2021/5537902 14. Piryonesi, S.M., El-Diraby, T.E.: Data analytics in asset management: cost-effective prediction of the pavement condition index. J. Infrastruct. Syst. 26(1), 04019036 (2020). https://doi.org/ 10.1061/(ASCE)IS.1943-555X.0000512 15. Madeh Piryonesi, S., El-Diraby, T.E.: Using machine learning to examine impact of type of performance indicator on flexible pavement deterioration modeling. J. Infrastruct. Syst. 27(2), 04021005 (2021). https://doi.org/10.1061/(ASCE)IS.1943-555X.0000602 16. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Data Migration from Relational to NoSQL Database: Review and Comparative Study Chaimae Saadouni1(B) , Karim El Bouchti1 , Oussama Mohamed Reda2 , and Soumia Ziti1 1
2
Department of Computer Science, Intelligent Processing Systems and Security Team, Mohammed V University, Rabat, Morocco [email protected] Algorithms, Networks, Intelligent Systems and Software Engineering Research Team, Faculty of Sciences, Mohammed V University, Rabat, Morocco
Abstract. New storage requirements, analysis, and visualization of Big Data, which includes structured, semi-structured, and unstructured data, have caused the developers in the past decade to begin preferring Big Data databases, such as NoSQL, that comes to overcome the boundaries of relational Databases such as unlimited scalability, high performance, data modeling, data distribution, and continuous availability. Due to those benefits ensured by this new technology, quite recently, considerable attention has been paid to data migration from relational Database to NoSQL Database. So, in this context, the purpose of this paper is to provide a comparative study of some recent approaches proposed by various researchers to migrate data from relational to NoSQL databases, as well as discussing and analyzing the results of those proposed techniques. This research allowed us to gain a basic understanding of the contributions and approaches that have already been made, as well as the benefits and limits of each proposal. Keywords: Data Migration · Document-oriented Database · Relational Database Management Systems · NoSQL Database MongoDB
1
·
Introduction
The growing computerization of all industries, E-commerce, E-administration, E-government, Multi-media...) has resulted in an exponential growth in data quantities, which are today measured in Zettabytes to petabytes. Similarly, the number of data generated by businesses has certainly increased, with data coming from a variety of sources (transactions, behaviors, social networks, geolocation...) [8]. Big Data, which only appeared in the Information Technology (IT) world a few years ago, has already become the number one business innovation of this c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 252–264, 2023. https://doi.org/10.1007/978-3-031-26384-2_22
Data Migration from Relational to NoSQL Database
253
decade [8]. The management of these huge data quantities has become a critical challenge that Relational Database Management Systems (RDBMS) [9] and the usual processing tools are no longer able to manage due to the properties of acidity. They can not process the amounts of data that flock or support the speed with which the data flows circulate, ingest the different data formats that exist notably structured, unstructured, and semi-structured data. Many efforts are being devoted to come up with some utility support to overcome these drawbacks that have been identified. Therefore, precisely at this moment where researches have launched the new database (DB) model known as Not Only Sql (NoSQL) databases [10]. This later allow the management of large volumes of heterogeneous data on a distributed set of storage servers and offer new storage solutions in large-scale environments. It provides also various ways of storage, management, and implementation and it fulfills the scalability problems [11]. So, in order to meet the needs for efficient storage and accesses of large amounts of data, many organizations have been joined this new technology, such as Google, Facebook, Twitter and Amazon. In this regard, a lot of users of the so-called “SQL” classic RDBMS [12] wants to switch to these new “NOSQL” solutions to anticipate the explosion of their data in the near future, nevertheless they certainly do not want to start with empty bases but seek how to recover the data hosted in their relational databases. Despite these benefits mentioned above and the urgent need of the companies to migrate to those NoSQL databases, their applications are still based on RDBMS [13] since there are many challenges associated with the migration process. Contrary to relational Databases, NoSQL Databases require a different approach in designing an efficient schema, such as in the document DB like MongoDB, which information should be stored as an embedded document, or be stored as referenced document, there are certain thumb rules for schema design in this type of databases. Therefore, during the migration process companies are facing many challenges. Such us, the volume of data to be migrated, how to map existing data of the relational database in its tabular form into the NoSQL structure form and how to deal with all relational databases properties such as, joins operation, relationships between tables, knowing that NoSQL does not support those concepts. As a result, schema conversion is fundamentally essential, same as, the importance of importing data from relational DB to NoSQL DB. Furthermore, it is very significant to ensure high reading efficiency after converting schema. In order to tackle this problem, several publications have appeared in recent years. For instance, A. Mahmood in [2] presented an algorithm that ensures mapping data of relational tables into collections in the output NoSQL DB. The algorithm takes into account the relationships between the relational DB tables and gives the user the option of mapping these relationships with the output DB using one of three modes: Embedded, Linking, and Auto. Another solution is described in [3], employs retro-engineering study to accomplish migration from the relational DB to MongoDB, by developing a data model which describes how
254
C. Saadouni et al.
data is abstractly represented in the original dataset taking into account every table or class properties, such as, the class name, relationships, cardinalities, primary and foreign keys, and so on. And lastly, the most interesting approach has been proposed by [6], it provides a unified intermediate model (mid-model) between relational DB and NoSQL DB. The basic element of this mid-model is the object, which refers to an entity in the relational databases, having a number of properties and relationships with other objects. Including the strategy library which contains the strategy which aims to specify how the data model for a specific NoSQL DB is generated. In addition, the model provides also two concepts namely, Data Feature and Query features, which are defined to maintain the integrity of the data. In this context, the present paper seeks to analyze all the recent approaches [1,7] which have been proposed for data migration from Relational databases to NoSQL databases, and elaborate a comparative study to highlight the main achievements and contributions, determine the advantages and boundaries of each one, as well as covering the various axes of improvement and points are still until today untreated. This paper is organized into 5 sections. The first one presents the literature review on fundamental approaches, concepts of RDBMS and NoSQL databases. The second section outlines the synthesis of the various contributions presented by researchers and the four one demonstrate a comparative analysis between them. Section five will describe results and discussions. Finally, this paper ends with a conclusion.
2
Literature Review
In this section, we will give a brief overview of the basic concepts about Nosql databases and other aspects that are relevant to our work. 2.1
NoSQL Databases
The problems of the main data are part of a complex context, at the crossroads of 2 major concerns: the adoption of new mass storage solutions, high-speed information collection, preferably in real time. In order to overcome these concerns, NoSQL databases offer novel storage solutions in large-scale environments, in substitution of the different traditional database management systems, which a huge part of them is relational. NoSQL (originally referring to “non-SQL” or “non-relational,” also known as “Not Only SQL” [2]) has emerged remarkably in recent years as a result of the growth of the Internet to overcome the downsides of RDBMS, owing to their less constrained structure, scalable schema design, and faster access compared to relational databases [14]. As a result, many organizations, particularly popular Internet enterprises like Google, Facebook, Twitter, LinkedIn, Amazon, and Netflix, are already employing it for their crucial business applications since it can handle large amounts of data. Indeed, by their flexible architectures they frequently free themselves from schematics and
Data Migration from Relational to NoSQL Database
255
allow the data storage in a flexibly manner. A basic classification of NoSQL databases [15], by data model, is explained in the following: Key-value stores: This is the simplest form of NoSQL where a value of any data type can be stored along with a unique key pointing to that value. The data in this type of database is stored in a hash table. The key is a unique identifier, while the value represents the data. The data is organized in the manner of a dictionary. The Voldermort, Riak, Redis, Scalaris and Tokyo Cabinet project are some examples of key-value NoSQL databases. column stores: The data is stored in rows, where each row has a unique identifier called key and one or more columns. The columns are themselves key-value pairs. The names of the columns should not be predefined, so the structure is not fixed. Unlike a relational database, where columns are static and present for each line, those of the column-oriented databases are so dynamic and only present when needed. graph stores: The graphical databases are mainly composed of nodes, relationships between nodes and their properties. Graphics models are more easily expandable on many servers than SQL databases. A graphic database’s key advantage is its best ability to handle relationships from a relational model. oriented-document stores: The document representation is very well suited to the web world. This is an extension of the key-value concept that represents the value in the form of a document JSON or XML type. These kinds of databases employ the document model, in which data is kept as a collection of documents. Each document generally stores information in a BSON (Binary JSON-binary encoded serialization of JSON-like documents) structure. 2.2
MongoDB
MongoDB is a document-oriented NoSQL database management system that does not require a prespecified physical schema, allowing data to be enhanced on the fly without reconfiguration or base modification. Furthermore, MongoDB, like MySQL, Oracle, and many others, is a database management system; nevertheless, MongoDB is non-relational DBMS. As a result, the data is stored in collections. These collections will contain JSON documents. In other words, MongoDB employs a storage document with almost no restrictions, allowing it to store a wide range of concepts within a single structure. The main reasons that lead us to choose MongoDB as target database are the fact that it has many advantages.
256
C. Saadouni et al.
• Comprehensive documentation is available on MongodB website. • Compatibility with different operating systems, including Windows, Mac OS X, Linux. • Distinguished by dynamic schema, great scalability, and superior query performance. • The Mongodb community is quite active on the forums and blogs; there are numerous components contributed by the Mongodb community, such as patches, API, and customer libraries. 2.3
Advantages of NoSQL over Limitation of RDBMS
The most well-known characteristic of NoSQL, is its ability to be efficiently expanded horizontally, since NoSQL handles DB partitioning over numerous affordable servers that function together as a single data center. In contrast to RDBMS, which requires more expensive equipment to be scaled up for the same single server (vertical scaling) [16]. Storing unstructured, semi-structured or structured data, the relational databases store the data in structured tables that have a predefined schema. While, the NOSQL databases allow you to store the data easier to understand or closer to how the data is used by applications. Furthermore, the schemas of many NOSQL databases are flexible and under the control of developers, making it easier to adapt the database to new types of data. Economical Alternative, NOSQL databases often employ low-cost low-end servers to equip “clusters”, whereas relational DBMS typically use ultra-powerful servers that are quite expensive. As a consequence, NOSQL systems minimize a company’s costs in terms of gigabytes or transactions per second. Flexible Data Template, changing the data model in a relational database in production is an immense pain. Even minor changes should be handled with caution, since they may necessitate the server being taken down while altering or reducing service levels. NoSQL systems are more adaptable in terms of data patterns, as shown in the key/value and documentary categories. Even slightly stricter models, such as those in the column-oriented group, can add a column without great difficulty. 2.4
Mapping Principles of a Relational Database in Mongodb
Each MongoDB database is mainly composed of collections, which are similar to relational databases made up of SQL tables. Each collection stores data in the form of documents, which equates to tables storing data in rows. While a line stores data in a set of columns, a document has a JSON type structure (known as BSON in Mongodb). Finally, document fields are similar to columns in a SQL line. Here is a diagram that illustrates the correspondence between RDBM/SQL and MONGODB/NOSQL (Fig. 1).
Data Migration from Relational to NoSQL Database
257
Fig. 1. The mapping of the RDBMR to Mongodb [17]
To better understand this mapping principle, let’s take an example of SQL Students table and its corresponding structure in MongoDB. As seen in Fig. 2, each line of the SQL table becomes a document and each column becomes a field in MongoDB.
Fig. 2. Correspondence of a relational table in a MongoDB collection [17]
Now, in terms of relations in the relational system, it corresponds to the principles of linking and embedding, which means that during the migrating process of the relational DB to the NoSQL DB, we apply one of the two concepts to transform the various types of relationships of the relational model, while adhering to well-defined conditions and criteria [18].
258
C. Saadouni et al.
Therefore, deciding when to embed a document or otherwise generate a reference to independent documents in various collections is an application-specific issue. However, there are certain common principles that facilitate making this decision during NoSQL schema design. Embedding. Data having a 1:1 or 1: many relationship (in which the “many” items always appear with, or are seen in the context of their parent documents) are ideal candidates for embedding within a single document (Fig. 3).
Fig. 3. Embedding concept
Though, not all 1:1 or 1: many relationships should be embedded in a single document. Referencing between documents in different collections should be used when: • When a document is often accessed, yet it has an embedded document that is only accessible on rare occasions. A customer record that includes copies of the yearly general report is an example. Embedding the report would just increase the memory required for frequent operations. • When one section of a document is regularly updated and expanding in size, while the rest of the document remains relatively static. • And finally, when the document size is higher than MongoDB’s current document limit (16 MB). Linking enables data normalization, and can give more Flexibility than embedding. References are generally implemented by saving the id field of one document as a reference in the associated document (Fig. 4).
Data Migration from Relational to NoSQL Database
259
Fig. 4. Linking concept
Referencing should be employed: • When embedding would not provide sufficient read performance advantages to outweigh the implications of data duplication. • When the object is referenced from many various sources. • To represent complex many-to-many relationships • To model massive, hierarchical data sets.
3
Synthesis of Data Migration Approaches
In this section, we present our synthesis of some techniques developed to handle the challenge of mapping data stored in Relational databases to NoSQL databases. We have chosen seven contributions as presented in the table (Table 1) below: Table 1. Data migration solutions S.N◦ 1
Authors name The approach proposed M. Potey, M.Digrase, G. Deshmukh and M. Nerkar [1]
The project is carried out through an online migration process, written in Python language and deployed on Amazon Web Services EC2. Then, a web interface is developed using the Django framework which will be used for uploading and downloading the converted DB. This utility is divided into 3 main modules as follows. The first module, user registration, provides a simple graphical user interface (GUI) for the user registration. The second module, database load/download, will provide the user with a simple interface for uploading his SQL DB and downloading his converted NoSQL DB. Finally, the last module, DB Conversion, contains the fundamental logic of the database transformation.
The result obtained Some performance evaluation tests have been made between this solution other proposed methods, The observed and analyzed result summarised in the following advantages: extremely high efficiency, less human effort, and lower error scope.
(continued)
260
C. Saadouni et al. Table 1. (continued)
S.N◦
Authors name The approach proposed
The result obtained
2
Alza A. Mahmood [2]
The paper presented a novel, easy-to-implement, fully automated technique for migrating data from any type of relational DB to any type of NoSQL DB with no effort. The basic idea behind this algorithm is to browse the lines of each table in the relational DB of entry and map the data of each line into a document, with the documents mapped from the same table stored together as a collection in the output NoSQL database. The algorithm takes into account the relationships between the relational database tables and gives the user the option of mapping these relationships with the output database using one of three modes: Embedded, Linking, and Auto.
At last, as an implementation of this algorithm, an application software based on VB. Net has been created. To demonstrate the algorithm’s results, a simple SQL Server database filled with sample data and used as the application’s input was used.
3
M. Hanine, A. Bendarag O. Boutkhoum [3]
The main steps of the methodology are divided into two phases, the first one named Loading the Logical Structure of the Source DB, aims firstly to connect to the source DB to obtain all information about the type and version, as well as to specify all the information on the target DB NoSQL. In other words, we obtain a representation of the relational model of the source DB, which includes the names of the tables, their attributes, and their relationships. The second step, called Mapping Between Relational Model and MongoDB Model, is devoted to defining the mapping between MySQL’s relational model and MongoDB’s document-focused model
To test the validity of the proposed methodology, a software prototype was created with Java and runs on a PC platform.
4
A. EL ALAMI, M. BAHAJ [4]
The approach proposed in this article is based on the study No implementation was done of retro-engineering, which is the study and analysis of a to validate the migration system to determine its internal functioning and is used in algorithm many areas of engineering. So, the solution proposed in this article employs this retro-engineering study to accomplish migration from the relational DB to the NoSQL MongoDB, by developing a data model from which we proceed to the migration of the scheme and Data Mapping, while ensuring the integrity constraints between collections. This Data Model describes how data is abstractly represented in the original dataset. Throughout this migration, the conversion of the tables and their relations is focused on the cardinality between each two entities or more to know the cardinality of type (x, 1) - (x, 1), mentioning that x can have the value of 1 or 0, secondly the type cardinality (0.1) - (0.1) and finally the relations with the cardinality of (x, 1) - (x, n)
5
S. Hamouda, Z. Zainol [5]
The proposed method in this research seeks to convert the ER model into a document-oriented data model. The ER model is migrated to a document-oriented model in two stages: (1) the design of the DODS, and (2) the migration of the ER model to the DODS. DODS design: This stage attempts to create a diagram for the document-oriented model. And it is divided into the following sub-steps: (1) DODS components, (2) DODS, and (3) ER model and DODS specifications functionality. The migration of the ER model to the DODS: the second phase is to migrate the relational DB to the document-oriented DB by standardizing and denominating the data using two models, namely, the integrated document and the reference model depending on the type of relationship, 1: 1, 1: N, M: M, and Unary Relations.
A company database schema was used as a case study for mapping the ER schema to the document-oriented schema by applying the DODS model, in order to validate the proposed algorithm.
(continued)
Data Migration from Relational to NoSQL Database
261
Table 1. (continued) S.N◦
Authors name The approach proposed
The result obtained
6
D. Liang, Y. Lin, The basic element of this mid-model is the object, which G. Ding [6] refers to an entity in the relational databases, having a number of properties and relationships with other objects. Including also the strategy library which contains the strategy which is going to be used for various NoSQL databases such as MongoDB or HBase, and also specify how the data model for a specific NoSQL database is generated. We also find the two concepts, Data Feature and Query features, which are defined to maintain the integrity of the data
In order to assess this mechanism, the article presents a realistic scenario to showcase how the MID-Model transfers a relational data model of a proposed relational scheme for an online purchase system to the NoSQL database.
7
B. Namdeo, U. Suman [7]
After deploying a relational DB in a program to examine the suggested paradigm. After analyzing the result, it is discovered that one of the output schemas outperforms in select query across all three types of DB structure. However, update queries take significantly more time since we have to change information in more than one location, i.e., more than one document in MongoDB
4
The authors of this paper suggested a new paradigm called Schema Design Advisor Model (SDAM), which intends to recommend an optimum DB scheme based on the items in the existing RDBMS. It is divided into three stages: one for designing DB schemas for NoSQL DB (particularly Document DB), another for generating all conceivable combinations of DB schemas, and a third for cost calculation. He begun by selecting four entries, such as, relational tables, relationships, select queries and update queries. In the end, this model generates an output list of all possible DB schemes together with their implementation costs
Comparative Analysis of Data Migration Approaches
See the Table 2. Table 2. Comparison of data migration approaches Concepts
Approaches M. Potey & A.A. M. Hanine A.El Alami all [1] Mahmood [2] & all [3] & M.Bahaj [4]
S. Hamouda & Z. Zainol [5]
D.Liang & all [6]
B. Namdeo & U.Suman [7]
Handling of RDB properties (relations, functions, constraints, keys, queries. . . )
No
No
Yes
Yes
Yes
Yes
Yes
Primary-foreign key relationship (1:1,1: N, N: N)
No
No
No
Yes
Yes
No
No
Migration to any type of NoSQL database
No
No
No
No
No
Yes
No
Embedding / referencing concepts
No
Yes
(just embedding)
No
Yes
No
No
The implementation of Yes the proposed algorithm
Yes
Yes
No
No
No
Yes
Database Migration Validation
Yes
Yes
No
No
No
Yes
5
Yes
Results and Discussions
The initial approach [1] suggested offers a very simple interface, the ability to convert any kind of BDR to BD NoSQL, as well as, according to certain performance assessment tests performed in comparison to other approaches, this one
262
C. Saadouni et al.
guarantees the following benefits: Very high efficiency, minimal human effort, and low error scope. Yet, the framework ensures only the migration of data from RDM foreword MongoDB, thus, the software can be enhanced by applying transformation modes such as embedding or linking to each table and each type of relationship instead of simple automated migration operation. Whereas Alza A. Mahmood’s [2] solution gives a simple, fully automated method for migrating data from any type of relational database to any type of NoSQL database with no strain and without the necessity to define the composition of tables and relationships between them. Though, it may be evolved by applying different transformation modes (Auto, Linking, Embedded) to each table and kind of relationship rather than using the same mode for the whole base. The third [3] proposed solution offers a straightforward migration technique with only three steps to convert data from a relational database to a NoSQL database. Simultaneously, the software developed guarantees to the user the ability to manipulate the new NoSQL database, and he may access all of MongoDB’s features via the interface’s tabs, such as Documents, Collections, and Requests. Nonetheless, there are still intriguing and pertinent topics to be addressed, such us, add some parameters to be able to map each table and relationships based on the principles of MongoDB namely Linking and Embedding. As well as, covering all types of databases available as RDBMS or NoSQL to migrate from both, not only MySQL and MongoDB. The authors A. EL ALAMI and M. BAHAJ come up with an interesting idea [4], which is based essentially on the concept of the data model that ensures the ability to determine the internal functioning of the relational system, taking into account tables names, attributes, primary and foreign keys, cardinalities, and so on. From these features, they proceed to assure the transformation of each table and its various types of relationships. However, one major weakness of this research is that no implementation or real-world case study was conducted to test and concretize the model’s conception. Regarding the [5] paper, its algorithm takes into consideration migrating not only the data but also the properties of the relational BD such as tables, relationships, functions, constraints, keys, and so on. Yet, the project can be enhanced by implementing automatic prototyping to map a real relational database to a document database in order to verify and assess this proposed method. The key advantage provided by the mid-model approach [6], unlike other methods or algorithms that only take into account a specific NoSQL data model, this mid-model can be a universal solution that has the ability to migrate to any NoSQL dB using the strategy lib in this system. At the same time, this methodology not only moves data from relational to NoSQL, but it also preserves the data integrity that was in the relational database. the major two aspects of this model, as mentioned above, are: The Data feature contains some of the features that are present in the source data in the relational database, meanwhile the Query feature identifies the queries that are commonly or frequently retrieved on the source data, and the data is arranged in the destination database (NoSQL
Data Migration from Relational to NoSQL Database
263
databases) based on these queries to make data access easier. However, this study can be more accomplished by developing an interface that would allow this system to be easily manipulated. At last, the primary benefit of the suggested paradigm called Schema Design Advisor Model (SDAM) [7], is that it pays special attention to the conception of NoSQL schema, and this conception is made based on the relational schema, which takes 4 entries as follows Tables, Relations, Select and Update Queries. Thus, the only thing to take into account in the future is why not, develop this conception to handle also the other types of NoSQL databases.
6
Conclusion
NoSQL databases are now gaining more attention of application developers due to the rising demands of currently used application of managing a huge amount of data in an effective way. Dealing with this massive volume of data traditional relational DB, highly transactional, seem totally exceeded and not suitable for real-time applications and social media sites etc. As consequence, adopting a NoSQL strategy is no longer a luxury or a competitive advantage, but a necessity. In this context, in recent years, a variety of models, frameworks, and layers have been proposed to convert current data stored in relational databases to this new technology NoSQL, while ensuring high availability and scalability. This document presented a comparative study of some of the most current studies on data migration from relational to NoSQL databases. In the previous part, the technology utilized for migration, as well as the numerous benefits and limits of every research, are discussed. The paper’s main goal is to serve academics in acquiring an overview of current research that has been achieved in the recent decade, from 2015 to 2020. To sum up, from these recent techniques we will proceed to build our own contribution in order to address this problematic and cover all of the downsides discovered.
References 1. Potey, M., Digrase, M., Deshmukh, G., Nerkar, M.: Database migration from structured database to non-structured database. Int. J. Comput. Appl. 975, 8887 (2015) 2. Mahmood, A.A.: Automated algorithm for data migration from relational to NoSQL databases. Alnahrain J. Eng. Sci. 21(1), 60–65 (2018) 3. Hanine, M., Bendarag, A., Boutkhoum, O.: Data migration methodology from relational to NoSQL databases. Int. J. Comput. Electr. Autom. Control Inf. Eng. 9(12), 2566–2570 (2015) 4. El Alami, A., Bahaj, M.: Migration of a relational databases to NoSQL: the way forward. In: 2016 5th International Conference on Multimedia Computing and Systems (ICMCS), pp. 18–23. IEEE (2016) 5. Hamouda, S. Zainol, Z.: Document-oriented data schema for relational database migration to NoSQL. In: 2017 International Conference on Big Data Innovations and Applications (innovate-data), pp. 43–50. IEEE, August 2017
264
C. Saadouni et al.
6. Liang, D., Lin, Y., Ding, G.: Mid-model design used in model transition and data migration between relational databases and NOSQL databases. In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 866–869. IEEE, December 2015 7. Namdeo, B., Suman, U.: Schema design advisor model for RDBMS to NoSQL database migration. Int. J. Inf. Technol. 13(1), 277–286 (2021) 8. Katal, A., Wazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE, August 2013 9. EL Bouchti, K., Ziti, S., Omary, F., Kharmoum, N.: A new solution to protect encryption keys when encrypting database at the application level. Int. J. Adv. Comput. Sci. Appl. (2020) 10. Ganesh D., Penthuru R.: Cloud Infrastructures for Big Data Analytics, IGI Global (2014) 11. Ogunyadeka, A., Younas, M., Zhu, H., Aldea, A.: A multi-key transactions model for NoSQL cloud database systems. In: 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pp. 24–27. IEEE (2016) 12. El Bouchti, K., Ziti, S., Omary, F., Kharmoum, N.: A new database encryption model based on encryption classes. J. Comput. Sci. 15, 844–854 (2019) 13. Goyal, A., Swaminathan, A., Pande, R., Attar, V.: Cross platform (RDBMS to NoSQL) database validation tool using bloom filter. In: 2016 International Conference on Recent Trends in Information Technology (ICRTIT), pp. 1–5. IEEE, April 2016 14. El Bouchti, K., Ziti, S., Omary, F., Kharmoum, N.: New solution implementation to protect encryption keys inside Database Management System. Adv. Sci. Technol. Eng. Syst. J. 5, 87–94 (2020) 15. https://fr.wikipedia.org/wiki/NoSQL 16. Hesham, E., Mostafa, A.: Advanced Computer Architecture and Parallel Processing. Wiley, Hoboken (2005) 17. https://beginnersbook.com/2017/09/mapping-relational-databases-to-mongodb/ 18. https://media16.connectedsocialmedia.com/MongoDB/14856/RDBMS MongoDB Migration Guide.pdf
Recommendation System: Technical Study Hanae Mgarbi(B) , Mohamed Yassin Chkouri, and Abderrahim Tahiri Abdelmalek Essaadi University, Tetouan, Morocco [email protected], {mychkouri,t.abderrahim}@uae.ac.ma
Abstract. We present in this paper a study of the features and approaches of the recommender system. The use of recommendation techniques is very important for a system that will provide good recommendations to its users. This explains the importance given to the different recommendation approaches, such as based on collaborative filtering, content-based, hybrid approach, etc. Keywords: Recommender system · Content-based approach · Collaborative approach · Hybrid approach · Feedback · Evaluation
1 Introduction The recommendation system [1] is a particular form of information filtering and an application intended to offer users elements likely to interest them according to their profile. Recommendation systems are used in particular on online sales sites. They are found in many current applications that expose the user to a large collection of elements. Such systems typically provide the user with a list of recommended items they might like or predict how much they might like each item. These systems help users choose appropriate items and make it easier to find their favorite items in the collection. The recommendation system [2] is based on the comparison between the user and the items which are the elements of information. Conventional information retrieval or recommendation methods using vector modeling directly deduce the relevance of the similarity measurement between the vector representing the user and that representing the item. There are several approaches to recommender systems, which suit each problem to obtain results that are relevant to the user. This paper studies the principle of recommender systems and their different approaches such as content-based filtering, collaborative filtering, hybrid approach, etc. We analyze the types of user feedback to the system to improve the next recommendations. Finally, we discuss the evaluation of recommender systems.
2 Recommender System Approaches To have relevant recommendations for users, the use of exact recommendation approaches [3] is very interesting. Therefore, it is important to use the most suitable approach for each case. This diagram shows the different approaches to recommendations (Fig. 1). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 265–271, 2023. https://doi.org/10.1007/978-3-031-26384-2_23
266
H. Mgarbi et al.
Recommender system Content-based filtring
Collaborative filtring
Model-based filtring
Hybrid filtring
Memorybased filtring
User-based
Item-based Fig. 1. Recommendation techniques
2.1 Content-Based Filtring The content-based approach uses information related to the profile of users and items to make the recommendation [4]. Based on the characteristics of the item, the system proposes items that are similar to the items liked in the past by the user concerned. Content-Based System Architecture. Figure 2 shows the architecture of the contentbased approach which consists of three main steps which are: content analyzer, profile learning, and filtering (or recommendation). The high-level architecture in which the recommendation process [6] is carried out in three stages, each of which is managed by a specific component: Content analyzer, user profile learning, and item filtering (or recommendation). Content Analyzer. The description of the items is in textual and unstructured format requires natural language processing [7] to extract and structure the main characteristics that describe the content of the items. Profile Learning. There are several machine learning techniques that can be used to produce the profile such as decision trees [8], naive Bayes classification [9], and neural networks [10] based on profile preferences on items. Filtring. Content-based recommendation techniques [11] focus on measuring the degree of correspondence between item i for a user u. the item(i) represents its description and the user(u) models its profile and its preferences on items. Among these methods, we cite: similarity, and nearest neighbors.
Recommendation System: Technical Study
267
Fig. 2. High-level architecture of a content-based recommender system [5]
This filtering approach is most relevant in case of the domain of information retrieval and recommendation of documents using vector space model with measure (TF/IDF) [12], or probability such as Naïve Bayes Classifier [8], Decision Trees or Neural Networks to model the relationship between different documents within a corpus. Vector Space Model. The Vector Space Model (VSM) is a model based on a vector space, which represents users and items in the same space by vectors in a vector space. Each dimension of a vector space represents a characteristic of an item corresponding to a basic element of a vector space of the VSM. Each component of an item vector reflects the importance of the corresponding characteristic of an item vector. The calculation of the similarity measure is based on the cosine of the angle between the two vectors modeled in the same VSM vector space. According to the theory, the user can be considered as an ideal item, so the more an item is similar to this user, the more relevant it is, i.e. it corresponds more to the needs of the user. Similarity. The similarity between user u and item i can be measured by the cosine of the angle formed by the two vectors of the user and of the item. This measure is called cosine similarity and is used in searching and filtering information. Particular Forms of Content-Based Recommendation The Knowledge-Based Recommendation. Knowledge-based recommendation focuses on explicitly knowing the user through forms posed by the system to the user for answers. it tries to collect as much information as possible about the user to have recommended items relevant to his profile.
268
H. Mgarbi et al.
The Utility-Based Recommendation. The utility-based recommendation system is based on the calculation of the utility of each item for the user to recommend the most relevant. The creation of a utility function for each user is based on the request to fill in forms.
2.2 System Based on Collaborative Filtering The collaborative filtering approach [13] makes the prediction of the evaluation of an item for a user by ignoring the attributes characterizing the users and the items based on the history of preferences of similar users. The principle of this approach is: If two users have the same evaluation of an item, they are more likely to have an evaluation similar to that of another item. Recommendation Based on Collaborative Filtering. Two approaches to collaborative filtering have been developed, memory-based collaborative filtering and modelbased collaborative filtering. Memory-Based Filtering. Collaborative filtering based on memory [14]: is based on the similarity between items or between users. The objective of this filtering [15] is to predict the rating of the user on item I, the user-based filtering focuses on the ratings of other similar users for the item I. In contrast, item-based filtering relies on user ratings for other similar items. Model-Based Collaborative Filtering. Model-based collaborative filtering [14] uses data to design a probabilistic model to make predictions about a user’s rating of an item. To apply this type of filtering, Bayesian networks can be used [15].
2.3 System Based on the Hybrid Approach The system based on hybrid filtering [16] allows to use and combine several recommendations approaches into a single prediction in a structure of a more complex recommendation system to minimize their disadvantages. The recommender system can use each model separately and combine its recommendation to form the list of filtered items. Hybridization Techniques. A hybrid recommender system can be classified into three categories: monolithic, parallelized, and pipelined hybrids. There are hybridization techniques that are categorized into three types: Monolithic Hybrids. Monolithic hybrids are based on the hybrid recommendation which combines several approaches and several different data sources. • Feature Combination: This technique is used to combine and pre-process multiple data sources. For example, in the combination of content-based filtering and collaborative filtering, Personal information is treated as additional data related to each item and uses the content-based approach on the combined data set.
Recommendation System: Technical Study
269
• Functionality Augmentation: This method augments the data model of one recommender system with information used by another recommender system. This method applies more complex transformation steps than combining features. Parallelized Hybrids. Parallelized hybrids use multiple recommendations approaches in parallel and apply hybridization to aggregate their results. • Weighted: Two or more recommender systems operate in parallel, and the rating given by each is weighted into a final relevance rating. • Switching: This method uses criteria to switch between recommender systems to have the highest relevance, the system can switch to another recommender system with greater confidence in its output. • Mixed: This method makes it possible to propose to the user a list of items, the result of two or more systems of recommendation is mixed in a single list Pipelined Hybrids. Pipelined hybrids make recommendations that run in stages to get results sequentially on top of each other. • Cascade: A recommender system is used to get a large list of recommended items, then another system takes that list as input and downgrades the list. • Meta-level: filtering builds a model that is used by the primary recommender to make recommendations. 2.4 Others Approach Demographic Approach. In this approach [17], the recommendation system asks the user to enter parameters such as gender, age, job, city, etc.). To define demographic classes into which users are categorized. To recommend to each user the items that came from the class assigned to him. Social Approach. In this approach [18], a new parameter has been added which is the user’s social circle. A social recommendation system recommends to the user items that are favored in his social circle.
3 Feedback to the Recommendation System Feedback to the recommender system is requested by the system from the user to improve future recommendations. To get this feedback, the recommender system takes one of two methods: Explicit Feedback. The explicit feedback [11] is feedback given by the user to the system: user involvement is therefore required to obtain this information. It is in the form of a rating on a numerical scale like the one to five star scale, or a binary value like “like” or “dislike”.
270
H. Mgarbi et al.
Implicit Feedback. The implicit feedback [19] can be retrieved without the user giving his opinion to the system explicitly. We can get feedback through user behaviors, such as clicking on an item.
4 Evaluation of Recommender Systems The quality of a recommendation algorithm can be evaluated based on different types of measurement. The type of metric applied relates to the type of the recommendation approach. The calculation of accuracy is the fraction of correct recommendations out of the total possible recommendations. There are two measures of the accuracy of recommendation systems [20]: statistical accuracy and decision support metrics. Statistical Accuracy Metrics. Assess the accuracy of a filtering technique by comparing predicted ratings with the user’s actual rating. There are two measures: Mean Absolute Error (MAE). Is the most popular and commonly used, it is a measure of recommendation deviation from the user’s specific value. Root Means Square Error (RMSE). Used as a measure of statistical accuracy. Decision Support Accuracy Metrics. These metrics help users select very highquality items from the set of items, the most used metrics [20] are reversal rate, weighted errors, receiver operation characteristics (ROC) and precision recall curve (PRC), Precision, recall and F-measure.
5 Conclusion We have presented in this paper a study on the techniques of the recommender system. The use of recommendation techniques such as collaborative filtering-based, contentbased, and hybrid approaches is very important for a system that will provide good recommendations to its users. For the techniques that are most used (content-based filtering, collaborative filtering, and the hybrid approach), we have presented their architecture and their main algorithms. Finally, we explained the evaluation of recommender systems, based on the different methods of Statistical accuracy metrics and Decision support accuracy measures.
References 1. Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, Boston, MA (2011). https://doi.org/10.1007/978-0-387-85820-3_8
Recommendation System: Technical Study
271
2. Mishra, R., Rathi, S.: Efficient and scalable job recommender system using collaborative filtering. In: Kumar, A., Paprzycki, M., Gunjan, V.K. (eds.) ICDSMLA 2019. LNEE, vol. 601, pp. 842–856. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1420-3_91 3. Isinkaye, F.O., Folajimi, Y.O., Ojokoh, B.A.: Recommendation systems: principles, methods and evaluation, Egypt. Inform. J. 16(3), 261–273 (2015). ISSN 1110-8665. https://doi.org/ 10.1016/j.eij.2015.06.005 4. Guillou, F.: On Recommendation Systems in a Sequential Context. Machine Learning [cs.LG]. Université Lille 3, (2016). English 5. Lops, P., de Gemmis, M., Semeraro, G.: Content-based recommender systems: state of the art and trends. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P. (eds.) Recommender Systems Handbook, pp. 73–105. Springer, Boston (2011). https://doi.org/10.1007/978-0-38785820-3_3 6. Jannach, D., et al.: Hybrid recommendation approaches. In: Recommender Systems: An Introduction. Cambridge University Press, pp. 124–142 (2010). https://doi.org/10.1017/CBO 9780511763113.007 7. Luo, J., Chong, J.: Review of natural language processing in radiology. Neuroimaging Clin. N. Am. 30, 447–458 (2020). https://doi.org/10.1016/j.nic.2020.08.001 8. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2– 3), 131–163 (1997) 9. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012) 10. Bishop, C.M.: Pattern Recognition and Machine Learning, vol. 4, no.4. Springer, New York (2006) 11. Schafer, J.B., Frankowski, D., Herlocker, J., Sen, S.: Collaborative filtering recommender systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web. LNCS, vol. 4321, pp. 291–324. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72079-9_9 12. Ben Ticha, S.: Recommandation personnalisée hybride. Autre [cs.OH]. Université de Lorraine (2015). Français. NNT: 2015LORR0168 13. Li, S.: Context-aware recommender system for system of information systems. Technology for Human Learning. Université de Technologie de Compiègne (2021). English. NNT: 2021COMP2602 14. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009) 15. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 285–295. ACM, New York (2001) 16. Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, 421–425 (2009) 17. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteen the Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann Publishers Inc. (1998) 18. Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User-Adap. Inter. 12(4), 331–370 (2002) 19. Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Rec. 31(2), 84–93 (2002) 20. Pazzani, M.J.: A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev. 13(5–6), 393–408 (1999)
The Appropriation of the Agile Approach in Public Sector: Modeling the Achievement of Good Governance Mouna Hajjaj1(B) , Houda Lechheb2 , and Hicham Ouakil2 1 2
Faculty of Economics and Management, University Hassan 1st, Settat, Morocco [email protected] Faculty of Economics and Management, Ibn Tofail University, Kenitra, Morocco {houda.lechheb,hicham.ouakil}@uit.ac.ma https://feg.uh1.ac.ma/ , https://feg.uit.ac.ma/
Abstract. Agile is on the rise as a new way of governing. It’s an emerging theme in the field of management to detach agile from its roots in software development and explore a new context of application. This article is a contribution to the literature to propose a model based on the theoretical gaps of the agile approach specifically in the public sector and linking the agile framework to the broader classical values of public management. In this paper, we model the components of the theoretical framework of the agile approach. We draw on De Vaujany’s theory of appropriation to explain the link between the public administration crisis and the attempt to model the agile approach in the public context. The model is guided by the principles of the agile approach for the independent variables. The moderating variables are the context presented in the models of Boehm and Turner [13] and that of Kruchten [14]. The dependent variables are the satisfaction of the users or stakeholder request to achieve good governance. With this paper, we hope to build a bridge for further collaboration between practitioners and academics in the search for new ways to improve public value. Keywords: Modeling · Agile governance · Public sector
1
· Organizational context · Good
Introduction
The theory of public administration provides insight into the evolution of public space. In a bureaucratic government organization, procedures and rules come first [1]. The management concept of New Public Management [2] has replaced rules and procedures with goals, performance measures, and individual incentives. The changes brought by New Public Governance [3] give hope for a more human approach that encourages creativity and new solutions. Politicians or government officials aren’t necessarily aware that they’re operating within a particular theory of public management. Nevertheless, the underlying philosophy of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 272–283, 2023. https://doi.org/10.1007/978-3-031-26384-2_24
The Appropriation of the Agile Approach in Public Sector
273
the governance model used in a public administration has a significant impact on how public services are perceived by their users and by civil servants. Today, we observe a discrete transformation of the administration that invests in the application of agile software development approaches to other types of administrative problems. This approach is in contrast to bureaucracy, where decisions are made from the top down while user complaints come from the bottom up. The agile mindset transforms decision making by making it pluralistic and involving internal and external users in the process from the beginning [4]. The agile approach informs how to address changing user needs and helps improve the efficiency of service delivery while ensuring that public values such as equity and integrity are respected [5]. This approach values the voices of managers and users. In fact, autonomous agile teams that are involved in decision-making processes are more open to successfully implementing reforms [6]. Since the agile approach emphasizes rapid learning, managerial innovation isn’t linear and therefore involves a relationship between different elements such as cultures and needs [7]. Management practices are developed in the organization that generates the innovation [8], in our case the software development industry. Then they go through a diffusion phase to evolve in a new organizational population, in our case public administration, where “the tool is appropriated by one or more actors, who train, distort and interpret it. Then, another collective or the same actors reappropriate the reconstructed tool by successively establishing relationships of varying strength between prescribers and users” [9]. Hajjaj et al. [5] article is a good illustration of this idea. It is dealing with the appropriation of the agile movement designed by software developers, which was adapted with the authors support by public servant despite of the fact that the context may appears special if one analyze it superficially. Considering only macro items like the nature of the organization under study and its activity, make the context outside of the scope of the agile. The tensions of public administration lead to these stages of appropriation by public administrators. De Vaujany [9] proposes reflexive models as an alternative to rationalist models [9]. The author reasons that managers mobilize their reflective capacity to maintain a balance in their socio-cognitive patterns. This reflection work with research partners was presented in a previous empirical study [5], where Hajjaj et al. [5] analyzed whether the public organization under study was ready to move to agile methods before allocating resources for adoption, and then presented the contextualization of agile methods outside of software development as applied research whose goal is to achieve better governance. This reflective work, in collaboration with community practitioners, enabled practices to be reviewed through the adoption of agile principles and methodologies and provides a solid foundation for this article. Because agility contrasts with the bureaucratic culture associated with public sector statutes, Hajjaj et al. [5] avoided reluctance to change by developing the new practices with community practitioners. When stakeholders are involved in improving their practices, they adhere to them much better [10–12] and transitions become smooth. The transitions that public policy management practices and frameworks undergo are
274
M. Hajjaj et al.
at the origin of managerial tensions. Indeed, several recent reforms in the public era have replaced the management of goals and means. This transition has structural implications for the use of management innovation by public administrators. Moreover, a paradox remains in public management that involves the balance between the formal and the informal. Managers are required to be formal and organized while being flexible enough. Our article will be structured as follows: First, in the method section, we’ll explain the theoretical basis that will allow us to achieve the objectives of this article; then, results obtained from the application of the method will be presented through the proposed conceptual model and its explanation. We’ll then discuss the nuances and opportunities of agile action in the public sector, recalling the process and its response to the crisis of good governance
2
Method
In this article, we will essentially refer to De Vaujany’s approach to appropriation [9], which he defines as “the process by which individuals make an originally unknown or even hostile object useful for everyday use” [9]. De Vaujany [9] distinguishes several purposes of appropriation. However, the ones that come closest to our goal are “putting into practice” and “good practices”. The former involves asking, “How can objects and management tools developed by academics or produced in collaboration with one community of practice be made suitable for use by another community of practice? How can the academic community facilitate this process?” [9]. The second axiom states that “appropriation is a long process that begins long before the object is used and continues long after the first routines of use occur” [9]. The appropriation process is also complex because it “requires activation to be understood in all its richness” [9]. De Vaujany [9] presents four postulates that determine the appropriation of management practices. According to this author, the appropriation process is “contingent”, where “each appropriation is a contingent form that articulates the four categories of elements [management object; management rule; management tools; management device] of the appropriation perspective” [9]. The second axiom states that “appropriation is a long process that begins long before the object is used and continues long after the first routines of use appear” [9]. The appropriation process is also complex because it “requires activation to be understood in all its richness” [9]. Finally, the appropriation process implies “a certain instrumental and interpretative flexibility” [9]. De Vaujany [9] distinguishes three perspectives of appropriation. First, the “value-provision” based on institutional tensions, the “assimilation value” based on structural tensions, and finally the “use value” based on managerial tensions. Our perspective fits this last form, the “use value” that proposes new practices. We bring the “use value” closer to the changes in management practices that we observe in Hajjaj et al. [5] empirical study which is a solid basis for this article.
The Appropriation of the Agile Approach in Public Sector
3
275
Finding
For the independent variables, we propose this theoretical model inspired by the DeVaujany Appropriation Framework [9] and inspired by the principles of the agile approach [4]. The moderating variables are the context presented in the models of Boehm and Turner [13] and Kruchten [14]. Boehm and Turner [13] explore the use of home ground to determine whether a process approach is appropriate for a particular project or organization. It includes a set of conditions under which agile approaches are most likely to succeed. Boehm and Turner [13] argue that the more the conditions of a particular organization or project differ from those of the Home Ground, the more risky an approach in its pure form is, and the more valuable it is to mix some of the complementary practices of the opposite approach. In addition, Kruchten [14] describes in his model of the agile Sweet Spot the characteristics of the factors that favor the application of the agile approach. Kruchten [14] constructed recommendations on which practices are applicable, which are not, and which would require adaptation or special consideration based on combinations of factor characteristics and methods. The dependent variables are user or stakeholder satisfaction and the achievement of good governance in the public sector (Fig. 1).
Fig. 1. Conceptual model of this research on the application of agile.
3.1
Context
Contextual elements are moderating variables in the impact of applying the agile approach to good governance. Some elements may even influence the success of implementing the agile approach in an organization or project. Structural Complexity. Structural complexity indicates the complexity of the organizational environment in which the project is located. Structural complexity goes hand in hand with the systemic approach as there is a strong interaction
276
M. Hajjaj et al.
and integration within the system. Structural complexity refers to some elements from the models in the literature [13,14]. In particular, it refers to the size of the teams, the project, or even the organization, which can have a significant impact on project success, although the literature is contradictory on this point. Complexity increases with size, as all elements are symbiotic and must be considered with great care. As the number of entities represented in a project increases, whether entities asserting a need according to their defined requirements or the team members themselves, it becomes more complicated to establish project goals in advance that remain stable and well defined. This is a key factor in the complexity of the project and affects its chances of success [15]. Another element of complexity is criticality. In fact, the agile approach is better suited for projects with low criticality. The traditional plan-based approach is better suited when the risk of failure can lead to large losses. A final element of structural complexity is the geographic distribution of teams. Managing a team where all members are colocated in the same building is not the same as working with geographically distant teams and managing the vagaries of multicultural teams, ensuring smooth communication, etc. Project teams rely on regular feedback to determine if they are on track for project success. This feedback comes from direct contact. Structural complexity increases when direct communication that occurs spontaneously in team environments is impeded. Dynamism. Another issue related to context is that of dynamism. Indeed, the agile approach requires a culture that meets the requirements of the agile approach and is best suited for projects where uncertainty is high and where there is a lot of change. Changing requirements are one aspect of dynamism. It is caused by the ambiguity of the initial definition of the requirements, but also by their real volatility. Agile culture is the opposite of the culture in many bureaucratic, hierarchical organizations. Research on team performance has long focused on the intrinsic characteristics of the team that can help or hinder effectiveness. Team success was a logical consequence of team effectiveness [16]. More recently, however, researchers have looked for other determinants of effectiveness [17] outside the boundaries of the team by analyzing the complex interaction between the team and its environment [18], which is characterized by adaptation. According to Burke et al. [18], adaptation is based on feedback from the environment through the identification of salient cues that should be responded to by reformulating the plan. For this author, team adaptation takes the form of innovation through the modification of structures or behavioral or cognitive actions with a specific goal. In addition, the agile approach relies on the autonomy of teams, so team members must be qualified to best perform their task, which is to make quick decisions. This task often lies beyond middle management. Agile teams must constantly monitor their environment to perceive changes and identify obstacles to performance so that they can adjust their practices and goals accordingly [19–23]. The agile approach proposes several processes that have a significant and direct impact on team adaptability. The iterative learning process enables strong feedback. It is characterized by recursively taking cues
The Appropriation of the Agile Approach in Public Sector
277
from the environment. These cues are analyzed and evaluated to draw lessons and opportunities by adjusting the team’s behavior. This process is cyclical. At each stage, teams perceive or respond to cues from the environment, resulting in learning, feedback, and change. In other words, planning occurs at the iteration level so that changes in the external environment can be detected and incorporated into the plan for the next iteration. Endsley [24] describes the existence of three increasing levels of complexity in his model of situational awareness. The first level of awareness is limited to the ability to perceive cues from the environment. The second level concerns the ability to understand the relevance of perceived cues from the environment given their relationship to the work in progress. Finally, it involves extrapolating the likely outcomes of the team’s ongoing work based on the new clues. Staff Competence. In addition, employees must be made to take the initiative. For this reason, culture is a critical element, because in some contexts where the environment is risk averse, taking responsibility for actions and their outcomes is avoided. We see several challenges in contextualizing the agile approach in public organizations. Agile managers in the public sector are not expected to follow predetermined workflows in their actions. However, their actions would risk clashing with the legalistic administrative culture of their staff or even contradicting administrative law. The application of the latter must be assessed on a case-by-case basis, especially in financial services that manage public contracts, for example, which are subject to fairly strict procedures and where failure to comply with a simple aspect can have negative consequences. The agile approach is combined with a new form of leadership, namely servant leadership. The latter is a form of consensual decision making based on a recursive process of trial and error. Some authors, such as Greenleaf [25], believe that this form of leadership is less represented in public administration. As we have already discussed, public administration has changed and its context is not homogeneous, so we should never generalize, but consider the elements of the context specific to each organization. 3.2
Good Governance
In our research model, the application of the agile approach, which is a mindset that introduces a culture break in hierarchical-bureaucratic organizations, has indirectly influenced good governance. Good governance is characterized by several key elements. Efficiency and Effectiveness. First, the efficiency and effectiveness of the service provided, which is reflected in the quality of the service offered to users while optimizing resources. In traditional processes, results are often reduced to the production of reports that are rarely consulted or used. In contrast, agile work aims to satisfy needs and solve problems rather than simply produce detailed documentation. In addition, agile values the efforts of managers and therefore
278
M. Hajjaj et al.
inevitably ensures their engagement, especially when the culture rewards individual contributions. Responsiveness and Feedback. Second, responsiveness and feedback, which respectively determine listening to the growing needs of users and partners and acting on the signals gathered to take action. Agility assumes that situations evolve, and not fixed, and that one should adapt as new constraints or opportunities emerge. Transparency and Accountability. This is about sharing information with users and partners to share resources and capacity and engage users, but also to better account for actions taken because it is important to measure the impact of actions and to challenge oneself in a continuous improvement perspective. The agile approach guides practitioners to revise the first versions of the work done, as it emphasizes continuous learning and self-reflection processes to improve processes or services delivered which is a pluralistic exercise which implies sharing with transparency and collecting feedback. The improvements may be related to the speed of delivery, the quality of the product or service, or even its existence, as agile methods are opposed to a zero-defect culture. In agile approaches, managers are allowed to make mistakes, because mistakes are part of the learning process. Agile approaches presuppose failure. However, organizations that have made mistakes in early iterations are better able to improve. However, in environments where mistakes are not tolerated, it can be very difficult to apply experimental and trial-and-error approaches. The agile approach provides for the process of manager-led retrospectives. This agile process requires regular reviews of policies and procedures to challenge opportunities for improvement and learn from past mistakes. 3.3
Use of the Agile Approach.
The use of the agile approach is the independent variable of this model. In this article, we seek to illustrate the impact of the use of the agile approach on good governance in a specific organizational context, namely the world of public service. We’ll be focused on insights from Hajjaj et al. [5] participatory action research which consists in a co-construction of a management model for scientific research programs. The use of the agile approach variable incorporates the twelve agile principles inspired by the agile Manifesto. Since it was designed for software development, it was necessary to adapt each of the twelve principles to the specifics of our case study, as described below: Customer Satisfaction. The top priority should be to satisfy the user of the public service by satisfying his requests and mobilize all the means around this priority. Here, it is respecting the quality of the deliverable, but also the processing time.
The Appropriation of the Agile Approach in Public Sector
279
Embracing Change. Welcome the evolution of needs, even at an advanced stage of processing. Agile processes exploit change to ensure that a need that is not supposed to be static is met. Considering the research project submitted for evaluation in Hajjaj et al. [5] empirical study, the project owner should be able to improve the project based on information as long as the submission deadline is not yet reached. Deliver Frequently. Processing times should be reduced where possible to increase efficiency and also improve the perception of service to the user. Process requests through an iterative process in which the delivery of information or documents is continuous in order to maintain the profitable link for several considerations related to the following principles of the agile approach The fact of processing files iteratively reassures the user that his request is taken into consideration and remains in contact and informed of the next steps. Working Together. The people involved in a case must work together actively. In Hajjaj et al. empirical study [5], the engineering teams working on the project submission platform must work closely with the research program management teams or the financial service teams in order to merge their information and create synergy. Trust and Support. A climate of mutual trust, support, and motivation must be built within the organization. It is very important when applying the agile approach to ensure that a culture of accountability and tolerance for error is established and that decision making is encouraged. Face-to-Face Conversation. The most efficient method of conveying information between stakeholders involved in a project is through face-to-face conversation. It should therefore be preferred to other means in order to be more effective while saving time and developing relationships between stakeholders. Effective Service. As it is often problematic to measure performance and good governance, the agile approach recommends keeping the efficiency of the service rendered as the key measure of progress. Sustained Pace. Agile processes promote sustainable pace. Stakeholders must be able to maintain a constant pace indefinitely. This point is very important because a faster pace should not be mistakenly considered a positive sign; it must be aligned with the team’s capabilities to be sustainable. Constant Attention. All senses must be kept alert in order to pick up timely information from the environment. It is also important to know how to exploit this information and transform it into opportunities for improvement in order to be more efficient and better satisfy the users’ requests.
280
M. Hajjaj et al.
Keep It Simple. Care must be taken to minimize non-value added work. Nonvalue-added work is often a source of frustration for agents, leading to tension and demotivation. In addition, non-value-added operations can cause delays for users in some situations. Self-organizing Teams. The best results come from self-organized teams. Teams must be able to take part in decision making. This is justified by their operational knowledge from their close and sensitive position with the user. Reflect and Adjust. The team should regularly reflect on how to be more effective in retrospective meetings. In this way, they learn from their mistakes, identify shortfalls, and reflect on innovative solutions in order to adapt and adjust their behavior accordingly and improve permanently. 3.4
Discussion
Agile has the particularity of being adaptive. Considering it beyond software development and particularly in the public sphere is still underdeveloped by the scientific community. This article would therefore improve the general understanding of implementing agile in the public service context. The agile approach is antithetical to bureaucracy where decisions are top-down while user complaints are bottom up. It reframes decision making by making it pluralistic with integrating internal and external users in the process from its early stages. The agile approach contributes to improving the efficiency of service delivery while ensuring that public values such as equity and integrity are respected. The alignment of public administration values and the agile approach can be seen in the role of the “Agile Manifesto” in the latter’s evolution [4]. The agile approach informs the voice on how to approach the changing needs of users [4], which is in harmony with public values. The agile approach goes beyond focusing on service delivery. It values the voices of managers and users. Indeed, autonomous agile teams involved in decision-making processes are more open to the successful adoption of reforms [6]. The agile approach allows for adaptation to changing environments, values, and public needs [5]. The agile approach requires high levels of collaboration between public managers and users. The process from start to finish is corollary to the joint actions of both operators and users. As the agile approach emphasizes rapid learning, innovation is not linear and therefore involves a relationship between several elements, such as cultures and needs. Public managers should be aware that the impacts of using new methodologies depend on their fit with the organizational context. As explained, agile methodologies have the greatest impact when there is a high fit between the environmental factors and the method’s practices. The results of this research indicate that organizations should generally expect the impacts of using agile methodology on project success to be generally positive, and to be most positive in the presence of low structural complexity and high dynamism.
The Appropriation of the Agile Approach in Public Sector
4
281
Conclusion
In this article, we describe the constructs of the research model of the conceptualization of agile and good governance for public organization. We have referred to the appropriation of the agile approach because public managers mobilize their reflective capacity in their quest to maintain a balance in their socio-cognitive patterns motivated by the tensions of public governance. This is in order to observe how management practices developed by academics or co-produced with a community of practice can be made fit for use by another community of practice and how can the academic community facilitate this process [9]. One of the limitations of this research is scarcity of empirical data on the impact of contextualizing the agile approach to the public service sphere. We relied mainly on Hajjaj et al., [5] previous work which consists of an empirical study on the application of agile in a public organization through the co-construction of new practices with practitioners of the organization under study. Another limitation is that the notion of good governance has remained polysemous and there is no widely recognized and adopted model to evaluate it. Indeed, in Hajjaj et al., case study [5], implementing agile revealed a significant contribution in terms of speed, working methods, acquisition of strategic information and consequently the satisfaction of public managers with their work as well as the quality of the service presented through stakeholders’ feedback. Furthermore, after experiencing either the suitability and the feasibility of the agile approach and its methodologies in the public service context, we have built knowledge that could be exploited in other research perspectives. Indeed, it is appropriate to start from a solid foundation to start a longitudinal analysis. The acquisition of experience and theoretical knowledge on the subject opens the doors to this perspective while securing the way, because without prior theoretical knowledge, preconceived ideas and lived experiences, it is possible to fall into the trap of lack of rigor. Familiarity with relevant literature and exploratory experience enhances sensitivity to subtle nuances and provides a source of concepts for more advanced comparisons and analysis. A longitudinal analysis would provide a better understanding of whether the impacts of the agile approach and the application of its methodologies as well as the moderating effects of uncertainty and complexity in the public sphere environment are consistent over time. In addition, the nonlinear nature of the data obtained in this study may imply the presence of a recursive process. A longitudinal or experimental research design would be able to potentially detect a cyclical or reinforcing effect of the adaptation of the agile approach and the application of its methodologies within public services over time. This question will be one of the future scopes of this research.
References ´ Plon 1. Weber, M.: Le Savant et le politique, trad. fr. de Freund J., Paris, Editions (1963). https://doi.org/10.1522/cla.wem.sav 2. Hood, C.: A public management all seasons? Publ. Adm. 69(1), 3–19 (1991). https://doi.org/10.1111/j.1467-9299.1991.tb00779.x
282
M. Hajjaj et al.
3. Castells, M.: The Rise of the Network Society (The Information Age: Economy, Society and. Culture, Volume 1) (1996) 4. Beck, K., et al.: The Agile Manifesto. Agile Alliance (2001). http://agilemanifesto. org/ 5. Hajjaj, M. & Lechheb, H. (2021) . Co-Construction of a New Management Approach in a Public Research Funding Agency through the Contextualization of Agile Thinking. Organizational Cultures: An International Journal, 21(1), 21–34. https://doi.org/10.18848/2327-8013/cgp/v21i01/21-34 6. Moran, A.: Managing Agile: Strategy, Implementation, Organisation and People (2015) 7. Hajjaj, M., Lechheb, H.: How management remains expanding its theoretical roots to evolve serenely. Int. J. Adv. Oper. Manag. 13(1), 21 (2021). https://doi.org/10. 1504/ijaom.2021.113664 8. Birkinshaw, J., Hamel, G., Mol, M.J.: Management innovation. Acad. Manag. Rev. 33(4), 825–845 (2008) 9. De Vaujany F.X.: Pour une th´eorie de l’appropriation des outils de gestion: vers un d´epassement de l’opposition conception-usage. Manag. Avenir (3), 109–126 (2005) 10. Jabri, M.: Team feedback based on dialogue. J. Manag. Dev. 23(2), 141–151 (2004). https://doi.org/10.1108/02621710410517238 11. Morgan, D., Zeffane, R.: Employee involvement, organizational change and trust in management. Int. J. Hum. Resourc. Manag. 14(1), 55–75 (2003). https://doi. org/10.1080/09585190210158510 12. Brown, M., Cregan, C.: Organizational change cynicism: the role of employee involvement. Hum. Resour. Manag. 47(4), 667–686 (2008) 13. Boehm, B., Turner, R.: Using risk to balance agile and plan-driven methods. Computer 36(6), 57–66 (2003) 14. Kruchten, P.: Contextualizing agile software development. J. Softw. Evol. Process 25(4), 351–361 (2013) 15. Hatchuel A., Weil B.: L’expert et le syst`eme: gestion des savoirs et m´etamorphose des acteurs dans l’entreprise industrielle, Economica (1992) 16. Turner, J.R., Cochrane, R.: Goals-and-methods matrix: coping with projects with ill defined goals and/or methods of achieving them. Int. J. Proj. Manag. 11, 93–102 (1993) 17. Ancona, D.G., Caldwell, D.F.: Bridging the boundary: external activity and performance in organizational teams. Adm. Sci. Q. 37(4), 634–665 (1992). https:// doi.org/10.2307/2393475 18. Ilgen, D.R., Hollenbeck, J.R., Johnson, M., Jundt, D.: Teams in organizations: from input-process-output models to IMOI models. Annu. Rev. Psychol. 56, 517–543 (2005) 19. Burke, S., Stagl, K., Klein, C., Goodwin, G., Salas, E., Halpin, S.: What type of leader behaviors are functional in teams? A meta analysis. Leadersh. Q. 17, 288–307 (2006). https://doi.org/10.1016/j.leaqua.2006.02.007 20. Conboy, K., Wang, X., Fitzgerald, B.: Creativity in agile systems development: a literature review. CreativeSME (2009) 21. Lyytinen, K., Rose, G.M.: Information system development agility as organizational learning. Eur. J. Inf. Syst. 15(2), 183–199 (2006) 22. Highsmith, J., Cockburn, A.: Agile software development: the business of innovation. Computer 34(9), 120–127 (2001). https://doi.org/10.1109/2.947100 23. Okhuysen, G., Waller, M.: Focusing on midpoint transitions: an analysis of boundary conditions. Acad. Manag. J. 45, 1056–1065 (2002). https://doi.org/10.5465/ 3069330
The Appropriation of the Agile Approach in Public Sector
283
24. Endsley, M.R.: Toward a theory of situation awareness in dynamic systems. Hum. Factors 37(1), 32–64 (1995) 25. Greenleaf, R.K.: The Servant as Leader. Robert K. Greenleaf Publishing Center, Atlanta (1970)
The Contribution of Deep Learning Models: Application of LSTM to Predict the Moroccan GDP Growth Using Drought Indexes Ismail Ouaadi1(B) 1
2
and Aomar Ibourk2
Management Sciences, Faculty of Legal, Economic and Social Sciences Agdal, Mohammed V University of Rabat, Rabat, Morocco [email protected] Microeconometrics, LARESSGD University of Marrakech, Marrakech, Morocco
Abstract. The aim of our work is to introduce recurrent neural network techniques in attention to propose some ways to deal with drought issues and their effect on economic growth, and by do way to help in making better decisions to forecast Gross Domestic Product (GDP). The most known kind of recurrent neural network techniques is Long Short-Term Memory, which enable the creation of both short-term and long-term memory components to effectively study and learn sequential facts, as well as the construction and implementation of algorithms and statistical models to evaluate patterns and derive conclusions from them. In this study, we introduce Long Short-Term Memory to show the performance of these models to forecast the GDP growth based on drought index in the Moroccan case. Keywords: GDP forecasting
1
· Drought indexes · LSTM algorithm
Introduction
The Moroccan economy have been depending on rainfall as shown by many government strategies, given that all agricultural policies have not reached their goals, neither to guarantee the food security of the population, nor to contribute to promoting the growth of the economy. This failure is related maybe to the lack of wrong policies or to the way of the inherent decision was made. Moreover, the COP summit, which is a yearly conference held in the framework of the United Nations Framework Convention on Climate Change (UNFCCC), to assess progress in dealing with climate change. It attempts to resolve the climate change issues, by getting together the worldwide leaders’ efforts to develop a unified strategy, her we should highlight the important relationship between climate change and economics, especially Gross domestic product (GDP), which has been examined by many studies. Furthermore, the era of digitalization that allows gathering and storing a huge amount of data, which in their turn present c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 284–294, 2023. https://doi.org/10.1007/978-3-031-26384-2_25
The Contribution of Deep Learning Models
285
certain challenges for decision makers, due to the need of skills to deal with this data according to the following steps: processing, analysis and decision making. Furthermore, the emergence of powerful algorithms related to the field of artificial intelligence, which have been proved great success of forecasting and making predictions in several fields, have pushed doing further and more reliable research. Considering these things, the aim of our work is to introduce artificial intelligence techniques in attention to propose some ways to deal with climate change variables to predict GDP growth. While artificial intelligence is the ability of a computer to perform tasks that mimic human intelligence. It encompasses a brand of techniques that achieve this goal precisely machine learning techniques. Some of the most advanced and complex ones are called Deep learning algorithms, which enable the development and use of algorithms and statistical models to analyze and draw inferences from pattern in data. Her, we introduce a specific algorithm belonging to the Recurrent Neural Network class, called LSTM (for Long short-term memory) to show the performance of these models in predicting the GDP growth given climate change variables in the Moroccan case. According to this introduction, our work is organized as follow. The next section gives some theoretical background and discuss the use of machine learning techniques in the GDP growth forecasting. Third section will be devoted to the methodology, where we describe used data and the development of our model. In section four, we present and discuss our finding which will be concluded with a summary of our paper, limit and further works in the last section.
2
Literature Review
Given the significance of both GDP growth and climate change, there is an increasing amount of literature on these two topics. While there are many works that focus on the interaction between various domains, some works make an effort to examine each field alone. The most recent study, [8], used an extended ordinary least squares estimation to study the dynamic effects of economic growth, the use of renewable energy sources, and the expansion of agricultural land on CO2 emissions in Peru. These studies employ econometrics models to investigate the issues related to this topic. The will known study, which investigated the relationship between per capita income and a set of environmental indicators is [4]. Moreover, the (Kahn et al., 2019) study [6] has studied the impact of climate change on economic activity, authors have applied a stochastic growth model that use a country-specific variables such as temperature and precipitation, data related to these variables are recorded between 1960 to 2014 from a panel dataset of 174 countries. Recent advance in artificial intelligence field especially in machine learning has increased the research in all field by incorporate related algorithm to predict GDP growth using or not predictors from climate change. The use of machine learning algorithm to predict GDP growth has been subject of new studies
286
I. Ouaadi and A. Ibourk
like [11], where the author has used random forest (RF) and gradient boosting (GB) models to forecast real GDP growth of Japan. He has utilized data from 1981 to 2018 gathered from Bank of Japan and International Monetary Fund (IMF) to make annual real GDP growth prediction. He found that GB and RF perform more accurate than the benchmark forecasts model related to the time span of 2001–2018. In the same path, the working paper of [5] seeks to apply machine learning techniques (Recurrent Neural network (RNN), Elastic Net and Super Learner) to perform economic prediction. These techniques are deployed to forecast GDP growth of seven countries (Germany, Mexico, Philippines, Spain, United Kingdom, United States, and Vietnam) based on quarterly and annually data. This data gathered from IMF’s World Economic Outlook (WEO) database. The results show that machine learning algorithm outperform the forecast performance of the IMF’s WEO. More sophisticated machine learning technique like Feedforward Multilayer Perceptron, which is a special algorithm of artificial neural network, has implemented to develop a forecasting model to predict GDP fluctuation by [7]. Using data from the UK office for National statistics recorded between 1971 to 2013, he found that machine learning algorithm can be helpful in public and financial management. Another work that has used such sophisticated models is [9]. The authors have introduced two Deep Learning algorithms, namely recurrent neural network, and Long Short-Term Memory (LSTM), to forecast GDP fluctuation of Indonesia. By using GDP and inflation percentages as predictors and data from World Bank, Macrotrend and Financial Trade Data collected from 1990 until 2020. They found that LSTM and RNN algorithm can predict the growth variation of GDP with an accuracy of 90 per cent. According to work of [6] many works doesn’t take into consideration the time dimension of the data. Given this literature review, our work’s task is twofold. First, it tries to predict the growth of GDP based on the most known and recommended index of climate change, which are the standardized precipitation index and the standardized precipitation-evapotranspiration index. Second, to build a robust model that use LSTM algorithm with an ultimate purpose to predict GDP growth in the Moroccan case.
3
Methodology
To build a robust model that can perform prediction properly and accurately, a set of steps should be taken carefully. Among those steps, which are by do way the important ones, we have data preprocessing, which consists of cleaning and handling data, model training, validating and accuracy evaluation. According to these steps, this section is organized as follow. firstly, we give a brief description of dataset and used features. Secondly, we present our approach of data preprocessing. Lastly, we explain the predictive models chosen to be performed.
The Contribution of Deep Learning Models
3.1
287
Data Description and Preprocessing
Given that our purpose is trying to predict the GDP growth quarterly based on drought index. For those reasons we have use monthly GDP data from the Moroccan Financial Studies and Forecasting Department Database (available at https://manar.finan-ces.gov.ma). Precipitation and temperature data are gathered from World Bank Group, Climate Change Knowledge Portal (CCKP Database, available at https://climateknowledgeportal.worldbank.org/ download-data), data provided in this database are measured monthly, where temperature is recorded by mean, min and max. Those data are required for computing quarterly drought index described hereafter. Data used in this work are belonging to the 1998–2019 period. Given that the aim of this study is to forecast the GDP growth via drought index, their respective formula is provided below. The GDP growth variable (Δ GDP) was calculated by taking the variation between the same quarter of two consecutive years. Hereafter the respective formula (Eq. 1): GDP (n) − GDP (n − 1) (1) ΔGDP (n) = GDP (n − 1) where: GDP(n) = GDP of a given quarter of the year n GDP(n-1) = GDP of the same quarter of the year n-1 Standardized precipitation index (SPI index ) variable represents the index that allows to characterize meteorological drought on a range of timescales. it’s computed as follow (Eq. 2): SP I =
P ti − P tm σ
(2)
where: Pti = precipitation per quarter of the year; Ptm = average precipitation of the series on the time scale considered; σ = standard deviation of the series on the considered time scale. Finally, the Standardized precipitation-evapotranspiration index (SPEI index ) is calculated based on precipitation (Pt and potential evapotranspiration (PET ) [2]. To compute those index, we have using the SPEI R library [1,10]. Which give us two methods of calculation (Thornthwaite method and Hargreaves Method). As we can see, her we have chosen to compute the SPEI index based on both methods to compare their forecasting accuracy in a final purpose. Table 1 summarizes these calculations. We can deduce that our dataset is composed of 84 observations, taking into account that first four quarters are dropped because they were used to calculate the next year’s GDP growth rates (here 1999). Moreover, this dataset encompasses four variables, the first one, namely GDP GROWTH, constitute the forecasted variable, the next three variables are designed as features, namely
288
I. Ouaadi and A. Ibourk Table 1. Descriptive statistics of the data. GDP GROWTH SPI Index SPEI Thornthwaite Index SPEI Hargreaves Index
Frequence 84
84
84
84
mean
5.408
0.019
0.003
0.002
std
3.123
0.824
0.866
0.845
min
-1.840
-1.809
-1.856
-1.825
25%
3.522
-0.473
-0.560
-0.511
50%
4.915
-0.040
0.005
-0.110
75%
6.917
0.496
0.487
0.554
max
14.010
1.972
1.931
1.941
SPI Index, SPEI Thornthwaite Index and SPEI Hargreaves Index. Features are measured with SPEI R library according to the above equation and exported to a csv file for later processing.
Fig. 1. Data analysis plot.
According to figure (Fig. 1), where we see the plot of the distribution of all variables, We can determine these variables’ stationarity (seasonality and trend). Here, we make the assumption that the data are stationary, leading us to run our model. Whereas figure (Fig. 2) presents the correlation among the forecasted variable and features, indeed it shows the absence of any linear relationship among the inputs and the output of our model. 3.2
Model
As we have raised in previous subsection, the current problem is nonlinear one, this means that traditional artificial neural network (ANN) can’t give us relevant results, consequently, we have adopted a class of neural network called recurrent neural network (RNN) to handle this nonlinearity. Where ANN are widely used in timeseries data prediction, which they have present same lack due to their
The Contribution of Deep Learning Models
289
Fig. 2. Features correlation with GDP Growth analysis plot.
architectures that do not take into consideration sequence data, RNN in contrast provide an alternative way to deal with this kind of data, it allows to get consistent and good forecasting of observation [9]. RNN enable previous outputs to be used as inputs while having hidden states, this let current hidden layers to be updated based on previous knowledge. We have timeseries data in this study, which is made up of a sizable amount of historical data and requires strong computational resources to handle. For these reasons, RNN approaches present a type of algorithm known as LSTM that can hold just relevant data and is capable of resolving the long-term dependency problem (for Long-Short Term Memory). Forget, Input, and Output are the three primary gates that make up this algorithm. These gates enable the LSTM model to retain prior knowledge or discard superfluous input. The LSTM architecture is shown in the subsequent Fig. 3, and its formula is given in the following Eqs. (3, 4 and 5). In this work, we have a timeseries data, which constitute by a huge amount of historical data and need powerful computational resources to deal with. For this reasons RNN techniques presents a sort of algorithm that can hold only relevant data and capable to solve the long-term dependency problem, this algorithm is called LSTM (for Long-Short Term Memory). This algorithm is formed by three main gates: Forget, Input and Output. These gates allow LSTM model to keep previous information or delete irrelevant knowledge. The LSTM architecture is given in next Fig. 3, and its formula according to each gate in Eqs: 3, 4 and 5. (t) Forget gate unit fi (for time step t and cell i), that sets this weight to a value between 0 and 1 via a sigmoid unit [3]: ⎞ ⎛ (t) f (t) f (t−1) ⎠ (3) fi = σ ⎝bfi + Ui,j xj + Wi,j hj j
j
(t)
The input gate unit gi has similar calculation of the forget gate (sigmoid unit that gives a gating value between 0 and 1), with following specified parameters: ⎞ ⎛ (t) g (t) g (t−1) ⎠ gi = σ ⎝bgi + (4) Ui,j xj + Wi,j hj j
j
290
I. Ouaadi and A. Ibourk
Fig. 3. LSTM architecture (from Wikipedia.org).
(t)
The last gate of the LSTM cell is the output qi , which as the other gates uses a sigmoid unit for gating: ⎞ ⎛ (t) o (t) o (t−1) ⎠ (5) Ui,j xj + Wi,j hj qi = σ ⎝boi + j
j
where x(t) is the current input vector and h(t) is the current hidden layer vector, containing the outputs of all the LSTM cells, and b(f ) , U (f ) , W (f ) are respectively biases, input weights and recurrent weights for the forget gates.
4
Result
This section will detail the experiment’s approach, present the results of the algorithms’ training and testing, and then discuss the findings. 4.1
Experiment
Fitting Models: To fit our model, we have used the Keras from python packages. Where have loaded the dataset firstly, and split it into training, test and validating data. LSTM algorithm was implemented with RELU activation function and 400 epochs. The first figure (Fig. 4) shows model fitted with SPI Index and SPEI Thornthwaite Index as features to predict quarterly GDP growth. While the second figure (Fig. 5) shows model fitted with SPI Index and SPEI Hargreaves Index as features for predicting quarterly GDP growth.
The Contribution of Deep Learning Models
291
Fig. 4. First Model fitting.
Fig. 5. Second Model fitting.
As we can see, in figures (4, 5), as the number of epochs rises, the two models begin to converge. Consequently, the gap between train losses and test losses is getting smaller and smaller. Moreover, it could be that these models perform very well if we add some hidden layers or even we modify the activation function or adjust some other parameters. Forecasting and Simulation: We can see the outcomes of the predictions provided by these two models in Figs. 5 and 6. The success of these models can be seen graphically when the anticipated values trend closely follows the actual values. In other words, the LSTM model is quite good at forecasting the results. We calculated the root mean square error (RMSE) and mean absolute error (MAE) metrics, which are provided by the following formulas, in order to conduct a thorough accuracy assessment of our models: n 2 (ˆ yi − yi ) RM SE = n i=1 And,
n
|ˆ yi − yi | n With, yˆ1 , yˆ2 , . . . , yˆn are predicted values; y1 , y2 , . . . , yn are observed values; n is the number of observations. MAE =
i=1
292
I. Ouaadi and A. Ibourk
There is the computed RMSE and MAE of each model (Table 2): Table 2. Accuracy metrics. First model (SPEI Thornthwaite Index) Second model (SPEI Hargreaves Index) RMSE 2.241
2.199
1.904
1.812
MAE
Fig. 6. First Model prediction results.
4.2
Results and Discussion
As we can learn from Figs. 6 and 7, using LSTM algorithms the Moroccan GDP growth can be predictable with drought index, hence by other climate change index. Where we can infer some index doing better than other as features, here we can say with a certain degree of confidence, with reference to Table 2 of accuracy metrics, that the LSTM model with SPEI index computed according to Hargreaves method is more accurate than the other method of calculation of the same index.
Fig. 7. Second Model prediction results.
Since our findings are in line with the findings of the literature study, machine learning algorithms offer more effective tools for making predictions than traditional ones. Whereas, the majority of studies have compared their results to
The Contribution of Deep Learning Models
293
benchmark forecasting models developed by the IMF and central banks. Additional research will be conducted to better understand the relationship between Moroccan GDP growth and environmental or climate change indices. This approach will be covered. Additionally, we developed an LSTM model with a few hidden layers and few parameters, but the same kind of models with additional parameters can yield excellent results. The accuracy and robustness of machine learning models were further highlighted by the use of two additional accuracy metrics, RMSE and MAE, which are the most often used indicators. Additional performance measures can be employed to emphasize the accuracy and robustness of machine learning models.
5
Conclusion
The study’s findings support the use of machine learning methods for macroeconomic data forecasting. Given the manner of index calculation, the deep learning technique used in this work to generate long short-term memory models for the 1999-2019 timeframe results in forecasts with a varied level of accuracy. Here, MAPE and RSME are used to gauge accuracy. Machine learning models concentrate on predictions, whereas conventional econometric models concentrate on explanations of the correlations. Deep learning models are specifically thought of as “black boxes,” which means they are not a viable option for figuring out how independent variables affect the dependent variable or establishing a causal relationship. But as we can see from the findings of this study and numerous prior ones, deep learning models frequently display high forecasting accuracy. In subsequent work, we attempt to model GDP growth forecast using various environmental indices as new features and by incorporating new deep learning techniques in an effort to obtain strong models that enable more accurate and reliable outcomes.
References 1. Beguer´ıa, S., Vicente-Serrano, S.M., Reig, F., Latorre, B.: Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring. Int. J. Climatol. 34(10), 3001–3023 (2014) 2. Bekri, M.H., et al.: Weather drought index prediction using the support vector regression in the ansegmir watershed, upper moulouya, morocco. J. Water Land Dev. (No 50), 187–194 (2021). https://doi.org/10.24425/jwld.2021.138174, http:// journals.pan.pl/Content/121313/PDF-MASTER/2021-03-JLWD-20-Hmaidi.pdf 3. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 4. Grossman, G.M., Krueger, A.B.: Economic growth and the environment. Q. J. Econ. 110(2), 353–377 (1995)
294
I. Ouaadi and A. Ibourk
5. Jung, J.K., Patnam, M., Ter-Martirosyan, A.: An algorithmic crystal ball: Forecasts-based on machine learning. International Monetary Fund (2018) 6. Kahn, M.E., Mohaddes, K., Ng, R.N., Pesaran, M.H., Raissi, M., Yang, J.C.: Longterm macroeconomic effects of climate change: a cross-country analysis. Technical report, National Bureau of Economic Research (2019) 7. Kouziokas, G.N.: Machine learning technique in time series prediction of gross domestic product. In: Proceedings of the 21st Pan-Hellenic Conference on Informatics, pp. 1–2 (2017) 8. Raihan, A., Tuspekova, A.: The nexus between economic growth, renewable energy use, agricultural land expansion, and carbon emissions: new insights from Peru. Energy Nexus p. 100067 (2022) 9. Sa’adah, S., Wibowo, M.S.: Prediction of gross domestic product (GDP) in Indonesia using deep learning algorithm. In: 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pp. 32–36. IEEE (2020) 10. Vicente-Serrano, S.M., Beguer´ıa, S., L´ opez-Moreno, J.I.: A multiscalar drought index sensitive to global warming: the standardized precipitation evapotranspiration index. J. Clim. 23(7), 1696–1718 (2010) 11. Yoon, J.: Forecasting of real GDP growth using machine learning models: Gradient boosting and random forest approach. Comput. Econ. 57(1), 247–265 (2021)
Natural Language Processing and Motivation for Language Learning Moulay Abdellah Kassimi(B) and Abdessalam Essayad ENSA, Agadir, Morocco [email protected], [email protected]
Abstract. Many learners have a great interest in technology such as computers and mobile since it is a part of their daily life. The aim of Artificial Intelligence (AI) in education is to develop intelligent tutoring systems (ITS) to support learner learning and reduce student-teacher contact. The most challenging task was to translate the best educational approaches into a Natural Language Processing (NLP) application. Hence, this article explores the relevance and uses of NLP in the context of online language learning, focusing on the use of technology to accelerate the language acquisition. This includes the innovative Artificial Intelligence applications for the analysis of learner emotions by ITS to increase user engagement. The data in this study is analyzed, based on a qualitative approach, and collected from the web and logs of LMS. Keywords: ICT · Education · Motivation · Natural Language Processing · Applied Linguistics · Language Learning · Context · Semantics
1 Introduction To avoid a total failure of the national and international education system that is confronted by the emergence of the Covid-19, most institutions adopted the digital pedagogy. Online learning thus becomes the most logical solution to ensure pedagogical continuity based on digital technology [1]. Moreover, teachers are faced with the challenge to adopt new ways of teaching and to focus on pedagogical strategies for online teaching to implement flexible learning. The lack of appropriate online courses and the importance of adopting new ways of teaching highlight the need to motivate, therefore, to keep up learner engagement in online courses. An online course for the learner is a way to increase access to the flexibility of training in time and space. Also, it is an opportunity to build autonomy. However, the success of the online course depends on the commitment and autonomy of the learner. Fortunately, an online course itself is motivating to foster motivation. This article is an attempt to solve the motivational problem, a central problem for learning language, in an online course encouraging learners to continue learning through the ICT. With ICT, the teacher has the possibility of using new modes and devices to transmit knowledge. NLP is used to solve a wide range of problems in the context of language learning such as reading, writing, and language tutoring applications. NLP is also used in learning to discover information and social connections and to predict and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 295–307, 2023. https://doi.org/10.1007/978-3-031-26384-2_26
296
M. A. Kassimi and A. Essayad
advise on learning [1]. This information is then used to create a student learning profile to predict his performance in the online course. Thus, the aim is to be able to provide personalized learning. This can be done through various Learning Analytics, and we use NLP to get this information. An understanding of the role of NLP is important for an online course to develop technological systems in language learning. Our proposition can provide an effective solution to keep up learner engagement in online courses increasing learning motivation through the ICT. The goal of ICT is pedagogical, it improves learner learning through innovative approaches. These approaches attempt to create authentic learning contexts rather than rely entirely on traditional ways of teaching and learning. The use of NLP technology to enhance language learning makes learners more confident to learn the language. The focus of this article will be the application of information extraction technology to enable data-driven personalization in the context of e-learning. We are interested in artificial intelligence in order to understand the human natural language.
2 Literature Review In recent years, learners are interested in technology and using computers to learn the language. One way to increase the engagement of learners is to introduce ICT to conduct learning [2]. Therefore, an online course should be adapted for educational purposes to make courses more appealing to the learners. The online course must be adapted to the cultural demands of the learners [3]. Then, ICTs should be introduced in language learning to increase the learners’ motivation. An online course for the learner is a kind of extrinsic motivation, and a learner is supposed to have an intrinsic motivation [4]. Thus, the ICT itself is an incentive that fosters intrinsic motivation [5]. In the article of Azmi [6], the use of ICT encourages autonomy, motivates learning, and improves performance in language learning. Unfortunately, as indicated by Jordan at al. [7], enrolment is often extremely high in online courses with a median value of 87.4% of incompletion. Moreover, the Connectivist principles [1] attract enrolment but do not attract completion. Thus, the active learners’ activity significantly decreases in time [8]. The author in the same article suggests adapting motivational formative assessment tools used in massively multiplayer online role-playing games to Connectivist massive Open Online Courses (cMOOCs). Natural Language Processing is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language. It allows us to perform tasks that are useful such as language translation, Question Answering (QA), sentiment analysis, Chatbots, and speech recognition. The objective of this paper is to apply of the NLP technology to enable data-driven personalization in the context of online courses [9]. Personalization modifies an underlying system to adapt the teaching to the learner according to his behavior and profile [10]. Learner interaction in an online environment leaves a lot of digital traces1 behind that can be collected and analyzed with the aim of optimizing teaching and learning using 1 This refers to the collection and analysis of learners’ interactions with a computer based tutoring
system such as the learner’s engagement with the system, exercises done, time taken to complete, time spent reading and re-reading etc.
Natural Language Processing and Motivation for Language Learning
297
NLP. The major challenge is to choose the right tool to meet the learners’ educational goals and the needs.
3 Research Questions Motivation is a central problem for online courses, especially for language learning. Nowadays, in online platforms, there is no real interaction between the tutor and the learners, and therefore their motivation to learn the language is low. The lack of appropriate online courses and the importance of adopting new ways of teaching feed the need to develop AI-based applications, which allows learners to be motivated in language learning. The idea of the present study came from the issue that it is difficult to encourage learners to continue learning. Our proposition can provide an effective solution to this issue. We pinpoint some techniques and the technologies of NLP giving special attention to answering the following questions: how can we keep motivation when the learner learns language in an online environment? And what kind of content do students engage with? Extracting information from learners’ feedback, such as exercises done, represents a great solution. Indeed, with the NLP solution, we can create conditions favoring numerical pedagogy to increase motivation and interactivity in language learning.
4 Motivation and Artificial Intelligence 4.1 Motivation and Active Pedagogies Online learning presents a challenge that learners must have the motivation to learn in isolation. It encourages interaction between learners and offers new materials, new activities, and an environment for collaborative work. Motivation is a crucial factor for success in online learning contexts. Learners with special abilities could not meet longterm goals without sufficient motivation [11]. This motivation is even more important in a language learning context in which the situation is more complex due to the nature of the language itself. The methodology adopted by the online course or tutor can also influence learners’ motivation. Hence, it is crucial that tutors adapt their methodology according to learners’ interests. ICT, through NLP, can increase flexibility in time and space and facilitate learning. Online learning appears to focus mainly on ICT knowledge while overlooking the relationships between ICT, pedagogy, and content, which are missing in most online courses. Therefore, tutors need to integrate pedagogical strategies during online learning. NLP, a field of Artificial Intelligence, gives machines the ability to read, understand and derive meaning from human languages. It creates intelligent systems capable of understanding, analyzing, and extracting meaning from text and speech. Therefore, by acting on pedagogy we can guide learners according to estimated capacities or adapt content. The online course offers grounds for active pedagogies and represents an authentic framework for implementing the action-oriented approach. Given its characteristics and functionalities, online course arouses the enthusiasm of tutors who have used it in their teaching activities. Furthermore, the online course presents new work situations to be
298
M. A. Kassimi and A. Essayad
exploited by appealing to the theory of connectivism according to which learning can take place through online peer networks [12]. From this perspective, learning is not an accumulation but a process of construction using interactive exchanges between learners by forming learning communities. This is reminiscent of the theory of the proximal zone of development which links the learner with a more learned one [13], and the principle of scaffolding in which social dynamics come to the center stage. 4.2 Artificial Intelligence and Education The aim of Artificial Intelligence (AI) in education is to develop tools, known as intelligent tutoring systems (ITS), to support student learning and to reduce contact between learners and teachers. ITS adapts the next-step instructions monitoring the learner’s interactions, responses, and feedback. It adopts an instructionist pedagogy where learning pathways rather than learning outcomes are automatically personalized for learners [14]. ITS makes various pedagogical choices. Indeed, ITS aims to enable learning in a meaningful and effective way using a variety of computer technologies. They aim to provide immediate and personalized instructions or feedback to learners without requiring the intervention of a human teacher. However, most are designed for mathematics and physics rather than language learning. In recent years, other applications of AI are developed such as automatic writing, automatic summarizing, and chatbots to interact with learners. In this tool where there is continuous interaction, the NLP model is a more dynamic and active learning system [15] which adapts itself to individual learners. In order to learn the language, understanding the role of AI and NLP is important for applied linguists. Thus, an awareness of learning processes is essential to develop ITS systems. Moreover, the same system uses concepts from linguistic fields such as semantics and morphology. 4.3 Natural Language Processing in Education Natural Language Processing (NLP) is a technology-based artificial intelligence that helps computers understand human language and extract relevant information. In short, NLP gives computers the ability to understand human language [16]. Machine learning also, a subfield of AI, is concerned with developing methods to teach computers to learn to perform different tasks automatically by “training” computers to recognize patterns of information from a corpus related to a given task [15]. Based on machine learning, NLP technologies decipher the human language and understand the meaning. To motivate learners in an online course, it is important that we create engaging online learning situations through careful consideration of pedagogical design. The research progress in recent years argues that NLP can have a positive impact on learning. It has the potential to solve larger problems that learners face in education especially in the context of language learning. NLP provides assistance for writing, vocabulary, assessment procedures, as well as acting in pedagogy. It also builds systems that can perform tasks such as machine translation, grammar checking, and other “Big Data” applications. In the context of language learning in an online course, NLP extracts personalized information from the online environment so students can continue learning. Indeed, the learner leaves a lot of digital traces behind that can be collected and analyzed
Natural Language Processing and Motivation for Language Learning
299
to optimize learning. With personalization and creation of user profiles, applications can store individuals’ preferences, then, to identify sentiment, emotions, satisfaction versus dissatisfaction, and so on. Depending on the case, the orientation of the learner optimizes learning and increases online learner engagement such as the services offered by various online e-commerce applications. This orientation leads to differences in concentration and attention; thus, the mental work will disengage the learner and lose attention. In the ITS system, learners must be able to easily navigate to find course materials, vocabulary, assessment, etc. Course materials should be designed in a consistent way to reduce learner confusion and to motivate them to continue and to enjoy learning. The online course requires high levels of motivation; that’s why ITS should check comprehension by asking quick questions to check learner understanding. Moreover, learners, especially those who begin language learning, will also benefit from vocabulary, transcripts, and translation. NLP has been used successfully in educational settings to identify problems. NLP can therefore identify individual differences in learners and has focused on the prediction of vocabulary knowledge, working memory, and sentiment analysis. NLP applications can also identify semantic trends during training, which can help to adapt pedagogy and new ways of teaching by orienting learners to another segment of lessons. This idea may be an effective means to motivate and to help ensure success for learners. Online systems will, then, understand written and spoken human discourse to generate natural language output. 4.4 Applied Linguistics and Language Learning NLP relies on syntactic and semantic analysis techniques to understand grammatical rules and to derive meaning [17]. It contributes to several linguistic tasks including text summarization, sentiment analysis, and part-of-speech tagging. Applied Linguistics aims to provide solutions to real-life problems which include language learning. It covers a wide variety of fields including language acquisition, language teaching, discourse analysis, and translation studies. Applied Linguistics is also used throw linguistic analysis by diagnosing learning difficulties and solving such problems including looking for a compatible strategy [18]. In the context of online course of language learning, ITS must determine which interactive techniques best solve the difficulties related to teaching language. Examining response and errors made by learners; it is possible to determine fields that need reinforcement in teaching. Errors contain valuable information on the strategies that people use to acquire a language [19]. Thus, error analysis, a type of linguistic analysis, focuses on the errors that learners make. Using NLP in the education setting and English grammar, researchers are trying to create solutions to these issues related to applied linguistics. Various problems are faced by tutors and learners alike to understand. The use of effective linguistic tools such as grammar, syntax, and textual patterns are very effective for learning and assessment of the text.
5 Method and Data Analysis Data in the AI era is a pretty important source of any kind of technology. NLP requires a very large corpus to function effectively and it needs to be trained on high-quality data.
300
M. A. Kassimi and A. Essayad
Training data refers to the initial data used to develop a machine learning model, from which the model defines its rules. A corpus can be made up of newspapers, books, and tweets and contains texts that can be used to train AI systems. Moreover, labelling text is important for creating structured training to understand human language for tasks such as sentiment analysis and text generation. This study, based on a qualitative approach, collects information from the web and logs of LMS. NLP is used in learning analytics which is the use of intelligent data, learner-produced data, and analysis models to discover information and social connections and to predict and advise on learning [1]. Learner interaction in an online course environment leaves a lot of digital traces behind. This huge amount of aggregate data is known as Big Data. Those digital traces can be collected and analyzed for the sake of optimizing teaching and learning and enabling the personalization of data-driven learning. The goal of Learning Analytics is to understand how humans learn to adapt the teaching to the learner according to his behavior and profile. It also interests the emotions of the learners. With Learning Analytics and NLP, we can deploy engagement strategies such as finding synonyms, finding sentences in the best contexts, and the missing word as well as acting on pedagogy. Indeed, with Learning Analytics and NLP, we guide learners according to estimated capacities or adapt the content in order to engage learners. 5.1 The Classical System Architecture To honestly motivate, it is necessary that tutors create engaging online learning situations through careful consideration of pedagogical design. Indeed, an online curriculum which is poorly structured or irrelevant can reflect the levels of engagement with the learners. Therefore, it is crucial to develop content and structure that is specific to online learning. However, it is important to consider pedagogy before technology and align assessment with learning outcomes to adapt methodology according to learners’ interests in an online learning environment. The classical online course attempts to simulate the exchanges between the tutor and the learner. It makes sure that the user understands using rule-based systems before asking questions (Fig. 1). Depending on the case, the orientation of the learner will consist of continuing to learn the unit or directing the learner towards remedial possibilities.
Fig. 1. The Classical System’s Architecture
The traditional systems don’t serve learners sufficiently. Thus, the curriculum is designed to suit as many learners as possible but they have difficulties following along.
Natural Language Processing and Motivation for Language Learning
301
5.2 The Proposed System’s Architecture Engaging learners in online learning is vital for success. The challenge is to construct an individual course and to provide adaptive content for learners and how to measure their emotion and cognitive development throughout learning. Furthermore, participation levels can fluctuate during learning as learners face other deadlines, work, or personal commitments. In fact, declining levels of engagement usually indicate a poorly structured or irrelevant online curriculum. However, measuring the level of motivation is important in this situation. By connecting learner intent with pedagogical design, NLP can optimize ITS to accommodate learner needs based on the information gathered during the training, therefore affecting the learning process. The proposed system (Fig. 2) analyses learner feedback, clicks, and choices to identify sentiments and emotions. So, feedback provides information about learners’ progress that can help them to advance in their learning [20].
Fig. 2. The Proposed System’s Architecture
The pedagogical model supports the pedagogical elements that prepares learners for their learning task. It takes the Curriculum and ML-based learner model as input to select strategies for learning. The learners may ask questions or make choices among several choices so as to decide what to do next by inferring the learner’s pedagogical strategy. ML-based learner model stores learner profiles during the registration session. It includes also the learner’s knowledge level and traces that are updated during the learning. Psychological states and learning paths are considered as parameters to guide learning by NLP-based model throw sentiment analysis, which predicts capacities and adapts content and there by allow for the classification and guidance of learners throughout the process of their learning. The curriculum contains the course material to be taught among other courses, it supports the elements of knowledge production. The system also examines strategies for improving motivation in online learning environments.
302
M. A. Kassimi and A. Essayad
6 Strategies for Motivating Learners On the Internet, a wide range of information is available including websites, vocabulary, eBooks, article spinners, semantic search engines, etc. However, not all learners are familiar with learning in an online environment. Therefore, it is useful to provide support and learner orientation activity during learning. This allows learners to be motivated and to familiarize themselves with the online environment. Then, learners should be supported during training. We use NLP to understand the content and we use the ITS to extract information about learners, feedback, places, events, and better analyze sentiment. NLP, throw datadriven, can make various pedagogical choices to engage the learner in learning using sentiment analysis to guide them towards an understanding of the topic being studied. An online course contains at least two components in the curriculum model: Aids in the search for information and the glossary of fundamental concepts. Our system provides the two elements with a smart way to engage learners to continue learning. 6.1 Sentiment Analysis The ITS needs to take into account the literal meaning that semantics provides and the level of pragmatic analysis or understanding of what the text is trying to achieve. The sentiment analysis predicts and regulates the sensitive track of user feedback [21] and allows to maximize the effectiveness of curriculum and increase learner engagement. Our work focuses on the emotion identification of learners in ITS to act in pedagogy and personalize content. Sentiment analysis using deep learning models has received appropriate performance. Using it in ITS gives learners an opportunity to contextualize learning through analysis of feedback and choices. To predict the positivity and the negativity of the feedback, we have built a machine learning model using the lexicon-based approach as in [22]. The sentiment was extracted from feedback using the TfidfVectorizer module [23]. TF-IDF stands for “Term Frequency—Inverse Document Frequency”, is a technique widely used in Information Retrieval and Text Mining to compute a score for each word to signify its importance in the document and corpus. Indeed, and after pre-processing, we calculate the feedback’s sentimental features such as polarity and subjectivity by using TextBlob to tag the feedback as positive, neutral, and negative. Polarity shows us how positive or negative the sentence given is. It is a value change between −1 to 1. Subjectivity shows us whether the sentence is about a fact or opinion (objective or subjective). It is a value change between 0 to 1. The classification of the sentiments, therefore, is based on their polarities. 6.2 Learning Vocabulary in Context The first component in our motivation system is integrating vocabulary into the curriculum where learners are assessed online. It can be a good way to engage them in the ITS content. Thus, learners become motivated by learning vocabulary in context to increase the reading speed. Indeed, if we stop and use a dictionary every time, we come up with a new word and we end up using extra time. Therefore, in order to remember vocabulary, words and phrases need to be learned in context. In addition, when we have meaningful
Natural Language Processing and Motivation for Language Learning
303
context in relation to language learning efforts, we are more likely to understand what we are learning without using a dictionary. Techniques such as Word Embedding are used by NLP to predict the context, then, to find the most related words. The word embedding models allow the computation of the semantic similarity between words given target word. It tends to capture the semantic and syntactic meaning of a word. NLP considers that similar words occur more frequently together than dissimilar words. In other words, NLP considers the frequency of words which are found close together, so that words of similar meaning are often found together. Our system helps to find the right words and develop language skills directly on the application. BERT (Bidirectional Encoder Representations from Transformers) [24] is the model used in this article as a method of pretraining language representations to create models that NLP can fine-tune on a specific task with data. We are using also PyTorch to finetune transformers’ models. This can be done with simple steps. The representation using BERT will help us accurately retrieve results matching the learner’s intent and contextual meaning. It produces word representations that are dynamically informed by the words around them [25]. 6.3 Aids in the Search for Information The second component that motivates learners is the easy way in searching for information. This component is useful to enhance writing that is one of the most important skills. So, we can discover sentence examples in our semantic search engine, the capability to search with meaning. By computing similarity, we found sentences in the best contexts that cause the learner to pay attention, spark interest, and encourage the learner to continue learning. To improve the quality of our model we use a domain-specific context. In the Information Technology context, we use Computer Science articles and related subjects that share the basic language structures with the English language. So, to train our model we introduce many topics including programming, engineering, networks, databases, software, and hardware.
7 Implementations 7.1 Data Collection Data in the AI era is a pretty important source of any kind of technology. NLP requires a very large corpus to function effectively and it needs to be trained on high-quality data. A corpus can be made up of newspapers, books, and tweets and contains text that can be used to train AI systems. Moreover, labelling text is important for creating structured training to understand human language for tasks such as sentiment analysis and text generation. This study, based on a qualitative approach, collects information from the web and logs of ITS or LMS. NLP is used in learning analytics that is the use of intelligent data, learner-produced data, and analysis models to discover information and social connections and to predict and advise on learning. Learner interaction in an online course environment leaves a lot of digital traces behind. This huge amount of aggregate data is known as Big Data. Those digital traces can be collected and analyzed then to
304
M. A. Kassimi and A. Essayad
optimize teaching and learning and enable the personalization of data-driven learning. The goal of Learning Analytics is to understand how humans learn to adapt the teaching to the learner according to his behavior and profile. It also interests the emotions of the learners. With Learning Analytics and NLP, we can deploy engagement strategies such to get synonyms, find sentences in the best contexts, discover the missing word as well as acting on pedagogy. Indeed, with Learning Analytics and NLP, we guide learners according to estimated capacities or adapt the content in order to engage learners. 7.2 BERT and Word Embeddings In this paper, we will use word embeddings technology produced by Google’s BERT to produce our own word embeddings, then, to extract features from text data. These embeddings are trained on large datasets, saved, and then used for solving other tasks. They are useful for keyword semantic search and information retrieval. Indeed, these representations will help us accurately retrieve results matching the learner’s intent and contextual meaning. BERT offers an advantage over models like Word2Vec and Fasttext, thus it produces word representations that are dynamically informed by the words around them. Therefore, tokens would be different for each sentence and obvious differences is captured like polysemy. We are installing the PyTorch interface for BERT by Hugging Face. BERT is a pretrained model that is trained with one task to help it form parameters that can be used in other tasks. In addition, BERT provides its own tokenizer, where it first checks if the whole word is in the vocabulary. After splitting the text into tokens, we then have to convert the sentence to a list of vocabulary. 7.3 Results To illustrate our approach, we’ll define an example of sentence with multiple meanings of the word bank: “After stealing money from the bank vault, the bank robber was seen fishing on the Mississippi river bank”. First five vector values for each instance of “bank” are: bank vault ([3.3596, −2.9805, −1.5421, 0.7065, 2.0031]) bank robber ([2.7359, −2.5577, −1.3094, 0.6797, 1.6633]) river bank ([1.5266, −0.8895, −0.5152, −0.9298, 2.8334]) In this sentence, we can confirm that the value of vectors is in fact contextually dependent. Cosine Similarity is a measurement that quantifies the similarity between two or more vectors. So, the similarity between two documents can be obtained by converting the words within the sentence into a vectorized form of representation. In vector values above for “bank”, the values differ and we can calculate the cosine similarity to make a more precise comparison. In “bank robber” vs “river bank” (different meanings), the Vector similarity is: 0.69, and in “bank robber” vs “bank vault” (same
Natural Language Processing and Motivation for Language Learning
305
meaning), the Vector similarity is: 0.94. Therefore, BERT can distinguish similar words under different contexts and learn their embeddings more accurately and the Cosine similarity is a good way to compare similarity between pairs of word vectors. In addition, we can find the most similar sentence using both sentence-transformers and a lower-level implementation with PyTorch and transformers. So, the most similar sentence contains the same meaning without using the same words. 7.4 Discussion Word embedding encodes the meaning of the word and does very well at capturing the semantics of analogy using word’s context to disambiguate polysemes. The principle is that “words that have similar semantic and contextual meaning also have similar vector”. In this paper we use BERT to generate contextualized word embeddings. However, it is very compute-intensive at inference time, thus there is a need to compute vectors every time for capturing meaning in production, then it can become costly. Whereas vectors in other models like Word2Vec and Fasttext are pre-calculated for each word and saved in a database and we will use the same vector for any sentences. In other hand, our system can use sentiment analysis or opinion mining to analyze learner feedback and comments gathered from online courses or social media. We can use the results to identify any areas of learner dissatisfaction, then and by acting on pedagogy we can guide learners according to estimated capacities or adapt content. However, it is difficult to recognize things like sarcasm and irony, then to skew the results. So, sentiment analysis does a really great job, but it is not perfect.
8 Conclusions Natural Language Processing offers rich opportunities to develop systems in support of language learning and provides a perfect solution to the motivation problems in the online context. This article described our structure developed for language learning in the online environment in order to reinforce learner motivation and to make the process of language learning more enjoyable. The architecture was presented and each element has been discussed. The main contribution consists of the NLP model for the IT Field. The focus was the application of information extraction technology to enable data-driven personalization in the context of e-learning, especially, the real-time meaning for most words using domain-specific vocabulary, and the recognition of the emotion of a learner as well as of profile, thereby adapting methodology according to learners’ interests. With a pre-trained BERT model, we create a high-quality model with minimal effort that NLP can fine-tune on a specific task. By associating learner intent with motivation system, NLP can optimize online courses accommodate learner needs. In the future, we consider using other strategies to motivate learners to develop models based on NLP. We will concentrate on translation using some techniques and the technologies of automatic translation by machine.
306
M. A. Kassimi and A. Essayad
References 1. Siemens, G.: What are learning analytics (2010). http://www.elearnspace.org/blog/2010/08/ 25/what-are-learning-analytics/ 2. Heemskerk, I., Volman, M., Admiraal, W., ten Dam, G.: Inclusiveness of ICT in secondary education: students’ appreciation of ICT tools. Int. J. Incl. Educ. 16(2), 155–170 (2012) 3. Kreutz, J., Rhodin, N.: The influence of ICT on learners’ motivation towards learning English. Degree Project in English and Learning (Malmö Högskola Fakulteten för Lärande och Samhälle) (2016). https://muep.mau.se/bitstream/handle/2043/20747/Degree%20Project% 20Josefin%20&%20Natalie.pdf?sequence=2. Accessed 05 Oct 2018 4. Hartnett, M.: The importance of motivation in online learning. In: Hartnett, M. (ed.) Motivation in Online Education, pp. 5–32. Springer, Singapore (2016).https://doi.org/10.1007/978981-10-0700-2_2 5. Lepper, M.R., Malone, T.W.: Intrinsic motivation and instructional effectiveness in computerbased education. In: Snow, R.E., Farr, M.J. (eds.) Aptitude, Learning and Instruction. Conative and Affective Process Analyses, vol. 3, pp. 255–286. Lawrence Erlbaum Associates, Hillsdale (1987) 6. Azmi, N.: The benefits of using ICT in the EFL classroom: from perceived utility to potential challenges. J. Educ. Soc. Res. 7(1), 111–118 (2017) 7. Jordan, K.: Massive open online course completion rates revisited: assessment, length and attrition. Int. Rev. Res. Open Distrib. Learn. 16(3), 341–358 (2015) 8. Danka, I.: Motivation by gamification: adapting motivational tools of Massively Multiplayer Online Role-Playing Games (MMORPGs) for peer-to-peer assessment in connectivist Massive Open Online Courses (cMOOCs). Int. Rev. Educ. 66, 75–92 (2020) 9. Votch, V., Linden, A.: Do You Know What Personalization’ Means? Gartner Group Research Note (2000) 10. Smith, D.: There are myriad ways to get personal. Internet Week Online (2000) 11. Dörnyei, Z.: Motivational Strategies in the Language Classroom. Cambridge University Press, Cambridge (2006) 12. Siemens, G.: Connectivism: a learning theory for the digital age. Int. J. Instr. Technol. Distance Learn. 2(1), 3–10 (2005) 13. Vygotsky, L.: Thought and Language. MIT Press, Cambridge (1986) 14. Kukulska-Hulme, A., et al.: Innovating pedagogy 2020: Open University Innovation Report 8 (2020) 15. Vajjala, S.: Machine Learning in Applied Linguistics. The Encyclopedia of Applied Linguistics, pp. 1–8 (2012) 16. Fukushima, K., Miyake, S.: Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recogn. 15(6), 455–469 (1982) 17. Keezhatta, M.S.: Understanding EFL linguistic models through relationship between natural language processing and artificial intelligence applications. Arab World Engl. J. 10(4), 251– 262 (2019) 18. Khan, I.A.: Role of applied linguistics in the teaching of English in Saudi Arabia. Int. J. Engl. Linguist. 1(1), 105 (2011) 19. Richards, J.C.: A non-contrastive approach to error analysis. In: Richards, J.C. (ed.) Error Analysis: Perspectives on Second Language Acquisition, pp. 172–188. Longman, London (1974) 20. Oliveira Neto, J.D., Cornachione Jr., E., Nascimento, E.: Paving the path for those who come after: designing online instruction to make the best use of previous student’s experiences in light of communal knowledge. In: Proceedings of the Anais 2009, Washington D. C. Academy of Human Resource Development (2009)
Natural Language Processing and Motivation for Language Learning
307
21. Luo, F., Li, C., Cao, Z.: Affective-feature-based sentiment analysis using SVM classifier. In 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 276–281. IEEE (2016) 22. Arambepola, N.: Analysing the tweets about distance learning during COVID-19 pandemic using sentiment analysis. In: International Conference on Advances in Computing and Technology (ICACT–2020) Proceedings (2020) 23. Dichiu, D., Rancea, I.: Using machine learning algorithms for author profiling in social media. In: CLEF (Working Notes), pp. 858–863 (2016) 24. Devlin, J., et al.: Bert: Pre-training of Deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 25. Chris, M., Nick, R.: BERT Word Embeddings Tutorial (2019). http://www.mccormickml. com
Generating Artworks Using One Class SVM with RBF Kernel Mohamed El Boujnouni(B) Laboratory of Information Technologies, Chouaib Doukkali University, National School of Applied Sciences, BP 5096, El Jadida, Morocco [email protected]
Abstract. This work presents a new use of a supervised machine learning model to generate automatically artworks after a learning step. The forms of art that will be discussed in this paper are painting and drawing. The proposed model is based on the decision function of One Class Kernel Support Vector Machine (OCKSVM) with a Radial Basis Function (RBF). Experimental results using a large training dataset composed of portrait paintings and drawings belonging to six significant art movements in art history, including impressionism, post-impressionism, high renaissance, northern renaissance, rococo, and ukiyo showed that the proposed model is able to generate automatically and successfully new digital paintings and drawings. This model can be extended easily to generate other kinds of arts like literature, music, etc. Keywords: OCSVM · Decision function · Artworks · RBF Kernel
1 Introduction Can a machine compose a new symphony after enjoying listening to classical music? Can a machine paint a portrait after observing some existing ones? Can a machine write a new poem after reading poetry collections? Answering these questions leads us necessarily to a related subject which is artificial intelligence (AI) and, more specifically, machine learning (ML). As it is known, AI is the development of computer systems that are able to perform tasks that would require human skills (e.g. visual perception, speech recognition, decision-making, etc…). While ML, a subfield of AI, gives machines the ability to learn and improve without the help of humans or new programming. ML can learn categories or labels y of input data x through two distinct strategies: 1) Generative, It learns a model of the joint probability p(y, x) of the inputs x and the label y and then uses Bayes’ rule to identify which category was most probably to have generated the object x. 2) Discriminative, it estimates the posterior p(y/x) over category labels given inputs directly or learns a direct map from inputs x to the class labels y. Theoretical and empirical analyses have shown that generative and discriminative strategies differ in term of generalization behavior, as well as the speed and accuracy of learning [1–3]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 308–317, 2023. https://doi.org/10.1007/978-3-031-26384-2_27
Generating Artworks Using One Class SVM with RBF Kernel
309
Among a wide range of ML models, this work interests on OCSVM [4] and aims to benefit from its high generalization capability not to classify new unknown objects but to generate them automatically. The objects that will be discussed here are artworks. OCSVM is an anomaly detection method invented by Schölkopf et al. [4] to handle training using only positive information (“one-class” classification). The purpose of OCSVM is to find the maximal margin hyperplane which best separates the training data from the origin, its boundary will be used to decide either or not a new sample is an outlier. OCSVM has very attractive properties: i) it has a rigorous mathematical foundation based on statistical learning theory, ii) it is a convex optimization problem, and hence has unique global optimum, iii) it uses a soft boundary hyperplane for decisionmaking which improves data description, iv) it can use kernel functions to map data points from input space to feature space and hence improves the description accuracy. This work is based on the idea introduced by [5] in which the authors used the decision function of OCSVM to generate automatically fake human faces. This function was constructed based on a large dataset of real human faces. By analogy, instead of working with human faces this paper proposes working with artworks (paintings and drawings). This paper is organized as follows. Section 2 presents a brief and recent review of the most relevant works regarding this study. Section 3 presents One Class SVM with its mathematical details and the basic idea used to generate new artworks. Section 4 shows the experimental dataset, setting and results of our method. Finally, the last section provides some concluding observations.
2 Related Works Generating Art with AI is a recent subject of study that is gaining more attention and is becoming more popular for the last decade. The first-ever original work of art named Portrait of Edmond de Belamy (Fig. 1) was created in 2018. This masterpiece achieved through Generative Adversarial Networks [6] (GANs) was sold for an important amount
Fig. 1. Portrait of Edmond de Belamy, from La Famille de Belamy (2018). Courtesy of Christie’s Images Ltd.
310
M. El Boujnouni
of money at auction. To produce artificial paintings Obvious, the group who made this work, had trained GAN on a selection of more than 15,000 classical portraits created between the 1300s and the 1900s [7]. GANs are based on competition between two convolutional neural networks: discriminator D and Generator G. The former capture the distribution of some target data (e.g. distributions of pixel in images). It accepts random noise as input, and gradually learns how to convert the noise into output images, while the latter distinguishes between real and generated images. The goal is to train these two models competitively, such that the generator creates data in a way that the discriminator can no longer decide if it’s real or synthesized. Many improvements regarding the resolution and the quality of images produced by GAN have been proposed [8–11]. Among these works, [11] is the most effective because of its capability to control the images synthesis process by GAN and to produce them with a high quality. This control is performed through a new generator that starts from a learned constant input and adjusts the style of the image at each convolution layer based on the latent code. The results of this improvement are illustrated in [12] and a random example is shown in Fig. 2.
Fig. 2. Example of artworks generated by [12].
Another variant of GAN called Creative Adversarial Network (CAN) was proposed by [13]. This new version uses “stylistic ambiguity” to ensure that the arts generated will be novel but at the same time will not depart too much from acceptable aesthetic standards. Figure 3 shows an example of the artworks generated by CAN [13].
Generating Artworks Using One Class SVM with RBF Kernel
311
Fig. 3. Example of the artworks generated by CAN [13].
3 A Brief Overview of the Used Method The method used in this paper was first introduced by [5] to generate fake human faces automatically. This method is based on OCSVM with RBF kernel. As known, there are two different version of one class SVM, the former was proposed by Schölkopf et al. [4] and the latter was suggested by Tax and Duin [14]. [5]’ work focuses on the first version that learns a decision boundary (a separating hyperplane) achieving maximum separation between the examples of a training dataset and the origin as illustrated in Fig. 4. Following SVM, only the new points lie above the learned hyperplane will belong to the training dataset. The sign of the decision function, the equation of the hyperplane, will be used to know their membership.
Fig. 4. Example of separating a class of interest using One-class SVM
312
M. El Boujnouni
Mathematically speaking, suppose we have a set of training examples S = {x1 , x2 , . . . , xl } and a kernel map φ(x) which transforms the data of S from RN to another space F. OCSVM aims at separating the training examples of S from the origin by a hyperplane defined by the equation: wx − ρ = 0 while maximizing the distance from this latter to the origin. OCSVM optimization problem was formulated as follows [4]: min
1 2 2w
+
1 v.l
l
εi − ρ
i=1
(1)
Subject to : w.φ(xi ) ≥ ρ − εi ∀ i = 1, .., l ∀ i = 1, .., l and εi ≥ 0
where εi with i = 1, 2, . . . l are nonzero slack variables, their effect is to allow certain constraints to be violated. v ∈ [0, 1] is a parameter that characterizes the solution: It fixes an upper bound on the fraction of outliers, and a lower bound on the number of training examples used as support vectors. w and ρ are the vector and offset of the Multipliers corresponding to decision boundary respectively. Let αi be the Lagrange the inequality constraints in Eq. (1) and K xi , xj = φ(xi ).φ xj a kernel function. The primal problem given by Eq. (1) becomes as follows: minα 21
l,l
αi αj K xi , xj
i=1,j=1
Subject to 0 ≤ αi ≤
1 v.l ,
l
(2) αi = 1
i=1
The objective function Eq. (2) can be minimized using Quadratic Programming tools. The non-null Lagrange multipliers will be used to decide whether or not a new sample xnew belongs to S. The decision function of OCSVM is given by the following equation: f (xnew ) = sign(w.φ(xnew ) − ρ) = sign
l
αi K(xi , xnew ) − ρ
(3)
i=1
Habitually, the sign of this function is used to decide if a new example xnew belongs to S (positive sign) or outlier (negative sign). But, in [5] this function was used differently. Instead of checking its sign for a given new sample (xnew ), this latter was considered as unknown and this function was maximized with respect to it. This will allow finding a xnew that is located on above the hyperplane.
Generating Artworks Using One Class SVM with RBF Kernel
313
The algorithm proposed by [5] to find xnew is:
4 Experimental Setting and Results The dataset used in this study was collected from [15] and contains two subsets, “orgimg” and “faces_6class”. Both include six significant art movements in art history: 1) Impressionism 2) post-impressionism 3) high renaissance 4) northern renaissance 5) rococo 6) ukiyo. The “orgimg” is a collection of portrait paintings, while “’faces_6class” is a cropped face area from “orgimg”. Figure 5 show six samples, each from an art movement extracted from faces_6class subset.
Fig. 5. Examples of portrait paintings extracted from [15]
314
M. El Boujnouni
The tuning of the hyperparameters of OCSVM was done based on a combination of grid search and 3-fold cross-validation. The search ranges of v and γ for optimal values in term of classification accuracy are {0.1, 0.2, 0.3} and 2−30 , 2−28 , .., 20 respectively. Tables 1, 2, 3, 4, 5 and 6 show the portrait paintings generated for each art movement as described above. First thing that can be observed is that all portraits are of human faces, which means that OCSVM has learned successfully meaningful patterns during training. Visually, the portraits belong to the art movement of the training subsets. The number of epochs needed to find a new portrait is very limited (generally less than 200) which is advantageous in term of computational cost. All of the decision functions f (xnew ) are positives which means that the new artworks are above the hyperplane of OCSVM (acceptable solutions).The hyperparameter v has the same value (0.1) for all the subsets; contrary to γ which has different optimal value in each experiment which means that γ is more influenced by the training dataset than v.
Table 1. Painting generated using impressionism art movement subset
Table 2. Painting generated using post-impressionism art movement subset
Generating Artworks Using One Class SVM with RBF Kernel Table 3. Painting generated using high renaissance art movement subset
Table 4. Painting generated using Northern Renaissance art movement subset
315
316
M. El Boujnouni Table 5. Painting generated using rococo art movement subset Training subset
Initialization : Picture with random values
Result
100 portraits of 74 × 74 pixels = 9.5367431640625e − 07 = 0.1 ( ) = 0.010089 ℎ = 190
Table 6. Painting generated using Ukiyo art movement subset Training subset
Initialization : Picture with random values
Result
100 portraits of 74 × 74 pixels = 2.38418579101562e − 07 = 0.1 ( ) = 0.005241 ℎ = 191
5 Conclusion This paper presents a new use of one class Support Vector Machine with Radial Basis Function to generate automatically new artworks (paintings and drawings). The generation of these latter goes through two steps. In the first step, OCSVM was trained on a large set of portrait paintings and drawings collected from six art movements. The hyperparameters of this classifier v and γ was tuned using a grid search strategy and 3-fold cross-validation. In the second step, the sign of the decision function of the best model (i.e. having the optimal hyperparameters) will be used to generate a new artwork using Newton’s method. The experimental results have shown that our approach can produce wonderful new portrait paintings and drawings.
Generating Artworks Using One Class SVM with RBF Kernel
317
References 1. Efron, B.: The efficiency of logistic regression compared to normal discriminant analysis. J. Am. Stat. Assoc. 70, 892–898 (1975) 2. Ng, A.Y., Jordan, M.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in Neural Information Processing Systems 17. MIT Press, Cambridge (2001) 3. Xue, J., Titterington, D.M.: Comment on “On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes”. Neural Process. Lett. 28, 169–187 (2008) 4. Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1472 (2001) 5. El Boujnouni, M., Jedra, M.: Generating fake human faces using One Class SVM. In: The 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology - IRASET, Meknes, Morocco (2022) 6. Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) 7. Eshraghian, J.K.: Human ownership of artificial creativity. Nat. Mach. Intell. 2(3), 157–160 (2020) 8. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. CoRR, abs/1809.11096 (2018) 9. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017) 10. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018) 11. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019) 12. https://thisartworkdoesnotexist.com/ 13. Mazzone, M., Elgammal, A.: Art, creativity, and the potential of artificial intelligence. Arts 8(1), 26 (2019) 14. Tax, D.M., Duin, R.P.: Support vector domain description. Pattern Recognit. Lett. 20(11–13), 1191–1199 (1999) 15. Yang, J.: Portrait Painting Dataset For Different Movements, Mendeley Data, V1 (2021). https://doi.org/10.17632/289kxpnp57.1
Multiobjective Evolutionary Algorithms for Engineering Design Problems Youssef Amamou(B) and Khalid Jebari LMA, FSTT, Abdelmalek Essaadi University, Tetouan, Morocco [email protected] Abstract. As computation tools have evolved, along with the emergence of industrial breakthroughs, there has been a proliferation of engineering design real world problems, in the form of multiobjective problems models. Solving this type of problems using multiobjective evolutionary algorithms (MOEAs) has attracted much attention in the last few years. In this paper, we will focus on the most up-to-date and efficient evolutionary multiobjective algorithms (EMO). The majority of these algorithms have been tested on theoretical test problems, in order to validate the obtained results in terms of convergence and diversity. In this work, the test will be built out of engineering design real world problems, to verify the extent to which MOEAs are capable of producing good results. We will proceed as follows: Present the (MOEAs) used and adjust the parameters of the algorithms in order to obtain the best results, choose different problems in terms of objective functions and constraints, model problems with Matlab and solve them using the Platemo platform, to eventually comment and compare the different obtained results. Keywords: Metaheuristics · Evolutionary algorithms · Engineering design problems · Engineering optimization · Multiobjective Optimization
1
Introduction
Optimization is a process of searching in the feasible solutions space, until no better solution can be found, in the case of multiobjective optimization, where we have several objective functions to optimize simultaneously, and which are contradictory, we find the definition of Trade-off and Pareto Front for multiobjective optimization evolutionary algorithms, which give a set of combination of solutions and not a single solution due to their population approach, The choice by the decision maker of one solution over the others requires an additional knowledge of the problem such as the relative importance of different objectives. The presence of multiple contradictory objectives is inevitable in any optimization process in the real world optimization problems including the presence of constraints, which has always posed problems to find the best solutions because of the cost and time consumption of the design approach. With the advancement of the computing tool, a new opportunity to use multiobjective optimization algorithms is offered and gave very promising results. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 318–331, 2023. https://doi.org/10.1007/978-3-031-26384-2_28
Multiobjective Evolutionary Algorithms for Engineering Design Problems
319
Examination of the search behavior of algorithms is an important issue in evolutionary optimization, in this paper, we will perform a comparison of the most famous and current algorithms which are different in terms of search procedure to solve the most challenging engineering design problems, all the algorithms parameters tuning and the problems coding was made using Platemo [7], a famous platform in Matlab in the field of multiobjective optimization. The obtained results turn out to be very promising, and the diversity of the types of the problems treated, makes it possible to confirm the applicability of the considered algorithms in the resolution of the various engineering design problems.
2
Algorithms and Problems Definition
In this section, we will introduce the different terms and concepts encountered in this document as well as the acronyms used, we will also present the different algorithms used with the pseudocodes and the operating mode of each, and we will give the formulation of the engineering design problems used in the experimental part. Multiobjective Optimization Problem (Mathematical Model): Mathematically, a MOO problem can be formulated as follows [2]: min s.t.
fm (x) = (m = 1, . . . , M ) x =(x1 , . . . , xn ) gj (x) ≥ 0,(j = 1, . . . , J) hk (x) = 0,(k = 1, . . . , K)
(1)
Pareto Dominace: The concept of “optimality” of single-objective optimization problems does not apply directly in the multiobjective context, the notion of Pareto optimality [1] has to be introduced. The dominance relation of Pareto presents a fundamental notion in this area since its discovery, and we say a solution x1 dominates a solution x2 , if: – Solution x1 is no worse than x2 in all objectives; – Solution x1 strictly better than x2 in at least one objective; where it is impossible to improve an objective function for a solution without degrading another objective, Pareto Front: Is the set of points forming the Pareto non-dominated solutions in the objective space. Pareto Set: Is the set of points forming the vectors of non-dominated solutions in the decision space.
320
2.1
Y. Amamou and K. Jebari
Multiobjective Evolutionary Algorithms
Multiobjective evolutionary algorithms (MOEA) are able to approximate the Pareto optimal set in a single run, they have attracted a lot of research effort during the last 25 years and still a hot topic to work on, in this paper we will use different algorithms with different evolution mechanisms (Genetic algorithms, reference point algorithm, decomposition based algorithms) with different selection and offspring reproduction operators, to better compare recent algorithms but with different behaviors and different classes. Push and Pull Search (PPS) [11] is an algorithm for solving constrained multi-objective optimization problems: The name of this algorithm PPS is inspired by the procedure it uses by dividing the search process into two different stages: push and pull search stages: Push stage: a multi-objective evolutionary algorithm (MOEA) is used to explore the search space without considering any constraints, which makes it possible to reach infeasible regions and to approach the unconstrained Pareto front. Pull stage: a modified form of a constrained multi-objective evolutionary algorithm (CMOEA) based on the information discovered and gathered in the push stage, is applied to pull the infeasible individuals achieved in the push stage to the feasible and non-dominated regions. The pseudocode for this algorithm is described below: Algorithm 1: Push and pull search: PPS Input : N: the number of subproblems, Tmax : the maximum generation, N: Weight vectors: λ1 ,λ2 ,...,λN . T: the size of the neighbors, nr :the maximal number of solutions replaced by a child. Tc : the control generation for (k) . Output: NS: a set of feasible non-dominated solutions. 1
2 3 4 5 6 7 8 9 10 11
PPS decompose a constrained multi-objective optimization problem (CMOP) into N sub-problems associated with weight vectors. Generate a population P = x1 , ..., xN randomly. For each i=1,...,N, set B(i) = {i1 , ..., iT } where λi1 ,..., λiT are the T closest weight vectors to λi . Set the ideal point zj∗ =mini=1,...,N fj (xi ). Set k=1 . Set rk =1.0, PushStage= true, maxViolation=-1. SetIdealNadirPoints(p,k) for each i=1,...N, UpdateMaxViolation(xi ,maxViolation) while k ≤ Tmax do { calculate (k) F or i=1 to N do { Generate a new solution y i UpdateIdealPoint(yi , z∗ ) UpdateMaxViolation(y i ,maxViolation)} k=k+1, SetIdealNadirPoints(p,k) NS=NDSelect(NS∪P,N)}.
Multiobjective Evolutionary Algorithms for Engineering Design Problems
321
Adaptive Non-dominated Sorting Genetic Algorithm (ANSGA-III) is based on the NSGA-II framework, was suggested and applied to a number of unconstrained (with box constraints alone). Adaptive NSGA-III [10] extends NSGA-III to solve generic constrained many-objective optimization problems, this algorithm is suitable for solving large scale problems. The ANSGA-III modifies certain operators of NSGA-III to emphasize feasible solutions more than the infeasible solutions in a population with two main changes: – Keep the overall algorithm parameter-less. – If all population members are feasible, or an unconstrained problem is supplied, the constrained NSGAIII reduces to the original unconstrained NSGAIII algorithm. The word adaptive to designate that NSGA-III identifies non-useful reference points and adaptively deletes and includes new reference points in addition to the supplied reference points. To show NSGA-III’s ability to be hybridized with a decision-making technique, NSGA-III is applied with a few preferred reference points. Being similar to NSGA III, the difference is in the tournament selection procedure, which is described below: Algorithm 2: Tournament Selection(p1, p2) procedure:ANSGA-III Input : p1,p2. Output: p’. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
if feasible(p1) = TRUE and feasible(p2) = FALSE then: p’=p1 else if feasible(p1) = FALSE and feasible(p2) = TRUE then: p’=p2 else if feasible(p1) = FALSE and feasible(p2) = FALSE then: if CV (p1)>CV (p2) then p’ = p2 else if CV (p1) 0. This assumption holds if the data have been preprocessed to remove any correlations between the input features, which can be achieved using PCA. The quadratic approximation of the L1 regularized objective function decompose into sum over parameters: 1 ∗ 2 ˆ Hi,i (wi − wi ) + λ |wi | L(w) = L(w) + (4) 2 i The solution of this approximate cost function has an analytical solution with the following form: λ ;0 wi = sign (wi∗ ) max |wi∗ | − Hi,i This equation shows that the L1 regularization favors sparsity; and sparse models are usually preferable to more ones that are complex [7,19]. 2.3
Dropout
Dropout is a regularization technique for neural network models proposed by Srivastava [9] Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. Applying dropout to a neural network amounts to sampling a “thinned” network from it. The thinned network consists of all the units that survived dropout (Fig. 2). So training a neural net with n units, can be seen as a collection of 2n possible thinned neural networks. These networks all share weights so that the total number of parameters is still O n2 , or less. For each presentation of each training case, a new thinned network is sampled and trained.
446
C. A. Mehdi et al.
Fig. 2. An example of neural network architecture trains with dropout. Each of hidden neuron is randomly dropped out with Bernoulli distribution (p).
Let’s consider a neural network with K layers where j ∈ {0, . . . , K − 1} is an index of the network layer. Index j = 0 and j = K − 1 are represented as the input layer and output layer, respectively. Suppose Z(j) denote the input vector to the layer j and similarly, y(j) express the output vector to the layer j. In addition, W (j) is the weight parameters at layer j. Based on the operation of the MLP network, the input and output vectors at the hidden layers (j = 1; . . . . . . .j = K − 2) can be computed by: Z j+1 = W j+1 y j &y j+1 = a Z j+1 where a(.) an activation function; where dropout is applied, the operation is becomes: δ j ∼ Bernoulli (p) y˜j = δ j y j Z j+1 = W j+1 y˜j y j+1 = a Z j+1 where is multiplication element by element. The objective function to minimize is: minW Eδ∼ bernoulli [L(W ; δ)] Suppose that the function L(W ) = y − XW 22 ; when X is dropped the function objective is:
min Eδ−bernoulli y − (δ X)W 22 W
Regularization in CNN
447
That equivalent as: min y − pXW 22 + p(1 − p)Γ W 22 W
1/2 ˜ = pW the problem becomes: where Γ = diag X T X If we consider W ˜ 2 + min y − X W 2 W
1−p ˜ 2 Γ W 2 p
From this expression, we can notice that dropout is equivalent to the L2 regularization in this case with a hyperparameter λ = 1−p p ; note that if p = 0.5 (value suggested by several authors for an optimal result) we get λ = 1.
3
Comparison of L1 and L2 Effect
The main intuitive difference between L1 and L2 regularization is that L1 regularization attempts to estimate the median of the data while L2 regularization attempts to estimate the mean of the data to avoid overfitting. We present their differences according to the following criterion – Robustness: Robustness is defined as resistance to outliers in a dataset. L2norm is more sensitive to the outliers than L1-norm since it squares the error. Therefore, L1 may be a better choice where outliers may be safely and effectively ignored. If outliers should be considered in some cases, then L2-norm is better. – Stability: Stability is resistance to small horizontal adjustment of a datum. L2-norm is more stable in small adjustment of a data point is because L2norm is continuous. L1 has absolute value which make – Computational efficiency: L1 is a non-differentiable piece wise function, which has no analytical solution, but L2 is not. Therefore, L2 is more computationally efficient. It is worth noting that L1 haves the sparsity properties which allows it to be used along with sparse algorithms, which makes the calculation more efficient. – Solution uniqueness: L2-norm is Euclidean distance, which is the shortest path between two points. It only has one solution. While L1 may have more than one. – Sparsity: The build-in feature selection property explains L1 is sparser than L2.
4
Experiment
This section describes the experimental procedure and show the results of the comparison between the L2 regularization and dropout. In order to measure the advantage of each method for training neural networks, several hidden layer network architectures of varying complexity were used to classify the MNIST dataset of handwritten digit images.
448
C. A. Mehdi et al.
In addition, each single hidden layer network architecture of L2 regularization and dropout were tuned to the same learning rate parameter (α = 1). The figure image was normalized and centered in a fixed size image field of 28 × 28, as shown in Fig. 3.
Fig. 3. An example of neural network architecture trains with dropout. Each of hidden neuron is randomly dropped out with Bernoulli distribution (p).
In our experiment, we assume that the number of hidden neurons between 10 and 100 is considered not too complex, while the number of neurons between 100 and 1000 is relatively complex for the MNIST classification problem. Note that the notation of the network architecture used in this study represents the number of neurons in the order of input layers, hidden and output layers. The experimental results shown in Table 1 are based on the following parameters. In the L2 regularization, the regularization parameter (λ) is fixed at 1 × 10 − 4. This value is imposed during the optimization to balance the reconstruction error and the squared sum of the weight parameters in the total loss function. In dropout learning, we choose to use a probability of dropping the neuron(s) with p = 0.5 (values between 0.4 and 0.7 used in practice). We can notice that the accuracy performance of the L2 regularization increases in the L2 regularization accuracy performance grows in the case of a small network architecture and degrades as the network becomes larger. Similarly, L2 regularization achieves better prediction accuracy than dropout for a small number of hidden neurons. Nevertheless, the rate of change in performance is significantly reduced in larger networks. Though the performance of the dropout training is also reduced when the networks are larger, the rate of change is slightly slower and more robust than the L2 regularization.
Regularization in CNN
449
Table 1. Performance of the classification between the regularization L2 and the dropout on the MNIST database
L2 regularization
Dropout
Network architecture % accuracy % rate of change % accuracy % rate of change 784 − 10 − 10 784 − 20 − 10 784 − 30 − 10 784 − 40 − 10 784 − 50 − 10
92.98 94.88 95.52 96.04 96.42
– 2.04 0.67 0.54 0.40
89.48 92.02 93.43 94.01 94.43
– 2.84 1.53 0.62 0.45
784 − 100 − 10 784 − 200 − 10 784 − 300 − 10 784 − 400 − 10 784 − 500 − 10 784 − 600 − 10 784 − 700 − 10 784 − 800 − 10 784 − 900 − 10 784 − 1000 − 10
96.71 96.80 96.83 96.81 96.74 96.57 96.65 96.59 96.32 95.86
– 0.09 0.03 −0.02 −0.07 −0.18 0.08 −0.06 −0.28 −0.48
95.72 96.73 96.81 97.00 97.04 97.09 97.09 97.16 97.20 97.10
– 1.06 0.08 0.20 0.04 0.05 0.00 0.07 0.04 −0.10
5
Conclusion
Dropout, L2 and L1 norm are methods often used to improve the generalization of models. The main idea of dropout is to randomly drop neurons using Bernoulli trigger variables during training, so that the network contains unreliable neurons to prevent coadaptation between them. In contrast L1 and L2 penalize weights to favor learning toward the axes of the most relevant variables The major advantage of dropout allows a single network to model a large number of different subnetworks through a simple and inexpensive means of training and testing. Experimental results show that dropout training in a large network not only provides better performance improvement but is more robust than L2 regularization. On the other hand, L2 regularization gives better predictive accuracy than dropout in a small network, since the mean learning model will improve the overall performance when the number of submodels is large and each of them must be different from the other.
References 1. He, J., Jia, X., Xu, J., Zhang, L., Zhao, L.: Make 1 regularization effective in training sparse CNN. Comput. Optim. Appl. 77(1), 163–182 (2020). https://doi. org/10.1007/s10589-020-00202-1
450
C. A. Mehdi et al.
2. Liu, M., Shi, J., Li, Z., Li, C., Zhu, J., Liu, S.: Towards better analysis of deep convolutional neural networks. arXiv:1604.07043 Cs, May 2016. http://arxiv.org/ abs/1604.07043. Accessed 27 Dec 2021 3. Wang, Y., Bian, Z.-P., Hou, J., Chau, L.-P.: Convolutional neural net-works with dynamic regularization. arXiv:1909.11862 Cs, December 2020. http://arxiv.org/ abs/1909.11862. Accessed 26 Dec 2021 4. Mikolajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary Ph.D. Workshop (IIPhDW), pp. 117–122, May 2018. https://doi.org/10.1109/IIPHDW. 2018.8388338 5. Yao, Y., Rosasco, L., Caponnetto, A.: On early stopping in gradient descent learning. Constr. Approx. 26(2), 289–315 (2007). https://doi.org/10.1007/s00365-0060663-2 6. Cortes, F.N., et al.: L2 regularization for learning kernels. https://arxiv.org/ftp/ arxiv/papers/1205/1205.2653. Accessed 27 Dec 2021 7. Jaiswal, S., Mehta, A., Nandi, G.C.: Investigation on the effect of L1 an L2 regularization on image features extracted using restricted Boltzmann machine. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1548–1553, June 2018. https://doi.org/10.1109/ICCONS.2018. 8663071 8. Moore, R.C., DeNero, J.: L1 and L2 regularization for multiclass Hing loss models, p. 5 (2011) 9. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting, p. 30 (2014) 10. Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using DropConnect. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1058–1066, May 2013. https://proceedings.mlr.press/ v28/wan13.html. Accessed 27 Dec 2021 11. Hssayni, E., Joudar, N.-E., Ettaouil, M.: KRR-CNN: kernels redundancy reduction in convolutional neural networks. Neural Comput. Appl. (1), 1–12 (2021). https:// doi.org/10.1007/s00521-021-06540-3 12. Bergmeir, C., Ben´ıtez, J.M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213 (2012). https://doi.org/10.1016/j.ins.2011.12.028 13. Xie, L., Wang, J., Wei, Z., Wang, M., Tian, Q.: DisturbLabel: regularizing CNN on the loss layer, pp. 4753–4762 (2016). Accessed 14 May 2022 14. Cortes, C., Mohri, M., Rostamizadeh, A.: L2 regularization for learning kernels, p. 8 (2009) 15. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 16. Deng, X., Mahmoud, M.A.B., Yin, Q., Guo, P.: An efficient and effective deep convolutional kernel pseudoinverse learner with multi-filter. Neuro-Computing 457, 74–83 (2021). https://doi.org/10.1016/j.neucom.2021.06.041 17. Kulkarni, P., Zepeda, J., Jurie, F., P´erez, P., Chevallier, L.: Learning the structure of deep architectures using L1 regularization, Swansea, United Kingdom, September 2015. https://doi.org/10.5244/C.29.23 18. Kumar, A., Shaikh, A.M., Li, Y., Bilal, H., Yin, B.: Pruning filters with L1-norm and capped L1-norm for CNN compression. Appl. Intell. 51(2), 1152–1160 (2020). https://doi.org/10.1007/s10489-020-01894-y 19. Schmidt, M., Fung, G., Rosales, R.: Fast optimization methods for L1 regularization: a comparative study and two new approaches. Mach. Learn. ECML 4701 (2007). https://doi.org/10.1007/978-3-540-74958
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach Ouafae El Bouhadi(B) , Abdellah Azmani, and Monir Azmani Computer Science, Systems and Telecommunications Laboratory (LIST), Faculty of Sciences and Techniques, Abdelmalek Essaadi University, Tangier, Morocco [email protected], {a.azmani,m.azmani}@uae.ac.ma
Abstract. Grouping the orders of different customers in the same vehicle has a considerable effect on the optimization of the supply chain [1]. The objective of this paper is to consolidate the orders of customers whose routes converge, i.e., they have the same point of departure and arrival or these points are close to each other. In addition, this grouping takes into account the time windows imposed by the customers, so that the delivery times will not be exceeded. In this paper, the division of orders into groups that have similar characteristics is determined using the K-Mean [2] and DBSCAN [3] algorithms. The latter are widely used in the clustering of massive databases and have shown their efficiency compared to other clustering methods [4]. Keywords: Grouping of orders · Delivery of goods · Clustering · K-means · DBSCAN
1 Introduction In the last decades, the freight transport sector has experienced an increase in demand for transport services [5]. Consequently, an increase in the demand for infrastructure and the resulting environmental and economic effects. Hence, the importance of grouping orders, which allows a reduction in the number of shipments and an optimization of vehicle filling. In order to perform this grouping, we have opted for the use of clustering, which is a data mining approach [6]. This approach aims at dividing large data into similar groups [7]. Clustering algorithms are numerous, including “K-Means, k-medoids, DBSCAN, Mean Shift, OPTICS, Agglomerative Hierarchy”, etc.. In this paper, we have opted for K-Means and DBSCAN because of their speed in clustering data (Erman et al., 2006). These two algorithms have been applied in several fields such as marketing [8, 9], biology [10] and governance [11]. As for its application in the field of logistics„ they have been widely used by several researchers, notably for demand planning in the semiconductor supply chain [12, 13]; customer selection for routing [14]; optimal design of transport distance [15]; identification of dangerous urban places [16]. In this context, we exploit the potential of clustering for order grouping with the objective of optimizing delivery operations. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 451–465, 2023. https://doi.org/10.1007/978-3-031-26384-2_39
452
O. El Bouhadi et al.
2 Background 2.1 Order Grouping Order grouping or shipment consolidation are among the terms used to refer to the assembly of two or more orders to be shipped on the same vehicle [17]. Orders that belong to the same group share close or similar geographic coordinates, also they have the same scheduled delivery time. One of the major objectives of grouping is to reduce the number of shipments. This has a considerable impact on the three levels of logistics sustainability, namely [18, 19]: • on the economic level it leads to a reduction in transport costs. It also allows a reduction in the size of the fleet and the size of the workforce, therefore, eliminating their associated costs. • on the social level it implies a reduction in congestion, one of the social benefits of which is the reduction of stress among drivers • on the environmental level it contributes to the reduction of CO2 emissions. 2.2 Data Mining Data mining is the method of using machine learning algorithms to extract knowledge from a huge dataset [20]. To investigate the data, a variety of approaches are utilized, including classification, clustering, prediction, association rules, and time series analysis. [21, 22] The technique used in this work is Clustering, In fact it is a widespread method in the field of data mining that has attracted the interest of several researchers in this area [23]. 2.3 Clustering Clustering is the division of a data collection into groups. Each cluster contains a group of similar data, and these are different from the data in other clusters [24]. This is part of unsupervised data mining which allows training unlabeled and undivided data into training and test data [25]. There are several types of Clustering [6] group them into five categories: Partitional Clustering. This type allows the division of data objects into k disjoint clusters. The division of the data is based on the calculation of the similarity distance. This split ensures that each cluster has at least one object and that each object can only belong to one cluster. “K-Means” and “K-Medoids” are among the algorithms based on this type of clustering [26, 27]. Hierarchical Clustering. It establishes nested partitions from a data set. These partitions produce a bottom-up (agglomerative) or top-down (divisive) hierarchy. The first approach considers each data object as a single cluster. Then, it identifies each pair of clusters that are close and merges them. As for the second one, it gathers all the data objects in a single cluster. Then it divides the cluster into two dissimilar clusters, and continues in a recursive manner until partitioning all objects into separate clusters [28]. ROCK, BIRCH are examples of hierarchical clustering.
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
453
Density-based clustering. It determines clusters based on a set of maximal data objects connected by density (amount of data objects within a specified radius.). This approach requires the specification of two parameters Eps (radius) and MinPts (minimum number of objects required to establish a cluster). Examples of this type of clustering are DBSCAN and OPTICS [29]. Clustering based on the grid. It performs clustering operations on a grid structure. This is done by transforming the space of data objects into a number of cells that constitutes the grid structure. This method is characterized by the speed of processing time, because it does not depend on data objects but on the space of cells that surrounds the data objects [29]. Examples of grid-based algorithms are CLIQUE and STING. Model-based clustering. It assigns a model for each cluster, and then finds the appropriate adaptation. EM, SOM, and COBWEB are among the model-based clustering algorithms [6].
2.4 K-Means This approach attempts to divide the data into K clusters.. So that the data in each cluster share similar characteristics and they have different characteristics compared to other clusters. The implementation of K-means should take into consideration the following rules [25]: • the K cluster must be defined. • the attributes must be numerical. The implementation of K-means usually goes through the following steps [22, 30]: • Step 1: selection of number of clusters. • Step 2: Random generation of clusters according to the number of k, with the definition of centroids for each cluster (a centroid is the average of different given objects belonging to a cluster). • Step 3: Assignment of each data object to the closest centroid. • Step 4: Recalculation of the new centroid positions. • Step 5: Steps 3 and 4 should be repeated until convergence is reached (centroids do not change anymore). 2.5 Approaches for the Selection of Cluster Number K To determine the number of cluster K, several methods have been suggested in the literature, namely: “Elbow method, by rule of thumb, Silhouette, Cross-validation, Information Criterion Approach” [31]. In this paper, we have used the oldest method Elbow and the silhouette method.
454
O. El Bouhadi et al.
Elbow method. The main idea of this method is to start by initializing K = 2 as the initial cluster number. Then, in each iteration, the number of k increases by 1 until reaching the maximum number of k specified. Finally, the selection of optimal number of k corresponds to the kink in the curve (where the cost decreases rapidly) of the variance plotted against the specified (maximum) number of K (Shi et al., 2021). Silhouette method. The silhouette method evaluates the quality of the cluster using a silhouette coefficient. The calculation of this coefficient is done using the separation and cohesion methods. The closer the result of this calculation is to 1, the better the cluster quality is. The choice of K is done with the help of a graph where the silhouette coefficient presents the Y axis and the value of K presents the X axis. The optimal value of K corresponds to the maximum value of silhouette coefficient [32].
2.6 DBSCAN This algorithm requires two input parameters to perform data clustering namely, epsilon and minimum points [3]. Epsilon (eps) is the maximum distance between points belonging to the same cluster. Minpts is the number of minimum points of a cluster [3]. By using DBSCAN, the number of clusters is determined by feeding the algorithm with the value of the two parameters entered by the user [7]. The main idea of this algorithm is to browse the set of data points in order to calculate their number of neighbors. If this number of neighbors is lower than the Minpts the point is considered as outlier. Otherwise, it will be the starting point of a new cluster [33]. The steps of DBSCAN are listed as follows [7, 34]: • Step1: Selection of the object at random from an unprocessed dataset. • Step2: Selection of the set of neighbors using eps, finally, if the number of neighbors satisfies Minpts, add the object and its neighbor object to a new cluster. • Step3: Processing of the other objects that are not part of the central object or the neighboring objects. If the selected candidate object satisfies the central point conditions then it will form a new cluster and subsequently add its neighbors. Step 3 is repeated until the different data objects are processed. If the processed object was not assigned to any cluster, then it is an outlier. 2.7 KNN The k nearest neighbor (KNN) algorithm introduced by [35] is one of the classification algorithms. It consists in selecting a number k of nearest neighbors for a data point [36]. It is a nonparametric method, which is part of the supervised learning methods. KNN is widely used due to its simplicity and effectiveness. The main idea of KNN, is to predict the class membership of unlabeled data by exploiting labeled training data [37]. For an unlabeled data object, KNN searches for the K nearest neighbors among the training set and subsequently assigns it to its majority membership class of K neighbors.
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
455
The steps of KNN are as follows [38]: • Step 1: Definition of number K of neighbors. • Step 2: Detwermine the distance between the unlabeled input and other data objects. • Step 3: Selection of K neighbors based on the calculated distance and calculation of number of object belonging to each class. • Step 4: Assign the new entry to the most present class of K neighbors.
3 Methodology The approach used consists in applying the K-means method for grouping orders to be delivered on the same vehicle. This approach begins with the recovery of data related to the history of orders made. Then proceed to the application of spatial and temporal clustering based on the geolocation of customers and delivery time. Regarding the implementation of Clustering models, we used the Scikit-learn library. Figure 1 represents the steps of the approach used.
Fig. 1. Process of the proposed approach.
3.1 Data Collection and Pre-processing The data used in this study is e-commerce data from actual Olist Store orders placed in Brazil. The database contains 100,000 orders covering the period from 2016 to 2018. It is available at this link. The database contains several information related to the orders such as: order status, customer location, supplier location, estimated delivery time, order price, payment, freight performance, product characteristics, etc.
00e7ee1b050b8499577073aeb2a297a1
5efb55c3cc550a13a9af885e225f8d9e
6da4587a1165373a6932574c423a0791
f3d15fc6394b30a6a35e4c3886208777
87b1b0ac27740690a2f04d333724c05f
364ced4c215dfadfacf5ee28844f4c83
fa85a1c9a92b2fc2b5c6f61e4688f8b3
0
7
9
12
13
17
20
order_id
−48.501679 −46.434923 −43.410349 −43.939936 −52.338335 −51.231438
−27.584002 −19.961678 −19.967129 −31.762679 −30.033213
−47.396929
−20.498489 −23.431867
longitude_customer
latitude_ customer
2017–11-17 00:00:00
2017–11-21 00:00:00
2017–11-15 00:00:00
2017–11-29 00:00:00
2017–11-17 00:00:00
2017–11-28 00:00:00
2017–11-27 00:00:00
order_estimated_delivery_date
Table 1. Database of orders
−23.486111
−23.486111
−23.486111
−23.486111
−23.486111
−23.486111
−23.486111
latitude_ seller
−46.366721
−46.366721
−46.366721
−46.366721
−46.366721
−46.366721
−46.366721
longitude_seller
456 O. El Bouhadi et al.
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
457
Although the size of the database is large, we selected only the orders of a period of 20 days and which belongs to a single supplier. In the context of our study, we only need information related to the location of the customer, and the estimated delivery time. Table 1 describes our database. 3.2 Application of K-means for the Grouping of Orders Choice of the number of clusters. In order to find the optimal value of k we used the Elbow method with the inertia metric represented in the inertia graph (Fig. 2) and the silhouette method represented in Fig. 3. For the first method the value of the elbow point is K = 62. For the second method, the value of K corresponds to the maximum value of the silhouette coefficient. Therefore, K = 62 and silhouette coefficient = 0.6410990412354504. To conclude, the two methods have given the same number of optimal k.
Fig. 2. Elbow for the selection of K
Cluster visualization. In this step, we implemented K-means with the number of clusters K = 62 found in the last step. Figure 4 visualizes the 62 clusters obtained after the implementation of K-means.
458
O. El Bouhadi et al.
Fig. 3. Silhouette score for the selection of K.
3.3 Application of DBSCAN for Order Bundling Choice of parameters by “Knee method”. To apply DBSCAN, we must first choose the optimal values of the two parameters epsilon (eps) and minimum points (Minpts). To identify these parameters, we used “Knee method”, whose goal is to determine the average distance between each data object and its k closest neighbors. After drawing the k-distance curve, the value of eps is obtained by selecting the ‘knee’ point where a change occurs in the k-distance graph [39]. In our case, the value of eps is approximately 0.1 as shown in Fig. 5. Choice of the parameters by an iterative approach. In order to choose the value of eps, we have opted for the use of an iterative approach. This approach consists in feeding our model in each iteration with new parameters eps and Minpts. The best values of eps and Minpts correspond to the one of the model that gave the highest silhouette score. The eps values selected for iteration ranged from 0.05 to 0.19 at intervals of 0.01, and the Minpts values ranged from 2 to 20.
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
459
Fig. 4. Visualization of the clusters provided by K-means
Fig. 5. Choice of eps by “Knee method”
Table 2 illustrates the result of the iterative approach for the best combination of eps and Minpt.
460
O. El Bouhadi et al. Table 2. Result of the iterative approach
Best silhouette_score: 0.5269925131400003 min_samples :2 eps: 0.16
Fig. 6. Visualization of the clusters provided by DBSCAN
Cluster visualization. The Fig. 6 shows the result of the DBSCAN implementation with eps = 0.16 and Minpts = 2. The values represented in color correspond to the 78 clusters found and those in black correspond to outliers. These points do not belong to any cluster, because they are far from the centers of the different clusters. In order to process these values, we adopted the KNN (k nearest neighbor) method. The objective of this method is to assign these outliers to the clusters that belong to their K nearest neighbors.
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
461
3.4 Application of KNN for the Assignment of Outliers to Groups of the Nearest Orders. We implemented the KNN classification model following these steps: • Import of the previous database by adding the column that contains the DBSCAN cluster labels (from 0 to 78). Division of the data into input data (longitude, latitude, and delivery time) and the output attribute (DBSCAN cluster). • Division of the database into two subsets: one set dedicated to training which represents 80% of data, and the other dedicated to testing which represents 20% of data. • Definition of KNN model. Figure 7 represents the result of the combined approach: DBSCAN-KNN.
Fig. 7. Visualization of the clusters provided by DBSCAN_KNN
462
O. El Bouhadi et al.
4 Discussion The purpose of the study is to consolidate the shipment of orders from different customers in order to optimize the delivery operations of the goods. To achieve this, the article exploited the order database of Olist store located in Brazil. After doing the preprocessing of the data, the article opted for the two clustering methods K-mean and DBSCAN to perform the grouping. The comparison of the clustering quality (Table 3) for both methods was based on the Silhouette score1 and Calinski harabasz score2 . This comparison shows that K-means has the highest values of the two scores with 0.6410 and 901.946 respectively. Hence the effectiveness of k-means. Table 3. Comparison of K-means DBSCAN Metric
K-means
DBSCAN
Silhouette score:
0.6410990412354504
0.5269925131400003
Calinski harabasz Score
901.9460972615148
312.171041423039
5 Conclusion This paper shows the importance of order consolidation in the field of goods delivery which lies in the considerable effect of order consolidation on the sustainability of the supply chain (economic, social, and environmental). The study used both K-means and DBSCAN clustering methods, which provided different numbers of clusters, 62 clusters for k-means and 78 for DBSCAN. Each cluster shares similar characteristics, namely: geolocation of arrival points (longitude and latitude of customers) and time window. DBSCAN presented limitations in terms of the existence of several outliers that do not belong to any cluster. To overcome this problem the article opted for the KNN method to assign these orders to the closest cluster. Among the perspectives identified to improve this grouping is to perform a classification to check the compatibility of the clustered orders based on the nature and type of products in order to avoid the risk of damaging the goods. In addition, to apply other clustering methods to select the best one. Acknowledgments. This research is supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA) and the National Center for Scientific and Technical Research (CNRST) of Morocco (Smart DLSP Project - AL KHAWARIZMI IA-PROGRAM). 1 https://en.wikipedia.org/wiki/Silhouette_(clustering). 2 . https://fr.wikipedia.org/wiki/Indice_de_Calinski-Harabasz
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
463
References 1. Nananukul, N.: Clustering model and algorithm for production inventory and distribution problem. Appl. Math. Model. 37(24), 9846-9857 (2013). https://doi.org/10.1016/j.apm.2013. 05.029 2. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: 5-th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297 (1967) 3. Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise . undefined, 1996. https://www.semanticscholar.org/ paper/A-Density-Based-Algorithm-for-Discovering-Clusters-Ester-Kriegel/5c8fe9a0412a 078e30eb7e5eeb0068655b673e86 4. Chakraborty, S., Nagwani, N.K., Dey, L.: Performance comparison of incremental K-means and incremental DBSCAN algorithms . ArXiv14064751 Cs, juin 2014. http://arxiv.org/abs/ 1406.4751 5. Nowakowska-Grunt, J., Strzelczyk, M.: The current situation and the directions of changes in road freight transport in the European Union. Transp. Res. Procedia 39, 350-359 (2019). https://doi.org/10.1016/j.trpro.2019.06.037 6. Benabdellah, A.C., Benghabrit, A., Bouhaddou, I.: A survey of clustering algorithms for an industrial context. Procedia Comput. Sci. 148, 291-302 (2019). https://doi.org/10.1016/j. procs.2019.01.022 7. Monalisa, S., Kurnia, F.: Analysis of DBSCAN and K-means algorithm for evaluating outlier on RFM model of customer behavior. TELKOMNIKA Telecommun. Comput. Electron. Control 17(1), 110 (2019). https://doi.org/10.12928/telkomnika.v17i1.9394 8. Hossain, A.S.M.S.: Customer segmentation using centroid based and density based clustering algorithms . In: 2017 3rd International Conference on Electrical Information and Communication Technology (EICT), pp. 1-6 (2017). https://doi.org/10.1109/EICT.2017.827 5249 9. Sembiring Brahmana, R.W., Mohammed, F.A., Chairuang, K.: Customer segmentation based on RFM model using K-means, K-medoids, and DBSCAN methods . Lontar Komput. J. Ilm. Teknol. Inf. 11(1), 32 (2020). https://doi.org/10.24843/LKJITI.2020.v11.i01.p04 10. Lurie, I., Lytvynenko, V., Osypcnko, V., Voronenko, M.: The use of inductive methods for determination of the binding affinity of interacting biological molecules . In: 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) 1, pp. 1-5 (2018). https://doi.org/10.1109/STC-CSIT.2018.8526753 11. Rodin, A.: Growing small businesses using software system for intellectual analysis of financial performance . In: 2018 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), pp. 217-222 (2018). https://doi.org/10.1109/TCSET.2018.8336190 12. Govindaraju, P., Achter, S., Ponsignon, T., Ehm, H., Meyer, M.: Comparison of two clustering approaches to find demand patterns in semiconductor supply chain planning. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 148-151 (2018). https://doi.org/10.1109/COASE.2018.8560535 13. Ponsignon, T., Govindaraju, P., Achter, S., Ehm, H., Meyer, M.: Finding demand patterns in supply chain planning [Nachfragemuster in der Lieferkette erkennen]. Atp Mag. 60(08), 54-61 (2018). https://doi.org/10.17560/atp.v60i08.2360 14. León Villalba, A.F., Cristina González La Rotta, E.: Comparison of Dbscan and K-means clustering methods in the selection of representative clients for a vehicle routing model. In: 2020 Congreso Internacional de Innovación y Tendencias en Ingeniería (CONIITI), pp. 1-6 (2020). https://doi.org/10.1109/CONIITI51147.2020.9240399
464
O. El Bouhadi et al.
15. Li, J.: Optimal design of transportation distance in logistics supply chain model based on data mining algorithm. Clust. Comput. 22(2), 3943–3952 (2018). https://doi.org/10.1007/s10586018-2544-x 16. Holmgren, J., Knapen, L., Olsson, V., Masud, A.P.: On the use of clustering analysis for identification of unsafe places in an urban traffic network. Procedia Comput. Sci. 170, 187-194 (2020). https://doi.org/10.1016/j.procs.2020.03.024 17. Ülkü, M.A.: Analysis of Shipment Consolidation in the Logistics Supply Chain. University of Waterloo (2009) 18. Ali Memon, M., Shaikh, A., Sulaiman, A., Alghamdi, A., Alrizq, M., Archim鑔e, B.: Time and quantity based hybrid consolidation algorithms for reduced cost products delivery. Comput. Mater. Contin. 69(1), 409-432 (2021). https://doi.org/10.32604/cmc.2021.017653 19. Alnahhal, M., Ahrens, D., Salah, B.: Modeling freight consolidation in a make-to-order supply chain: a simulation approach. Processes 9(9), 9 (2021). https://doi.org/10.3390/pr9091554 20. Imron, M., Hasanah, U., Humaidi, B.: Analysis of data mining using K-means clustering algorithm for product grouping. IJIIS Int. J. Inform. Inf. Syst. 3(1), 12-22 (2020). https://doi. org/10.47738/ijiis.v3i1.3 21. Awangga, R.M., Pane, S.F., Tunnisa, K., Suwardi, I.S.: K means clustering and meanshift analysis for grouping the data of coal term in puslitbang tekMIRA. TELKOMNIKA Telecommun. Comput. Electron. Control 16(3), 1351 (2018). https://doi.org/10.12928/telkomnika.v16i3. 8910 22. Pandey, A., Malviya, K.: Enhancing test case reduction by k-means algorithm and elbow method. Int. J. Comput. Sci. Eng. 6, 299-303 (2018). https://doi.org/10.26438/ijcse/v6i6. 299303 23. Walse, R.S., Kurundkar, G.D., Bhalchandra, P.U.: A Review: Design and Development of Novel Techniques for Clustering and Classification of Data. Int. J. Sci. Res. Comput. Sci. Eng. 06(01), 19-22 (2018) 24. Aldino, A.A., Darwis, D., Prastowo, A.T., Sujana, C.: Implementation of K-means algorithm for clustering corn planting feasibility area in south lampung regency. J. Phys. Conf. Ser. 1751(1), 012038 (2021). https://doi.org/10.1088/1742-6596/1751/1/012038 25. Khairani, N., Sutoyo, E.: Application of K-means clustering algorithm for determination of fire-prone areas utilizing hotspots in West Kalimantan Province. Int. J. Adv. Data Inf. Syst. 1, 9-16 (2020). https://doi.org/10.25008/ijadis.v1i1.13 26. Boomija, M.D.: Comparison of partition based clustering algorithms . J. Comput. Appl., p. 4 (2008) 27. Sardar, T.H., Ansari, Z.: Partition based clustering of large datasets using MapReduce framework: An analysis of recent themes and directions. Future Comput. Inform. J. 3(2), 247-261 (2018). https://doi.org/10.1016/j.fcij.2018.06.002 28. Popat, S.K.: Review and Comparative Study of Clustering Techniques (2014). https://www. semanticscholar.org/paper/Review-and-Comparative-Study-of-Clustering-Popat/12b7cc398 d67b2a17ace0b0b79363e9a646f8bcb 29. Shah, G.H., Bhensdadia, C.K., Ganatra, A.P.: An Empirical Evaluation of Density-Based Clustering Techniques 2(1), 8 (2012) 30. Bandyopadhyay, S.K., Paul, T.U.: Segmentation of Brain Tumour from MRI image – Analysis of K- means and DBSCAN Clustering. Int. J. Res. Eng. Sci. IJRES 1(1), 10 (2013) 31. Kodinariya, T., Makwana, P.: Review on determining of cluster in K-means clustering. Int. J. Adv. Res. Comput. Sci. Manag. Stud. 1, 90-95 (2013) 32. Saputra, D.M., Saputra, D., Oswari, L.D.: Effect of distance metrics in determining K-value in K-means clustering using elbow and silhouette method. présenté à Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019), Palembang, Indonesia (2020). https://doi.org/10.2991/aisr.k.200424.051
Shipment Consolidation Using K-means and a Combined DBSCAN-KNN Approach
465
33. Dudik, J.M., Kurosu, A., Coyle, J.L., Sejdi´c, E.: A comparative analysis of DBSCAN, Kmeans, and quadratic variation algorithms for automatic identification of swallows from swallowing accelerometry signals. Comput. Biol. Med. 59, 10-18 (2015). https://doi.org/10.1016/ j.compbiomed.2015.01.007 34. Chang, D., Ma, Y., Ding, X.: Time series clustering based on singularity. Int. J. Comput. Commun. Control 12, 790 (2017). https://doi.org/10.15837/ijccc.2017.6.3002 35. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), (1992). https://www.tandfonline.com/doi/abs/https://doi.org/10.1080/00031305. 1992.10475879 36. Giri, K., Biswas, T., Sarkar, P.: ECR-DBSCAN: An Improved DBSCAN based on Computational Geometry 6 (2021). https://doi.org/10.1016/j.mlwa.2021.100148 37. Taunk, K., De, S., Verma, S., Swetapadma, A.: A Brief Review of Nearest Neighbor Algorithm for Learning and Classification, p. 1260 (2019). https://doi.org/10.1109/ICCS45141.2019.906 5747 38. Kaushal, C., Koundal, D.: Recent trends in big data using Hadoop. Int. J. Inform. Commun. Technol. IJ-ICT 8, 39 (2019). https://doi.org/10.11591/ijict.v8i1.pp39-49 39. Gaonkar, M.N., Sawant, K.: AutoEpsDBSCAN : DBSCAN with Eps automatic for large dataset. J. Comput. Sci. IJCSIS 2(2), 7 (2013)
A New Approach to Protect Data In-Use at Document Oriented Databases Springer Guidelines for Authors of Proceedings Abdelilah Belhaj(B) , Karim El Bouchti, Soumia Ziti, and Chaimae Intelligent Processing Systems and Security (IPPS) Team, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco [email protected]
Abstract. The NoSQL Databases are often vulnerable face to internal and external attacks. Attackers can easily access to sensitive data using different ways and methods particularly in a critical or strategic environments. Data encryption is considered as an efficient and effective security method for protecting sensitive data in transit and at rest. However, the most databases encrypt data in motion and in storage at rest but they do not cover encrypting data that are in the used memory. In this work, we focus on NoSQL document database by proposing a protection model for sensitive data while it is in-use by the database. Our approach encrypts transparently and efficiently sensitive data before sending it to database. The obtained results show that the proposed model adds a security layer which can be considered as a standard solution against possible attacks on an active database instance. Keywords: Big data · Oriented document · NoSQL database encryption · Database keys protection · Database security model
1 Introduction The Not Only SQL database (NoSQL)refers to any non-relational database. It is an approach for database design that allows storing and querying data outside traditional modeling [1]. A schema-less architecture of NoSQL Model offers a great flexibility for updating data and scalability without the structure redesign. There are four main types of the NoSQL Databases model. They are distinguished from each other by the method used to store the data. We have the key-value model, the document-oriented model, the column-oriented model and the graph-oriented model [2]. Document-Oriented databases model is based on the document data type such as JSON or BSON. It stores the data as pairs of keys and values [3]. However, as a NoSQL model, Document-oriented suffers from many security issues that need to be evinced including the classic vulnerabilities in databases where the relevant one is the injection © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 466–475, 2023. https://doi.org/10.1007/978-3-031-26384-2_40
A New Approach to Protect Data In-Use
467
attack [4]. A close inspection shows that attacks by administrators are considered as a major threat that can affect data integrity and reliability which are not easy to be detected. In addition, several security challenges and measures should be taken. These involve authentications, confidentialities and storage encryptions [5]. A first examination in the literature shows that the encryption has been used as a great solution for such problems. Concretely, it has been based principally on algorithms and encryption keys to encrypt and decrypt data. According to [6], several NoSQL database encryption techniques and technologies can be employed by considering the following different data states [6] (Fig. 1): • Encryption in Transit/Transport (TLS) • Encryption at Rest • Encryption in-use. (No standard solution)
Fig. 1. The tree data states
Several models of encrypting NoSql Database have been proposed in the literature. Ghazi et al. have proposed a Database Security As Service(DB-SECaaS) to provide authentication, fine-grained authorization and encryption as a service on a documentoriented database hosted in the cloud [7]. The authors of [8] have proposed a model SDDB (Secure Document Database). It is a proxy server between the user application, and DBMS for rewriting Queries to be executed on the encrypted data. This model relies on the CryptDB approach for querying encrypted data, proposed by (Popa et al., 2011) [9]. Another innovative encryption model suggested by authors of [10] proposes a model using an additive homomorphic asymmetric encryption system to encrypt user data in MongoDB. The aim of this work is to contribute to these activities by proposing a model based on the CSFLE (Client Side Field Level Encryption) which is only available in MongoDB Enterprise 4.2 or later, and MongoDB Atlas 4.2 or later clusters. The rest of the paper is organized as follows. Section 2 presents DBMS (Data Base Management System level encryption. Section 3 gives details on CSFLE (Client-Side
468
A. Belhaj et al.
Field Level Encryption) technique. Section 4 provides the proposed model and a case study for the scheme application. Implementation of the model is defined in Sect. 5. Finally, Sect. 6 concludes the paper.
2 Materials and Methods In this section, we give a concise overview about techniques and methods employed to encrypt No SQL Database. In fact, there are three ways to encrypt data at NoSQL Database. 2.1 Encryption in Transit/Transport The transport Layer Security (TLS) and the Secure Socket Layer (SSL) are the most used encryption protocols for transacting data between DBMS and the server application in order to send and receive data over networks. 2.2 Encryption at Rest In order to prevent the unauthorized persons from access to the files on the disk, the protection of data at rest has become even more important for the database. It prevents threats of physical access to the disk where data is stored. The procedure is completely transparent for the application. Several encryption algorithms can be adopted and supported including AES-256 in CBC mode being the default algorithm used by Mongodb and AES-256 in GCM mode. 2.3 Client-Side Field Level Encryption Client-side Field Level Encryption (FLE) allows to specify the sensitive fields of a document that should be encrypted. FLE is supported by MongoDB which provides the ability to encrypt selectively document fields using a particular key for each. The process is completely transparent for the server since it is separated from the database. However, CSFLE is only available in MongoDB Enterprise 4.2 and MongoDB Atlas.
3 Client-side Field Level Encryption In this section, we reconsider the study the Client-Side Field Level Encryption CSFLE supported by Mongodb [11] that allows users to encrypt data at the field level, protecting sensitive information either from administrators who may have access to the data at rest either in transit or from any other legitimate users while the data is being used on the server.
A New Approach to Protect Data In-Use
469
3.1 CSFLE and Keys ‘Management CSFLE level encryption adopts the enveloped strategy based on two types of encryption keys: DEK (Data Encryption Key) and MK (Manager Key). DEK keys are used to encrypt and decrypt data while KM will be used to encrypt and decrypt DEK keys. Client-side field level encryption can rely on the Key Management Service KMS that allows creating, deleting and controlling Master key. We can use the following providers (Fig. 2): • • • •
Amazon Web Services KMS. Azure Key Vault. Google Cloud Platform KMS. Locally Managed Key.
Fig. 2. Strategy based on envelope encryption
3.2 Process of Executing CSFEL‘s Query Since we are focused on document-oriented databases, we chose MongoDB [11], which is an open-source NOSQL document-based database storing data as BSON (Binary JSON) documents. To illustrate this approach, we consider an example. Indeed, we have a collection named Patients containing three sensitive fields which are SSN, Mobile, Email and the procedure of executing a search query to find the document using “SSN” stored in cipher text into the database and sending the result back in plain text to the client (Fig. 3).
470
A. Belhaj et al.
Fig. 3. Client-side field level encryption
1. The client sends a request to the MongoDB driver which will be interpreted by the driver to ensure that the encrypted fields are concerned by the filter. The driver sends a request for the encryption keys of the fields to the key manager. 2. The key manager sends the keys back to the driver. 3. The driver encrypts the sensitive fields then submits the request by putting the sensitive fields in the cipher text. 4. The server sends the results of the request in the cipher text back to the driver. 5. The results of the request are decrypted by the keys and sent back to the client. In this work, we are particularly interested in the local management of the encryption keys both in their generation and in their use.
4 Proposed Model for Locally Managed Key In this section, we would like to propose our model. Before going ahead, we first recall that CSFEL requires an external key manager for the creation and management of MK (Master Key) on the one hand and on the other hand the DEK (Data Encryption Key) must be stored in encrypted form in a collection. This can expose the keys to various dangers and can therefore be recovered by malicious attackers. Since, we will only be concerned with locally managed keys, we envisage that we can create and manage encryption keys without using an external manager and without having to store them in a file. Concretely, the model we propose involves the generation of the encryption keys per field according to the following relationships: KeyField(i) = Hash(Collectionname ⊕ Field(i))
A New Approach to Protect Data In-Use
471
To visualize this approach, we give the following procedures. First, we consider a collection named Clients in database namedCustomersDB. It contains the fields “firstName” and “lastName" and also the sensitive fields "SSN", "Mobil" and "Email". This is schematized in JSON format as follows (Table 1):
CustomersDB.Client{ "_id": { "$oid": "61ad13a58100841af52935a7" }, "firstName": "belhaj ", "lastName": "youssef", "SSN": "999-25-666", "Mobil": "06152652626", "Email": [email protected] }
Table 1. The document to insert in a collection before encryption. ID
61ad13a58100841af52935a7
firstName
Belhaj
lastName
Youssef
SSN
999-25-666
Mobil
06152652626
Email
[email protected] AYC7T2oPe5X2imMmgc
Then, the sensitive fields must be encrypted in the application, before it’s sent to database using the generated encryption keys according to the above relationship. For each field, the keys are given by: Keyssn = Hash(Client ⊕ SSN) KeyMobile = Hash(Client ⊕ Mobile) KeyEmail = Hash(Client ⊕ Email) Finally, the Document is stored in the database in the following form.
472
A. Belhaj et al.
{ "_id": { "$oid": "61acb79263d56f4dcd3a204c" }, "firstName": "Belhaj", "lastName": "Youssef", "SSN": "qZB+9OkYig6fR3Pdke2P1w==", "Mobil": "Sz2chIq1bKg4YIyBeoWlGA==", "Email": "ErhF+3+BIUn7arP8SQpWN2fEGCxInAAy6H/PzVQF/Y0=" } This approach can be represented by the following figure (Fig. 4):
Fig. 4. The model proposed for CSFLE
5 Implementation of the Approach Model To implement this model, we choose MongoDb as a document-oriented database and java-driver to connect to the database and use the main library of MongoDb. Two basic operations have been implemented consisting in reading and writing a document containing encrypted fields in a MongoDB document oriented database. Below, we provide the algorithm Write_Encrypted_Document which allows to insert a document containing encrypted fields into collection followed by anotherAlgorithmFind_Encrpted_Document needed to find a document from an encrypted field.
A New Approach to Protect Data In-Use
473
6 Results and Discussions The obtained results when implementing our solution allow us to develop a transparent model of encryptions at the application level and adds a layer of security in server-side. Thus, the data in memory is the encrypted form. They are not accessible to administrator who cannot see the data in plaintext in particular the sensitive fields like SSN, Mobil, Email given in Table 2. Therefore, it considerably reduces the surface of the attack area (Fig. 5).
474
A. Belhaj et al. Table 2. Client collection after encryption
ID
61ad13a58100841af52935a7
61ad13e940a4ab21a93754d6
firstName
Belhaj
Abid
lastName
Youssef
Amine
SSN
qcTfcw1u8NBYwrb2VL7gGg
70IoLv1n9y4mhZIw
Mobil
Ezj5rZkZ63EvzLYbTqbvgA
U1CEsPMzPo8eY3kV59LFdQ
Email
qkzdcA4FQoxVRMHMPXuISfxvSAYC7T2oPe5X2imMmgc
Fig. 5. Encrypt 3 sensitive field before being sent to database
7 Conclusion In this work, we have presented a model for fulfilling database security requirements for querying encrypted data into a document-oriented database. In particular, we have proposed a new method to generate the encryption keys. This method allows one to provide a transparent encryption which can be considered as standard solution to protect data in-use. This work comes up with certain open question. A possible work could be focused on the model applications based on graph oriented and column oriented Databases. We hope address such questions in future.
References 1. Han, J., Haihong, E., Le, G., Du, J.: In Pervasive Computing and Applications (ICPCA). 2011 6th International Conference on Survey on NoSQL database, pp. 363–366. IEEE, Port Elizabeth (2011) 2. Atzeni, P., Bugiotti, F., Cabibbo, L., Torlone, R.: Data modeling in the NoSQL world. Comput. Stand. Interfaces 67, 103149 (2020) 3. Harrison, G.: Document databases. In: Next Generation Databases NoSQL, NewSQL, and Big Data, pp. 53–63. Apress, Berkeley (2015). https://doi.org/10.1007/978-1-4842-1329-2_4 4. Blanco, C., et al.: Security policies by design in NoSQL document databases. J. Inf. Secur. Appl. 65, 103120 (2022) 5. 5Jahid, S., Borisov, N.: PIRATTE: proxy-based immediate revocation of attribute-based encryption. arXiv preprint arXiv:1208.4877 (2012) 6. Tian, X., Huang, B., Wu, M.: A transparent middleware for encrypting data in MongoDB. In: 2014 IEEE Workshop on Electronics, Computer and Applications, pp. 906–909 (2014). https://doi.org/10.1109/IWECA.2014.6845768
A New Approach to Protect Data In-Use
475
7. Ghazi, Y., Masood, R., Rauf, A.: DB-SECaaS: a cloud-based protection system for documentoriented NoSQL databases EURASIP. J. Inf. Secur. 2016, 16 (2016) 8. Almarwani, M., Konev, B., Lisitsa, A.: Fine-grained access control for querying over encrypted document-oriented database. In: Mori, P., Furnell, S., Camp, O. (eds.) ICISSP 2019. CCIS, vol. 1221, pp. 403–425. Springer, Cham (2020). https://doi.org/10.1007/978-3030-49443-8_19 9. Popa, R.A., Redfield, C., Zeldovich, N., Balakrishnan, H.: CryptDB: protecting confidentiality with encrypted query processing. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 85–100. ACM (2011) 10. Xu, G., Ren, Y., Li, H., Liu, D., Dai, Y., Yang, K.: CryptMDB: a practical encrypted MongoDB over big data. IEEE International Conference on Communications (ICC) 2017, 1–6 (2017). https://doi.org/10.1109/ICC.2017.7997105.E 11. https://docs.mongodb.com/drivers/security/client-side-field-level-encryption-local-key-tokms/
A Dual Carriageway Smart Street Lighting Controller Based on Multi-variate Traffic Forecast Fouad Agramelal(B) , Mohamed Sadik, and Essaid Sabir NEST Research Group, LRI Lab., ENSEM, Hassan II University of Casablanca, Casablanca, Morocco {fouad.agramelal,m.sadik,e.sabir}@ensem.ac.ma Abstract. Streetlights play an essential role in the safety and well-being of road users. By 2025 the expected number of installed streetlights may reach over 350 million worldwide, with a high density of these lamps installed in cities and rural areas. However, due to energy consumption and installation difficulties, light poles are primarily established on highways, in conflict zones such as entrances and exits. Therefore, the chances of accidents occurring are much higher. In order to increase safety and reduce energy consumption in this paper, we propose a separate light controller for stand-alone street lights based on future traffic forecasts and fuzzy logic. The controller adapts light according to traffic demand in each carriageway while considering battery state and future solar irradiance during the next day. First, several traffic forecast models were tested and validated. Then simulations of the system were carried out in Matlab. The obtained results indicate that the designed light controller is capable of lowering energy consumption, thus prolonging the system’s autonomy while at the same time assuring road safety. Keywords: Smart street light controller
1
· Traffic forecast · LSTM · Fuzzy logic
Introduction
The growing rate of population and urbanization worldwide causes vast amounts of energy usage. Streetlights (SL), an integral part of cities, consume more than 19% of the global electric production [2]. According to reports, there are currently more than 304 million streetlights in use around the world, with expectations to reach 352 million by 2025 [1]. The development of LED luminaires reduced energy consumption significantly and opened other frontiers to apply various control methods. On the other hand, the lack of illumination during nighttime is a significant cause of traffic accidents, with more than 50% of traffic deaths occurring at night [11]. In this context, illumination on highways is almost non-existent, except in some peculiar zones such as entrances and exits, where there can be traffic conflict, or on some urban highways. According to some research findings, adequate c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 476–488, 2023. https://doi.org/10.1007/978-3-031-26384-2_41
A Dual Carriageway Smart Street Lighting Controller
477
illumination at nighttime can reduce casualties by up to 35%, contrary to low illumination or the absence of street lighting [4]. Thus, installing light lamps on highways significantly reduces the death toll and increases safety. Nevertheless, connecting street lamps to the grid not only requires additional wiring but is also challenging to install in some remote areas while at the same time causing additional greenhouse emissions since fossil fuels generate the majority of the electric power in grids. The shortcomings listed above can be overcome by using stand-alone PVpowered street lamps, as they entail no electricity bills and are ecologically friendly due to the usage of unlimited solar energy. Nevertheless, the unwise way of operating these systems, such as maintaining full brightness at low traffic volumes, or disregarding the adverse climate and weather conditions, leads to wasteful energy usage and quickly depletes batteries. So in the design process of these systems, it is crucial to only provide light according to demand, while considering battery level and future energy production by solar panels in the upcoming days. This control is achieved thanks to the integration of communication devices and remote sensing, giving the streetlights an intelligent behavior. In this study, we suggest the usage of a mixture between traffic forecast and fuzzy logic controller to extend the time usage of highway PV street lamps. The brightness of luminaires is changed in accordance with the battery charge state, traffic flow, and solar irradiance. The traffic is predicted using multivariate forecast models and a real-world dataset, while the solar irradiance is taken from the local weather forecast. The rest of this paper is organized as follows: Section 2 discusses related works. Section 3 gives an overview of the proposed light system. Section 4 describes the development of several traffic forecasting models for each carriageway, followed by a comparison study to assess the best models. Section 5 outlines the design of the fuzzy light controller. Lastly, in Sect. 6, a model to simulate the system is presented, along with simulation results against other light controllers. Then we conclude the paper with some remarks and future work.
2
Related Work
Smart streetlamps are light systems that can monitor the environment along with their functional parameters. They can also collect and communicate data and make adequate decisions to control the light level in response to different situations and light demands. In the literature, several studies used wireless communication devices and sensors to enable the remote monitoring and control of luminaries. For instance, in [8], the authors proposed using WSN to detect the presence of vehicles on highways. In their approach, the road is divided into sections, where each section represents the distance between two adjacent street lamps. The light is then either turned on/off or dimmed in case the vehicle counter in each section is equal to 0. The simulation findings indicate that turning off or dimming light lamps in the case of empty sections can save up to 57% of
478
F. Agramelal et al.
energy. However, turning off the lamps may endanger the safety of road users in the occurrence of a miss detection by sensors. In [6], the authors presented a traffic-aware light management system, where an adaptive dimming profile is applied around the user upon its detection by sensors. The authors suggested partitioning the lit portion of the road around the user into zones, wherein, in the case of a motorist, street lamps within a 100 m distance of the subject are turned fully on. Simulations of their approach show significant energy reduction. Nevertheless, this approach is challenging to apply when there is high activity on the road. Other approaches rely on adjusting the dimming level in accordance with traffic flow. In [9], a light control based on local regulations is implemented, where a downgrade/upgrade of the lighting class is applied when there is a change in the traffic count. Whereas other authors propose the use of heuristic methods such as neural networks or fuzzy logic controllers, for example, in [10], a traffic forecast model using an ensemble method was used to provide traffic volume for the next hour. The light is then adjusted accordingly with the obtained value of traffic. In [12], the output power of a PV panel is forecasted using an LSTM model by using data on solar radiation and weather to make the forecasts. The authors then applied several methods to calculate the dimming level of street lamps for the upcoming nights. The results indicate a low probability of discharging the battery below 30%. While in [13], an FLC is used in hybrid wind-solar LED street lamps on highways. The controller uses the level of battery and wind speed to adjust the luminosity of lamps. This approach is proven to be flexible. Nevertheless, in both papers, the authors disregarded the traffic demand on the road. Alternatively, to overcome some of the shortcomings of previous works, in this paper, the light level of PV-powered SL is continuously adjusted based on traffic forecast in both directions of a dual carriageway while managing the battery consumption and taking into account future predictions of solar radiance.
3
System Overview
An overview of the proposed system is represented in Fig. 1. In this system, each stand-alone PV-powered SL is equipped with a WSN device to enable traffic monitoring and transmission of data to other lamps. Lamps on each carriageway are grouped in clusters and controlled via a master street lamp equipped with a long-distance device and with more computing capabilities. The primary role of the master lamp is to communicate with the base station and perform necessary light adjustments for its corresponding lamp cluster. After forecasting the traffic flow for the next hour, Eq. 2, the system’s fuzzy logic controller takes the forecasted values of traffic Xt , the battery level Ent , and the solar irradiance for the next day irrd+1 , Eq. 3. then estimates the appropriate lighting level to apply during that hour. Since there can be a different energy level stored within each street lamp due to dust accumulating on PV panels, sun exposure, etc., and in order to ensure a uniform light distribution on the road surface, the master lamp reads the minimum stored energy level from all street lamps and uses it as a basis for calculation.
A Dual Carriageway Smart Street Lighting Controller
479
Fig. 1. Overview of the proposed system.
4 4.1
Yt = {Xt , St , Occt }
(1)
Xt = forecast (Yt−1 , Yt−2 , ...Yt−n )
(2)
φt = FLC (Xt , Ent , irrd+1 )
(3)
Traffic Forecast Data Repository
In order to evaluate an appropriate multivariate traffic forecast model, we use an open data repository, which is the California transportation performance measurement system (PeMS). In this repository, traffic flow data is collected from Loop detection stations and other metrics. In addition, data were collected from the SR51-S freeway in both directions, with each direction containing three lanes. The collected data is the hourly average traffic flow expressed in (Veh/h); the hourly average speed of vehicles expressed in (miles/hour), and the hourly occupancy, representing the fraction of time the detector is occupied with vehicles above it. The data range from 30 May 2021 to 28 February 2022, making a 6577 observations across eight months. Before searching a forecast model, the dataset’s integrality must be ensured by replacing incorrect or missing values with neighboring data in temporal order [14]. Afterward, the data is split into 80% training/validating, while the rest is held for testing. The data is then rearranged into an FxWxS data structure, with F denoting the number of used features in this case 3, Eq. 1. (i.e., speed, occupancy, traffic flow); W is the size of the time window, and S is the total number of samples after the data is rearranged.
480
4.2
F. Agramelal et al.
Performance Metrics
To evaluate the accuracy of forecasts, we make use of three well known metrics, the root-mean square error (RMSE), the mean absolute percentage error (MAPE), and mean absolute error (MAE). These metrics are defined using the following equations:
N 2 1 t Xt − X RM SE = N t=1
12
N 1 t | |Xt − X N t=1 t 100% Xt − X M AP E = N Xt N
MAE =
(4)
(5)
(6)
t=1
t is the observed value, and Xt is the corresponding predicted value of where X traffic flow at time t, and N is the number of predictions. 4.3
Traffic Forecast Models
ANN and DNN: ANNs are heuristic approaches inspired by the learning process of the human brain. These networks are composed of interconnections between artificial neurons (i.e units), where each neuron is able to process inputs and forward the output to other neurons. Essentially, it is the weighted sum of its inputs passed through an activation function. The simplest way to combine these neurons is to use a feed-forward method, in which information is only propagated forward from the input units towards the output units. In an ANN architecture neurons are distributed within three distinctive layers: an input layer, a hidden layer and an output layer Fig. 2. By adding other hidden layers, the model is able to extract and model complex features and non-linear relationships between its inputs and outputs. These models are referred to as Deep Neural Networks (DNN) because of the deep stack of layered neurons.
Fig. 2. Neural network model.
A Dual Carriageway Smart Street Lighting Controller
481
LSTM: Long short term memory (LSTM) is regarded as an efficient algorithm in forecasting time series. This model was developed to overcome the short memory problem of Recurrent Neural Networks (RNN) and to handle its vanishing gradient problem. The main difference with other Neural network models is that LSTM possesses a memory, it feeds the output back to its inputs, and by exploiting this feedback the model is able to take the effect of distant past data to evaluate the output. Figure 3 depicts the structure of an LSTM cell. Where Ct denotes the cell state, ht denotes the hidden cell state and Xt in our case is the time series input data. In an LSTM cell there exist additional gates which are the forget (ft ), input (it ) and output (Outt ) gates. These gates are used to determine which signals are going to be transmitted to other nodes, by controlling the signal flow of each neural network through a sigmoid activation function. The main equations of an LSTM cell are listed below: ft = sigma (wf · [ht−1 , xt ] + bf )
(7)
it = sigma (wi · [ht−1 , xt ] + bi )
(8)
gt = tanh (wg · [ht−1 , xt ] + bg )
(9)
Ct = ft ∗ ct−1 +
i∗t gt
(10)
outt = sigma (wo · [ht−1 , xt ] + bO )
(11)
ht = outt ∗ tanh (ct )
(12)
Fig. 3. LSTM cell model.
482
F. Agramelal et al.
CNN: Convolutional neural networks, also know as ConvNets, are a subclass of neural networks very known for their applications on grid-like data such as images. These models are also efficiently applied on the forecasting of time series. In contrast to other neural network models, CNNs are mainly composed of a convolutional layer and a pooling layer. The convolutional layer is used to extract features in the data whilst the pooling layer is used to reduce the dimension of the extracted features and to reduce the networks training cost. In the literature, some authors considered transforming time series data into images, then apply a CNN with a 2D convolutional layer, to make forecasts [3]. In this work we keep the data as a time series problem by using the same input shape like the other proposed models, then we use a combination of 1D convolutions and 1D Maxpooling layers followed by dense layers to make traffic flow predictions Fig. 4.
Fig. 4. Convolutional neural network pipeline.
4.4
Traffic Forecast Results
Hyperparameters and model design variables are two types of parameters commonly involved in deep learning models. In the model design variables, the number of neurons, layers, loss function, activation function, and optimizer are among the model design factors. On the other hand hyperparameters are values that affect the learning process of the model, such as batch size, the number of epochs, learning rate. Each of these parameters is a subject of optimization because it permits an adequate tailoring of the model’s behavior on the data. For each of the 1-h traffic forecast models, we used gridsearch to find the optimal parameters and hyperparameters on each time window. The values of RMSE, MAE and MAPE obtained for each time-window on the left and right carriageways are reported in Table 1 and Table 2 respectively. From the obtained results, and in both scenarios, we can see that a time window of 48H outperforms the other time-windows for each traffic model. Thus in the case of the left carriageway, the LSTM model surpasses its other counterparts with an RMSE of 196.03 veh/h, a MAPE value of 5,95% and a MAE value 136,47
A Dual Carriageway Smart Street Lighting Controller
483
Table 1. Left carriageway forecast scores. ANN RMSE 72H 292,26
DNN
LSTM
CNN
MAE
MAPE RMSE MAE
MAPE RMSE
MAE
MAPE RMSE
MAE
236,03
13,26
10,05
157,83
6,85
226,79 11,32
252,64 185,03
142,32 6,39
216,46
312,1
MAPE
48H 205,92 149,95 7,6
198
196,03 136,47 5,95
282,63 210
9,72
24H 206,92
151,71
7,3
222,54 157,99
7,11
220,67
153,79
6,79
332,8
247,2
14,24
12H 272,15
205,03
13,29
266,9
12,57
214,49
151,13
7,16
339,39
251,43 12,73
10H 264,91
192,26
8,62
261,05 185,9
7,44
245,56
186,94
10,69
402,32
283,85 12,03
204,84
Table 2. Right carriageway forecast scores. ANN RMSE 72H 292,26
DNN
LSTM
CNN
MAE
MAPE RMSE MAE
MAPE RMSE MAE
MAPE RMSE
MAE
MAPE
236,06
13,26
10,05
6,85
226,79
11,32
252,64 185,03
216,6
157,83
312,1
48H 196,94 136,91 7,7
189,4 132,27 6,71
207,1 147,46 6,87
236,91 178,24 12
24H 222,65
159,58
8,44
215,55 153,04
8,02
206,94 147,52
7,69
319,15
230,03
12H 272,15
205,03
13,29
266,9
12,57
214,49 151,13
7,16
339,39
251,43
12,73
10H 264,91
192,26
8,62
261,05 185,9
7,44
245,56 186,94
10,69
402,32
283,85
12,03
204,84
11,51
Fig. 5. Comparison between forecast models on the right carriageway.
veh/h. Whilst in the case of the right carriageway the DNN model surpasses the other model, with an RMSE of 189,4 veh/h, a MAPE value of 6,71% and a MAE value 132,27 veh/h. A graphical comparison using the best models for each of the right and left carriageways are depicted in Fig. 5 and Fig. 6 respectively. Based on these results, a 48H LSTM model is used to predict traffic for the left carriageway, and a 48H DNN model for the left carriageway. The optimal hyperparameters of each model are reported in Table 3.
484
F. Agramelal et al.
Fig. 6. Comparison between forecast models on the left carriageway. Table 3. Optimal training parameters of the best forecast models. carriageways model layer 1 Layer 2 dropout epochs batch RMSE MAE MAPE Right
DNN
100
50
0.2
200
50
189.4
146.46 6,71
Left
LSTM 100
50
0.1
100
72
196,03
136.47 5,95
5
Light Controller
Fuzzy logic controllers are well-known heuristic methods based on human brain reasoning. These models are easy to design for complex problems based on the expertise of a skilled human operator by using a set of linguistic variables and rules rather than having a comprehensive grasp of the system’s internal laws and dynamics. The main steps of the FLC are summarized as follow: (1) Fuzzification: Crisp inputs are transformed into fuzzy sets. (2) Inference engine: A fuzzy output is obtained by combining the stored IFTHEN rules given by the human expert with membership functions. (3) Defuzzification: In this step, the fuzzy output obtained by the inference engine is converted into a crisp value. In this paper, three crisp inputs are used for the suggested FLC: The battery’s remaining energy Ent , The forecasted traffic flow Xt , and the expected solar irradiance during the next day irrd+1 , while the output is the dimming command of street lamps. First, the inputs were scaled from 0 to 100%, then triangular membership functions were used to convert inputs into fuzzy sets and then back to a crisp output. With high, medium, and low as linguistic variables to describe both inputs and outputs. We used Mamdani as a fuzzy inference system, with the ‘and method’ being ‘min’ and the aggregation method being ‘max’. We utilized the centroid method for defuzzification. A depiction of the control surfaces of the suggested FLC are shown in Fig. 7a, and Fig. 7b.
A Dual Carriageway Smart Street Lighting Controller
485
Fig. 7. Surface view of the suggested FLC (a) Light level in accordance with Traffic volume and Battery level (b) Light level in accordance with Traffic volume and solar irradiance during the next day.
6
System Assessment
In this section, we evaluate and compare the proposed light control approach against: (i) Traditional light control (i.e., no dimming). (ii) adaptive control scheme presented in [10]. (iii) adaptive Light control based on the Italian TAI regulations, where the light level is classified per traffic volume [9]. To this end we used Matlab to simulate the system. First a 30 W LED module is established using a 5 × 3 “LUXEON Rebel” LED points matrix [7], where each LED module is modeled using the following equation: Vo = RD Io + Vth
(13)
With V0 is the rated voltage Rd the dynamic resistance, Io the rated current and Vth the threshold voltage of the LED module. Then by exploiting the datasheet, a link between flux and current is established and the following regression equation is obtained: Id = 66.9713φ3 + 38.3589φ2 + 595.0029φ − 0.59462
(14)
where Id represents the rated current of one LED point and φ is the emitted light flux. Then the instantaneous power of the entire LED module is calculated by: Pt = Vo ∗ I0 = Vo ∗ (Id ∗ 3)
(15)
Although sizing different parts of a stand-alone SL system depends on several aspects such as night-time duration, solar irradiance, etc. we consider that LED lamps are active on average for 10 h a day, with a required autonomy of 3 days, thus based on the sizing used in [5]. A 12 V–100 Ah battery is chosen, with a total energy given by: (16) E0 = C0 U0 With U0 the battery’s voltage, and C0 the battery’s capacity [12].
486
F. Agramelal et al.
(a)
(b)
Fig. 8. Evolution of Battery charge over time using the suggested light controllers for (a) Left carriage way (b) Right carriageway.
Fig. 9. Comparison of normalized energy consumption between different light control schemes.
Once the traffic volume is forecasted, the FLC controller reads the battery energy level, and the solar irradiance forecast to calculate the lighting flux. The instantaneous power is then obtained by Eq. 15, and the remaining energy in the battery is computed each second, by subtracting the calculated instantaneous power from the stored energy. In some cases, the generated power by the PV modules may not be sufficient enough to charge the batteries. For example winter brings shorter days, lesser solar light exposure due to cover, and longer nighttime length, all of which increase energy demand. By comparing the proposed light controller against the other control schemes, a simulation showcasing this scenario is shown in Fig. 8. For a low irradiance expectation, a night-time duration of approximately 15 h, and an initial battery charge of 80%. The remaining energy in the left carriageway Fig. 8a and right carriageway Fig. 8b using the FLC are 63,04% and 64,21% respectively, while 54.58% and 56% remains in the case of control (iii), and 52,1% and 53,9% in the case of control (ii). Whilst in the case of no dimming the remaining energy is 42.5% for both carriageways. A comparison of normalized
A Dual Carriageway Smart Street Lighting Controller
487
energy consumption over N street lamp, between all the light control schemes is depicted in Fig. 9. Since the dimming command is different in each carriageway, The FLC light scheme saves up to 58% of energy in contrast to traditional light control, and up to 30% and 24% with the other light controllers. Consequently, the suggested light controller allowed a significant energy reduction, thus enabling street lamps to operate for longer periods of time even in adverse weather circumstances.
7
Concluding Remarks
In this paper, we presented a separate carriageway light controller for PV- powered street lamps, based on fuzzy logic and future traffic forecast. The aim of the controller is to prolong the usage of the system by handling energy consumption according to the battery charge, traffic demand, and future solar irradiance. To this end, a multi-variate LSTM and DNN traffic forecast models were validated and tested for each carriageway, using real-world traffic data and proper metrics. Then a model of the system was established in a Matlab environment. Simulations of the proposed control method show that the controller can adjust the lamp brightness effectively in accordance with traffic demand and battery level while considering future energy production. A comparison against the traditional light control shows an energy reduction of up to 58%, and against other state-of-the-art methods, the energy savings achieved 24%. In future work, we will focus on incorporating forecast models to estimate the energy production of PV panels and testing the utility of the light provided by the system to road users.
References 1. “Global LED & Smart Street Lighting Market (2015–2025)” Kernel Description. https://www.prnewswire.com/news-releases/global-led-smart-street-lightingmarket-2015-2025-300277486.html. Accessed 30 Mar 2022 2. “Light’s Labour’s Lost” – Policies for Energy-efficient Lighting. https://www. iea.org/news/lights-labours-lost-policies-for-energy-efficient-lighting. Accessed 03 Apr 2022 3. Yu, C.C., Wen, C.H.: “Constructing a stock-price forecast CNN model with gold and crude oil indicators[Formula presented]”. Appl. Soft Comput. 112, 107760 (2021). issn: 15684946. https://doi.org/10.1016/j.asoc.2021.107760 4. Jackett, M., Frith, W.: Quantifying the impact of road lighting on road safety—a New Zealand Study. IATSS Res. 36(2), 139–145 (2013) 5. Kiwan, S., Mosali, A.A., Al-Ghasem, A.: Smart solar-powered LED outdoor lighting system based on the energy storage level in batteries. Buildings 8(9) (2018). issn: 20755309. https://doi.org/10.3390/buildings8090119 6. Lau, S.P., et al.: A traffic-aware street lighting scheme for Smart Cities using autonomous networked sensors. Comp. Electr. Eng. 45, 192–207 (2015) 7. LUMILEDS, luXEon Rebel ES Kernel Description. https://lumileds.com/wpcontent/uploads/files/DS61.pdf. Accessed 30 Mar 2022
488
F. Agramelal et al.
8. Mustafa, A.M., Abubakr, O.M., Derbala, A.H., Ahmed, E., Mokhtar, B.: Towards a smart highway lighting system based on road occupancy: model design and simulation. In: Sucar, E., Mayora, O., Mu˜ noz de Cote, E. (eds.) Applications for Future Internet. LNICST, vol. 179, pp. 22–31. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-49622-1 4 9. Petritoli, E., et al.: Smart lighting as basic building block of smart city: an energy performance comparative case study. Measure. J. Int. Measur. Confederation 136, 466–477 (2019). issn: 02632241. https://doi.org/10.1016/j.measurement.2018.12. 095 10. Pizzuti, S., Annunziato, M., Moretti, F.: Smart street lighting management. Energy Efficiency 6(3), 607–616 (2013) 11. The Most Dangerous Time to Drive. https://www.nsc.org/road-safety/safetytopics/night-driving. Accessed 30 Mar 2022 12. Tukymbekov, D., et al.: Intelligent autonomous street lighting system based on weather forecast using LSTM. Energy 231, 120902 (2021). issn: 03605442. https:// doi.org/10.1016/j.energy.2021.120902 13. Wadi, M., et al.: Smart hybrid wind-solar street lighting system fuzzy based approach: case study Istanbul-Turkey. In: Proceedings - 2018 6th International Istanbul Smart Grids and Cities Congress and Fair, ICSG 2018, pp. 71–75 (2018). https:// doi.org/10.1109/SGCF.2018.8408945 14. Zheng, Z., et al.: LSTM network: a deep learning approach for short-term traffic forecast. IET Image Process. 11(1), 68–75 (2017). issn: 17519659. https://doi.org/ 10.1049/iet-its.2016.0208
Blockchain-Based Self Sovereign Identity Systems: High-Level Processing and a Challenges-Based Comparative Analysis Bahya Nassr Eddine(B) , Aafaf Ouaddah, and Abdellatif Mezrioui INPT, RAISS Laboratory, Rabat, Morocco {nassreddine.bahya,a.ouaddah,mezrioui}@inpt.ac.ma
Abstract. Traditional identity management systems trust centralized certification authorities (CAs) to manage public keys and authenticate the mapping between users and their respective keys. This presents several security concerns as a CA is a single point of failure in the system. Besides, managing public keys by a centralized CA is becoming costly regarding the current growth in users and the distributed systems they use. Also, centralized identity management systems lack interoperability and present privacy concerns. Self-Sovereign Identity (SSI) systems aim to address these issues by providing decentralized identity ecosystems that facilitate the registration and exchange of identity attributes, and the propagation of trust between participating entities, without needing to rely on a central authority. Blockchain technologies improve security in the SSI systems by allowing control of the storage and disclosure of credentials and identity information. They improve the integrity, confidentiality, and interoperability of users’ information. This paper highlights the challenges that an Identity Management System (IdM) must overcome, and how blockchain-based IdMs leverage blockchain technology to meet these challenges. It also presents a comparative analysis of three SSI ecosystems based on the above-mentioned challenges. An SSI layered architecture model detailing the actors, objects, components, and processes is then proposed. Keywords: Self-Sovereign identity · Blockchain · Identity management system
1 Introduction Service virtualization is predominant in more and more areas and more and more countries, especially with the Covid-19 pandemic which has imposed several health restrictions, including remote access to several services. This virtualization requires the exchange of identification data to ensure that an entity is what it claims to be. These personal data were traditionally managed by third parties. In such situations, the identity owner has no control over the protection of his data against theft or misuse. In this direction, several essential laws [1] that explain the successes and failures of digital identity systems are proposed. In this work, we identify ten important challenges Digital IdMs © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 489–500, 2023. https://doi.org/10.1007/978-3-031-26384-2_42
490
B. Nassr Eddine et al.
should address: They should allow identity owners to control the use of their identity data without involving any intermediary. IdMs must also ensure trustworthiness by allowing entities to prove that they are who they claim to be. This is achieved through authentication and protection of sensitive information. As no single IdM is sufficient on its own in all contexts, interoperability between existing IdMs is then an additional challenge that can be observed. Also, an IdM must allow the portability of users’ identities from one platform to another. This can be enabled by standardizing the data involved such as the identifiers and the credentials. Privacy must also be effective in such a system. This can be ensured if personally identifiable information (PII) does not allow any correlation with identities. Furthermore, in addition to having to be sustainable, an identity system must allow users to have a user experience consistent with their needs and expectations so that users can understand the implications of their interactions with the system. Recovery, namely the ability to easily and securely recover keys and credentials also remains a challenge in an IdM. The last challenge is the cost. Indeed, to be robust and effective, an identity system must be adopted by as many people and organizations as possible around the world. To do this, it must be free or offered at nominal cost. The next section cites some related works. It presents a comparative analysis of three SSI ecosystems based on the above-mentioned challenges. It also gives an overview of certain layered architecture models of SSI. Section 3 explains how blockchains enable IdMs to overcome most of the challenges mentioned. Section 4 is about the architecture and main components of a blockchain-based SSI system, followed by the design of a blockchain-based SSI layered architecture model. Finally, a conclusion will quote some future research directions.
2 Related Work Comparative Analysis. Civic [4] and uPort [3], now Serto, are both self-sovereign identity platforms that allow individuals to create and manage their own digital identity. uPort offers two product suites: uPort Serto, and uPort Open that gives components and tools that help building products to help preserving privacy during exchange between entities. Civic [4], which operates over identity.com as its decentralized identity verification marketplace uses smart contracts to give Users, Requesters, and Validators access to reusable identity verification powered by CVC tokens. Sovrin [2] is an open-source SSI ecosystem that provides tools and libraries to create private and secure identity management solutions that run on Sovrin’s identity network. Sovrin source code was developed by Evernym and then donated to the Sovrin Foundation that governs the Sovrin IdM. Here is a comparison of these three ecosystems in terms of the mentioned challenges: User control: All three platforms give users control over their own digital identity and the ability to choose how they present themselves online. However, storage in Sovrin can take place in a Cloud Agent protected against unauthorized access. Also, Civic requires access to Google Drive to store an encrypted backup of the user’s wallet for recovery purposes. Trustworthiness: Civic and uPort both use blockchain technology to store and manage identity information, which can help ensure the integrity and verifiability of the
Blockchain-Based Self Sovereign Identity Systems
491
data. Sovrin uses a distributed ledger technology called Hyperledger Indy, which is also designed to be secure and verifiable. Privacy: All three platforms prioritize privacy. Sovrin supports anonymous credentials based on zero-knowledge protocol. uPort Registry stores the mapping of uPort identities to the identity attributes as a JSON, which may leak information even if encrypted. In Civi, user’s verified credentials (VC) are encrypted and stored in the user’s device. This information is represented in a Merkel tree and only root hashes are signed and recorded to the blockchain. However, an encrypted backup of the user’s wallet is stored in the user’s Google Drive account for recovery purposes. Interoperability is limited for all three ecosystems that are built on W3C and DIF standars. They all are interoperable with DID-based SSI systems that support the DIF universal resolver. It requires further alignment with other IdM. It is not possible to comment on the sustainability of the three compared ecosystems being an emerging system.Their sustainability will depend on a variety of factors, including their business models, adoption rates, and the support of the wider community. User experience: The user experience of these ecosystems will depend on a variety of factors, including their design and ease of use, as well as the specific context in which they are being used. Recovry: Sovrin as uPort use a quorum-based key recovery solution. Civic App requires access to Google Drive to store an encrypted backup of the user’s wallet to allow them to recover their account in case of deleting the App or switch phones. Cost: Credentials issuance on Sovrin is free. However, small fees are charged for writing on the public ledger [23]. Identity creation on uPort is free as well as additional functionalities that do not require chain transactions. But due to the Ethereum cost of gas, all uPort transaction fees on the Ethereum blockchain are paid by the owner of the identity signing the transaction. Civic’s pricing policy is set out at [24]. It sets platform subscription fees that depend on the number of active users and fees by type of use. Civic promises lower fees thanks to its integration with Solana. Considering the above, Sovrin seems to be the solution that most respects the challenges of the three compared. This is because the uPort registry stores information in JSON format, so does not adhere to the principles of ownership challenges, user control, and privacy. Same for Civic due to the need to back up the user’s wallet to their Google Drive account. However, all three ecosystems require improvements in the user experience they provide and in the cost they require. They also need additional alignment with other IdMs than DID-based ones. SSI Layered Architecture Models. Several works have proposed layered architecture models of SSI. An SSI layered model inspired by the OSI model has been defined at the Internet Identity Workgroup (IIW) 17th session [5]. The proposed set of layers is required to maintain interoperability and portability up to the Application layer. Even though the proposed model separates layers, layers may be grouped for simplicity, scalability, or to use other standards than the DID.
492
B. Nassr Eddine et al.
Another approach to the SSI layered model has been proposed by the ToIP Foundation [6] that defines a dual-stack, each made up of four layers, for digital trust, one stack for technology, and the other for governance. It should be noted that many standards and protocols are being developed to enable true interoperability by many groups such as the World Wide Web Consortium (W3C), the Decentralized Identity Foundation (DIF), and the Sovrin Governance Framework working group [7]. In [8] is defined an SSI model made up of three layers: Regulation as the first layer, Technology as the second layer, and Trust Frameworks as the third layer. Our proposed SSI layered architecture model will be detailed in the fourth section.
3 How Does SSI Leverage Blockchain Technology? A blockchain is a secure, shared, distributed and fault-tolerant ledger that facilitates the process of registering and tracking resources without the need for a central trusted authority [9]. A blockchain provides a transparent, immutable, and reliable way to process the exchange of cryptographic keys. Applying blockchains to IdMs is therefore likely to make these systems more efficient [10]. The immutability of blockchain and the distributed consensus eliminate the role of central authorities in verifying credentials, even for smart contracts which enable the deployment and execution of immutable programs. Blockchain is a tamper-resistant ledger. Decentralized authentication can then be established as the verification of key ownership can be established. The distributed nature and the event recording properties of the blockchains combined with their immutability and irreversibility characteristics provide a strong non-repudiation instrument for any data in the ledger. Data is stored within transactions in the blockchain in a distributed fashion ensuring its persistence. Interactions in the blockchain are transparent as the state of the ledger, along with every transaction among entities, can be verified by an authorized entity. Self-sovereign identity systems use blockchains and distributed ledgers to make it possible to lookup decentralized identifiers without involving a central repository. Blockchains don’t solve online identity issues on their own, but they allow entities to prove things about themselves using decentralized, verifiable credentials just as they do offline. For example, in an SSI ecosystem, a digital copy of a person’s certificate can be stored in their wallet. The owner can then share the signed copy, or verifiable credential, another entity, such as an employer, that involves in the system.
4 Blockchain-Based SSI Architecture Self-Sovereign Identity is built on two standards defined by the World Wide Web Consortium (W3C): The Decentralized Identifiers (DIDs) and the Verifiable Credentials (VC).
Blockchain-Based Self Sovereign Identity Systems
493
In this section, we will provide a high-level overview of a blockchain-based SSI processing that is illustrated by Fig. 2 where the actors, objects, components, and main processes involved in such a system are detailed. We then propose a layered model of SSI architecture illustrated by Fig. 3. 4.1 High-Level Overview of Blockchain-Based SSI Processing 4.1.1 Involved Actors As mentioned in Fig. 2, three actors are involved in a blockchain-based SSI: the id holder, the issuer, and the verifier. The id holder is required to provide his personal information to prove his identity with the Credential Issuer, to obtain a validated Verifiable Credential. This validation is obtained via the signature of that credential by the Issuer DID. This signed credential can be securely stored in the Credential Holder’s Wallet and be associated with his DID. The Credential Holder can choose to present only the statement of the credential, the claim, that is required by the Verifier without invalidating the signature of the Issuer. The Verifier can then verify the authenticity of the identity and claim by verifying the Holder’s and Issuer’s DID signatures (contained in the claim) against the blockchain. All the actors’ DIDs are registered in the blockchain. 4.1.2 Objects There are three major objects in a blockchain-based SSI: keys, Decentralized Identifiers (DIDs), and Credentials. In [11], the authors illustrate the lifecycles of that objects and their interconnection in SSI. Decentralized Identifiers (DID): In blockchain-based SSI, entities are identified by DIDs. In W3C specification [12] DID is a URI composed of three parts as illustrated by Fig. 1: the scheme “did:”, a method id (eg. “ethr” for Ethereum, “btdr” for Bitcoin), and a unique method-specific identifier generated by the DID method specification. That DID method defines how a specific type of DID and its associated DID document are created, resolved, updated, and deactivated adequately to a specific distributed ledger or network.
Fig. 1. DID syntax (W3C specification)
The DID is recorded on the blockchain. It is resolved to the DID document which follows formats such as JavaScript Object Notation (JSON). To allow interoperability across ledgers and DID methods, Decentralized Identity Foundation’s (DIF) Universal Resolver [13] provides a unified interface that can be used
494
B. Nassr Eddine et al.
to resolve any kind of decentralized identifier. This is achieved through an architecture consisting of “drivers” that can be implemented in DID-based IdM. The system is then able to link the Universal Resolver to the systems specific DID method to read the DID Document. A driver can be added via a Java API or a Docker container. The Universal Resolver now supports around 30 different DID methods (such as Sovrin, Bitcoin Reference, Ethereum uPort, Jolocom, Blockstack, Veres One…) [14]. Pairwise Pseudonymous: for security and privacy considerations, it can be important to have Directed Identity. This means that when interacting with different entities, to present yourself using pairwise-pseudonymous identifiers that are not linked to the primary identifier, which enables users to maintain a level of anonymity. For example, Sovrin offers the concept of microledgers that allow the generation of DIDs which are not registered in the blockchain. They only are shared between the identity owner and a relying party through peer-to-peer communication. Under the Bitcoin Improvement Protocol, Hierarchical deterministic (HD) wallets can be used to derive unlinkable identifiers from a single master key. Credentials: A Claim is a statement concerning a subject made by an issuer. It is a part of a credential. For example, the age or the function can be a claim. A credential is a set of one or more claims with their metadata. A credential is made by an issuer and is associated with an identifier. A verifiable credential is a set of tamper-evident claims and metadata that cryptographically prove who issued it. W3C [15] defines a specification for cryptographically secure, privacy-respecting, and machine-verifiable credentials on the Web. The exchange of Verifiable Credentials associated with a DID is related to DID Auth. [16] presents three ways of conceiving the relationship between DID Auth and Verifiable Credentials exchange. DID Auth and Verifiable Credentials exchange could be thought to be separate from each other: in this approach, when interacting with each other, entities first authenticate (mutually, or just in one direction). Then, a protocol for the exchange of Verifiable Credentials can be executed. Or Verifiable Credentials exchange can be thought of as an extension to DID Auth: a single protocol is used for both authentication and for proving possession of Verifiable Credentials, using an “optional field” in the protocol. Finally, it is possible to consider DID Auth as an exchange of a self-issued Verifiable Credential that states that you are who you claim to be. 4.1.3 Components Self-sovereign identity can be ensured through the interworking of key components that we list below: Blockchain. The blockchain acts as a replacement for the registration authority in a classic IdM. It gives the possibility to the user to record proofs in decentralized ledger such as digital signatures and timestamps so that any authorized entity can verify them against it. The Id holder Agent. A component (mobile application or browser) that performs almost all operations for DIDs, including DIDs creation, which is the same as a wallet
Blockchain-Based Self Sovereign Identity Systems
495
application in blockchain technology [17]. It can store the verifiable credentials locally on the wallet or can push them to an identity hub [18]. The id holder agent mediates the communication between the id holders, the issuers, and the verifiers. The Universal Resolver. To allow interoperability across ledgers and DID methods, Decentralized Identity Foundation’s (DIF) Universal Resolver [13] provides a unified interface that can be used to resolve any kind of decentralized identifier. This is achieved through an architecture consisting of “drivers” that can be implemented in DID-based IdM. The system is then able to link the Universal Resolver to the systems specific DID method to read the DID Document. A driver can be easily added via a Java API, a Docker container, or a remote HTTP GET call. The Universal Resolver now supports more than 30 DID methods (such as Sovrin, Bitcoin Reference, Ethereum uPort, Jolocom, Blockstack, Veres One…) [14]. Off-chain Repositories. As blockchains are generally public and immutable, no credentials or personal data should be stored in these ledgers. Only fingerprints can. Indeed, there are mainly two different ways for storing claims and credentials: registry model, and non-registry model. In the registry model, claims fingerprints are stored in the blockchain. uPort for exampls has public profile that includes names, profile pictures and public keys. For the non-registry model: credentials/claims may be stored locally in the wallet itself by the user agent or by a third-party custodian to whom the subject has delegated this role. With this approach, user rights such as privacy, control over personal data, right to be forgotten [19], can hardly be guaranteed. uPort for example uses a smart contract, named uPort Registry, that maintains a mapping of user identity to hashes of claims that are stored off-chain as JSON documents in an IPFS data store. The claims fingerprints are used to verify claims integrity by the relying party. The blockchain timestamp property protects claims and their signatures from secret modification. Blockstack on the other hand opted for centralized storage providers such as Amazon S3, Dropbox, and Google Drive. Even this helps prevent potential data loss as these systems are highly redundant, this makes us question whether the user has real full control over the privacy and the security of their identity attributes. 4.1.4 Processes The blockchain-based IdM lifecycle consists of four phases including registration, authentication, issuance, and verification. Registration. DID registration [20] is the process of creating a DID on a distributed ledger and associating it with one or more public keys. The user agent generates a key pair. It submits then the public key along with the DID and the DID Document as a proposed transaction to the blockchain. The built-in consensus mechanism is then used to validate, distribute and immutably write the new information bloc (DID and DID Document) in all the distributed nodes. Authentication: With DIDs, authentication means proving ownership of a DID. This is established by proving control of the private key associated with a public key relative to the DID. The public key is stored as a value of the identifier on the blockchain. This concept has been described as Decentralised Public Key Infrastructure.
496
B. Nassr Eddine et al.
The way the user holds the private key associated with its key pair differs depending on the IdMs. For both uPort and Blockstack, private keys are stored in the device that the identity was created on and the user himself is responsible for key recovery and mobility. DID Auth is a mechanism by which any entity can cryptographically prove the ownership of a DID. Such authentication can be established [21] through a challengeresponse cycle that includes resolving a DID to its DID Document. These cycles rely on exchanges of tokens and objects that are transported using combinations of mechanisms like QR code scanning, HTTP protocol, biometrics [22], or NFC technology. The way and the format of the challenge vary depending on the situation. For example, they can come across a “Sign in with DID Auth” button or a QR code on a website. It is possible to combine a DID Auth within a higher layer interaction such as the exchange of Verifiable Credentials, to simultaneously prove control of a DID and offer Verifiable Credentials for some transaction-specific purpose. DID Auth has many different forms of interactions. [16] describes the 10 different architectures of DID Auth. For example, uPort authentication uses a QR code communication channel for the challenge, and for the response, it uses HTTP Post. This is combined with a higher layer interaction for the exchange of Verifiable Credentials. Since authentication relies on the keys held by the identity owner, and since key losses are likely, key recovery/replacement is a necessity for a working SSI system. In DIDbased systems, this can be possible when authentication is separate from authorization, thus allowing others to change the DID Document by changing the authentication key after the loss of the private key. For Blockstack, private key recreation can be made through mnemonic phrases, typically of 12 words, that are used as a seed to generate the keys. uPort as Sovrin, use a quorum-based key recovery solution. Delegates are previously selected to vote for the replacement of the user’s public/private key pair. However, the delegates could be one vector of attack since their uPortIDs are openly linked to the user’s uPortID. If an attacker can compromise an uPort application and replace delegates, the uPortID is compromised permanently. Issuance. Whenever credentials are issued, the issuer records the status of those assets in the decentralized ledger along with cryptographic proof of issuance and a timestamp. The status of credentials can be changed by the issuer or by any entity authorized by it (e.g. from active to revoked), according to applicable rules. It should be noted that for some blockchain-based SSI systems, the issuance of credentials may incur transaction costs to write to these registries. Users would bear these costs. Verification. Verifiers go to the blockchain to verify the cryptographic proofs left by issuers to verify the credential the id holder presents to them. Verification should include checking the status of verifiable credentials to ensure that it is active and not revoked or suspended [8]. 4.2 Blockchain-Based SSI Layered Model Below is the layered model of the blockchain-based ISS that we propose in seven layers.
Blockchain-Based Self Sovereign Identity Systems
497
Fig. 2. Blockchain-based SSI high level processing
Applications layer Governance layer Credentials layer DID Authentication layer DID Resolution layer DID Operations layer Blockchain layer Fig. 3. Blockchain-based SSI layered model
Applications layer is about the human interaction (UI/UX) with the SSI ecosystem through API calls and library integration. Governance Layer: An SSI ecosystem requires a governance structure that makes it secure and trustworthy. This requires the implementation of governance frameworks that specify policies, requirements, principles, and standards [8]. Credentials Layer: In this layer must be considered all the operations related to the credentials and claims lifecycle: their request, their issuance and signature, their storage, their disclosure, their verification, their revocation, and their expiration.
498
B. Nassr Eddine et al.
Referring to the architecture illustrated in Fig. 2, in this layer the “Issuance” and “Verification” processes take place. DID Authentication Layer: This layer describes the authentication methods using the DIDs. Referring to the architecture illustrated in Fig. 2, in this layer the “Authentication” process takes place. DID Resolution Layer: DID Resolver used to convert the DID to its associated DID document. DID resolution can be achieved through a native resolution method defined in a DID method specification or through the DIF Universal Resolver. DID Operations Layer: In this layer should be defined how a specific type of DID (adequately to a specific distributed ledger) and its associated DID document are created on the blockchain, how the DID is resolved to its associated DID Document, and how DID can be updated and deactivated. These CRUD operations are defined in the DID method specification [12]. Referring to the architecture illustrated in Fig. 2, in this layer the “Registration” process takes place. Blockchain Layer: Every DID-compliant blockchain has an associated DID “method” which is a specification that governs how DIDs are anchored on the ledger. it should be noted that the DID method can sometimes separate the storage of the DID and the DID document from the blockchain. This storage is then managed outside of the ledger.
5 Conclusion Self-Sovereign Identity does not necessarily guarantee that identity owners have total control over their identities or that they have absolute trust in the system. Indeed, some risks must be considered such as: • Credentials and claims are protected from outside sources. However, they are not against inside actors such as the issuer or any entity that holds the issuer’s signing key; • Even if issuers and verifiers have obtained the identity holder’s data with the identity holder’s consent, they may decide to keep it for other purposes such as for marketing use; • Users are also required to store their verified credentials in a secure wallet on their devices. The scenario where users might move their verified credentials from one device to another or when they lose their trusted device is another challenge. SSI should meet international compliance requirements. For example, in Morocco, the Law No. 09–08 is about the protection of individuals regarding the processing of personal data. We also find Law No. 53–05 on the electronic exchange of legal data. In the European Union, we find electronic identification and trust services (eIDAS) Regulation that formulates requirements relating to the mutual recognition of electronic identification means as well as that of electronic signatures, for exchanges between
Blockchain-Based Self Sovereign Identity Systems
499
public sector organizations and users in the Member States. We also find the General Data Protection Regulation (GDPR) that aims to protect users by allowing them to control their identity data. It includes the right of access, consent, data minimization, portability, and the right to be forgotten. However, blockchains are immutable and are used to store cryptography proofs. It is, therefore, necessary to ensure that the storage of personal data or PII in the Ledger is avoided. The issue of interoperability and portability in IdM has also not been widely addressed for SSI systems and remains an open issue for digital identity. The possibility of integrating blockchain-based IdM with existing solutions may lead to wider adoption of SSI. The user experience is another challenge that still needs to be addressed by blockchain-based IdM. The research in this field is still early and still needs more work.
References 1. Cameron, K.: The Laws of Identity (2005). https://www.identityblog.com/stories/2005/05/ 13/TheLawsOfIdentity.pdf 2. The Sovrin Foundation: Sovrin™: A Protocol and Token for SelfSovereign Identity and Decentralized Trust (2018) 3. Lipi´nska, A.: uPort Serto Ecosystems: Creating trusted data networks between businesses and individual (2019). https://medium.com/uport/uport-serto-ecosystems-creating-trusted-datanetworks-between-businesses-and-individuals-ff21c9368d3b 4. Ligham, V.: CVC Token Transfer and Identity.com Nonprofit (2021). https://www.civic.com/ blog/cvc-token-transfer-and-identity-com-nonprofit/ 5. Terbu, O.: The Self-sovereign Identity Stack (2019). https://medium.com/decentralized-ide ntity/the-self-sovereign-identity-stack-8a2cc95f2d45 6. Trust over IP Foundation: the white paper Introducing the Trust over IP Foundation (2020) 7. Sovrin Foundation: Taking the Sovrin Foundation to a Higher Level: Introducing SSI as a Universal Service (2020). https://sovrin.org/taking-the-sovrin-foundation-to-a-higher-levelintroducing-ssi-as-a-universal-service/ 8. López, M.A.: IDB: SELF-SOVEREIGN IDENTITY The Future of Identity: SelfSovereignity, Digital Wallets, and Blockchain (2020) 9. Gisolfi, D.: Self-sovereign identity: Why blockchain? (2018). https://www.ibm.com/blogs/ blockchain/2018/06/self-sovereign-identity-why-blockchain/ 10. Ferdous, M.S., Chowdhury, F., Alassafi, M.O.: IEEE Access (vol. 7) In Search of SelfSovereign Identity Leveraging Blockchain Technology (2019) 11. Liu, Y., et al.: Design-Pattern-as-a-Service for Blockchain-based Self-Sovereign Identity. In: IEEE Software Special Issue on Blockchain and Smart Contract Engineering (2020) 12. W3C Community Group (2020) Decentralized Identifiers (DIDs) v1.0, Core architecture, data model, and representations. https://www.w3.org/TR/did-core/ 13. Sabadello, M.: A Universal Resolver for self-sovereign identifiers (2017). https://med ium.com/decentralized-identity/a-universal-resolver-for-self-sovereign-identifiers-48e6b4 a5cc3c 14. Sabadello, M.: The Universal Resolver Infrastructure (2020). https://medium.com/decentral ized-identity/the-universal-resolver-infrastructure-395281d2b540 15. W3C Community Group: Verifiable Credentials Data Model 1.0 (2019). https://www.w3.org/ TR/vc-data-model/
500
B. Nassr Eddine et al.
16. Sabadello, M., et al.: Introduction to DID Auth, White paper, Rebooting The Web of Trust VI (2018) 17. Matsuzaki, T.: Walkthrough of Decentralized Identity (DID) Network (2019). https://tsmatz. wordpress.com/2019/12/24/decentralized-identifiers-did-tutorial/ 18. Roon, M.: DIF Identity Hubs (2019). identity-hub/explainer.md at master · decentralized-identity/identity-hub · GitHub 19. Kondova, G., Erbguth, J.: SAC’20, “Self-Sovereign Identity on Public Blockchains and the GDPR.pdf” 20. DID Registration. https://didproject.azurewebsites.net/docs/registration.html 21. Lesavre, L., Varin, P., Mell, P., Davidson, M., Shook, J.: A Taxonomic Approach to Understanding Emerging Blockchain Identity Management Systems. Nist Cybersecurity White Paper (2020) 22. Six Principles for Self-Sovereign Biometrics (2019). https://github.com/WebOfTrustInfo/ rwot6-santabarbara/blob/master/draft-documents/Biometrics.md 23. Foundation, S.: Write To The Sovrin Public Ledger! https://sovrin.org/issue-credentials/ 24. Civic Technologies: “Flexible Pricing that Scales”. https://www.civic.com/pricing/
Impact of Machine Learning on the Improvement of Accounting Information Quality Meryem Ayad1(B) , Said El Mezouari1 , and Nassim Kharmoum2
2
1 Research Laboratory in Finance, Accounting, Management and Information Systems and Decision Support, ENCG, Hassan First University of Settat, P.O. Box 577, Settat, Morocco {m.ayad,said.elmezouari}@uhp.ac.ma National Center for Scientific and Technical Research (CNRST), Rabat, Morocco [email protected]
Abstract. Nowadays, the Accounting field is in constant evolution due to Artificial Intelligence’s new technologies, such as Machine Learning. On the other hand stakeholders such as investors, corporate managers, and creditors rely vitally on information provided by Accounting, known as Accounting Information in making better business decisions. To make such decisions, producing high-quality Accounting Information by companies is essential and is the objective of Financial Accounting. The main aim of this work is to explore, analyze, and discuss the impact of Machine Learning algorithms in improving Accounting Information quality. It does so through a bibliometric analysis conducted on 114 publications to identify key research trends. Afterward, as financial statements are the primary source of Accounting Information, we analyzed case studies on the impact of Machine Learning algorithms on financial statements. Further, Machine Learning, specifically the classification algorithm, is some of the main trends and plays a significant role in fraud detection in financial statements, thus, improving Accounting Information reliability. Keywords: Machine Learning algorithms · Accounting Information Financial statements · Bibliometric analysis · Review
1
·
Introduction
In the business world, Artificial Intelligence and Machine Learning techniques have greatly improved the efficiency of enterprise Accounting and financial management and gradually shifted traditional financial Accounting to modern financial management and Accounting [1]. Indeed, the growing advancement of artificial intelligence has brought tremendous changes to Accounting [2] and has a significant role in improving business efficiency, reducing work errors, and preventing and controlling business risks [3]. On the other hand, stakeholders such as investors, suppliers, managers, and creditors make crucial business decisions c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 501–514, 2023. https://doi.org/10.1007/978-3-031-26384-2_43
502
M. Ayad et al.
based on Accounting Information [4,5]. For example, investors use Accounting Information to decide if they should invest in a company or not [6], and creditors need Accounting Information to evaluate the creditworthiness of the company [7]. Therefore, each decision requires accurate, high-quality, timely, and reliable information because every wrong or misleading decision comes at a high price [8]. The generation of Accounting Information occurs through financial Accounting, which is a form of Accounting that has the role of recording the organization’s transactions with its external environment to periodically and systematically determine the financial status and results of operations performed. All Financial Accounting Information is published in conformity with legislation [9]. The scope of this paper is to explore and discuss how Machine Learning impacts the improvement of the quality of Accounting Information, to do so, we will first briefly introduce the concepts related to Accounting Information and Machine Learning algorithms and then use bibliometric analysis to reveal research, trends, area, and trajectories in link with Machine Learning algorithms and Accounting Information, and finally analyze empirical evidence from case studies in order to identify the impact of Machine Learning algorithms on the improvement of Accounting Information quality.
2
Methodology
In this article, we will respond to the three following research questions: RQ1: What is Accounting Information and Machine Learning? RQ2: What are existing literature trends and research avenues on Machine Learning applied in Accounting Information? RQ3: How does Machine Learning impact improving the quality of Accounting Information? To answer the above research questions, we will use the following methodology: For RQ1: through a review, we will seek to define concepts, underlining the Accounting Information and its characteristics that determine its quality for better decision making, and briefly introduce Machine Learning. For RQ2: will be addressed through bibliometric analyses using VOSviewer software, to do so, Firstly we selected Scopus as the most recognized database for our research, secondly, we performed the search for articles using advanced search for which, keywords, as shown in Fig. 1 must appear simultaneously to match our query. Scopus outputs contain 190 publications, ensuring that our analysis focuses on the most relevant publications: 76 articles that were not directly related to our research questions were filtered out, so only 114 papers are included in the analysis, we exported a file listing the year, title, abstract, and keywords. Afterward, we forwarded the file to VOSviewer software.
Impact of Machine Learning on the Improvement
503
Fig. 1. Query string from the online database Scopus including alternative terms
To highlight research trends, we focused our analysis on the occurrence and co-occurrence of three variables. Namely: author keywords, index keywords, title, and abstract terms. For RQ 3: will be dissected by analyzing empirical evidence from different case studies conducted on the impact of Machine Learning algorithms on financial statements, the Main source of Accounting Information. The selection of case studies is based on their relevance and link to QR3.
3
Definition of Concepts Underlining Accounting Information and Machine Learning
In this Section, we will respond to the first research question, which is: What is Accounting Information and Machine Learning? 3.1
Accounting Information
Accounting Information is Information that is generated from business transactions and is used to make business decisions by all parties who have an interest in a business, Accounting Information sources are mainly financial statements, including the balance sheet, income statement, and cash flow statement. for stakeholders, Accounting Information is an indispensable element in the process of making important business decisions, to meet their needs Accounting Information, the Financial Accounting Standards Board [10] presented the main characteristics that should be satisfied namely: understandability to decisionmakers, relevance, reliability,and comparability, These qualities enable decisions makers to recognize the difference between the good and the bad information, and therefore between the more useful and the less useful Information [11]. 3.2
Machine Learning
Machine Learning as defined by Ba¸stanlar et al. [12] is a subset of AI where enabling computers to make successful predictions using past experiences, machines learning can learn by using algorithms to interpret data from the world around us to predict outcomes and learn from successes and failures to enhance decision making [13]. Machine Learning techniques can be broadly classified into two main categories, as we schematized in Fig. 2, the classification depends on whether the output values are required to be present in the training data [12], Other categories exist as, Reinforcement learning and so on.
504
M. Ayad et al.
Fig. 2. Machine Learning algorithms classification (Source: author’s own)
4
Literature Trends and Research Avenues
In this section, we will address the second research question, which is: What are existing literature trends and research avenues on Machine Learning applied in Accounting Information? To identify and analyze research trends and avenues related to Machine Learning and Accounting Information, we used a bibliometric analysis procedure as outlined by Donthu et al. [14]. We can explain our choice by the popularity of bibliometric analysis in business research and its advancement [14]. To run the bibliometric analysis, we used the VOS (Visualisation of Similarities) technique through VOSviewer software as introduced by van Eck and Waltman [15]. To highlight research trends, we focused our analysis on the occurrence and co-occurrence of three variables. Namely: author keywords, index keywords, title, and abstract terms. To get an overview of the current trend regarding our subject, we have added the analysis shown in Figs. 6, 8, 7, conducted by Scopus following our search request as given in Fig. 1. 4.1
Analyze and Discussion of Author’s Keywords
The bibliometric analysis is based on 324 authors’ keywords, using 3 as a minimum number of occurrences. Vosviewer outcome, as visualized in Fig. 3, is a network made of 19 Nodes, ventilated in four thematic clusters with different colors. 1. Analyse of Findings – The more keyword is mentioned, the bigger [14], therefore from 114 publications selected for analysis, we found that the keyword “Machine Learning” is the most occurring word (60), followed by Artificial Intelligence (13); Data Mining (13); Accounting (13); Fraud detection (8); Financial Statement (7);
Impact of Machine Learning on the Improvement
505
– The link between the nodes shows the co-occurrence between the keywords [14], hence, “Artificial Neural Networks”(ANNs), usually called Neural Networks (NN) a Machine Learning algorithms;“Bankruptcy Prediction”; “Prediction Models” and “Deep Learning” are closely related concepts and have been used together in the 114 publications multiple times, which show their connectedness; – “Data Mining”; “Financial Statement” and “Fraud Detection” have been clustered together in the green group which means that they co-occurred together in publications multiples times; – Keywords namely “Machine Learning”;“Natural Language Processing”;“Classification”; “Test Mining”; “Feature Selection”; “Financial Statement Fraud “are linked together in the red group; – “Blockchain” and “Accounting” seem to be interesting research avenues to be explored further.
Fig. 3. Our network visualization of the Occurrence and Co-occurrence of the author’s keyword
2. Discussion of Findings Figure 3 shows three keys research thematic trends, illustrating the existing relationship between Machine Learning algorithms and Accounting Information:
506
M. Ayad et al.
– We can view The first cluster (Red group) as the potential impact of “Text mining”; “Classification Model” and “Natural Language Processing “on fraud detection in “Financial Statement” The main Accounting Information source; – We can view the second cluster (gold group) as the potential impact of “Deep Learning”; “Artificial Neural Networks (ANNs)”, on “Bankruptcy Prediction” this prediction is done by calculating several financial ratios based on financial statements; – We can consider the second cluster (gold group) as the potential impact of “Data Mining” Fraud Detection in “financial statement” the main Accounting Information source. 4.2
Analyze and Discussion of Index Keywords
The bibliometric analysis is based on 846 index keywords, using 3, as the minimum number of occurrences. Vosviewer outcome as visualized in Fig. 4, is a network made of 77 nodes, ventilated in seven thematic clusters with different colors.
Fig. 4. Our network visualization of the Occurrence and Co-occurrence of the index keyword
1. Analyse of Findings – In the 114 publications selected for analysis, the keyword “Machine Learning” is the most used keyword (46) then followed by “Financial Statements” (42); “Learning Systems” (38); “Finance” (30); “Learning Algorithms” (26); “Data Mining” (16); “Artificial Intelligence” (16); “Forecasting” (15); “Decisions Trees”; (13) “Machine Learning techniques” (13); “Support Vector Machines” (12); “Decision Making” (11);
Impact of Machine Learning on the Improvement
507
– Machine Learning techniques, namely, Supervised Learning has emerged in index keywords through its algorithms: Support-Vector Machines; Decision Trees (13); Neural Networks (12); Logistic Regression (4); Support Vector Regression (4); Discriminant Analysis (3) appear to be elements that are used in the Accounting field; – “Data mining”; “Financial Fraud Detections”; “Financial Statements Fraud”; “Feature Selection” are connected concepts and have been used by index together; – As observed in the author’s keywords visualization, “Neural Networks”; “Bankruptcy Prediction”; “Prediction Models” and “Deep learning” are closely related concepts and have been used together in the 114 publications multiple times; – Two different technologies of Artificial Intelligence, namely “Machine Learning”; “Expert Systems” are linked with” Risk Assessment”; “Decision Making” ; – “Engineering Education” and “Machine Learning”: appear to be an area of interesting research for further study. 2. Discussion of Findings – Figure 4 highlights as well some Main research thematic trends to illustrate the existing relationship between ML algorithms and Accounting Information through financial statements fraud detection, this concerns data mining’s potential impact on financial statements fraud detection, and the potential use of neural networks and deep learning in bankruptcy prediction. 4.3
Analyze and Discussion of Title and Abstract Terms
The bibliometric analysis is based, on 2954 terms, using 5 as the minimum number of occurrences. Vos viewer extracted 115 nodes, each node is a selected term that showed up at least five times in the title or abstract terms, to have a relevant result, we excluded 68 irrelevant terms to the purpose of this research, The final output, visualized in Fig. 5, is ventilated in four thematic clusters with different colors, analyzed, and discussed as follows: 1. Analyse of Findings – “Machine” is the most used keyword (57), followed by “Financial Statements (37); “Machine Learning” (35) “Accuracy” (29); “Accounting” (27); “Fraud” (22); One of the Accounting Information characteristics is detected, namely “Reliability”; – One of the Accounting Information characteristics is detected, namely “Reliability”; – Some interesting concepts, namely: “Machine Learning”; “Reliability”; “Accounting Information “, “Decision Making “, and “Investor “coexisted together so many times;
508
M. Ayad et al.
Fig. 5. Our network visualization of the Occurrence and co-occurrence of Title and abstract terms
– The link between “Data Mining” and “Fraudulent Financial Statement “also co-occurred multiple times in title and abstract terms of 114 publications; – A subset of supervised Machine Learning, namely Classification is linked to “Accuracy”; “Financial Statement Fraud “; “Detection”; and “Auditor”; – Emerging concepts have appeared concurrently in the title and abstract selected terms, namely “Decision tree”; “Support Vector Machine”; “Neural Network”; ”Financial Ratio”. 2. Discussion of Findings Figure 5 indicate some emerging main research thematic tends to illustrate the existing relationship between ML algorithms and Accounting: – Reliability appears to be one of the main characteristics of Information Accounting that is linked to Machine Learning; – As revealed in the co-occurrence of the author’s keywords, classification models have a potential impact on financial statement fraud detection (blue group); – As revealed in the co-occurrence of authors’ keywords, data mining has a potential impact on financial statement fraud detection (green group).
4.4
Scopus Analysis Related to our Research Request
To get an overview of the current trend regarding our subject, we have added the analysis conducted by Scopus following our search request as given in Fig. 1, The 114 publications were published, between 2002 and 2021, can be discussed as follows:
Impact of Machine Learning on the Improvement
509
Fig. 6. Repartition of publication by country
– As Fig. 6, reveals China ranked first as document producer, followed by the US;
Fig. 7. Repartition of publication by year
– Figure 7, shows an upward trend in terms of the number of articles published per year since 2006;
510
M. Ayad et al.
Fig. 8. Repartition of publication by author
– According to Fig. 8, Sotiris Kotsiantis is the author who has contributed the most in this area of research.
5
Impact of Machine Learning in Improvement of the Quality of Accounting Information for Decision Making
In this section we will provide answers to the third research question, which is: How does Machine Learning impact improving the quality of Accounting Information? To answer this question, we analyzed empirical evidence from different case studies conducted on the impact of Machine Learning algorithms on financial statements, the Main source of Accounting Information. We selected from 114 publications the most relevant case studies that have a direct link with RQ3. Table 1 shows a summary of the empirical evidence. 5.1
Impact on the Improvement of Accounting Information Reliability
Reliability denotes that Accounting Information is reasonably free from error and bias, and faithfully represents what it purports to represent [10], after full text screening of cases studies that we synthesized in Table 1, ML algorithms have been used in the detection of financial statement fraud, several studies have used either financial variables [16–18] such as profitability, activity, asset structure, liquidity, and cash flow, or combined non-financial variables related to the corporate governance structure and financial variables [19], or combined financial variables with linguistic variables [20], these variables are derived from financial statements. These Studies [16,19,20], applied afterward classification methods to detect financial fraud, this detection enables financial statements to provide a faithful
Impact of Machine Learning on the Improvement
511
Table 1. Case studies on the application of ML in financial statements Study (cited)
Data
Used ML algorithms
Findings
Hajek and Henriques (2017)
622 annual reports
Fourteen Machine Learning techniques
The Bayesian belief networks (BBN) performs better than other Machine Learning methods
Hamal et financial al. (2021) statements of 341 Turkish companies
Machine Learning classification (support vector machine, Naive Bayes, artificial neural network, K-nearest neighbor, random forest..)
Random Forest and Bagging classifiers, have better performance than other classifiers in Detecting fraudulent financial statements
Yao et al. 120 fraudulent (2018) financial statements
feature selection and Machine This study used feature selection Learning classification and Machine Learning classification. to propose an optimized financial Fraud detection model and indicated that random forest have huge performance in the detection
Huang et The annual al. (2008) financial statement of Taiwan’s companies
Artificial neural networks
Song et Financial data from four classifiers (logistic al. (2014) Chinese companies regression, back-propagation neural network, decision tree and support vector machine)
Using ML algorithm namely neural combining with a hybrid financial analysis model for Business failure prediction, this kind of model is capable of providing a very high prediction accuracy rate of business failure Result show that the four classifiers combined to financial and non-financial risk factors, are a competitive tool for risk assessment of financial statement fraud
Kirkos et 76 Greek al. (2007) manufacturing firms
Decision Trees, Neural the authors agree with prior Networks and Bayesian Belief research about the usefulness ML Networks algorithms In the identification of fraudulent financial statements. In terms of performance Bayesian Belief Network outperforms the other two models and achieves outstanding classification accuracy
Kotsiantis 164 Greek firms et al. (2006)
decision trees. Artificial Neural Networks, Bayesian networks
The authors confirm the usefulness and compare the performance of Machine Learning techniques in detecting fraudulent financial statements by using published financial data. In terms of performance, the decision tree learner achieved the best performance
512
M. Ayad et al.
reflection to the stakeholders about the economic and financial situation of the Enterprise, Therefore,impacts the financial statements’ reliability. Moreover, the identification of fraudulent financial statements using ML algorithms ensures the trustworthiness of the financial statements which refers to complete financial statements that are free from errors. Furthermore, Applying classification models to evaluate the risk of fraud in a financial statement [21] support the companies to avoid or minimize fraud in the financial statements and thus improve their reliability. Song et al. [21] demonstrate that classification algorithm (logistic regression, back-propagation neural network, decision tree, and support vector machine) can help evaluate the risk of financial statement fraud, and support stakeholders to reduce financial risk, and, therefore, make better decisions. Song et al. [21] viewed detection of fraud in financial statements as a typical classification problem, Machine Learning methods are used to detect fraud in financial statements by capturing the changes and relationships between groups of account balances. 5.2
Impact on the Improvement of Accounting Information Relevance
One of the Accounting Information characteristics is “relevance” which means to possess a predictive value, in other terms, to form expectations about the outcomes of past, present, and future events [10] the prediction of business failure is an important topic for stakeholders, many models have therefore been developed to predict it [22] proposed and tested a hybrid financial analysis model composed of static and trend analysis models using neural networks producing a good performance of prediction accuracy. Therefore, using a Neural Network as a Machine Learning algorithm in modeling failure prediction of business supports the ability of Accounting Information to predict business failure and ultimately be more relevant in decision making.
6
Conclusion
Accounting Information is critical in the decision-making process, For this reason, enhancing its quality is a top priority for both company managers and external stakeholders. In this paper, we explore and analyze the impact of Machine Learning algorithms on improving the quality of Accounting Information through two analyses, the first one is a bibliometric analysis highlighting the occurrence and co-occurrence of three variables, author keywords, index keywords, title, and abstract terms, it allows us to identify a link between ML algorithms mostly, those related to classification models, in detecting fraud in the financial statement, and in The prediction of bankruptcy. The second analysis was conducted on cases studies, to reveal empirical evidence, enabling us to confirm the link identified in the bibliometric analysis,
Impact of Machine Learning on the Improvement
513
indeed, Machine Learning algorithms, mainly those related to the classification models are more suitable and have a significant performance in detecting fraudulent financial statements by using financial ratios broadly used as indicators of fraud risk. This detection improves the quality of Accounting Information by allowing it to present a faithful representation of the financial situation of the company, and thus, improve its reliability, and secondly, empirical evidence indicates that models combining ANN have high performance in the prediction of bankruptcy, this, helps the Accounting Information to have a predictive value and makes it more relevant in decision making. Based on this, we can conclude that Machine Learning algorithms have a significant impact on the reliability of financial statements, the primary source of Accounting Information, as well as the relevance and predictive value of future events that may influence stakeholders’ decisions.
References 1. Hou, X.: Design and application of intelligent financial accounting model based on knowledge graph. Mobile Inf. Syst. 2022 (2022) 2. Zhang, Y., Xiong, F., Xie, Y., Fan, X., Gu, H.: The impact of artificial intelligence and blockchain on the accounting profession. IEEE Access 8, 110461–110477 (2020) 3. Shi, Y.: The impact of artificial intelligence on the accounting industry. In: Xu, Z., Choo, K.K., Dehghantanha, A., Parizi, R., Hammoudeh, M. (eds.) Cyber Security Intelligence and Analytics. CSIA 2019. Advances in Intelligent Systems and Computing, vol. 928, pp. 971–978. Springer, Cham (2020). 10.1007/978-3-030-152352 129 4. Collier, P.M.: Accounting for Managers: Interpreting Accounting Information for Decision Making. John Wiley & Sons, Chichester (2015) 5. Hope, O.K., Thomas, W.B., Vyas, D.: Stakeholder demand for accounting quality and economic usefulness of accounting in us private firms. J. Account. Public Policy 36(1), 1–13 (2017) 6. Lambert, R., Leuz, C., Verrecchia, R.E.: Accounting information, disclosure, and the cost of capital. J. Account. Res. 45(2), 385–420 (2007) 7. Neogy, D.: Evaluation of efficiency of accounting information systems: a study on mobile telecommunication companies in Bangladesh. Global Disclosure Econ. Bus. 3(1) (2014) 8. Buljubaˇsi´c, E., Ilg¨ un, E.: Impact of accounting information systems on decision making case of Bosnia and Herzegovina. Europ. Research. Series A (7), 460–469 (2015) 9. Munteanu, V., Zuca, M., Tinta, A.: The financial accounting information system central base in the managerial activity of an organization. J. Inf. Syst. Oper. Manag. 5(1), 63–74 (2011) 10. Board, F.A.S.: Scope and implications of the conceptual framework project. Financial Accounting Standards Board (1976) 11. Djongou´e, G.: Qualit´e per¸cue de l’information comptable et d´ecisions des parties prenantes. Ph.D thesis, Bordeaux (2015) ¨ 12. Ba¸stanlar, Y., Ozuysal, M.: Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis, pp. 105–128 (2014)
514
M. Ayad et al.
13. Abdi, M.D., Dobamo, H.A., Bayu, K.B.: Exploring current opportunity and threats of artificial intelligence on small and medium enterprises accounting function; evidence from south west part of ethiopia, oromiya, jimma and snnpr, bonga. Acad. Acc. Financ. Stud. J. 25(2), 1–11 (2021) 14. Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., Lim, W.M.: How to conduct a bibliometric analysis: an overview and guidelines. J. Bus. Res. 133, 285–296 (2021) 15. Van Eck, N., Waltman, L.: Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2), 523–538 (2010) ¨ Comparing performances and effectiveness of machine learn16. Hamal, S., Senvar, O.: ing classifiers in detecting financial accounting fraud for turkish smes. Int. J. Comput. Intell. Syst. 14(1), 769–782 (2021) 17. Kotsiantis, S., Koumanakos, E., Tzelepis, D., Tampakas, V.: Predicting fraudulent financial statements with machine learning techniques. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds.) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science, vol. 3955, pp. 538–542. Springer, Heidelberg (2006). 10.1007/11752912 63 18. Kirkos, E., Spathis, C., Manolopoulos, Y.: Data mining techniques for the detection of fraudulent financial statements. Expert Syst. Appl. 32(4), 995–1003 (2007) 19. Yao, J., Zhang, J., Wang, L.: A financial statement fraud detection model based on hybrid data mining methods. In: International Conference on Artificial Intelligence and Big Data (ICAIBD), vol. 2018, 57–61. IEEE (2018) 20. Hajek, P., Henriques, R.: Mining corporate annual reports for intelligent detection of financial statement fraud-a comparative study of machine learning methods. Knowl.-Based Syst. 128, 139–152 (2017) 21. Song, X.P., Hu, Z.H., Du, J.G., Sheng, Z.H.: Application of machine learning methods to risk assessment of financial statement fraud: evidence from China. J. Forecast. 33(8), 611–626 (2014) 22. Huang, S.M., Tsai, C.F., Yen, D.C., Cheng, Y.L.: A hybrid financial analysis model for business failure prediction. Expert Syst. Appl. 35(3), 1034–1040 (2008)
NLP Methods’ Information Extraction for Textual Data: An Analytical Study Bouchaib Benkassioui1(B) , Nassim Kharmoum2 , Moulay Youssef Hadi1 , and Mostafa Ezziyyani3 1 LARI Laboratory, Ibn Tofail University, Kenitra, Morocco
[email protected]
2 National Center for Scientific and Technical Research (CNRST), Rabat, Morocco 3 Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tetouan,
Morocco
Abstract. Information Extraction (IE) is the process of automatically extracting pertinent information from unstructured or semi-structured data, and it typically involves the analysis of human language text through natural language processing (NLP). Rules-based methods (RBM), Supervised-learning-based methods, and Unsupervised-Learning-based methods are the three basic methods used by the IE system. This work aims to explore, analyze the various approaches, and illustrate the difficulties encountered while using textual data in different forms, domains, and sizes of datasets from a preexisting information extraction using various categories of IE methods. This study presents an analytical study of various approaches to different information extraction methods used to analysis of textual data. Keywords: Information Extraction · Natural Language Processing · Textual Data · Rule-Based Methods · Learning-Based Methods
1 Introduction Recent technology advancements and the information explosion have raised the need for processing and analyzing massive amounts of unstructured data in a variety of forms [1]. Support for this issue is offered through information extraction (IE). IE refers to the Natural Language Processing (NLP) task of automatically extracting meaningful information in various fields from unstructured or semi-structured data and storing it in organized formats to make this data accessible for study or decision-making [2]. Researchers have suggested a number of ways and methods to extract information from data using the IE system. Rules-based methods (RBM), supervised learning-based methods, and unsupervised learning-based methods can be used to classify these methods [2].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 515–527, 2023. https://doi.org/10.1007/978-3-031-26384-2_44
516
B. Benkassioui et al.
First, based on the type of the input dataset, diverse rules are created in rule-based procedures, such as pattern matching, parsing, regular expressions, and syntactic simplification. The textual information or content of documents is then extracted using these principles [3]. Second, the supervised-learning-based method consists of three parts: a machinelearning approach, a deep-learning approach, and a language-based approach [3]. Machine learning-based information extraction techniques use machine-learning models to automatically extract the essential syntactical and semantic features of the text instead of manual annotation or creating rules [4]. Since deep learning can process enormous amounts of data and produce more useful results, it is incredibly scalable. It has a substantial amount of extra parameters as compared to traditional machine learning models [4]. Third, language model-based techniques have demonstrated enhanced effectiveness in many NLP tasks, since these language models consider context information when describing the characteristics [5]. In addition, we can also consider unsupervised learning as an ML technique that does not require users to supervise the model in order for it to operate and learn on its own. As a result, the model can now recognize interesting patterns that were previously ignored [3]. Another approach, known as hybrid approaches, combines different methodologies, like rule-based approaches and machine-learning approaches. To extract relationships, it additionally applies a few rules [6]. Numerous studies have been done in this area to learn how to automate obtaining the necessary information from various forms of data using NLP’s information extraction methods. We highlighted the textual data in this study. Therefore, our contribution consists of investigating and evaluating the many IE methods and strategies that solve the difficulties and issues associated with complex text data. In this study, we concentrate on the use of NLP to extract relevant information from the various IE approaches and provide a thorough literature review of the state of the art. In this paper, we give a thorough evaluation of the state of the art in the literature with a focus on the application of NLP, machine learning, and deep learning techniques to extract meaningful information utilizing the various IE methodologies. The contributions of this paper are summarized as follows: the second section details the Information Extraction methods. The third section presents the various approaches studied for Information Extraction for textual data. Analysis and discussion of all the results obtained from the study of different approaches will be the subject of the fourth section. Finally, the conclusion of our study and specifies our future work are presented in the fifth section.
2 Information Extraction for Textual Data The method’s two key subcategories, rule-based methods (RBM) and learning-based methods (LBM), can be used to classify information extraction systems [2]. As well as being divided into supervised, semi-supervised, and unsupervised methods, learningbased approaches can be categorized [2].
NLP Methods’ Information Extraction for Textual Data: An Analytical Study
517
Semi-supervised learning can also be incorporated into supervised learning algorithms. Three subclasses of supervised learning-based techniques exist traditional machine learning, deep learning-based techniques, and language-based techniques [3]. The next subsection discusses the various feature representation and learning methods. A graphical presentation of different information methods for textual data is shown in Fig. 1.
Information Extraction Methods
RBM
LBM
Supervised LBM
Unsupervised LBM
ML Methods
DL Methods
Language-Based Methods Fig. 1. The representation of different information extraction methods
2.1 Rule-Based Methods Although the rule-based methods rely on the explicit rules developed by experts in a particular field, they frequently show better processing performance in terms of recall and precision. This method mainly relies on the domain, which means that IE performance is highly dependent on a rule specified for the extraction of a certain item. This guideline frequently places an undue emphasis on specialist expertise [7]. Even though the rule-based method has not been widely used lately to extract knowledge from texts, it has frequently been successful in the past. These methods are not very common these days due to their performance, as they are frequently not very computationally efficient compared to machine learning models about extracting data by user-defined rules and patterns [3].
518
B. Benkassioui et al.
2.2 Learning Based Methods Supervised Learning Based Methods The Supervised learning systems approach depends on a training set of labeled examples from a particular topic. By utilizing machine learning rules and techniques to extract the required information, such systems automatically learn extractors for relations. The fundamental issue with these approaches is how time and effort-intensive it can be to create a corpus that is appropriately tagged. On the other hand, if there are training data, these systems can be simply converted to a different domain [6]. Machine Learning-Based Methods For automatic information extraction, the machine learning-based method makes use of the ML algorithm (e.g., Support Vector Machine (SVM), Naive Bayes (NB), and k-nearest neighbors (kNN)). A large amount of training data must be collected for it to understand the rules. How exact the results depend on the training text, a finite resource that is a part of shallow learning techniques with inadequate modeling and representational capacities. As constrained resources belong to the superficial learning approaches with subpar modeling and representational abilities, the accuracy of the results depends on the training data [7]. In general terms, machine-learning-based methods have a good performance than linguistic-based and pattern-based. The benefits of these methods, they are easily adaptable to new tasks, domains, or datasets. But for training and testing, machine learningbased methods significantly rely on annotated corpora. Corpus annotation is an expensive task that often requires a lot of time and effort [8]. Deep Learning Based Methods Deep learning can be used to learn complex functions that are concatenated from a set of simpler ones using a nonlinear combination of parameters deduced from training data. As a result, deep learning can describe more semantics in texts while modeling the text data using machine learning’s distributed representation theory. Deep learning is currently being used more and more in many NLP applications, including machine translation, question answering, language modeling, text classification, named entity recognition, and relation classification [9]. For a variety of tasks, deep learning methods including Convolutional Neural Networks (CNN), Dense Neural Networks (DNN), Recurrent Neural Networks (RNN), Deep Reinforcement Learning (DRL), autoencoders, etc. are used for extracting knowledge from unstructured text input for a range of purposes [3]. Language Model-Based Methods These language models take into consideration contextual information to describe the features, language model-based techniques based on supervised learning models have demonstrated enhanced performance in various NLP tasks. To conduct the classification model of relations, a classifier is then built on top of the language model output. ULMFit (Universal Language Model Fine-tuning), ELMO (Embeddings from Language Models), and BERT (Bidirectional Encoder Representations from Transformers) are the most popularly used in NLP tasks [3, 5].
NLP Methods’ Information Extraction for Textual Data: An Analytical Study
519
Language models, like BERT, have typically outperformed other ML and deep learning algorithms for IE because of their ability to learn from the context. The pre-trained BERT model performs well in the methods that we have studied [4, 10, 11]. Unsupervised Learning Based Methods Unsupervised learning-based methods initially learn general extraction rules for a given domain, then automatically generate connection extraction rules. Unsupervised approaches automatically generate relation-specific extraction rules based on a set of generic patterns before learning domain-specific extraction rules. Successively repeating the process step by step Another name for it is the self-supervised learning technique [6]. It differs significantly from supervised learning methods since it employs corpus bootstrapping techniques that rely on little seed rules discovered through an annotated system [12].
3 Information Extraction Approaches Our goal in this study is to get a solid understanding of the most recent automatic information extraction methodologies and approaches for unstructured textual data, as well as to offer recommendations for future research. Table 1 provides a summary of this work along with explanations of the dataset, IE methods, and techniques employed. Table 2 provides information on the outcome model, performance, and limitations attained. Recent studies have been conducted in this field of study. According to information extraction techniques, there are three categories of the current study approaches: Rulebased, deep learning, machine learning, and hybrid approaches: 3.1 Rule-Based Approaches In this field of research, various studies have used this approach to extract information from text data. The study [13] (Gorjan, P., et al.) investigates an information extraction of entities related to food concepts for extracting different biomedical entities and finding relations between them using a rule-based named-entity recognition method called FoodIE. They evaluate and compare with two datasets, the first is 200 recipes were taken from two separate user-basic sites, Allrecipes and MyRecipes, and the second is 1000 recipes obtained from Allrecipes. The study [14] (Gregor, W., et al.) studied a multilingual IE pipeline of an open-source software New/s/leak 2.0, to automatically analyze very large quantities of unstructured textual data for investigative journalism, based on the automatic extraction of entities and their co-occurrence in documents, using a rule-based approach with Named entity recognition (NER). The authors use 27000 documents and Wikipedia articles related to the topic of World War II in four languages (English, Spanish, Hungarian, and German). The survey [15] (Gorjan, P., et al.) created rule-based food Named-Entity Recognition (NER) algorithms to automatically extract food information from textual data. Four approaches were discussed: FoodIE, NCBO, NCBO (OntoFood), and NCBO (FoodON). They concluded from their comparison that FoodIE delivers reliable results. The model
520
B. Benkassioui et al.
was trained using the FoodBase Corpus and was able to recognize things from dietary suggestions. They employ the FoodBase Corpus, a collection of 1000 recipes from the popular recipe-sharing social network Allrecipes that have been annotated with food concepts. 3.2 Machine Learning Approaches The study [16] (Imam, A.T., et al.) employed this technique by utilizing machinelearning techniques. SyAcUcNER (System Actor Use-Case Named Entity Recognizer) is a named entity recognition technique that looks to automate the extraction of Software Requirements Specification (SRS) Elements from unstructured English user requirements descriptions. SyAcUcNER uses a Support Vector Machine (SVM) machine learning classifier to extract the system, actor, and use case entities. 66 English language statements were included in the data set used to train the SVM, which was compiled from a variety of books, articles, and software. 3.3 Deep Learning Approaches In the study [4] (Phillip, R., et al.), a collection of 12 cardiovascular themes was automatically extracted from German discharge letters using an automated approach of information extraction for unstructured clinical data from the cardiology domain. BERT, a deep learning architecture based on transformers, is used in the project for language representation. They employ a corpus, which is an MS doc binary file containing around 200000 German discharge letters. Employing state-of-the-art deep learning models, the study [10] (K. Sugimoto et al.) investigated an information model for internal radiology reports using BiLSTM-CRF, BERT, and BERT-CRF in their experiments to extract clinical keywords from radiology reports in free-text format. Two deep learning models were compared and evaluated the use of other institutional chest computed tomography (CT) reports investigating the generalizability of the model. They use two datasets, the In-house dataset with 118078 chest CT reports that they stored in the radiology information system, and the External dataset with 77 chest CT reports. The study [17] (John, X., Qiu, et al.) investigate the IE system for pathology reports using semi-supervised deep learning utilizing an attention-based convolutional autoencoder. He performed a set of experiments comparing supervised training augmented with unlabeled data at 1%, 5%, 10%, and 50% of the original data size. They use a Corpus containing 374,899 cancer pathology reports from the Louisiana SEER cancer registry. The investigation [11] (Friedrich, A., et al.) develops a new information extraction use case from the materials science domain and suggests a number of new challenging information extraction tasks. They create a system for annotating schemes for marking information on studies involving solid oxide fuel cells in academic articles, such as the materials used and the measurement conditions. Using recurrent neural networks to handle growing task complexity and neural network-based models with BERT embeddings to achieve significant performance gains. They compare the neural network models with traditional classification models, such as the logistic regression classifier and the support
NLP Methods’ Information Extraction for Textual Data: An Analytical Study
521
vector machine (SVM). Solid oxide fuel cells SOFC-Exp corpus, which consists of 45 open-access academic publications annotated by subject specialists, is the dataset used. 3.4 Hybrid Approaches The study [18] (Agnieszka, W., et al.) discusses the difficulties using of unstructured text files for the recruitment process. For the automated information extraction from Polish resume documents in the IT recruiting process, the authors suggest a hybrid solution based on NER combining findings from overlapping NER tools modules (Liner2, NERF, and Babelfy), NER on CRF (Conditional Random Field) algorithm, dictionary methods, and regular expressions. The suggested model implements a multi-module system for complicated linguistic connections in Polish and low-resource language dictionaries. The prototype used in this study does not take into account outliers in later stages of analysis, and named entity recognition (NER) is still a challenge, particularly for languages with limited resources. These are the study’s main shortcomings. According to the study [19] (Rishabh, R., et al.), ProfileGen is a technique for extracting information from the web that may be used to build a profile of a particular person. The system uses machine learning techniques based on a recurrent neural network (RNN) with the gated recurrent unit (GRU) to arrange the phrases into a coherent biographical narrative. After acquiring information, the initial stage is to rank items according to rules based on named entity recognition (NER) and coreference resolution. 1185 biographical paragraphs from Wikipedia were included in the data sets. Table 1. Comparison of studied approach Methods with using techniques. Studied approach
Dataset
Methods
Phillip, R., et al. [4]
About 200000 German Deep-learning discharge letters from approaches the cardiology field were included in the corpus in binary MS doc format
NER BERT ML-CRF RNN-LSTM
Kento, S., et al. [10]
- In-house dataset 118,078 chest CT reports - External datasets: 77 chest CT reports
BiLSTM-CRF, BERT, BERT-CRF
Annemarie, F., et al. [11]
45 open-access scientific Deep learning articles related to SOFC approach
Deep learning approach
Techniques
BiLSTM-CRF, BERT embedding, BERT-CRF, RNN (continued)
522
B. Benkassioui et al. Table 1. (continued)
Studied approach
Dataset
Methods
Techniques
Gorjan, P., et al. [13]
- 200 recipes from Allrecipes and MyRecipes - 1000 recipes from Allrecipes (Groves, 2013)
Rule-based approach
NER
Gregor, W., et al. [14]
27000 documents from Wikipedia about Word War II
Rule-based approach
NLP tools
Popovski, G., et al. [15]
FoodBase Corpus: 1000 recipes were taken from Allrecipes
Rule-based approach
NER
Ayad Tareq, I., el al. [16]
66 English language statements (books, literature)
Machine-learning approach
SVM NER
Qiu, J., et al. [17]
Taken from the Deep learning Louisiana SEER cancer approaches registry’s 374899 cancer pathology reports
CNN attention-based auto-encoder
Agnieszka, W., et al. [18]
CV data in binary PDF files from an IT recruiting company
Hybrid approach
NER
Rishabh, R., et al. [19]
Wikipedia biographies: 1185 biographical texts
Hybrid approach
RNN-GRU; Word embedding; LexRank; NER; coreference resolution;
4 Analysis and Discussion In this section, we analyze and discuss the different approaches studied above. After the analysis, we can classify the above work into four categories. The first is about the size of the dataset, the second category is based on the IE methods used in Table 1, and the third is the performance achieved, and in the end, each study’s limitations are shown in Table 2. According to the analytical study presented in Tables 1 and 2, the four major types of studied approaches to information extraction are rule-based approaches, machinelearning approaches, deep learning approaches, and hybrid approaches. Each of these methods has advantages and limitations. No approach could be categorized as broad and applicable in every scenario. In this work, we see NER as a method of obtaining a structured form that is employed with all RBM [13–15], in hybrid approaches [18, 19], and using with deep learning
NLP Methods’ Information Extraction for Textual Data: An Analytical Study
523
approach in [4]. It has the best performances usually corpus-based NER methods [13]. The issue of finding and categorizing predetermined notions is addressed [13]. The EntoFood ontology does not adequately cover the food domain, according to NCBO, the NER method with the weakest evaluation metrics. [15]. BERT is a deep learning architecture for language representation based on transformers [4] and is used in most deep learning approaches with good performance [4, 10, 11]. The performance of a rule-based method was better [13–15], whereas the machine learning-based approach [16] and deep learning-based approach [4, 11, 17] required more work. Deep learning and machine learning algorithms function well with limited data sets. The minimal dataset size required for machine learning models is not stated explicitly. It depends on how complicated the suggested model is. The first test using the developed model allowed for a rough estimation of the quantity of the necessary dataset. The comparative study addresses the advantages and limits of each technique, pointing out that IE is a community-based process and that the needs of the user, the nature of the work, and the complexity of the dataset all have an impact on the approach that is selected for each task. Despite its inherent limitations and difficulties, deep learning approaches generate amazing results for huge datasets. It can generalize learning, and it has the distinctive ability to use unlabeled input during training. Deep learning includes several hidden layers, which allows it to learn many features. For pattern recognition, these methods work better. Table 2. Comparison of results studied approaches. Studied approch
Models
Results
Limitations
Phillip, R., et al. [4]
An automated approach of IE using German cardiology data
Micro-average F1-score: 86%
Not found research on idea extraction from German discharge letters from the cardiovascular domain using pre-training and fine-tuning language models
Kento, S., et al. [10]
Clinical terms could be extracted from radiological data using an information model
- In-house dataset: Micro F1-scores: 95.36% - Dataset from another institution: Micro F1-scores: 94.62%
- For assessing generalizability, just one institutional dataset was utilized - Radiology reports for other body areas have not been examined (continued)
524
B. Benkassioui et al. Table 2. (continued)
Studied approch
Models
Results
Limitations
Annemarie, F., et al. [11]
An annotation scheme
-Experiment: Precision: 81.1% Recall: 75.6% F1: 78.3% -No-Experiment: Precision: 96.6% Recall: 97.5% F1: 97.1%
- Complexity of the task - Some categories remain challenging
Gorjan, P., et al. [13]
FoodIE: system for food information extraction
- Dataset with 200: Precision: 97% Recall: 94% F1 score: 96% - Dataset with 1000: Precision: 97% Recall: 94% F1 score: 96%
- Few NLP tools are available for information extraction of food entities - Insufficient annotated corpora for training corpus based NER methods
Gregor, W., et al. [14] IE pipeline of New/s/leak 2.0
Clean and preprocess large datasets
Popovski, G., et al. [15]
FoodIE - rule-based food namedentity recognition system
- FoodIE: F1 Scores 96.05%, precision 97.8%, recall 94.37% - SNOMED CT: F1 Scores 63.75%, precision 91,53%, recall 48.91% - OntoFood: F1 Scores 32.62%, precision 85.48%, recall 20.16% - FoodON: F1 Scores 63.90%, precision 79.22%, recall 53.54%
- The food domain is not fully covered by the Onto-Food ontology
Ayad Tareq, I., el al. [16]
SyAcUcNER (System Actor Use-Case NER)
weighted averages: -precision: 76.2% - Recall: 76% - F-measure: 72.1%
Unstructured style of the written SRS
Qiu, J., et al. [17]
Semi-Supervised Information Extraction
Improve MicroF scores at the smaller 1% and 5%
The amount of the vocabulary had little bearing on classification success (continued)
NLP Methods’ Information Extraction for Textual Data: An Analytical Study
525
Table 2. (continued) Studied approch
Models
Results
Limitations
Agnieszka, W., et al. [18]
CV parsing system
Compared to using separate common tools, keyword recognition increased by 60% to 160%
- In further steps of the analysis, the prototype excludes the outliers - NER is still a challenge, particularly for languages with limited resources
Rishabh, R., et al. [19]
ProfileGen, IE system that
Weighted avg: Precision: 50% Recall: 49% F1-Score: 49%
Absence of applications talking about the profile of the person such as social media data, biographical overview, and biographical corpus
5 Conclusion and Future Work According to the analytical study, we propose a rule-based method for small datasets and a learning-based method for large datasets for extracting information from unstructured textual data. In addition to the methods and algorithms utilized with each strategy, the comparative study looks at the many benefits and limits of four methods: rule-based methods, machine-learning methods, deep-learning methods, and hybrid methods. The user’s needs, the dataset, and the task all have a significant role in the strategy choice for each activity. In future work, we will implement our own IE system. Benefit from the advantages of these studies, as well as enhance the disadvantages. Abbreviations In this document, the following abbreviations are used: IE NLP RBM LBM ML DL SVM NB KNN CNN DNN
Information Extraction Natural Language Processing Rules-based methods Learning-Based Methods Machine Learning Deep Learning Support Vector Machine Naive Bayes K-nearest Neighbors Convolutional Neural Networks Dense Neural Networks
526
B. Benkassioui et al.
RNN DRL ULMFit ELMO BERT NER SyAcUcNER SRC BiLSTM LSTM CT SOFC CRF GRU
Recurrent Neural Networks Deep Reinforcement Learning Universal Language Model Fine-tuning Embeddings from Language Models Bidirectional Encoder Representations from Transformers Named entity recognition System Actor Use-Case Named Entity Recognizer Software Requirements Specification Bidirectional long-short term memory Long Short Term Memory Computed Tomography Solid oxide fuel cells Conditional Random Field Gated Recurrent Unit
References 1. Okurowski, M.E.: Information extraction overview. National Computer Security Center Fort George G Meade Md (1993) 2. Adnan, K., Akbar, R.: An analytical study of information extraction from unstructured and multidimensional big data. J. Big Data 6(1), 1–38 (2019). https://doi.org/10.1186/s40537019-0254-8 3. Bose, P., Srinivasan, S., Sleeman, W.C., IV., Palta, J., Kapoor, R., Ghosh, P.: A survey on recent named entity recognition and relationship extraction techniques on clinical texts. Appl. Sci. 11(18), 8319 (2021) 4. Richter-Pechanski, P., Geis, N.A., Kiriakou, C., Schwab, D.M., Dieterich, C.: Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models. Dig. Health 7, 20552076211057664 (2021) 5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 6. Devisree, V., Raj, P.R.: A hybrid approach to relationship extraction from stories. Procedia Technol. 24, 1499–1506 (2016) 7. Zhong, B., Wu, H., Li, H., Sepasgozar, S., Luo, H., He, L.: A scientometric analysis and critical review of construction related ontology research. Autom. Constr. 101, 17–31 (2019) 8. Segura-Bedmar, I., Martínez, P., de Pablo-Sánchez, C.: A linguistic rule-based approach to extract drug-drug interactions from pharmacological documents. In BMC bioinformatics (Vol. 12, No. 2, pp. 1–11). BioMed Central (December 2011) 9. Zhong, B., et al.: Deep learning-based extraction of construction procedural constraints from construction regulations. Adv. Eng. Inform. 43, 101003 (2020) 10. Sugimoto, K., et al.: Extracting clinical terms from radiology reports with deep learning. J. Biomed. Inform. 116, 103729 (2021) 11. Friedrich, A., et al.: The SOFC-exp corpus and neural approaches to information extraction in the materials science domain. arXiv preprint arXiv:2006.03039 (2020) 12. Abdelmagid, M., Ahmed, A., Himmat, M.: Information Extraction methods and extraction techniques in the chemical document’s contents: Survey. ARPN J. Eng. Appl. Sci. 10(3), 1068–1073 (2015)
NLP Methods’ Information Extraction for Textual Data: An Analytical Study
527
13. Popovski, G., Kochev, S., Korousic-Seljak, B., Eftimov, T.:. FoodIE: a rule-based namedentity recognition method for food information extraction. In: ICPRAM, pp. 915–922, February 2019 14. Wiedemann, G., Yimam, S.M., Biemann, C.: A multilingual information extraction pipeline for investigative journalism. arXiv preprint arXiv:1809.00221 (2018) 15. Popovski, G., Seljak, B.K., Eftimov, T.: A survey of named-entity recognition methods for food information extraction. IEEE Access 8, 31586–31594 (2020) 16. Imam, A.T., Alhroob, A., Alzyadat, W.: SVM machine learning classifier to automate the extraction of SRS elements. Int. J. Adv. Comput. Sci. Appl. (IJACSA) (2021) 17. Qiu, J.X., et al.: Semi-supervised information extraction for cancer pathology reports. In: 2019 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–4. IEEE, May 2019 18. Wosiak, A.: Automated extraction of information from Polish resume documents in the IT recruitment process. Procedia Comput. Sci. 192, 2432–2439 (2021) 19. Ranjan, R., Vathsala, H., Koolagudi, S.G.: Profile generation from web sources: an information extraction system. Soc. Netw. Anal. Min. 12(1), 1–12 (2021). https://doi.org/10.1007/s13278021-00827-y
Handwriting Recognition in Historical Manuscripts Using a Deep Learning Approach Hassan El Bahi(B) Laboratory of Computer and Systems Engineering L2IS, Cadi Ayyad University, Marrakech B.P. 511, 40000, Morocco [email protected] Abstract. The ancient manuscripts are considered one of the most important and valuable treasures of the cultural heritage of nations. A great effort has been made in recent decades to digitize and make the contents of these manuscripts available to everyone, but only a very small percentage of these manuscripts are accessible to the public. In this context, we propose in this paper a system for the recognition of handwritten text in ancient manuscripts. The system begins with a preprocessing step which aims to improve the quality and eliminate degradations in the input images. Next, a convolutional neural network (CNN) model is adapted to perform the step of extracting the most relevant features from each text word image. Next, the classification phase is carried out using the recurrent neuron network RNN with long short-term memory (LSTM) block and connectionist temporal classification (CTC) layer, which predicts the sequence of characters that represents the text included in each input word image. The experimentation of the proposed system is performed on the ESPOSALLES dataset, and the performance results show promising recognition rates. Keywords: Ancient manuscripts · Handwritten text recognition Convolutional neural network · Long short-term memory
1
·
Introduction
Ancient manuscripts are considered one of the most important and priceless treasures of cultural heritage. In recent years, with the era of digitization, a great effort has been made to put the content of these manuscripts available to the public, but only a very small percentage of these manuscripts are available to the public. The automatic processing and recognition of the texts of these manuscripts are essential tasks for archiving and making the contents of these ancient manuscripts available to people around the world. However, these two tasks are very difficult because of several problems such as: writing in cursive, faded ink, noise, presence of ink stains, complex background, word size variability, degraded quality, etc. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 528–535, 2023. https://doi.org/10.1007/978-3-031-26384-2_45
Handwriting Recognition in Historical Manuscripts
529
According to the literature, several works have been proposed to ensure text recognition in images of old manuscripts or modern documents. Generally, in both cases of text images, two steps are the most important in the recognition process: the feature extraction step and the classification step. The techniques used for feature extraction can be divided into two categories, the first is based on Handcrafted features such as the use of histogram of oriented gradients (HOG) [1] or invariant feature transform (SIFT) [2]. The second category of techniques is based on deep learning approaches like convolutional neural network (CNN). The CNN model is adapted in several text recognition works to ensure the automatic extraction of features such as: English handwritten text images [3], French handwritten text images [4], etc. The next step in text recognition systems is classification which takes as input the extracted features and predicts as output the sequence of characters contained in the text image. Classical classification approaches consist in using classifiers such as: support vector machine [5] or multilayer perceptron (MLP) [6]. In recent years, with the improvement in computing time of computers and the availability of a huge quantity of data the most used classifier that obtains the best recognition performance is recurrent neural network (RNN). This classifier is used in several text recognition works such as: recognition of Arabic text [7], recognition of text in documents acquired by smartphone [8], recognition of historical Greek manuscripts [9], etc. To overcome all the problems already mentioned, we propose in this work a system for recognizing handwritten text in old images. The system starts with a pre-processing step, in the next step a convolutional neural network model is used to extract features from each image, finally a recurrent neural network is used to classify the image and gives as output the text contained in the image. The experimentation of the proposed system is performed on the ESPOSALLES dataset [10], and the performance results show promising recognition rates. The rest of this article is organized as follows. In Sect. 2, we detail each step of the proposed system. Then, in Sect. 3 we present the results of experiments performed on the ESPOSALLES dataset. Finally, we conclude this work and discuss some future work in Sect. 4.
2
Proposed System
This part is dedicated to the presentation of the different stages of the system proposed to ensure the recognition of handwritten text from old documents. Fig. 2 shows the architecture of the proposed system. In the following, each step of this architecture will be detailed (Fig. 1). 2.1
Pre-processing
This step is very important given the poor and degraded quality of historical images written several centuries ago. To reduce this problem, we have used a first operation which will allow the binarization of the input image and convert it from a color image to a binary image with two colors. This preserves the most important information in the image which is the text which will have a white
530
H. El Bahi
Fig. 1. Architecture of the proposed handwritten text recognition system.
color and the background which will have a black color. The second operation, which is no less important than the first, is normalization. This operation aims to unify the size of all word images with a fixed size, which will improve recognition performance by avoiding problems with image size variability. In this work, we normalized the height of each word image to 80 pixels. The width of the image is chosen automatically according to the height respecting the aspect ratio to avoid deforming the image. To unify the width, we add a bar of black pixels to the right side of the image. As a result, all images after this step will have a fixed size of 80×460 (see Fig. 2).
Fig. 2. Pre-processing step. (a): Input image. (b): Binarization. (c): Normalization.
2.2
Feature Extraction
Feature extraction is a crucial step because choosing the most relevant image features will facilitate the recognition step. In recent years, the most used technique in almost all recognition systems is the use of convolutional neural networks, since it allows the extraction of features in an automatic and efficient way. This performance inspires us to adapt the CNN in our system to perform the feature extraction step. In this work, we relied on the VGG-16 architecture [11] because of its simplicity and its use in various text recognition systems [8,12] to form an architecture capable of extracting features in our case of old images. Fig. 3 presents
Handwriting Recognition in Historical Manuscripts
531
the adapted architecture. The normalized image of the previous step presented as input into the architecture, then through two layers of convolutions 64 kernels moves over the input image to produce a total of 80 × 460 x 64 feature maps. After the convolution layers, a max pooling layer with a window of 2 × 2 pixels is applied, which will reduce the feature maps to 40×230×64. Then two convolution layers are added with this time 128 kernels to produce a result of 40 x 230 x 128 feature maps, followed by a max pooling layer which reduces the result to 20 x 115 x 128. After that, the model uses 3 times 3 convolution layers with a total kernel of 256, 256, and 512 respectively. After applying each of the 3 convolution layers, a max pooling layer is used to reduce the size of feature maps which will have respectively: 10 x 57 x 512 5 x 57 x 512 and 1 x 56 x 512. The initial architecture of VGG16 has 3 fully connected layers that allows to classify the image, these 3 layers will be replaced with the recurrent neural network that will perform the recognition task during the next classification step.
Fig. 3. Representation of the CNN architecture used in the feature extraction step.
2.3
Text Classification
The result of the feature extraction step is a sequence of feature vectors that represents the set of text word images. Each vector of the sequence represents an isolated word image, this vector will be the input of an RNN model which predicts the transcription of each vector by recognizing the sequence of characters which forms the word in the input image. Traditional RNNs had a limitation in learning and encountered the problem of vanishing gradient during the learning phase if the particular time sequence was too long. To address this problem, the LSTM block is introduced [13] to provide the RNN with the ability to model and learn dependencies between data even if the time sequence of the data is too long. Each LSTM block is composed of three gates: Input Gate, Forget gate and Output gate. These three gates that cooperate to produce the output result, the equations that organize the mechanism of operation of these gates are as follows: it = σ(wi [ht−1 + xt ] + bi )
(1)
532
H. El Bahi
ft = σ(wf [ht−1 + xt ] + bf )
(2)
ot = σ(wo [ht−1 + xt ] + bo )
(3)
Where it , ft and ot represent respectively input, forget and output gates. σ refers to the sigmoid function. Concerning wi , wf and wo they refer respectively to weights matrices of input, forget and output gates. Whereas ht−1 denotes to the previous LSTM output, while xt represents the input at the current time step, and bx represents the biases of the corresponding gate. In this work, we use the bidirectional LSTM which offers in addition to the exploitation of the past results, the use of the results of the future outputs in order to make the model more efficient to predict the result at the current moment. The number of LSTM blocks and the number of RNN layers will be detailed and discussed in the result section. 2.4
Transcription Layer
Normally the use of RNN requires providing a word or line of text image segmented into individual character images to be able to predict the class of each character image. Unfortunately, this is impossible in the case of text images, especially ancient manuscripts, given the style of writing, the overlapping of characters and the degraded quality of the documents. To meet this need, Graves et al. [14] introduce a new layer named Connectionist Temporal Classification (CTC) layer. This layer will make the RNN-LSTM model able to predict the results even in the case where the input data is not segmented. The CTC layer takes as input at each time step the LSTM transcription result, and then predicts the final recognition result by calculating the highest conditioned probability. It is therefore sufficient to present at the moment of learning the model, the sequence of feature vectors accompanied by the sequence of characters which form the word contained in the image of the input word.
3
Results
In order to evaluate the performances of the proposed recognition system, we performed experiments on the public ESPOSALLES [10] dataset. In the rest of this section, we first present this dataset and the evaluation metric, then we present the results of the quantitative and qualitative experiments obtained by the system. 3.1
ESPOSALLES Dataset And Evaluation Metric
The ESPOSALLES [10] database consists of records of old marriage certificates written in Spanish between 1617 and 1619 that were archived and preserved in Barcelona Cathedral. The database contains a total of 39527 images of isolated words which are extracted from these recording collections. The database images
Handwriting Recognition in Historical Manuscripts
533
are separated into three sets: a training set (28346 images), a validation set (3155 images) and a test set (8026 images). To assess the efficiency of the system, we calculate the text recognition results using the Character Error Rate (CER) metric. This metric compares the word predicted by the system and the ground truth word, then calculates the number of characters needed to be inserted, deleted or substituted in order to transform the predicted word into the ground truth word. The value of this metric is between 0 and 1, 0 if the two words are identical, and 1 if there is no correspondence between the two words. 3.2
Quantitative Evaluation
As we detailed in the proposed system section, we use the CNN model to ensure the feature extraction phase and the classification phase is ensured using the RNN-LSTM model. In order to get the best results, we conducted several experiments on the dataset using several parameters and configurations at the level of the architecture of the two models. Since the CNN model inspired by VGG16 has proven its performances in several recognition systems, we have concentrated our efforts on the classification phase by choosing several architectures of the RNN-LSTM model. The configuration of these architectures is based on two parameters: the number of hidden layers and the number of LSTM blocks. Thus, we designed two RNN-LSTM models, a first model with a single hidden layer and a second with two hidden layers. Next, we have run a series of experiments with each model choosing a number of 128, 256 or 512 LSTM blocks. All models are trained on the training set, validated on the validation set and the final results computed on the test set, using Adam optimization algorithm [15] and a learning rate of 0.001. The results obtained by the different configurations are presented in Table 1. Table 1. The effect of number of hidden layers and LSTM memory blocks on handwriting recognition rate.
Nbr. Of hidden layers
Number of LSTM blocks 128 256 512
1 layer
2.74
1.60
1.94
2 layer
2.40
2.37
2.07
As shown in Table 1, the experimental results clearly demonstrate that the configuration of the RNN-LSTM model with a hidden layer achieved a character error rate of 2.74%, 1.60% and 1.94% with respectively 128, 256 and 512 LSTM blocks. While the RNN-LSTM model configuration with two hidden layers achieved a character error rate of 2.40%, 2.37% and 2.07% with respectively 128, 256 and 512 LSTM blocks. Thus, the best recognition performance is obtained by using a single hidden layer with 256 LSTM memory blocks. As you can deduce,
534
H. El Bahi
using multiple hidden layers and putting multiple LSTM blocks does not directly mean that the recognition results will be improved. We recall that these results are obtained on isolated images of words of handwritten text. The impact of hidden layers and LSTM memory blocks will be important when dealing with images of lines of text that consist of multiple words in the same text image. In this case the use of several hidden layers and memory blocks will increase the capacity of RNN-LSTM to memorize more contextual information which improves the recognition rate. 3.3
Qualitative Evaluation
To visualize the performance of the proposed system, we present in Table 2 qualitative recognition results. The system works well and manages to recognize the handwritten words in the old images in different cases of presence of noise, complex background, word size variability, degraded quality and even presence of ink in the initial image. Table 2. Examples of old handwritten words recognized correctly.
Input images Predicted words
4
Bara
Dit
franch
Luys
Palaudaries
texidor
Conclusion
In this paper, we have proposed a recognition system that works on old handwritten images, the recognition of which is one of the most difficult processes in the field of image recognition. The system begins with a pre-processing step that eliminates degradation. Then, a CNN model inspired from VGG16 is adapted in the feature extraction step. In the classification step, several RNN-LSTM based architectures are explored and tested to predict the character sequence of each word image. Finally, the CTC layer converts the sequence of predictions into the final sequence of characters. Experiments on the ESPOSALLES dataset indicate that the proposed system achieves a high recognition rate in one of the most difficult text images. In future work, we plan based on these text recognition results to propose a system that aims to recognize named entities (NER) and identify the relationship between words in order to extract the most relevant information from ancient manuscripts.
Handwriting Recognition in Historical Manuscripts
535
References 1. Deore, S.P., Pravin, A.: Histogram of oriented gradients based off-line handwritten devanagari characters recognition using SVM, K-NN and NN Classifiers. Rev. d’Intelligence Artif. 33(6), 441–446 (2019) 2. Surinta, O., Karaaba, M.F., Schomaker, L.R.B., et al.: Recognition of handwritten characters using local gradient feature descriptors. Eng. Appl. Artif. Intell. 45, 405–414 (2015) 3. Roy, R.K., Mukherjee, H., Roy, K., et al.: CNN based recognition of handwritten multilingual city names. Multimedia Tools Appl. pp. 1–17 (2022) 4. Dutta, K., Krishnan, P., Mathew, M., et al.: Improving CNN-RNN hybrid networks for handwriting recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 80–85. IEEE (2018) 5. Elleuch, M., Kherallah, M.: An improved Arabic handwritten recognition system using deep support vector machines. In: Computer Vision: Concepts, Methodologies, Tools, and Applications, pp. 656–678. IGI Global (2018) 6. Mondal, T., Bhattacharya, U., Parui, S.K., et al.: On-line handwriting recognition of Indian scripts-the first benchmark. In: 2010 12th International Conference on Frontiers in Handwriting Recognition, pp. 200–205. IEEE (2010) 7. Butt, H., Raza, M.R., Ramzan, M.J., et al.: Attention-based CNN-RNN Arabic text recognition from natural scene images. Forecasting 3(3), 520–540 (2021) 8. El bahi, H., Zatni, A.: Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network. Multimedia Tools Appl. 78(18), 26453–26481 (2019) 9. Markou, K., et al.: A convolutional recurrent neural network for the handwritten text recognition of historical Greek manuscripts. In: Markou, K., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science, vol. 12667, pp. 249–262. Springer, Cham (2021). 10.1007/978-3-030-68787-8 18 10. Forn´es, A., Romero, V., Bar´ o, A., et al.: ICDAR2017 competition on information extraction in historical handwritten records. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 1389–1394. IEEE (2017) 11. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 12. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transa. Patt. Anal. Mach. Intell. 39(11), 2298–2304 (2016) 13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 14. Graves, A., Fern´ andez, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006) 15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Artificial Intelligence for a Sustainable Finance: A Bibliometric Analysis Rania Elouidani(B) and Ahmed Outouzzalt Fsjes, Ibn Zohr University, Agadir, Morocco [email protected]
Abstract. The ecological emergency implies a drastic change in our production and consumption patterns. Artificial intelligence (AI) is rapidly opening up a new frontier in business, corporate practice and government policy. Machine intelligence and robots with deep learning capabilities have had a profound impact on business, government and society. They are also influencing major trends in global sustainability. As the AI revolution transforms our world, it could herald a utopian future where humanity coexists harmoniously with machines. By examining existing literature, bibliometric analysis is an excellent way for undertaking a quantitative study of academic output to address research trends in a certain field of research. This paper seeks to investigate the role of emerging artificial intelligence techniques in supporting sustainable finance, to assess their progress, and to explain the research trend over the last decade using bibliometric analysis. The findings show that, despite a significant rise in the number of publications since 2017, author collaboration is minimal, particularly at the international level. Furthermore, the findings provide an overview of the topic’s interdisciplinary study. By presenting new insights and crucial concepts, the authors hoped to contribute to the theoretical development of artificial intelligence’s use in sustainable development at the financial sector level. Artificial intelligence approaches are being widely deployed as viable replacements to traditional methods, with promising outcomes. This article has theoretical as well as practical consequences, as it gives researchers an overview of the theoretical evolution and intellectual framework for undertaking future study in this field. Keywords: Artificial intelligence · Sustainable development · Green finance · Bibliometric study · FinTech
1 Introduction Over the past years, the relationship between artificial intelligence (AI) and the concept of sustainable finance has interested researchers around the world. The current general trend in the economy is towards the digitalization of all sectors of activity (FinTech), which leads finance researchers to study the impact of new technologies, artificial intelligence (AI), on sustainable development in the financial field. Academic studies have been devoted to exploring the role of new AI technologies in improving the foundations of sustainable finance. AI is increasingly used for decision support, forecasting and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 536–551, 2023. https://doi.org/10.1007/978-3-031-26384-2_46
Artificial Intelligence for a Sustainable Finance
537
anticipation. It is thus a strong opportunity to support companies and more generally societies in a sustainable development approach. Transitioning to sustainability involves accomplishing the Sustainable Development Goals, converting the economy into a green, low-carbon, resource-efficient economy, and fighting climate change. Sustainable finance, green finance, and climate financing promote this process. The finance industry is gradually adjusting to the continuing shift. It is known as the Quiet Revolution in finance since it is still forming on the periphery of the conventional economy [1]. Developing a new business model is often a challenge for companies and can fail [2]. To effectively implement sustainability strategies and practices, companies need to reflect on their values and be readiness to seize opportunities related to sustainability [3]. AI is one of the most essential accelerators for establishing a sustainable company strategy [4]. However, our knowledge, there is no comprehensive bibliometric analysis of the evolution of the application of artificial intelligence techniques in support of sustainable finance, and we have sought to fill the gaps in this area of research. This effort is motivated by the realization that this research area has evolved greatly over the last few years and that it is essential to examine and define the research trend using bibliometric analysis and visualization. Given the significance of this issue, the purpose of this research is to examine the current academic literature on integrating the idea of sustainable development into the financial sector via the use of intelligent procedures, with the following particular objectives: 1. Describe Identify bibliometric patterns (co-authorship, geographical region of authorship, co-citation, co-occurrence, etc.) in the use of artificial intelligence to sustainable finance. 2. Provide a synthesis of research works from many disciplines of business, finance, management, and the environment, as well as computer science, in order to conduct interdisciplinary research on bankruptcy prediction modeling. 3. Discuss, based on the acquired data and knowledge, the unexplored regions and consider prospective future study possibilities in order to develop a better understanding and insight into this research issue.
2 Literature Review 2.1 The Concept of Sustainable Finance Finance is the collection of systems that supply the functioning capital required by the economy. It plays a crucial part in the competitiveness and prosperity of a nation since its goals are to guarantee the most efficient use of resources and so enhance the well-being of everyone [5]. However, access to the maximum level of life is only possible if the diminishing environment and natural resources are taken into consideration. Financing a socially suitable ecological transition is consequently a significant problem for our societies’ futures. In order to promote the actual economy and long-term initiatives, sustainable finance encourages financial operations that include ESG criteria [6]: environmental, social, and
538
R. Elouidani and A. Outouzzalt
governance requirements. These criteria include study of the implications of a company’s actions in terms of carbon emissions, biodiversity conservation, waste management, etc.; social impacts; and the set of regulations that govern the manner in which a company is regulated and operated. 2.2 Artificial Intelligence for a Sustainable Finance The Brundtland Report states that technology can be “managed and enhanced to usher in a new era of economic growth.” [7]. Not only is the significance of technology in supporting and promoting sustainable development still stressed thirty years later, but it is also increasing speed. In 2015, for instance, the World Economic Forum emphasized the need for technological transformation, demonstrating how certain aspects, such as artificial intelligence, could radically alter society over the next decade [8]. Considered one of the most significant technologies for the advancement of sustainability [10], AI offers new methods of producing, delivering, and extracting value among stakeholders, affecting all aspects of business models [11]. Accordingly, the Global Reporting Initiative [12] has highlighted that advances in knowledge and technology not only contribute to broader economic development, but also support the resolution of “risks and threats to the sustainability of our social relations, environment and economies”. The adoption of advanced technologies implies a disruption and redesign of activities [13]. John McCarthy invented the term artificial intelligence (AI) at the first academic meeting on the issue at Dartmouth College in 1956 to advance the notion of thinking machines [14]. Today, artificial intelligence refers to a wide range of computer technologies and applications, including neural networks, voice and pattern recognition, genetic algorithms, and deep learning [15]. The absence of a precise and commonly acknowledged definition makes it difficult to get a good picture of the AI field [10]. AI is nonetheless defined by the Oxford English Dictionary as “the theory and development of computer systems capable of doing tasks that ordinarily require human intellect, such as visual perception, voice recognition, decision-making, and language translation.” The contrast between strong and weak AI is well recognized [16]. Strong AI refers to nonexistent systems having complete human or superhuman intellect [17]. Weak AI, or narrow AI, refers to activities that need distinctive human talents, such as visual perception, contextual comprehension, probabilistic reasoning, and management of complexity [18]. Expert systems and knowledge-based systems, such as those used for medical diagnosis, loan approval, insurance underwriting, and decision support, are the only AI-based commercial applications now in use [19]. Intriguingly, AI enables new methods of producing and sharing value, which has an effect on company structures [20] and is a significant source of innovation [21]. AI, for instance, provides several significant research possibilities to alter the role of resources in a business model [10]. Conforming to the digital transformation of businesses, we are seeing a significant degree of automation of physical chores and manufacturing processes via the use of pragmatic technologies [19]. On the plus side, AI will increase company intuition and individual inventiveness [22]. On the negative side, the workplace will experience a paradigm change as robots and sophisticated technology displace human workers [23]. Similarly, AI is making goods and services smarter by introducing new uses and features [24]. Smartphones, for instance, already contain AI functions such
Artificial Intelligence for a Sustainable Finance
539
as voice recognition and comprehension, word completion while writing, and tailored advise providing.
3 Research Methodology This article is a bibliometric literature review. The literature research is limited to journal articles and reviews published between 2010 and 2022 in the SCOPUS database. A set of keyword combinations were used as search criteria in the initial search to locate international articles relating to the research topic. These criteria are based on a previous systematic literature review conducted by several authors providing the following combination of main keywords: artificial intelligence, sustainable development, green finance, bibliometric study, FinTech, and ESG. The above sequence of keywords was used to obtain bibliometric data from the four most-used databases (Scopus, Pubmed, Dimensions and Lens). The Scopus online database was chosen for the study because it has the greatest amount of documents (n = 63), and the bibliographic data, i.e. year of publication, number of publications, document type, countries/territories of origin, and institutions, were recorded. The “Analyze” and “Create Citation Report” functionalities of the online web platform Scopus were utilized to do basic analysis. After inputting such terms, the search results are filtered by only evaluating papers from 2010 to 2022. Regarding language selections, no languages have been specified, despite the fact that the bulk of results are written in English. As we aimed to present an overview of research conducted in several disciplines, we did not apply a filter based on field.
4 Results of the Analysis Intelligent methodologies were used to perform a bibliometric analysis on supporting sustainable development in finance in order to notice and analyze the publishing patterns in this research field. The VOSviewer program was used to investigate the development of published documents, co-authorship, geographical location (country/territories) of authors, co-citation, and co-occurrence… VOSviewer is one of the most popular computer applications that provides “visualization methods that may be used to map the ever-expanding domain structure of scientific fields and to help the retrieval and categorization of information” [25]. It was selected because it excels at visualizing huge bibliometric maps in a simple and intelligible way [26]. Following will be a discussion of the study’s findings on the specified criteria. 4.1 Initial Analysis of Results Figures 1 and 2 show the distribution of output over the period 2010–2020. 63 papers were published by journals indexed in the Scopus database during the study period. This number of papers is shared between different research fields. Thus, computer science takes the first place with a rate of 17.6%, followed by business, management and accounting, while mathematics occupies the last position with 2.1%.
540
R. Elouidani and A. Outouzzalt
Fig. 1. Publication trends by year (Source: Based on data from Scopus database)
Fig. 2. Publications by subject area (Source: Based on data from Scopus database)
Figure 1 shows that in terms of publications, the number of documents is increasing with 22 documents in 2021 compared to 2 in 2010, we also observe a zero number of publications in the years 2012 and 2013. The highest growth rate (266.66%) was recorded in the period 2019–2021 and the number of documents published had a decrease rate for the years 2011 to 2013. Figure 3 reports the distribution of documents by type for 63 publications. It mainly comprises “Article” for 42.9%, with 27 documents, followed by “Conference Paper” (38.1%) with 24 publications, and “Review” (4.8%) represented by 4 documents. Figure 4 shows the most productive countries/territories. Indeed, China (n = 11) is the most productive country, followed by the United States (n = 9) and Russia (n = 7).
Artificial Intelligence for a Sustainable Finance
541
Fig. 3. Publications by type (Source: Based on data from Scopus database)
Fig. 4. Publications by country/territory (Source: Based on data from Scopus database)
The main keywords covered for the articles reported for China are artificial intelligence, finance, electronic commerce, financial risks, sustainable development, and management science. Figure 5 shows the most productive authors in our research topic. Indeed, Ceocea, Hassan, Jayawickrama, Olan, Suklan and Vladareanu occupy ex-aequo the first place with 2 publications each. Followed by other authors with only one publication each. This confirms the novelty of the subject and the progressive increase in production in recent years.
542
R. Elouidani and A. Outouzzalt
Fig. 5. Publications by author (Source: Based on data from Scopus database)
4.2 In-depth Analysis of the Results Co-authored research is a crucial component of bibliometric studies, and the amount of cooperation is a measure of the status of research in a particular subject [27]. From the standpoint of people and nations, it is necessary to examine in this section the collaborative strength and research groups of users of smart approaches for enhancing the foundations of sustainable finance. Before continue with the network interpretation of co-authorship and co-citation, it is important to distinguish between the two concepts. The objective of co-authorship analysis is to determine the extent of research cooperation in a certain area [28]. Coauthoring a book, paper, report, etc. with another individual or individuals [29]. Cocitation happens when an article that is cited also references a referenced work. It is a kind of document connection and is defined as the frequency with which two pieces of previous literature are mentioned together in subsequent writing [30].
Fig. 6. Individual co-authorship map (Source: Developed by the Author on VOSviewer)
For data selection and thresholds, the minimum number of documents and citations for an author is one and zero, respectively. 14 of the 166 writers meet the requirements,
Artificial Intelligence for a Sustainable Finance
543
and their co-author network is shown in Fig. 6, where each node represents an author and the lines and distances indicate their connection. When two nodes are closer to one another, they are more likely to have a strong connection. Authors with a greater influence, as measured by citations and publications, are represented by bigger nodes. A link is a connection or relationship between two items, and the thicker the line used to represent the link in the map visualization, the stronger the link. The connections in Fig. 6 represent the co-authorship relationships between researchers. Each connection has a strength that indicates the number of articles coauthored by two scholars [31]. The total link of a node is the sum of the link strengths of that node on all other nodes [32]. Link strength may be used as a quantitative metric to characterize the connection between two components. As seen in Fig. 6, there are three clusters, each represented by a different color. The two biggest nodes for Zhang and Liu have the greatest citation weight and overall link strength. The remainder of the writers have substantially lower citation and publication weights and fewer collaboration connections.
Fig. 7. Country co-authorship overlap visualization map (Source: Developed by the Author on VOSviewer)
Country co-author analysis is a useful type of co-author analysis because it may reveal the level of communication between countries and the most influential countries in a field [33]. A visualization map of the national co-authorship network overlay was developed and illustrated in the Fig. 7. Regarding data selection and criteria, the minimal number of articles and citations for a nation is two and zero, respectively. Eight out of nine nations meet the requirements. Notably, the visualization of the overlay is based on the average year of publication, with the colours of the items decided by their score, from blue (lowest score) to yellow (highest score) (highest score). China, the United States, and Russia are the nations and areas with the greatest number of publications, as seen in Fig. 7. The United States is the research center with the biggest overall number of linkages in this topic. On this map, the size of each node is proportional to the number of articles. In other words, while the United States is not the biggest node on the globe (it is slightly smaller than China), it has the greatest overall linkage strength for the broadest connection and cooperation with diverse nations and regions across continents. After China, Russia is the second-largest node in Asia and
544
R. Elouidani and A. Outouzzalt
is strongly connected to other Asian nations, such as India and Malaysia. This finding indicates that geographical closeness tends to increase the collaborative and cooperative relationships between writers in this area of research. Regarding the average year of publication as seen in Fig. 7, India and Bahrain (in yellow) have a large amount of publications after 2021, which may suggest that academics in these two nations have shown a growing interest in this study area in recent years. China, Italy, and Malaysia (colored in blue) all exhibit a publishing peak prior to 2018, which may indicate their past contribution and significance to this topic. The United States and Russia (in green) exhibit a significant rise in the number of publications from 2019 to 2020. Regarding data selection and criteria, the minimal number of articles and citations for a nation is two and zero, respectively. Eight out of nine nations meet the requirements. Notably, the visualization of the overlay is based on the average year of publication, with the colours of the items decided by their score, from blue (lowest score) to yellow (highest score) (highest score). China, the United States, and Russia are the nations and areas with the greatest number of publications, as seen in Fig. 7. The United States is the research center with the biggest overall number of linkages in this topic. On this map, the size of each node is proportional to the number of articles. In other words, while the United States is not the biggest node on the globe (it is slightly smaller than China), it has the greatest overall linkage strength for the broadest connection and cooperation with diverse nations and regions across continents. After China, Russia is the second-largest node in Asia and is strongly connected to other Asian nations, such as India and Malaysia. This finding indicates that geographical closeness tends to increase the collaborative and cooperative relationships between writers in this area of research. Regarding the average year of publication as seen in Fig. 7, India and Bahrain (in yellow) have a large amount of publications after 2021, which may suggest that academics in these two nations have shown a growing interest in this study area in recent years. China, Italy, and Malaysia (colored in blue) all exhibit a publishing peak prior to 2018, which may indicate their past contribution and significance to this topic. The United States and Russia (in green) exhibit a significant rise in the number of publications from 2019 to 2020. The authors create a map based on a co-occurrence matrix, separating the keywords into three categories (see Fig. 8) with a minimum of five occurrences per term. 10 keywords out of 620 that fit the criteria are shown as nodes. Due to differences in how writers characterize a word (as singular or plural, with or without the hyphen), elements that have been represented in various ways may be tallied independently. To get more precise findings, the authors used the thesaurus feature of the VOS reader and combined the various keyword types. The keyword “artificial intelligence” has the greatest overall link strength and frequency. Other frequently occurring terms include “sustainable development” and “financing.” The artificial intelligence node has thick lines linking to the remaining keywords. Co-citation is a kind of document linking, defined as the frequency with which two papers are referenced jointly by other publications [34]. A co-citation map consists of nodes representing journal articles and edges reflecting the co-occurrence of nodes and/or articles in the article’s reference list [35]. Consequently, the authors undertook
Artificial Intelligence for a Sustainable Finance
545
Fig. 8. Keyword co-occurrence map (Source: Developed by the Author on VOSviewer)
a co-citation research in line with the literature on smart techniques for strengthening sustainable financial foundations. In VOSviewer, a reference co-citation map was generated based on bibliographic data (see Fig. 9), and the minimum criterion for an author’s number of citations was set at 6, which only 10 out of 11 writers satisfied. In Fig. 9, it can be seen that all nodes are of equal importance. This can be explained by the fact that the research topic is innovative and not yet well addressed by researchers. In fact, when searching the bibliographic database, the number of results found is only 63 documents of different types spread over different research areas. To conclude this in-depth analysis, we have tried to analyse the topic more precisely by focusing only on articles, conference papers and reviews published between 2021 and 2022 (the period of growth in the number of publications) in the SCOPUS database that belong to the following fields: Business, Management and Accounting; Economics, Econometrics and Finance; and Decision Sciences. For this, we used the following search structure: (TITLE-ABS-KEY (sustainable AND finance) AND TITLE-ABS-KEY (artificial AND intelligence)) AND (LIMIT-TO (SUBJAREA, “BUSI”) OR LIMIT-TO (SUBJAREA, “ECON”) OR LIMIT-TO (SUBJAREA, “DECI”)) AND (LIMIT-TO (PUBYEAR, 2022) OR LIMIT-TO (PUBYEAR, 2021)) AND (LIMIT-TO (DOCTYPE, “ar”) OR LIMIT-TO (DOCTYPE, “cp”) OR LIMIT-TO (DOCTYPE, “re”)).
546
R. Elouidani and A. Outouzzalt
Fig. 9. Reference co-citation map (Source: Developed by the Author on VOSviewer)
This one gave a result of nine documents. The objective of this last analysis is to shed light on the different results obtained by the authors in the economic and financial field. The following table summarises these results in descending order of the publication date of each document (Table 1). Table 1. Summary of the results of the analysis of each author published on the SCOPUS database between 2021 and 2022 in the economic and financial field. Authors
Main Results
Luo, Suyuan; Choi, Tsan-Ming (2022)
• Most publications are conceptual • Categorisation of related studies into three broad areas, namely information technologies for supply chain operations, technologies for sustainable operations and technologies for production operations (continued)
Artificial Intelligence for a Sustainable Finance
547
Table 1. (continued) Authors
Main Results
Olan, F., Arakpogun, E.O., Jayawickrama, U., • AI offers significant economic opportunities Suklan, J., Liu, S. (2022) and provides the most efficient use of supply networks • It also provides a theoretical contribution to CS financing and broadens the managerial implications for improving performance Kumar, S., Sharma, D., Rao, S., Lim, W.M., Mangla, S.K. (2022)
• The study gathers the most influential publications, journals, the most influential authors, institutions, and countries, as well as methodological choices and research contexts for long-term finance research • It gives an overview of seven major themes in long-term finance research, including socially responsible investing, climate finance, green finance, impact investing, carbon finance, energy finance, and long-term finance governance • It makes some recommendations for future research on long-term finance
Milana, C., Ashta, A (2021)
• There are productive uses of AI that affect the daily life of the average consumer • AI poses formidable challenges to the maintenance of jobs • The literature reviewed gave hope for improved efficiency, new data, information, advisory and management services, risk mitigation and some unanswered questions about the negative impacts on sustainable growth and increased economic welfare
Ha, S., Marchetto, D.J., Dharur, S., Asensio, O.I. (2021)
• Human expert annotators outperform the crowd in general in terms of accuracy and F1 score • Improvements in the quality of training data also produce more accurate and reliable detection • When models are trained on high-quality training data organised by experts, surprisingly, transformative neural networks can outperform even human experts (continued)
548
R. Elouidani and A. Outouzzalt Table 1. (continued)
Authors
Main Results
Han, D., Ding, L. (2021)
• According to the findings, the ensemble learning model has a prediction accuracy of 98.08 percent, which is higher than the single model. This discovery could pave the way for a new financial risk prediction path for manufacturing businesses, allowing them to better foresee financial risk and promoting long-term, steady growth
Pustokhina, I.V., Pustokhin, D.A., Mohanty, S.N.,
• In the FinTech setting, the research proposes a revolutionary artificial intelligence aided IoT based FCP (AIAIoT-FCP) paradigm
García, P.A.G., García-Díaz, V. (2021)
• The resulting values demonstrated that the AIAIoT-FCP model outperformed state-of-the-art approaches by a significant margin
Qi, X., Hu, H. (2021)
• The number of suppliers under corporate oversight is negatively related to financial performance • The publication offers development recommendations for managing the real estate industry’s ecological supply chain
Olan, F., Liu, S., Suklan, J., Jayawickrama, U., • The study establishes a conceptual Arakpogun, E. (2021) framework and subsequently develops it to a meta-framework based on theoretical contributions recognized through literature • The potential contributions of AI-driven supply chain networks present a long-term funding source for the food and beverage supply chain
5 Discussion of the Analysis From the initial analysis and the in-depth analysis of the subject, carried out according to the bibliometric research method, we have drawn the following conclusions: Firstly, the relationship between artificial intelligence and sustainable finance is a recent topic with innovative aspects. Research in this area is still underdeveloped and the number of productions in each field of study is insufficient, especially since it is an important topic these days. Secondly, the first two papers on the subject were published only in 2010, namely “Financial distress prediction of China listed companies based on SD model and LSSVM” and “Decision support system framework for the implementation of Enterprise Resource Planning (ERP) system” published by Wang, Z.-N., Wu, Y.-Y. and Khalil, M.A.T., Dominic, P.D.D., Bin Hassan, M.F. respectively.
Artificial Intelligence for a Sustainable Finance
549
Thirdly, it was found through this bibliometric analysis that research in this topic has been growing since 2018, and it is still increasing. The authors are aware of the importance of the topic, given the abnormal climate changes and the scarcity of natural resources. Fourth, the co-occurrence analysis reveals that the most frequently used keywords are: “artificial intelligence”, “sustainable development” and “finance”. All of these are closely related to the topic, as they show that the use of AI is oriented towards a multitude of domains, whether related or not, such as finance and sustainability. Finally, the analysis showed us that the research on this subject in the economic and financial field is still insufficient in terms of the number of publications per year, a number that has seen its greatest growth after 2020. The study also revealed that all authors agree on the importance of AI techniques in improving the economic sector.
6 Conclusion The purpose of this bibliometric study on the implementation of new smart techniques for sustainable finance is to identify and evaluate research trends in the field, and to present the development of published articles, co-authorship, the geographical area (country/territory) of authors, co-occurrence analysis, and text mining. Findings from this study may be helpful to researchers in this area, however there are several caveats that need to be addressed. Bibliometric analysis has limitations because to its quantitative character, which does not take into account the context or authors’ intentions when referring to other research. One scientific database is the source of the second restriction (SCOPUS). In light of this, it is possible that a bibliometric evaluation based on other data sources will be included into future investigations. Objective quantitative co-occurrence approaches may be used to deal with the third limitation, which is keyword collecting. In particular, a forward-backward search technique that automatically collects subject keywords from literature databases using Meta keywords and a co-occurrence/LDA-topic definition list might be the focus of future investigation. The major contribution of this study is to provide new insights and key information on bibliometric trends in artificial intelligence in relation to the development of green finance. Artificial intelligence approaches are being widely deployed as viable replacements to traditional methods, with promising outcomes. To discover new niches in this subject for future research, it is vital to understand the trend of studies in this topic and to determine the intellectual structure. As a result, the findings of this article provide a summary of interdisciplinary research conducted since 2010. This approach has contributed to a better knowledge of the subject by revealing new information about the application of under-utilized alternative machine learning techniques. Lastly, the study adds to the field’s theoretical development by assisting graduate students and researchers in identifying essential research subjects and discovering potential prospects [36].
550
R. Elouidani and A. Outouzzalt
References ˙ 1. Ryszawska, B.: Sustainable finance: paradigm shift. In: Bem, A., Daszy´nska-Zygadło, K., Hajdíková, T., Juhász, P. (eds.) Finance and Sustainability. Springer Proceedings in Business and Economics. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92228-7_19 2. Geissdoerfer, M., Vladimirova, D., Evans, S.: Sustainable business model innovation: a review. J. Clean. Prod. 198, 401–416 (2018). https://doi.org/10.1016/j.jclepro.2018.06.240. S0959652618318961 3. Fleming, P., Watson, S.J., Patouris, E., Bartholomew, K.J., Zizzo, D.J.: Why do people file share unlawfully? A systematic review, meta-analysis and panel study. Comput. Hum. Behav. 27, 535–548 (2017) 4. Manyika, J., et al.: A Future That Works: Automation Employment and Productivity. McKinsey & Company, New York (2017) 5. “Qu’est-ce que la finance durable». Observatoire de la finance durable. https://observatoire delafinancedurable.com/fr/presentation/la-finance-durable/#:~:text=La%20finance%20dura ble%20regroupe%20diff%C3%A9rentes%20pratiques&text=Il%20permet%20d’int%C3% A9grer%20des,compte%20des%20crit%C3%A8res%20extra%2Dfinanciers. Accessed 01 Mar 2022 6. “ESG”. Novethic. https://www.novethic.fr/lexique/detail/esg.html. Accessed 02 Mar 2022 7. United Nations: Report of the world commission on environment and development (1987) 8. Espinel, V., Brynjolfsson, E., Annunziata, M., Brechbuhl, H., Cao, R., Crawford, S., et al.: Deep Shift: Technology Tipping Points and Societal Impact. World Economic Forum, Geneva (2015) 9. Panetta, K.: Gartner top 10 strategic technology trends for 2019 (2018). https://www.gartner. com. Accessed 11 Mar 2019 10. Ransbotham, S., Gerbert, P., Reeves, M., Kiron, D., Spira, M.: Artificial intelligence in business gets real. MIT Sloan Manag. Rev. (2018). https://doi.org/10.1145/2063176.2063177 11. Global Reporting Initiative: RG. Sustainability reporting guidelines (2011) 12. Manyika, J.: What’s Now and Next in Analytics AI and Automation. McKinsey Global Institute, San Francisco (2017) 13. Marr, B.: The key definitions of artificial intelligence (AI) that explain its importance. Forbes (2018). https://www.forbes.com/sites/bernardmarr/2018/02/14/the-key-def initions-of-artificial-intelligence-ai-that-explain-its-importance/#4f4220b4f5d8. Accessed 9 Apr 2019 14. Bughin, J., Hazan, E., Ramaswamy, S., Chui, M., Allas, T., Dahlström, P., et al.: Artificialintelligence: the next digital frontier? McKinsey Global Institute (2017). https://doi.org/10. 1016/S1353-4858(17)30039-9 15. Jarrahi, M.H.: Artificial intelligence and the future of work: human-AI symbiosis in organizational decision-making. Bus. Horiz. 61(4), 577–586 (2018) 16. Stanford University: Artificial intelligence and life in 2030: one hundred year study on artificial intelligence. Stanford University (2016). https://ai100.stanford.edu 17. KPMG: Trust in artificial intelligence (2018a). https://doi.org/10.1177/1064804618818592 18. Hengstler, M., Enkel, E., Duelli, S.: Applied artificial intelligence and trust—the case of autonomous vehicles and medical assistance devices. Technol. Forecast. Soc. Chang. 105, 105–120 (2016) 19. Russel, S., Norvig, P.: Artificial Intelligence a Modern Approach, 3rd edn. Pearson, London (2010). https://doi.org/10.1017/S0269888900007724 20. Duan, Y., Edwards, J.S., Dwivedi, Y.K.: Artificial intelligence for decision-making in the era of Big Data—evolution, challenges and research agenda. Int. J. Inf. Manag. 48, 63–71 (2019)
Artificial Intelligence for a Sustainable Finance
551
21. KPMG: Rethinking the value chain. A study on AI, humanoids and robots (2018b). https:// assets.kpmg.com/content/dam/kpmg/xx/pdf/2018/09/rethinking-the-value-chain.pdf 22. Biloslavo, R., Bagnoli, C., Edgar, D.: An eco-critical perspective on business models: the value triangle as an approach to closing the sustainability gap. J. Clean. Prod. 174, 746–762 (2018) 23. Baccala, M., et al.: 2018 AI predictions—8 insights to shape business strategy (2018). https:// doi.org/10.1007/s12193-015-0195-2 24. Ferràs-Hernández, X.: The future of management in a world of electronic brains. J. Manag. Inq. 27(2), 260–263 (2018) 25. Huang, M.H., Rust, R.T.: Artificial intelligence in service. J. Serv. Res. 21(2), 155–172 (2018) 26. Rosenberg, R.S.: The social impact of intelligent artefacts. AI Soc. 22(3), 367–383 (2008) 27. Makridakis, S.: The forthcoming artificial intelligence (AI) revolution: its impact on society and firms. Futures 90, 46–60 (2017) 28. Borner, K., Chen, C., Boyack, K.: Visualizing knowledge domains. Annu. Rev. Inf. Sci. Technol. 37(1), 179–255 (2005) 29. Waltman, L., Van Eck, N.J., Noyons, E.C.: A unified approach to mapping and clustering of bibliometric networks. J. Informet. 4(4), 629–635 (2010). https://doi.org/10.1016/j.joi.2010. 07.002 30. Jiménez-Reyes, P., Samozino, P., Brughelli, M., Morin, J.B.: Effectiveness of an individualized training based on force-velocity profiling during jumping. Front. Physiol. 7, 677 (2017). https://doi.org/10.3389/fphys.2016.00677 31. Liao, L., Lin, T., Zhang, Y.: Corporate board and corporate social responsibility assurance: evidence from China. J. Bus. Ethics 150(1), 211–225 (2018) 32. Cambridge University Press: Cambridge Online Dictionary, Cambridge Dictionary Online, from the website temoa: Open Educational Resources (OER) (2008) Portal at. http://temoa. tec.mx/node/324. Accessed 23 Apr 2008 33. Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24(4), 265–269 (1973) 34. Van Eck, N., Waltman, L.: VOSviewer Manual (2019). Vosviewer.Com. https://www.vosvie wer.com/documentation/Manual_VOSviewer_1.6.10.pdf. Accessed 27 Mar 2022 35. Pinto, M., Pulgarín, A., Escalona, M.I.: Viewing information literacy concepts: a comparison of two branches of knowledge. Scientometrics 98(3), 2311–2329 (2013). https://doi.org/10. 1007/s11192-013-1166-6 36. Liao, H., Tang, M., Luo, L., Li, C., Chiclana, F., Zeng, X.: A bibliometric analysis and visualization of medical big data research. Sustainability 10(2), 166 (2018) 37. Fahimnia, B., Sarkis, J., Davarzani, H.: Green supply chain management: a review and bibliometric analysis. Int. J. Prod. Econ. 162, 101–114 (2015) 38. Koseoglu, M.: Growth and structure of authorship and co-authorship network in the strategic management realm: evidence from the Strategic Management Journal. BRQ Bus. Res. Quart. 19(3), 153–170 (2016)
Geoparsing Recognition and Extraction from Amazigh Corpus Using the NooJ Complex Annotation Structures Bouchra Ladouzi(B) , Azeddine Rhazi, and Ali Boulaalam DAKS, University Cadi Ayyad, Marrakesh, Morocco [email protected], [email protected]
Abstract. This paper deals with a novel toponym recognition approach for various geoparsing purposes, and presents a method for classifying and extraction toponyms as named Entities (NE) from Amazigh corpora, the research starts with the implementation of the identifying rules (local grammar) and the annotation of toponyms recognizer tool (TR) that is able to find references and semantic relations to places, organizations, persons nd historical meaning use. As targets to extract the correct Amazigh names entities, we use a complex annotation system as a text annotation structure (TAS) with two tasks divided into four phases: Clustering (files), Segmentation (POS tagging for each form), Classification (categories & types) and Extraction (identification). We decided to model various texts constraints of toponyms as ambiguous senses and each phases plays a crucial rule in the overall performance. We use the NooJ platform and it structural annotation engine; which requires a separate analysis of these NEs structures and represents more than one annotation at a given location in the text. The appropriate samples used in the study show the importance of the complex annotation and the useful technique of classification processes to improve Amazigh text analysis from a historical and anthropological meaning based approach. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies. Keywords: Toponym · Geoparsing · Named Entities · Amazigh Text · Extraction · Classification · NooJ
1 Introduction 1.1 Context and Related Work In recent years, the toponym has been a very active field of research due to the increasing use of several types and amount of data as location information retrieval or as Named Entities (NEs) designating geoparsing of persons, places or organizations…also refer to more historical; cultural and anthropological meaning, however little research has been B. Ladouzi, A. Rhazi and A. Boulaalam—Discours Analysis and Knowledge Systems (DAKS)Cadi Ayyad University, Morocco. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 552–559, 2023. https://doi.org/10.1007/978-3-031-26384-2_47
Geoparsing Recognition and Extraction from Amazigh Corpus
553
carried out to study toponym recognition from the Amazigh historical text. Once the emergency-related corpus are filtered out, geo-spatial textual information is processed in Natural Language Processing (NLP) monitoring systems to recognize and to extract toponym units from texts (Suwaileh et al. 2020); which necessitates new guidelines, a consolidation of metrics and a detailed toponym taxonomy with implications for Named Entity Recognition and its geotagging (Milan Gritta et al. 2020) in different contexts and texts disambiguation within various applied purposes such geographical retrieval information; information extraction and pragmatic analysis of some specific corpus of historical texts (Rupp et al. 2013). Furthermore toponym plays an important role as an element that helps to restore their collective memory and reconstruct the components of their cultural identity, that are integrated within its different dimensions, such as the psycho-physiological dimension, the socio-economic dimension, the religious-doctrinal dimension, and the historical dimension. As an example of those named entities called geoparsing that have been spotted in Amazigh Texts the traditional use of ‘Argan Tree’ which drives various ethnographical and anthropological meanings such as “tarant n lbas” (‘argana of misery). In this work on the automatic recognition of toponym from Amazigh texts, we referred to several studies concerning complex text annotation that have been done using different approaches in order to create the NooJ grammar rules then the local grammar to build text annotation structures (Rhazi et al. 2018) . In this paper, we present a method for classifying and extraction toponyms from Amazigh corpora and to deal with a novel toponym recognition approach for various geoparsing purposes, we start with the implementation of the identifying rules (local grammar) and the annotation of toponyms recognizer tool (TR) that is able to find references and semantic relations to places, organizations, persons and historical meaning use. As targets to extract the correct Amazigh names entities, we use a complex annotation system as a text annotation structure (TAS) with two tasks divided into four phases: Clustering (files), Segmentation (POS tagging for each form), Classification (categories & types) and Extraction (identification) (Diab 2009); and we decided to model various texts constraints of toponyms as ambiguous senses. Each phase plays a crucial rule in the overall performance. We use the NooJ platform as a valuable linguistic development environment on one hand, and as the structural annotations engine (Silberztein 2015) with high frequency; which requires a separate analysis of these Named Entities structures (ENAMEX and represents more than one annotation at a given location in the text on the other hand (Ben Hammadou 2010). The appropriate samples used in the study show the importance of the complex annotation and the useful technique of classification processes to improve Amazigh text morpho-syntactic analysis (Nejm 2013) from a historical and anthropological meaning based approach. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.
554
B. Ladouzi et al.
Toponyms occur frequently in Amazigh corpus require morphological rules, lexical rules and syntactic rules (local grammar). Furthermore, to formalize these mentioned classes, we have relied on the works of (Boukhris et al. 2008) and those of (Oulhaj 2000). According to these works on Toponym Amazigh dictionary (we have raised, based on scheme, a set of rules that are structured according to morphological rules (gender, Number, and state), and syntactic grammars (transducers and graphs). 1.2 NooJ’s Complex Annotation1 The annotation process might look like highlighting information of the word, vocabulary or expression in a text and marking it with symbols to represent different information and ideas by creating notes in the margins or keeping track of thoughts. There are three types of the text annotations such as: • Highlight specific information; • Making notes with thoughts, questions, or realizations in the margins or on sticky notes; • Keeping track of thoughts using a Cornell Notes format on a separate document. The NooJ complex annotation as an annotator engine decided to model various electronic texts mainly Atomic Linguistic Units (ALUs) (affixes, suffixes, agglutinated words, simple words, compound words, Named Entities and Collocations.), (Silberztein 2015) which has various ambiguous senses and occur with high frequency in the text. The NooJ platform as linguistic environment development uses complex and structural annotations which require a separate semantic analysis of discontinuous and Multiword Expressions (MWEs) (Rhazi et al.) - as Atomic Units - in order to avoid context ambiguities that offer to Text Annotation Structure (TAS) to represent more than one annotation at a given location in the text (Silberztein 2015).
2 Approach and Method In our work we adopted a specific approach using Texts Annotation structures (TAS) as tool implemented in NooJin order to build a named-entity recognizer from the ground up and to apply a simple query to a corpus and build its corresponding concordance that aims to solve the problem of semantic ambiguity of some toponyms structures. Our approach consists of three main phases: the classification, the extraction and the recognition. This specific approach, presented in three main steps as in the following Fig. 1:
1 NooJ: is a linguistic environment used to construct large-coverage formalized descriptions
of languages as electronic dictionaries, resources and local grammars (graphs), and used as a corpus processing system for various corpora and discourse analysis.
Geoparsing Recognition and Extraction from Amazigh Corpus
555
Fig. 1. The approach, presented in three main steps: extraction; classification & recognition
2.1 Clustering and Segmentation Clustering is a type of NooJ method that collects unstructured using its algorithms clusters to filter corpora, in this context, we are interested in the automatic annotation of Amazigh corpora implemented in NooJ platform (Haddar 2016), the segmentation phase will, on the one hand, reduce the complexity of the analysis and, on the other hand, improve NooJ platform functionalities. Also, we achieved our annotation phase by identifying different types of toponym ambiguities, and then an appropriate set of rules is proposed because some geoparsing tools are al-ready in use (Rupp et al. 2013). In this phase we adopted part of speech tagging (POS) to tokenize Amazigh toponym structure (Fig. 2). 2.2 Extraction and Classification One of the most popular problem in extraction process is the features extraction, his feature vector is used to recognize structures and classify them (Silberztein 2015), through the NooJ functionalities and after constructing morpho syntactical rules implemented by NooJ dictionary and following our mentioned approach, TAS engine offers a classification of the toponym structures within its historical meaning for places and proper names in the text (Gritta 2020 and Rhazi 2018) (see Fig. 3 and Table 1). 2.3 Recognition Toponyms drive from a different founder, famous person, or origin of settlers…some names also drive from features of the physical environment, the historical and the religious cultures, that toponyms recognizer tool (TR) have used for recognition in the text by referring to places via semantic relations. 2.4 Morphological Rules The morphological rules of Amazigh derivational and flexional rules (Nejm et al. 2013) must be extended to include another rule such as the case of named entities and geoparsing units by formalizing, modeling and developing the corpus for evaluation. In the following we present an extract of Amazigh toponyms (Fig 2).
556
B. Ladouzi et al. Table 1. Features of the physical environment, the historical and the religious cultures,
Typology of Amazigh Toponyms
Grony ms
Oronyms
Econyms
Limnonyms
Hebonyms
Others (anthropologyi cal references)
Samples of Toponyms
{} {
} < ORG > Targant n ifrd (the pond of Argan) Argan: targant
: n : ifrd Targan n ifrd Targan n ifrd The pon ( lake ) of Argan( non) Place :Targant n ifrd Targant n ifrd Fig. 2. Extracts of Amazigh dictionaries
2.5 Grammar Rules The Grammar rules the Fig. 1 shows the main graph allowing the recognizing of the functional relation between ENAMEX (Toponyms) Named Entities. Each path of this graph represents a different pattern.
3 Experimentations and Preliminary Results To experiment with our proposed method, we used a test corpus from historical Amazigh texts that contains toponym units such as “TARGANT N DDU TSAWNT”, “TABIHIT”, “TALHUSEINT”. See the Electronic Dictionary for Amazigh Proper Noun ‘EDIcAMPN’ the following sequence show the local grammar of a candidate toponym (Nejm et al. 2013) (Figs. 4 and 5).
Geoparsing Recognition and Extraction from Amazigh Corpus
Fig. 3. Graph representation of local grammar “compound noun”.
Fig. 4. Example of lexical analysis of Amazigh text.
Fig. 5. Concordance of the Toponym: “targant n ifëÄ”.
557
558
B. Ladouzi et al.
4 Conclusion and Perspectives In this work in progress; we have described the performance of the NooJ Text Annotation Structure to highlight the recognition of geoparsing structures as historical place names in Amazigh corpus, this approach shows an ability to model structures by: • Developing a system allowing the TAS tool and obtaining high accurate precision were developed mainly in NooJ platform. • The modeling of various tests constraints mainly semantic relations as produced by different types of Toponyms. • The potentiality of NooJ as an annotator system versatile and easy to use and to optimize. • The importance of segmentation for each Toponym forms and the useful process of TAS by including another morpho-syntactical rules and electronic dictionaries for geoparsing units to model the corpus for evaluation. • The ability of local grammar and finite state transducers of detected and located annotations in the text structure using the NooJ platform. • Development of new tree banks and new system for syntactic and pragmatic parsing of Amazigh geoparsing.
References Hammadou, A.B., Piton, O., Fehri, H.: Multilingual Extraction of functional relations between Arabic Named Entities Using NooJ Platform, HAL Id: 00547940 (2010). https://hal.archivesouvertes.gr/hal-00547940 Diab, M.: Second generation tools (AMIRA 2.0): fast and robust tokenization, POS tagging, and base phrase chunking. In: MEDAR 2nd International Conference on Arabic Language Resources and Tools, Cairo, Egypt (April 2009) Nejm, F.Z., Boulaknadel, S.: Morphological analysis of the standard Amazigh using NooJ. In: Proceeding of TALN 2013, vol. 1 (2013) Gritta, M., Pilehvar, M.T., Collier, N.: A pragmatic guide to geoparsing evaluation toponyms, named entity recognition and pragmatics. Lang. Resour. Eval. 54, 683–712 (2020) Laparra, E., Bethard, S.: A dataset and evaluation framework for complex geographical description parsing. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 936–948 (2020) Hammouda, N.G., Haddar, K.: Integration of a segmentation tool for Arabic Corpora in NooJ platform to build an automatic annotation tool. In: Barone, L., Monteleone, M., Silberztein, M. (eds.) Automatic Processing of Natural-Language Electronic Texts with NooJ. NooJ 2016. Communications in Computer and Information Science, vol. 667. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55002-2_8 Mesfar, S.: Morphological grammar for standard Arabic Tokenization. In: Varadi, T., Kuti, J., Silberztein, M. (eds.) Application of Finite State Language Processing, Selected Papers from the 2008 International NooJ Conference, pp. 108–120. Cambridge Scholars Publishing, Newcastle upon tyne (2010) Silberztein, M.: La formalisation des langues: l’approche de NooJ, Coollection science cognitives et management des connaissances, ISTE édition 2015, London (2015)
Geoparsing Recognition and Extraction from Amazigh Corpus
559
NooJ Manual (2003). www.nooj4nlp.net Rhazi, A., Boulaalam, A.: Corpus-based extraction and translation of Arabic Multi-Words Expressions (MWEs). In: Mbarki, S., Mourchid, M., Silberztein, M. (eds.) NooJ 2017. CCIS, vol. 811, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73420-0_12 Rupp, C.J., et al.: Customising geoparsing and georeferencing for historical texts. In: Conference Paper (October 2013). https://doi.org/10.1109/BigData.2013.6691671
Agent-Based Merchandise Management and Real-Time Decision Support Systems Mostafa Tagmouti(B) and Aziz Mabrouk RE Information Systems Engineering, Tetouan, Morocco {motagmouti,amabrouk}@uae.ac.ma
Abstract. The objective of our participation in the Advanced Intelligent Systems for Sustainable Development Conference is to present our approach to manage merchandise transport in urban area, because Merchandise transport is one of the key factors of economic and social development, which requires intelligent management based on new information and communication technologies. We conducted this research within the framework of a research subject which concerns “Agent-based merchandise management and real-time decision support systems”. This research subject is based on the convergence of two axes: - Exploiting the potential of distributed artificial intelligence with intelligent multi-agent systems. - The aggregation of supplier and customer databases from an urban area into a decision support system. In this context that this article fits, since it proposes an approach making it possible to model the phenomenon of transport of goods based on agents, which will allow the interrogation of supplier information systems and feed the distributed database of the decision support system. This system through the “Anytime” algorithm will generate the appropriate times to transport the goods, the shortest route to reach the customer and the recommended delivery time to avoid peak traffic hours. Keywords: Urban mobility · Freight transport · Geographic location · Decision support systems · Anytime algorithm · Multi-agent systems
1 Introduction There is currently an increase in road traffic in the world due to the increase in the number of people, diversity of categories of road users (public, private transport, etc.) which generates many mobility problems such as congestion, pollution and road accidents. Among the categories of transport that have a negative impact on the fluidity of traffic and urban mobility in general, we cite “Goods transport”. Freight transport is one of the important links in economic development and intelligent management of freight transport is one of the manifestations of smart cities. It is in this context that this article fits, since the intelligent management of the transport of goods requires the presence of relevant and useful information in real time for suppliers and customers, this information can only be exploited through a distributed database fed by the information systems of the various suppliers of a specific region (departure time, place of departure, time of arrival, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 560–566, 2023. https://doi.org/10.1007/978-3-031-26384-2_48
Agent-Based Merchandise Management
561
place of arrival, delivery time, etc.) as well as a system decision support that analyzes all this data, plans deliveries of goods at the right time, without disturbing traffic or causing congestion and this using the potential of artificial intelligence with intelligent multi-agent systems that are able to provide this information for the decision support system in real time. The decision support system through the “anytime” algorithms will generate the appropriate schedules for the transport of goods - The shortest way to access it to reach the customer - The recommended delivery time, to avoid peak hours in traffic. Indeed, this paper will include the following sections: an introduction on the subject with the problem and the methodology, the following section on the state of the art then the approach adopted finally a conclusion. Problematic: The transport of goods is a persistent necessity, especially in the city. So the question is how to meet this need without blocking traffic or causing congestion and road accidents. Many approaches have been adopted to address this problem, such as dividing urban areas into sub-areas according to their specificities (road traffic, number of customers, etc.) and scheduling deliveries according to these specificities. But the idea that seems irreplaceable to us is the importance of having relevant, useful and real-time information for suppliers and customers, this information can only be exploited through a distributed database. Fed by the information systems of the various suppliers of a specific region (departure time, place of departure, time of arrival, place of arrival, provisional duration of delivery, etc.) as well as a system of decision support that will analyze all this data and plan deliveries of goods at the right time, taking into account traffic fluctuations in the area concerned.
2 Methodology To manage the transport of goods in an urban area, without blocking road traffic or causing congestion and accidents. We will propose a system that will generate the appropriate delivery schedules for customers depending on traffic, nature and duration of delivery, based on information provided by suppliers and customers. State of the art: The treatment of freight management topic in the literature has evolved over time. Initially the authors focused on organizational and structural solutions, for example (Lopez 2017) in his thesis proposed two organizational solutions: – Provide delivery areas dedicated to the loading or unloading of the delivery so as not to disturb traffic in double-lane parking. – Schedule delivery periods in off-peak hours collected in the information on standard days (model days, to prevent traffic conditions). (Mañuzuri et al. 2005) identified 5 solutions to improve the freight management process: 1- Public infrastructures, such as Unified Distribution Centres (CDU), allow the pooling of logistics platforms and thus limit the number of loads/unloadings.
562
M. Tagmouti and A. Mabrouk
2- Management of road use, such as multi-use. 3- Access conditions such as taxes, overnight deliveries, reservation systems for delivery areas (Fig. 1). 4- Traffic management as the information available on real-time traffic status. 5- Incentives and enforcement such as low greenhouse gas emission vehicles.
Fig. 1. Goods delivery operation
After that, the majority of freight management solutions focused on the impact of delivery on traffic flow. For example: The LWR model (Lightill and Whitham 1955; Richards 1956) made it possible to reproduce the propagation of congestion due to such a constraint. (Kaldeftiras et al. 2013) focused on the impact of double-lane parking of freight vehicles, which has seen a 10–15% increase in downtime and air pollution. In the event that double parking lots are strictly absent (through penalty or prevention strategies), the average speed can increase by 44% and the time taken to stop can decrease by 33% and 47% respectively. Also, the search time of parking in light vehicles and the walking time increase. But at present and especially during the last decade, research on this subject has focused on the use of new information and communication technologies. For example (Taniguchi et al. 2019) have proposed a range of technologies that could improve the modelling of city logistics such as intelligent transport systems, the Internet of Things, big data and artificial intelligence. New technologies such as IoT, AI or big data are essential to establish the logistics of smart cities (Taniguchi et al. 2014) as they allow the development of a freight management platform capable of collecting, store and analyze data. Intelligent multi-agent systems promote collaboration and coordination between the different parties interacting in the cargo transport operation, in (Tanigushi et al. 2014) 5 parties have been mentioned in the modelling of a freight transport network: stakeholders, shippers, cargo, administrators and residents and this thanks to a multi-level simulationan agent who will be able to predict the impact of logistics by establishing appropriate structural, organizational and managerial measures which is one of the recommendations of the smart cities of tomorrow.
Agent-Based Merchandise Management
563
3 Proposed Approach The proposed approach prioritizes the real-time aspect in the processing of requests and also the relevance of the information provided. Since it is based on the information systems of the suppliers, from these systems we can extract the schedules of their interventions, the quantity/type of goods transported and thus deduct the provisional duration of the operation at the customers and also the geographical position for the customers delivered. Thus, given the complexity of the data involved in this system, modelling based on the multi-agent paradigm is essential in our case (Fig. 2) which shows the different interactions between the agents of the system.
Fig. 2. Interactions between system agents
564
M. Tagmouti and A. Mabrouk
Supplier Agent: The objective of suppliers is to transmit the goods to the customers concerned at the agreed time. This agent must have a certain autonomy, since he must be able to identify the quantities available and allow the time of stock shortage to disclose the next delivery based on a decision support system that takes into account the state of circulation in the response to this request. Customer agent: The customer agent must be able to provide the appropriate slots to recover the goods, according to the specificities of the company and also according to the state of the road traffic and its geographical location. This information shall be made available to officials in the decision-making system. Departure-Destination Agent: is the intermediary agent between the Supplier Agent and the Customer Agent, coordinating between the Customers and Suppliers. The collected information is grouped in a decision support system that allows to provide the adequate time of the operation so as not to hinder the traffic and since the real-time aspect is essential in this kind of system, we propose the «anytime» algorithm. The “Anytime” algorithm: The «anytime» algorithm is an algorithm (Fig. 3) that exchanges execution time for a higher quality of result. Indeed, in an algorithm «anytime» the longer the time left to perform a task, the better the quality of the output result will be. ANYTIME-TSP(V,iter) Tour B > G, then map the pixel with the character ‘S’; If G > R > B, then map the pixel with the character ‘G’; If G >= B >= R, then map the pixel with the character ‘H’; If B >= R >= G, then map the pixel with the character ‘B’; If B >= G >= R, then map the pixel with the character ‘C’; Else, then map the pixel with the character ‘N’;
This modification is required to make the resulting strings of the same size. To generate the color string for each image, we should resize the images uniformly. The method of bi-cubic interpolation described in [Gonzales92] is used.
5
The Proposed Method
In this work, we propose a color-based descriptor using the Octree color quantization method to encode pixels into color strings. The steps of the method are: 1. 2. 3. 4.
Get the image; Resize the image to 32 × 32 (bi-cubic interpolation) for fast calculation; Insert pixels into Octree tree with K = 512 and maximum depth 8; Encode the image using color strings;
We use the rules discussed in Sect. 4 to perform CSC. Following this coding step, each image is converted into a string, which is then used as the string feature. For instance, a 32 × 32 image will be encoded into a string of size 32 × 32 = 1024 characters (Fig. 7).
750
H. Zekkouri et al.
Fig. 7. The steps followed to extract the OCQD feature.
5.1
Color Strings Comparison
Once an image is represented by a color string. In order to express the similarity of two string features, a similarity measure is needed. We compare the strings s1 with s2 and return the matching weight W then we compute the similarity Sim: N −1 s1 (i) − s2 (i) (10) W (s1 , s2 ) = i=0
W (s1 , s2 ) (11) N where N is the length of the strings which are equal-size, N = 32 × 32 = 1024. The more similar images are the higher matching weight W and the lower the similarity S. Sim(s1 , s2 ) = 1 −
6
Experimental Results
The results and discussion of the proposed content-based image retrieval color descriptor is given in this section. The proposed content-based image retrieval system has been implemented using Python (Python 3.8.6) and the performance of the proposed system is analysed using the precision and recall evaluation metrics. For the dataset, we have used the Wang/Corel dataset [Wang01] (Fig. 8). 6.1
Image Dataset
In this work, we use Wang dataset for our CBIR system’s image retrieval. It contains 1000 images of the Corel stock image database which have been manually selected and which form 10 classes of 100 images each. 6.2
Performance Evaluation
In order to evaluate the performance of our descriptor, we use the precision measure. The values are calculated using the following equation. RRI (12) T RI Where RRI is the retrieved relevant images, and T RI the total retrieved images in the dataset. P recision =
Content-Based Image Retrieval Using Octree Quantization Algorithm
751
Fig. 8. Sample of each class of Wang dataset.
6.3
Image Retrieval
In order to validate the performance of the proposed method, we perform tests on the Wang dataset adopting a strategy of getting the precision of top L (10, 15, and 20) results for every search query of 100 similar images for each image class in RGB space, similar strategy have been used in [Meskaldji09]. We do compare our proposed method with color histogram (CH) [Meskaldji09] to highlight the performance (Table 1). Table 1. The average precision of top 25 queries results for each class with L = 10, 15 and 20. Class
L = 10 L = 15 L = 20 CH OCQD CH OCQD CH OCQD
Africa Building Beach Bus Dinosaurs Food Elephants Flowers Horses Mountains
0.971 0.648 0.412 0.78 1.0 0.896 0.52 0.667 1.0 0.488
Average
0.738 0.751
0.356 0.576 0.644 0.752 1.0 0.576 0.768 1.0 1.0 0.844
0.946 0.610 0.368 0.738 0.986 0.858 0.48 0.597 0.962 0.442
0.288 0.466 0.578 0.693 1.0 0.6 0.704 1.0 1.0 0.752
0.699 0.708
0.935 .576 0.338 0.708 0.956 0.818 0.431 0.528 0.941 0.420
0.27 0.426 0.54 0.622 1.0 0.58 0.664 1.0 1.0 0.694
0.665 0.679
752
H. Zekkouri et al.
Fig. 9. Results of a query of the class horse, with the search query in the upper left corner.
Fig. 10. Results of a query of the class flower, with the search query in the upper left corner.
As we can notice it, the precision is of the proposed method using OCQ and CSC is better than color histogram with different number of desired results L = 10, 15 and 20 . These optimistic results again validate the good precision of the OCQD descriptor (Figs. 9, 10 and 11).
Content-Based Image Retrieval Using Octree Quantization Algorithm
753
Fig. 11. Precision curve with total retrieved images for Wang database
6.4
Comments and Discussion
Color is the most commonly used visual feature in CBIR. The features used in image representation are a significant factor. In CBIR systems, retrieval effectiveness and feature vector dimension are also critical. As we see good retrieval performance got improved with CSC we coupled to OCQ algorithm. Because the proposed method is color-based, CQ contributes to image retrieval performance, but CSC is the core of the approach that allows OCQ to treat all images uniformly. The number of colors we fix for OCQ algorithm can directly influence the retrieval results. We should also note the limitation in computation speed, which must be improved since Octree currently forces us to read the image twice.
7
Conclusion
To summarize this research, we think that applying a color quantization method such as OCQ for extracting the color features in general is able to characterize the image’s content and has a better retrieval precision compared to color histogram technique. Our future activity will concentrate on improving the precision of similarity search, memory storage and execution time. Also, we will explore other metaheuristics in CBIR to construct more descriptors.
References [Broek05] van den Broek, E.L.: Human-centered content-based image retrieval. Neurosci. Res. Commun., 19 (2005)
754
H. Zekkouri et al.
[Brucker92] Brucker, P.: On the complexity of clustering problems. In: ACM Transactions on Graphics Optimization and Operations, vol. 11, no. 4, October 1992 and Operations, ACM Transactions on Graphics, Vol. 11, No. 4, October 1992 [Chawki15a] Youness, C., Elasnaoui, K., Ouanan, M., Aksasse, B.: 2-D ESPRIT method and Zernike moments based on CBIR. In: Recent Advances on Systems, Signals, Control, Communications and Computers, pp. 308– 313 (2015) [Chawki15b] Youness, C., Elasnaoui, K., Ouanan, M., Aksasse, B.: New method of content based image retrieval based on 2-D ESPRIT method and the Gabor filters. TELKOMNIKA Indones. J. Electr. Eng. 15, 313–320 (2015). https://doi.org/10.11591/telkomnika.v15i2.8377 [Chawki18] Chawki, Y., El Asnaoui, K., Ouanan, M., Aksasse, B.: Content-based image retrieval using Gabor filters and 2-D ESPRIT method. In: Ezziyyani, M., Bahaj, M., Khoukhi, F. (eds.) AIT2S 2017. LNNS, vol. 25, pp. 95–102. Springer, Cham (2018). https://doi.org/10.1007/978-3319-69137-4 10 [Celebi11] Celebi, M.E.: Improving the performance of k-means for color quantization. Image Vis. Comput. 29(1), 260–271 (2011). https://doi.org/10. 1016/j.imavis.2010.10.002 [Direkoglu11] Direko˘ glu, C., Nixon, M.S.: Shape classification via image-based multiscale description. Pattern Recognit. 44, 2134–2146 (2011) [Dubey21] Dubey, S.R.: A decade survey of content based image retrieval using deep learning. IEEE Trans. Circuits Syst. Video Technol. (2021). ISSN 1558-2205. https://doi.org/10.1109/tcsvt.2021.3080920 [Elasnaoui15] Elasnaoui, K., Youness, C., Aksasse, B., Ouanan, M.: A new color descriptor for content-based image retrieval: application to COIL-100 13, 472–479 (2015) [Elasnaoui16a] Elasnaoui, K., Youness, C., Aksasse, B., Ouanan, M.: Efficient use of texture and color features in content based image retrieval (CBIR). Int. J. Appl. Math. Stat. 54, 54–65 (2016) [Elasnaoui16b] Elasnaoui, K., Youness, C., Aksasse, B., Ouanan, M.: A content based image retrieval approach based on color and shape 29, 37–49 (2016) [Flusser00] Flusser, J.: On the independence of rotation moment invariants. Pattern Recognit. 33, 1405–1410 (2000). https://doi.org/10.1016/S00313203(99)00127-2 [Garry82] Garry, M.R., Johnson, D.S., Witsenhausen, H.S.: The complexity of the generalized Lloyd-Max problem. IEEE Trans. Inf. Theory IT 28, 255–256 (1982) [German18] Valenzuela, G., et al.: Color quantization using coreset sampling. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2096–2101 (2018) [Gervautz88] Gervautz, M., Purgathofer, W.: A simple method for color quantization: octree quantization. In: Magnenat-Thalmann, N., Thalmann, D. (eds.) New Trends in Computer Graphics, pp. 219–231. Springer, Heidelberg (1988). https://doi.org/10.1007/978-3-642-83492-9 20 [Girgis14] Girgis, M., Reda, M.S.: A study of the effect of color quantization schemes for different color spaces on content-based image retrieval. Int. J. Comput. Appl. 96, 1–8 (2014). https://doi.org/10.5120/16843-6699 [Gonzales02] Gonzales, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice-Hall Inc., New Jersey (2002)
Content-Based Image Retrieval Using Octree Quantization Algorithm
755
[Gonzales92] Gonzalez, R.C., Woods, R.E.: Digital Image Processing. AddisonWesley Publishing Company (1992) [Haralick73] Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973) [Hu62] Hu, M.: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 8, 179–187 (1962) [Hua18] Hua, J.-Z., Liu, G.-H., Song, S.-X.: Content-based image retrieval using color volume histograms. Int. J. Pattern Recognit. Artif. Intell. 33 (2018). https://doi.org/10.1142/S021800141940010X [Huang97] Huang, J., Kumar, S.R., Mitra, M., Zhu, W.-J., Zabih, R.: Image indexing using color correlograms. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 762–768 (1997). https://doi.org/10.1109/CVPR.1997.609412 [Jenni15] Jenni, K., Mandala, S., Sunar, M.S.: Content based image retrieval using colour strings comparison. Procedia Comput. Sci. 50, 374–379 (2015) [Lin09] Lin, C., Su, C.-H.: Using color strings comparison for video frames retrieval. In: International Conference on Information and Multimedia Technology, pp. 211–215. IEEE (2009) [Liu18] Liu, L., Chen, J., Fieguth, P., Zhao, G., Chellappa, R., Pietik¨ainen, M.: From BoW to CNN: two decades of texture representation for texture classification. Int. J. Comput. Vis. 127(1), 74–109 (2018). https://doi. org/10.1007/s11263-018-1125-z [Ma11] Ma, Z., Zhang, G., Yan, L.: Shape feature descriptor using modified Zernike moments. Pattern Anal. Appl. 14, 9–22 (2011). https://doi. org/10.1007/s10044-009-0171-0 [Machhour20] Machhour, N., M’barek, N.: Content based image retrieval based on color string coding and genetic algorithm, 1–5 (2020). https://doi.org/ 10.1109/IRASET48871.2020.9091984 [Megiddo84] Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM J. Comput. 13, 182–196 (1984) [Meskaldji09] Meskaldji, K., Boucherkha, S., Chikhi, S.: Color quantization and its impact on color histogram based image retrieval (2009). https://doi. org/10.1109/NDT.2009.5272135 [Ojala02] Ojala, T., Pietik¨ ainen, M., M¨ aenp¨ aa ¨, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002) [Ouhda18] Ouhda, M., Elasnaoui, K., Ouanan, M., Aksasse, B.: A content based image retrieval method based on k-means clustering technique. J. Electron. Commer. Organ. 16, 82–96 (2018). https://doi.org/10.4018/ JECO.2018010107 [Ouhda19] Ouhda, M., Elasnaoui, K., Ouanan, M., Aksasse, B.: Content-based image retrieval using convolutional neural networks (2019). https:// doi.org/10.1007/978-3-319-91337-7 41 [Wang01] Wang, J.Z., Li, J., Wiederhold, G.: Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 947–963. http://wang.ist.psu.edu/docs/related/ [Wu92] Wu, X.: Color quantization by dynamic programming and principal analysis, University of Western Ontario (1992)
Release Planning Process Model in Agile Global Software Development Hajar Lamsellak(B) , Amal Khalil , Mohammed Ghaouth Belkasmi , Oussama Lamsellak , and Mohammed Saber SmartICT Lab, Universit´e Mohammed Premier/Mohammed First University Oujda, ENSA Oujda, 60000 Oujda, Morocco [email protected]
Abstract. Global Software Development (GSD) refers to the practice of geographically distributing software development teams with intended goals, including reducing labor costs, increasing the software development capacity, and productivity over 24 h. The Software Release Planning process addresses decisions associated with selecting and assigning features to create a sequence of consecutive product releases that fulfill significant technical, resource, budget, and risk constraints. In the literature, numerous are release planning processes proposed for distributed settings; however, release planning processes suited to the Global Software Development context and support the agile way of working and planning remains rare. In this paper, we present an Agile Release Planning Process Model for GSD projects to assist the agile release planning process adopted in the distributed environments. The proposed model contains two main activities: High-Level Design and architecture, and Feature Development Process, two repositories: Design and Knowledge repositories, and some supporting practices.
Keywords: Global Software Development Release Planning
1
· Agile · Project Planning ·
Introduction
Global projects enable the unification of highly expert team members dealing with the same project without necessitating relocation teams to other countries, leading to minimizing their costs [4]. Global Software Development (GSD) is a way of producing software, where stakeholders from remote locations and possibly different backgrounds are involved in the software development life cycle [3]. In large organizations, there are multiple levels of planning which are conducted on different time horizons and by different actors [7]. In Agile planning, we adopt a three-level planning model comprising pre-planning, release planning, and iteration planning phases [8].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 756–762, 2023. https://doi.org/10.1007/978-3-031-26384-2_66
Release Planning Process Model in Agile Global Software Development
757
Our previous work [6] highlighted the first level of Agile planning focusing on the pre-planning phase by describing its application within an Agile Global Software Development projects. Whereas in this paper, we are focusing on the release planning phase in this context. Software Release Planning is the problem of finding the most suitable combination of features to implement in a sequence of releases [10]. It is a crucial step in software development, which happens to be extremely complex given the necessity to reconcile multiple decision-making criteria, especially business value, effort, and cost [2]. Successful release planning is a significant success factor in agile software development projects and also a challenging aspect of agile adoption in market-driven product development [7]. In the literature, the proposed release planning processes suited to the Global Software Development context and support the Agile way of working and planning remains rare. This work describes the application of the release planning process within the Agile global software development context. We present in detail the different activities included further as the repositories used. Within the remaining sections of this paper: – – – –
2
We discuss the background and related work in Sect. 2. We present the release planning phase for Agile GSD projects in Sect. 3. In Sect. 4, we discuss the usefulness of the proposed release planning model. Finally, we conclude the paper with a number of our future works.
Background and Related Work
Incremental software development provides customers with parts of a system early, so they receive both a sense of value and the opportunity to provide feedback early in the process [10]. Therefore, achieving higher flexibility and better satisfaction of the customer [5]. In adopting this process, requirements are delivered in releases and so the decision to make on which requirements should be delivered in which release [5]. Software Release Planning (SPR) addresses decisions associated with selecting and assigning features to create a sequence of consecutive product releases that satisfies important technical, resource, budget, and risk constraints [10]. SRP is the problem of choosing which features or requirements will be included in the next release or releases. It consists of finding the most effective combination of features to implement during a sequence of releases [2]. It seeks to maximize business value and stakeholder satisfaction. It is an important step in software development, which happens to be extremely complex given the need to reconcile multiple decision-making criteria such as business value, effort, and cost while considering several constraints including feature precedence and resource availability [2]. During Software Release Planning, all functionalities contained within the product backlog are broken into a variety of iterations, then iterations are assigned to releases. Each release delivers a working product increment [8]. The dates for release milestones, moreover as for the last product release, will be
758
H. Lamsellak et al.
identified. The general project cost is often determined from the labor cost of the Agile team and also the number of identified iterations [8]. In Global Software Development, Software Release Planning starts with a high-level design and architecture; an expectation of the time of completion; the decision of the number of releases and the objectives of each release, and early identification of the possible dependencies to facilitate effective subsystems and task allocation, so that the distributed team members can work independently but collaboratively [9]. Numerous are the GSD release planning processes proposed within the literature, and this work [2] reviewed most of them. However, release planning in the Global Software Development context that supports the agile way of working remains rare. This paper presents a process model for release planning in GSD projects based on high-level design and architecture, feature development, release management, and design and knowledge repositories, in addition to several supporting activities.
3
Release Planning Phase in GSD
Software Release Planning is the problem of finding the most effective combination of features to implement in a sequence of releases [10]. It aims to maximize both business value and stakeholders’ satisfaction without ignoring the constraints enforced by the availability of suitable resources and the presence of dependencies between features, among other constraints [10]. Several factors make Software Release Planning a computationally complicated issue: the number of features and their interdependencies, the number of stakeholders implicated, their different levels of priority, and their conflicting interests, the variety of variables to be considered especially business value, effort, and cost and also the uncertainty and incompleteness of the available information [1]. Every release delivers a working product increment [8]. The output of release planning is a release plan, based on it the dates for the release milestones, as well as for the final product release, can be identified [8]. During release planning, within collocated agile teams, the project manager, product owner, project team, and other stakeholders break the functionality in the product backlog into several iterations, ensuring that each iteration can be completed within the duration of an iteration. They then assign iterations to releases [8]. As for Agile Global Software Development comprising distributed teams, the Release planning Process should be adapted to consider the different distances imposed by this type of development: geographical, temporal, socio-cultural, and organizational distances which do not allow them to apply agile practices. In this paper, we designed the release planning framework as shown in Fig. 1. It has two components: Activities and Repositories. In the following subsections, we describe the application of this phase within the GSD context. Also, we present the different included activities as well as all the used repositories.
Release Planning Process Model in Agile Global Software Development
759
Fig. 1. Release Planning in GSD framework
3.1
Release Planning Activities in GSD
• High-Level Design and Architecture The design of the software architecture and associated task dependencies play a significant role in reducing several GSD challenges, especially communication and coordination [11]. These problems are highlighted when there is a necessity to transfer knowledge between sites, particularly when software artifacts are assigned to different sites depending on each other [11]. In GSD, the most challenges in software architecture design derive from the distributed nature of this development type, particularly the organization and teams [11]. These challenges include considering organization structure within the design, finding the problems that affect how distributed teams can best work, managing awareness in distributed teams, including awareness of architecting practices and guidelines, and defining clear ways of working, including quality and management practices is especially important to make sure that responsibilities regarding these issues are clear and are handled with due diligence [11]. The found recommended practices include the need for a well-thought-out work distribution that mirrors the structure of the product and the structure of the organization. Work should be broken into manageable pieces. Moreover, having clearly defined design practices and interfaces will enhance the loose coupling and support the aims of work distribution. The need for communicating the architecture across different sites should be recognized, and different views used as needed. The distributed nature of the organization should be considered when assigning architects and organizing the design work [11].
760
H. Lamsellak et al.
In the case of first-time collaboration, the full project team may be collocated for discussing architecture. Short demo cum training sessions will be organized to brief team members about the process, tools, techniques, and terminologies to be utilized in the project to avoid hesitation, confusion, and misunderstanding [9]. This collocation enables architectural knowledge dissemination and an understanding of the rationale of the architecture. If the collocation of the whole team isn’t possible because of budget constraints, all Teams Product Owners (TPOs) can travel to meet and the rest of the team is collaborating virtually via tools [9]. After this collocated design phase, the architecture team should participate in daily scrum meetings whenever needed to resolve confusion or convey architectural modifications to distributed members. The architecture team should also resolve the fulfillment of architecture rules. Design and coding standards are stored in a globally accessible Design repository and they should be understood and followed by all team members throughout the project [9]. • Feature Development Process The feature development process is a continuous, iterative, and incremental process. During it, the feature decisions are made. The feature idea is refined and evaluated, a technical specialist is responsible for writing a brief single presentation slide to present an initial idea of what the feature is and how much it would cost to implement. When this is ready for each feature, a meeting is organized between team members to decide on features to take into further refinement. Then those selected features are added to the product backlog [1]. The chief product owner prioritizes each feature against the other features in the product backlog. If he decides that the priority is low and there are no available resources, the feature has to wait until resources become available [1]. The chief product owner selects a development team or teams and a product owner. A virtual team consisting of members from the team and the product owner. They are supported by a technical specialist, and the team begins by depicting information required to decide to develop the feature or not [1]. To keep the single presentation slide accessible by all team members and also by the newcomers to the project they should be maintained in the knowledge repository. Additionally, the rationale behind choosing feasible features to add to the release backlog should be clear, expressed, and kept in the knowledge repository. Release Management Process. The release project management process is completed once for each version of the software. Two simultaneous releases are usually under development at the same time. One release is more focused on new functionality and the other is more focused on maintenance, but both contain new functionality and updates [1]. The planning of a new version release officially begins with a meeting in which the key inputs are financial information and product roadmap [1].
Release Planning Process Model in Agile Global Software Development
761
Based on these inputs, and the information of the previous and ongoing release projects, a tentative release budget and release scope are decided for the release. Then, a decision of what features should be included in the next release. This decision is based on which features have passed. Features that have not yet passed can also be included in the development progress, information implies that the feature will be in time for the release [1]. Supporting Practices Team Collocation. To support team members to learn the different cultures involved, processes, and technologies, collocation of the entire project team is possible for discussing architecture. Also, organizing short demo training sessions to brief team members about the method, tools, techniques, and terminologies to be employed in the project to avoid hesitation, confusion, and misunderstanding [9]. A record of these meetings will be done and saved within the knowledge repository to be able to refer afterward. In the case of the collocation of the whole project team isn’t possible due to budget constraints, all Team Product Owners can be collocated by traveling to the collaboration site with the rest of the team members collaborating virtually via tools [9] Informal team Building. Organizing informal team-building activities during this collocation can increase term acquaintance and cultural awareness, thus helping communication coordination and collaboration [9]. 3.2
Release Planning Repositories
During the Release Planning phase, the subsequent two repositories are used: Design Repository. A design repository can contain the design and architecture of the product. The architecture team can store concise and prioritized, design decisions and their rationale in this repository [9]. Knowledge Repository. Release planning artifacts should be stored in this repository to be accessible, also decisions related to feature selection and feature priority should be stored in this repository too [9].
4
Discussion
In this paper, we described a release planning process model for Agile Global Software Development. Based on the application of agile practices on distributed projects, the model is composed of high-level design and architecture, feature development process, and release project management with interactions with repositories in addition to some supporting activities. The included practices almost force distributed team members to communicate and lead to successful distributed collaboration. We hope that the process model in this paper is useful for project managers to assist them during the release planning phase in an agile distributed setting.
762
5
H. Lamsellak et al.
Conclusion
Software release planning (SRP) consists of selecting which features or requirements will be included in the next release or releases. It is a crucial step in software development, which happens to be extremely complex given the need to reconcile multiple decision-making criteria including business value, effort, and cost, while considering several constraints comprising feature precedencies, and resource availability. Numerous are the GSD release planning processes proposed within the literature. However, release planning processes in the Global Software Development context including the agile way of working are rare. The objective of this paper is to present an agile process model for release planning in GSD projects composed of a high-level design and architecture, a feature development process, and a release management process, in addition to some supporting activities.
References 1. Ruhe, G.: Product Release Planning: Methods. Tools and Applications. CRC Press, Boca Raton (2010) 2. Franch, X., Rufian, G., Ameller, D., Farr´e, C.: A survey on software release planning models. In: Product Focused Software Process Improvement (2016) 3. Chadli, S.Y., et al.: Identifying risks of software project management in global software development: an integrative framework. In: 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA) (2016) 4. Belkasmi, M.G., et al.: Global it project management: an agile planning assistance. In: Advances in Smart Technologies Applications and Case Studies (2020) 5. Greer, D., Ruhe, G: Software release planning: an evolutionary and iterative approach. Inf. Softw. Technol. (2004) 6. Lamsellak, H., Metthahri, H., Belkasmi, M.G., Saber, M.: Pre-planning process model in agile global software development. In: Ben Ahmed, M., Boudhir, A.A., Karas, I.R., Jain, V., Mellouli, S. (eds.) SCA 2021. LNNS, vol. 393, pp. 393–400. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94191-8 31 7. Heikkil¨ a, V.T., Paasivaara, M., Lassenius, C., Engblom, C.: Continuous release planning in a large-scale scrum development organization at Ericsson. In: Baumeister, H., Weber, B. (eds.) XP 2013. LNBI, vol. 149, pp. 195–209. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38314-4 14 8. Udo, N., Koppensteiner, S.: An agile guide to the planning processes. In: R Congress 2009 (2009) PMIGlobal 9. Suman, U., Jain, R.: An adaptive agile process model for global software development. Int. J. Comput. Sci. Eng. (IJCSE) (2017) 10. Ruhe,G., Saliu, M.O.: The art and science of software release planning. IEEE Softw., 47–53 (2005) 11. Sievi-Korte, O., Beecham, S., Richardson, I.: Challenges and recommended practices for software architecting in global software development. Inf. Softw. Technol., 234–253 (2019)
Developing a New Indicator Model to Trade Gold Market Oumaina Nadi(B) , Karim Elbouchti, Oussama Mohamed Reda, Chaimae Ahout, and Soumia Ziti Intelligent Processing and Security of Systems, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco [email protected]
Abstract. Investing in the financial markets demands the ability to change perspectives over time. For the gold market gold has many benefits for investors, its intrinsic value makes it ideal for moving wealth internationally and investing long term, the value of gold goes up and down and changes with the market but is never worth anything. That is why traders must have verifiability, quantifiability, consistency, and objectivity strategy, so they can be successful gold traders. A strategy helps the trader to make a trading decision (buying or selling) based on predefined rules and criteria in the gold markets. It frequently uses technical indicators in an objective to identify trading opportunities. In this article, we propose a detailed model indicator for a strategy that predicts the gold price and arrives at a purchase or sale decision using information given by two main indicators: fractals and Triangular Moving Average Centered Bands. This strategy has provided us with good results, the total number of winning trades is higher than the number of losing trades. In the performed research we used Mql4 languages to develop the indicators and technical data from a candlestick chart using meta4 for XAU/USD. Keywords: Trading gold · Trading strategies · Technical indicators in the finance market
1 Introduction The gold market is one of the markets that offer high liquidity and excellent opportunities to profit in nearly all environments due to its unique position within the world’s economic and political systems. It refers to the buying and selling of gold worldwide, this yellow metal has been used as a trading commodity for millennia and even today its intrinsic value makes it ideal for moving wealth internationally and investing long term. It comprises a broad range of participants that includes physical players (producers, refiners, fabricators, and end-users…), and financial intermediaries (banks…). All these participants either seek to trade physical gold, gain exposure to the gold price, or transfer price risk. In addition, gold trading performs an important function in facilitating price discovery and provides an important function in offering to finance, providing trading liquidity, and offering broader services. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 763–769, 2023. https://doi.org/10.1007/978-3-031-26384-2_67
764
O. Nadi et al.
Meanwhile, being a successful trader in the gold market require having a strong and objective strategy. A wide range of strategies have been developed in the past years, these strategies are divided into many types, but they are based largely on either technical or fundamental analysis. Normally a trading strategy is a plan that employs analysis to identify specific market conditions and price levels, it specifies at which price points you ‘ll enter Strategies frequently use technical indicators in an objective manner to determine entry, exit, and/or trade management rules. A strategy specifies the exact conditions under which traders are established—called setups—as well as when positions are adjusted and closed. In this way, indicators can be used to generate buy and sell signals or indicate trends or patterns in the market. Many indicators and strategies are already developed to help traders to become successful and achieve regular profits in every market. Some strategies are developed based on fundamental analysis, while some are based on technical analysis [1]. Watthana Pongsena, Prakaidoy Ditsayabut, Panida Panichkul, Nittaya Kerdprasop, and KittisakKerdpraso have been proposing in their article [1] a novel Expert Advisor (EA) [1] that performs an automatically trade following a trading strategy, this strategy is the combination between technical analysis including the ancient but powerful Japanese candlestick patterns and the modern technical indicators. Besides what was already defined, our new indicator model is developed based on using technical indicators [2–4], to focus on market strikes and their movements, we are going to combine multiple indicators to predict the gold price.
2 Materials and Methods 2.1 Technical Analysis Technical analysis is the study of past market action to forecast future prices. It is based on the following three premises [5]: • Market action discounts everything: It means any factor that can affect the prices is already reflected in the price. • Price move in trends: The purpose of the technical analysis is to detect a price trend in the early phases of development. • History repeats itself: Technical analysis uses patterns that have shown success in the past and assumes they will work in the future. Unlike fundamental analysts, who attempt to evaluate a security’s intrinsic value based on financial or economic data, technical analysts focus on patterns of price movements, trading signals, and various other analytical charting tools to evaluate a security’s strength or weakness. 2.2 Technical Analysis Tools There are several dozen technical analysis tools, including a range of indicators and chart patterns that must be learned if anyone plans to trade based on technical analysis. In this research, we build our model based on chart patterns using a Japanese Candlestick along with two other technical indicators.
Developing a New Indicator Model to Trade Gold Market
765
Japanese Candlestick Patterns. Chart analysis is one of the main approaches used in technical analysis, it involves the detection of patterns in price charts. Mainly there are 4 main types of charts used by traders: Bar chart, line chart, candlestick chart, and point and figure chart. In this article, we are going to see just the Japanese candlestick chart. A Japanese candle is the most used representation by traders, it can show four price points (open, close, high, and low) as shown in the figure throughout a certain period of time, the trader specifies if he wants to obtain information of a month, a day, a week, 1 h, 5 min (Fig. 1).
Fig. 1. Japanese Candlestick (Source)
In a candlestick chart, there are two parts: the thin line (shadow) represents the price range between high and low whereas, the wider portion (real body) represents the price range between open and close. If the close is higher than the open, the real body is white (green) which shows a price increase. If the close is lower than the open, the real body is black (red) which shows a price decrease. Technical Indicators. The use of technical analysis indicators for trading is widely known and discussed [6], it provides opportunities for creating profitable and successful trading strategies for traders working with technical analysis. Technical indicators consist of mathematical formula(s) which are applied to price time series data to produce another time-series data [5] that help the trader to make a buying or selling decision. In this paper, we have employed two technical indicators in our trading strategy, which are the Triangular Moving Average Centered Bands (TMA) and fractals. Triangular Moving Average Centered Bands. TMA Centered Bands is an indicator that tells us the overall direction of the trend by plotting bands. It draws three bands on the
766
O. Nadi et al.
chart market forming a triangular band: a top band, a lower band, and a middle band. The triangular can be calculated with the use of the following formula: TMA = (SMA1 + SMA2 + SMA3 + SMA4 + ... SMAN )/N
(1)
Figure 2 below shows the generated TMA Centered Bands on a chart.
Fig. 2. TMA Centered Bands on a chart (Source: Metatrader 4 software)
Fractals. Fractal is a basic repeating pattern that isolates potential turning points on a price chart. The indicator draws arrows to indicate the existence of a pattern. An up arrow marks the location of a bearish fractal which means the price could move lower, while a down arrow marks the location of a bullish fractal the price could move higher. Figure 3 below shows the generated fractals on a chart.
Fig. 3. Fractals on XAU/USD chart (Source: Metatrader 4 software)
Developing a New Indicator Model to Trade Gold Market
767
2.3 Trading System Model In this section, we describe the proposed indicator model. Our model indicator takes as inputs the price of the currency pair (XAU/USD) along with the values provided by both of the indicators: TMA Centered Band and fractal. After retrieving these three values, the algorithm provides as an output either a buy signal or a sell signal as shown in the figure below (Fig. 4).
Fig. 4. Indicator model
The buy and sell signals are marked by a red up arrow under a bearish candlestick, while the sell signal is marked by a green down arrow upper a bullish candlestick. The above chart in the figure shows the generated indicator on the hourly values of the XAU/USD (Fig. 5).
Fig. 5. Our indicator model on XAU/USD chart (Source Metatrader 4 software)
2.4 Trading System Strategy The indicator is made to provide trading decisions that help the trade to open either a new buy position or a new sell position according to the following strategy.
768
O. Nadi et al.
On one hand, we can have a buy decision when: • The price is below the lower_band. • The candlestick before the current must achieve the following formula: Open − Close < 0 • The candlestick before the current must respect the following formula: Close < lower_band and Open 0 On one hand, we can have a sell decision when: • The price is above the top_band. • The candlestick before the current must achieve the following formula: Open − Close > 0 • The candlestick before the current must respect the following formula: Close > top_band and Open >= top_band • The down fractals represented by the downwards pointing arrows must be: fractal_up > 0
3 Result and Discussion The experimental results are obtained using the data set of the currency pair (XAU/USD). This data sets cover the period May-Jane and obtained from Metatrader 5 software. The Number of trades is summarized in the following table (Table 1). Table 1. Results.
After the replacement of decisions, we obtained the following profits (Table 2):
Developing a New Indicator Model to Trade Gold Market
769
Table 2. Results.
4 Conclusion and Future Work To conclude, our new indicator model is developed based on using technical indicators to predict the gold price, it’s a combination of two technical indicators: fractals and TMA Centered Bands. The results presented in this article confirm the power of this indicator model. However, we could improve the performance of this model by developing a trading robot based on this new indicator. Basically, this robot will trade automatically, it can analyze the data and trade by following a given trading instruction. In other words, this robot will make us avoid several problems such as emotional problems faced by most traders.
References 1. Pongsena, W., Ditsayabut, P., Panichkul, P., Kerdprasop, N., Kerdprasop, K.: Developing a forex expert advisor based on Japanese candlestick patterns and technical trading strategies. Int. J. Trade Econ. Finance 9(6), 2022 (2018) 2. AbuHamad, M., Mohd, M., Salim, J.: Event-driven business intelligence approach for real-time integration of technical and fundamental analysis in forex market. J. Comput. Sci. 9(4), 488 (2013) 3. Yazdi, S.H.M., Lashkari, Z.H.: Technical analysis of forex by MACD indicator. Int. J. Hum. Manag. Sci. 1(2), 2320–4044 (2022) 4. Krishnan, R., Menon, S.S.: Impact of currency Pairs, time frames and technical indicators on trading profit in forex spot market. Int. J. Bus. Insights Transform. 2(2) (2009) 5. Ozturka, M., Hakki Toroslua, I., Fidan, G.: Heuristic based trading system on Forex data using technical indicator rules. Appl. Soft Comput. J. (2016). https://doi.org/10.1016/j.asoc.2016. 01.048. p. 4 (2022) 6. Vajda, V.: Could a trader using only “old” technical indicator be successful at the Forex market? Emerg. Mark. Queries Finance Bus. (2022). https://doi.org/10.1016/S2212-5671(14)005152,pp.318-319
A New Model Indicator to Trade Foreign Exchange Market Chaimae Ahouat1(B) , Karim El Bouchti1 , Oussama Mohamed Reda2 , Oumaima Nadi2 , and Soumia Ziti2 1 Intelligent Processing and Security of Systems Research Team (IPSS), Rabat, Morocco
[email protected]
2 Algorithms, Networks, Intelligent Systems, and Software Engineering Research Team, Rabat,
Morocco {o.reda,s.ziti}@um5r.ac.ma
Abstract. The Foreign Exchange (FOREX) market is a financial market with a daily volume of around $6.6 trillion. The importance of Forex as the largest market in the world, traders need to have a trading plan to make profits and be a successful trader. A forex trading strategy defines a system that a forex trader uses to determine when to buy or sell a currency pair. There are various forex strategies that traders can use including technical analysis or fundamental analysis. A good forex trading strategy allows for a trader to analyze the market and confidently execute trades with sound risk management techniques. In this paper, we propose a Foreign Exchange strategy for currency markets based on a combination of two technical indicators: Relative Strength Index and Triangular Moving Average Bands as well as the price value. The combination of these latters are ensured by a specific rules when buying and selling positions. The proposed model provides a good number of sell and buy signals depending on the model inputs. We have used MetaTrader 4 to implement our model which it makes easy to automate rules and indicators combinations. In addition, this let traders backtest trading strategies to see how they would have performed in the past. The implemented results showed that the proposed model provides an interesting result. Keywords: Forex · Trading Strategies · Technical Analysts · Technical Indicators
1 Introduction The Foreign Exchange (FX in short) market is the biggest financial market in the world with a daily transaction exceeding $6.6 trillion [1]. In FX, the currencies are exchanger simultaneously between 2 parties. The participants in FX are widespread including banks, corporations, brokers-dealers, individuals etc. EUR/USD is the most traded currency pair in FX market. Many practitioners are closely interested in price forecasting in FX. A forecast represents an expectation about a future value or values of a variable [2]. The expectation is constructed using an information set selected by the forecaster. Based on the information © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 770–777, 2023. https://doi.org/10.1007/978-3-031-26384-2_68
A New Model Indicator to Trade Foreign Exchange Market
771
set used by the forecaster, there are two pure approaches to forecasting foreign exchange rates: the fundamental approach and the technical approach. Fundamental approach [3– 5] deals with the macroeconomic factors to explain and forecast the changes in price. These fundamental economic variables are taken from economic models [6]. Usually included variables are GNP, consumption, trade balance, inflation rates, interest rates, unemployment, productivity indexes, etc. In general, the fundamental forecast is based on structural (equilibrium) models. Technical approach [6–10] aims to forecast the price changes using historical market data [11]. Technical analysis looks for the repetition of specific price patterns [12]. Technical analysis is an art, not a science. Computer models attempt to detect both major trends and critical, or turning points. These turning points are used to generate trading signals: buy or sell signals. On the other hand, an indicator is not a trading strategy. While an indicator can help traders identify market conditions, these allow you to create variations of existing indicators or completely new ones based on whatever set of rules you wish and to display them on the charts or below in the data windows. Indicators cover such things as moving averages, histograms, plotting text above/below bars and changing bar colors based on the coded rules. These are adjusted every time a new tick arrives but depending on the nature of the data being processed may need a certain number of bars to have been posted, or strategy is a trader’s rule book and traders often use multiple indicators to form a trading strategy. These are programs that are designed to automate the trading process by identifying where to enter trades, how many lots to buy/sell, where to place stops/targets and how it will trail any position to maximize potential profit. They generally update every time a new tick arrives. In this research, we aim to develop a novel technical indicator based on the combination of the two technical analyses including the ancient but powerful Japanese candlestick patterns and the modern technical indicators price value, the indicators using are Relative Strength Index (RSI) [12] and Triangular Moving Average Bands. This indicator has provided us with great results using MetaTrader 4 [13], the total number of winning trades is higher than the number of losing trades, and it offered an important winning rate.
2 Materials and Methods In this research, we have developed our indicator on chart using Japanese Candlestick Patterns. 2.1 Trading Strategy Using Japanese Candlestick Patterns Put simply, candlesticks are a way of communicating information about how price is moving. Candlestick isn’t the only way. You also have the line chart, bar chart, and Renko charts. Candlesticks are just one type of method that you can use. It’s very popular because it reveals quite a bit of information on the chart. The objective of the candlestick pattern technician is to identify the underlying psychology within the pattern. This will include the highs, lows and the opens and
772
C. Ahouat et al.
closes, especially relative to the previous candlesticks. You see, when a candlestick attempts new highs and fails to close at those highs you can get some clues as to who’s in charge at the moment, the bulls or the bears. There are single candlestick clues, as well as 2, 3 and even 4 candlestick patterns that will reveal a lot about who’s in charge (Fig. 1).
Fig. 1. Schema of Japanese candlestick
Combine these patterns with some specific indicators and you are well on your way to identifying short term market tops and bottoms and identifying some very specific entry and exit points. 2.2 Technical Indicators Charts always have a story to tell. However, from time to time those charts may be speaking a language you do not understand, and you may need some help from an interpreter. Technical indicators are the interpreters of the Forex market. They look at price information and translate it into simple, easy-to-read signals that can help you determine when to buy and when to sell a currency pair. Technical indicators are based on mathematical equations that produce a value that is then plotted on your chart. In this paper, we have employed two technical indicators in our trading strategy. Which are the Relative Strength Index (RSI) and the Triangular Moving Average Centered Bands (TMA). 2.3 Relative Strength Index The Relative Strength Index (RSI) is an oscillator, which can be used for measuring the speed and change of price movements. The value of RSI can oscillate between 0 and 100. In Forex, for example, the RSI value is above 70 indicates that the currency pairwise is
A New Model Indicator to Trade Foreign Exchange Market
773
overbought (sell or short signal), while the currency pairwise is oversold when the RSI value is below 30 (buy or long signal). The RSI can be calculated with the use of the following formula in instant T: RSI (n) =
RS(n) .100 1 + RS(n)
Where (Fig. 2): RS(n) =
the average increase for n periods the average fall for n periods
Fig. 2. Relative Strength Index indicator (RSI)
2.4 Triangular Moving Average Centered Bands The triangular moving average (TMA) shows the average price of an asset over a specified number of data points—usually several price bars. The purpose of the TMA is to doublesmooth the price data, which will produce a line that doesn’t react as quickly as a simple moving average would. The TMA won’t react quickly in volatile market conditions, meaning that it will take longer for your TMA line to change direction. The triangular moving average formula is considered as an average of an average of the last N candlestick called period (P). Before giving the formula of TMA we should calculate the Simple Moving Average (SMA). SMA = (P1 + P2 + P3 + P4 + ... + PN)/N N: number of candlestick or period chosen to calculate.
774
C. Ahouat et al.
The TMA is provided by taking the average of all SMA as below (Fig. 3): TMA = (SMA1 + SMA2 + SMA3 + SMA4 + ...SMAN)/N
Fig. 3. TMA Centered Bands Indicator
2.5 Indicator Model Description The indicator model we have proposed take inputs: price value, TMA Centered band value and RSI. The indicator’s model retrieves these three values and decides to provide either signals to buy or sell the currency pair according to a strategy that we will explain later (Fig. 4).
Fig. 4. Indicator Model
The buy and sell signals are marked by a red up arrow under a bearish candlestick while the sell signal is marked by a green down arrow upper a bullish candlestick (Fig. 5).
A New Model Indicator to Trade Foreign Exchange Market
775
Fig. 5. Test Indicator
2.6 Description of the Indicator The description of our trading instruction demonstrated as followed: Sending Sell/Short Ticket When: We can have a sell decision when: – The price is above the band up. – The candlestick i before the current must achieve the following formula: Open[i]Close[i] > 0 – The candlestick before the current should respect the following formula: Close[i + 1] > band_up and Open[i] >= band_up – The value of RSI >= 70
Sending Buy/Long Ticket When: We can have a buy decision when: – The price is below the band_dw. – The candlestick before the current must achieve the following formula: Open[i]Close[i] < 0 – The candlestick before the current should respect the following formula: Close[i] < band_dw and Open[i] r exp (1 − r/c) c ≤ r
r: Count of words in a reference. c: Count of words in a candidate. N: Number of n − grams, we usually use uni − gram, bi − gram, 3 − gram, 4 − gram. wn : Weight for each modified precision, by default N is 4,wn is 1/4 = 0.25. pn : Modified precision. C∈{Cand} n gr∈{C} Countclip (n gr) (16) pn = C ∈{Cand} n gr ∈{C } Countclip (n gr ) Where: n gr : Number of grams Cand : Candidates Countclip = min(Count, M ax Ref Count)
(17)
Attentive Neural Seq2Seq for Arabic Question Generation
811
Count: The maximum number of times a candidate n − gram occurs in any single reference translation. M ax Ref Count: The maximum number of n − grams occurrences in any reference count. Count clip: The minimum of the Count and Max Ref Count. It clips the total count of each candidate word by its maximum reference count. – ROUGE (Recall-Oriented Understudy for Gisting) is a set of metrics, rather than just one. We will cover the main that are most likely to be used, ROUGEN and ROUGE-L. For both of this metrics, we calculate the recall, precision, and F1-score [13]. ROUGE-N: Where N represents the n-gram that we are using, its measures the number of matching n − grams between the model-generated text and a reference. Nn−grams−in−M od−Ref (18) Recall = Nn−grams−in−Ref P recision =
Nn−grams−in−M od−Ref Nn−grams−in−M od
F 1 − Score = 2 ∗
P recision ∗ Recall P recision + Recall
(19) (20)
Where: Nn−grams−in−M od−Ref : Number of n − grams found in model and reference. Nn−grams−in−Ref : Number of n-grams in reference. Nn−grams−in−M od : Number of n-grams in model. ROUGE-L: Measures the longest common sub-sequence (LCS) between our model output and reference. All this means is that we count the longest sequence of tokens that is shared between both. Recall =
NLCS−in−M od−Ref Nn−grams−in−Ref
P recision =
NLCS−in−M od−Ref Nn−grams−in−M od
(21) (22)
F1-score: We use the same formula in Eq.(20) Where: NLCS−in−M od−Ref : Number of n-grams in LCS in model and reference. Nn−grams−in−Ref : Number of n-grams in reference. Nn−grams−in−M od ): Number of n-grams in model. 4.3
Results
To our knowledge, there is no study in the literature based on machine learning for Arabic questions generation. Below, we compared our results of the metrics obtained from the Seq2seq model without and with attention. Table 1 shows the comparison results using sequence-to-sequence without attention, the results obtained remain unsatisfied because the Seq2Seq models
812
S. Lafkiar et al.
represent disadvantages at the output sequence which depends on the context defined by the hidden state in the final output of the encoder. Table 2 shows the comparison results using sequence-to-sequence-LSTM model with attention to the three scoring functions. From these results, we can notice that the Seq2Seq BILST M → LST M model with Attention using the Bilinear function gives a better result comparing with the other models and functions. Table 3 shows the comparison results sequence-to-sequence-GRU model with attention to the three score functions. We notice from the results presented in Table 3 that the Seq2Seq BIGRU → GRU model with attention using the multilayer perceptron function gives a better result comparing with the other models and functions. Table 1. sequence-to-sequence model results without attention BLEU ROUGE-1 P R F1
ROUGE-2 ROUGE-3 P R F1 P R F1
2 BiLST M → LST M 7,79
0,40 1,42 0,23 0,0 0,0 0,0 0,40 1,42 0,23
2 BIGRU → GRU
0,40 1,42 0,23 0,0 0,0 0,0 0,40 1,42 0,23
7,39
Table 2. sequence-to-sequence-LSTM model results with attention of the three scoring functions BLEU ROUGE-1 P R
F1
ROUGE-2 P R F1
ROUGE-3 P R
F1
BiLST M → 10.28 LST M (M LP )
17,31 18,78 17,43 5,24 5,36 5,95 17,24 18,66 17,30
BiLST M → 13,05 LST M (Bilinear)
20,94 22,43 21,72 8,09 8,69 8,60 21,20 22,67 21,60
BiLST M LST M (Dot P roduct)
19,15 20,71 20,45 4,79 5,47 5,81 19,07 20,64 20,21
→ 11,12 −
Figure 5 and 6 shows the graph of BLUE and ROUGE metrics results, from these graphs, We have the following findings to note. First, as can be observed, Seq2Seq without attention offers poor performance. The performance of this model is far from the results by other models. This result is expected as Seq2Seq without attention generates the sentences with a context vector which does not contain the real information. However, when considering the attention to the model, we have obtained encouraging results. Figure 7 presents an example of question produced by our best model with the original question and the passage
Attentive Neural Seq2Seq for Arabic Question Generation
813
Table 3. Sequence-to-sequence-GRU model results with attention of the three score functions BLEU ROUGE-1 P
ROUGE-2
R
F1
30,99 32,50 31,00
15,70 16,40
15,92
30,53
32,05
30,43
2 BiGRU → 17,01 GRU (Bilinear)
28,08 31,04 27,60
9,69
9,45
28,14
31,19
27,19
→ −
13,90 27,14 31,46
F1
P
25,44 12,93
R
ROUGE-3
2 BiGRU → 17,45 GRU (M LP )
2 BiGRU GRU (Dot P roduct)
R
10,90 15,45
F1
P
12,02
27,01
31,26
25,17
Fig. 5. Obtained results from model Seq2seq with and without attention according to BLUE metric results
Fig. 6. Obtained results from model Seq2seq with and without attention according to ROUGE metric results
814
S. Lafkiar et al.
used for generation. The green text represents the answer and the red text is the words that exist in both the original and the generated questions.
Fig. 7. Example of question generated by Seq2seq with attention
5
Conclusion and Perspectives
In this paper, we proposed Arabic question generation system based on word embedding, Seq2Seq architecture, and attention mechanism. The attention mechanism gives the neural network the ability to focus on the most important parts from the input passage and answer to generate new question efficiently. We performed various experiments with Seq2Seq models without attention and Seq2Seq models that contain attention mechanism. The results show that models with attention performs better in the task of Arabic question generation. As perspectives, we would like to investigate our work further over this introductory work on question generation for Arabic language. We believe that there is still a lot of room for improvement in the performance of these models, for instance, by using more and balanced data to train the model and applying other architecture and other Word Embedding.
References 1. Hamza, A., En-Nahnahi, N., Ouatik, S.E.A.: Contextual word representation and deep neural networks-based method for Arabic question classification. Adv. Sci. Technol. Eng. Syst. J. 5(5), 478–484 (2020). https://doi.org/10.25046/aj050559
Attentive Neural Seq2Seq for Arabic Question Generation
815
2. Alazani, S.A., Mahender, C.N.: Rule based question generation for Arabic text: question answering system. In: Proceedings of the International Conference on Data Science, Machine Learning and Artificial Intelligence, pp. 7–12 (2021) 3. Alwaneen, T.H., Azmi, A.M., Aboalsamh, H.A., Cambria, E., Hussain, A.: Arabic question answering system: a survey. Artif. Intell. Rev. 55(1), 207–253 (2022). https://doi.org/10.1007/s10462-021-10031-1 4. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https:// transacl.org/ojs/index.php/tacl/article/view/999 6. Bousmaha, K.Z., Chergui, N.H., Mbarek, M.S.A., Belguith, L.H.: AQG: Arabic question generator. Rev. d’Intelligence Artif. 34(6), 721–729 (2020) 7. Chali, Y., Hasan, S.A.: Towards topic-to-question generation. Comput. Linguist. 41(1), 1–20 (2015). https://doi.org/10.1162/COLI a 00206. https://aclanthology. org/J15-1001 8. Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1342–1352. Association for Computational Linguistics, July 2017. https://doi.org/ 10.18653/v1/P17-1123. https://aclanthology.org/P17-1123 9. Hamza, A., Alaoui Ouatik, S.E., Zidani, K.A., En-Nahnahi, N.: Arabic duplicate questions detection based on contextual representation, class label matching, and structured self attention. J. King Saud Univ. Comput. Inf. Sci. (2020). https://doi. org/10.1016/j.jksuci.2020.11.032. https://www.sciencedirect.com/science/article/ pii/S1319157820305735 10. Hamza, A., En-Nahnahi, N., Zidani, K.A., El Alaoui Ouatik, S.: An arabic question classification method based on new taxonomy and continuous distributed representation of words. J. King Saud Univ. Comput. Inf. Sci. 33(2), 218–224 (2021). https://doi.org/10.1016/j.jksuci.2019.01.001. https://www.sciencedirect. com/science/article/pii/S1319157818308401 11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 12. Labutov, I., Basu, S., Vanderwende, L.: Deep questions without deep understanding. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 889–898. Association for Computational Linguistics, July 2015. https://doi.org/10.3115/v1/P15-1086. https://aclanthology.org/P15-1086 13. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, Barcelona, Spain, pp. 74–81. Association for Computational Linguistics, July 2004. https://aclanthology.org/W04-1013 14. Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015) 15. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Bengio, Y., LeCun, Y. (eds.) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013). http://arxiv.org/abs/1301. 3781
816
S. Lafkiar et al.
16. Mozannar, H., Maamary, E., El Hajal, K., Hajj, H.: Neural Arabic question answering. In: Proceedings of the Fourth Arabic Natural Language Processing Workshop, Florence, Italy, pp. 108–118. Association for Computational Linguistics, August 2019. https://doi.org/10.18653/v1/W19-4612. https://aclanthology. org/W19-4612 17. Pan, L., Lei, W., Chua, T., Kan, M.: Recent advances in neural question generation. CoRR abs/1905.08949 (2019). http://arxiv.org/abs/1905.08949 18. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp. 311–318. Association for Computational Linguistics, July 2002. https://doi. org/10.3115/1073083.1073135. https://aclanthology.org/P02-1040 19. Rus, V., Wyse, B., Piwek, P., Lintean, M., Stoyanchev, S., Moldovan, C.: The first question generation shared task evaluation challenge, January 2010 20. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) 21. Yao, X., Bouma, G., Zhang, Y.: Semantics-based question generation and implementation. Dialogue Discourse 3, 11–42 (2012)
Microservice-Specific Language, a Step to the Low-Code Platforms Mehdi Ait Said(B) , Abdellah Ezzati, and Sara Arezki Faculty of Sciences and Technologies, Settat, Morocco {m.aitsaid,a.ezzati,s.arezki}@uhp.ac.ma
Abstract. Web, mobile, and desktop applications are essential in today’s modern information society and in different business domain. Therefore, companies based on a Domain-Driven approach lose a lot of human resources trying to satisfy these demands. This resulted in a significant evolution on two standards: (i) The software architecture that recently embodied in Microservices Architecture and (ii) the development techniques, which have recently been personified in the No-Code/Low-Code movement that leading by Domain-Driven development and Model Driven Architecture. In this paper, we analyze the source code of 50 microservice-based projects using 50 different ideas to present the current state of the microservices architecture. Then, we propose a new approach for integrating the microservice architecture with the emerging Low-Code development trend to create a platform that makes it simple and quick for programmers and even citizen developers to develop, test, and deploy applications. By using a new MicroserviceSpecific Language, we achieved encouraging results we optimaze up to 95.81% of hard-coding. Keywords: Microservices Architecture · Software engineering · Low-Code · No-Code · Model Driven Architecture
1 Introduction Most businesses are moving their information systems into the Cloud as a result of the emergence of distributed architectures recently represented in the Microservice Architectures (MSAs). Numerous enterprises have adapted their software to the new style of development, to the MSAs, including Spotify, Twitter, Ebay, and others [7, 11, 16]. Due to its architectural style, which is especially suited to the needs of cloud infrastructures and new applications, the MSA has evolved from only an auxiliary technique to a self-sufficient software engineering architecture style for considerably large complete projects. This is because it allows businesses to integrate what they have built with what they are currently making and what they will create in the future. Agility, flexibility, modularity, and evolution are just a few advantages the MSA may offer [5, 22]. On the other hand, the new "No-Code/Low-Code" movement, which is driven by the Domain-Driven Development (DDD) and Model Driven Architecture (MDA) approach [1], enables to encapsulate of the development knowledge, requirement skills, and programming frameworks within the background and instead provides platform users with © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 817–828, 2023. https://doi.org/10.1007/978-3-031-26384-2_72
818
M. Ait Said et al.
human-friendly tools that they can calmly and effortlessly manipulate to create software quickly [4]. Our primary motivation is to develop a software engineering tool that frees developers from writing code line by line and requirement coding skills. Instead, it enables stakeholders (non-software developers) to build complete software by dragging and dropping software components (UI/UX, third-party packages, and Modules) and connecting them. This paper’s objective is to present the outcomes of our integration of the MSA with the emerging Low-Code movement. The primary contributions of this research are: I the introduction of the current incarnation of MSA used in current projects by examining 50 projects based on microservices of various ideas; and (ii) a novel method that enables developers and other stakeholders to quickly, skillfully and expertly build, test, and deploy complete web mobile and desktop applications by integrating the decent standard of flexibility, adopting, modularity, evolution, agility, and maintaining of the MSA with the new development style of Low-Code by a Microservice-Specific Language (MSSL). The study’s intended audience includes developers interested in improving their performance and development tools when creating software and researchers interested in contributing to this field of study (facilitating and automating software development). The remainder of the paper is structured as follows. First, we provide the fundamental ideas about the original form of MSA in Section 2 before presenting the current form of MSA. Next, Sect. 3 presents the emerging Low-Code trend. In Sect. 4, we’ll outline our approach, and in Sect. 5, a case study will illustrate. Then, in Sect. 6, we compared our approach to the more conventional MDD approach. In Sect. 7, we present the linked and related research about automating and simplifying software development that based on MSA and Low-Code. And in Sect. 8, we specify the following work to conclude our paper. 1.1 Original Form of MSA The microservice architectural style was first mentioned by Fowler and Lewis on March 25, 2014 [16]. It is a technique for creating a single software using a collection or a set of components or tiny services called Microservices, each of which runs in its environment and communicates with other Microservices via lightweight mechanisms. Achieving a great deal of flexibility, adoption, and evolution is the primary goal of MSA [3]. However, due to the absence of clear academic standards for developing MSA-based projects, there is still an insufficient understanding of how to transition to the MSAstyle [10]. Therefore, the MSA approach might be considered a particular application of Service-Oriented Architecture (SOA), even though there hasn’t been a widespread consensus on a particular academic or industrial description [8, 10]. Monolithic architecture prevents the independent execution of its modules because all functionality and dependencies are contained within a single program [9]. This kind of architecture is closely connected and strongly aligned, so just a single process that handles all requests functions. Conversely, the MSA features a totally automated deployment process and is built with industrial capabilities in mind. Due to their scale, they are more fault-tolerant and easier to maintain as the failure of one service won’t bring down
Microservice-Specific Language, a Step to the Low-Code Platforms
819
the entire system, as could occur with a monolithic architecture. As a result, this design approach don’t enables the creation of flexible, modular, and scalable architectures [16]. The continuous development requirements of large projects cannot be supported by monolithic architectures. Consequently, many massive systems have developed over the past few years, moving from standalone monolithic software based on high coupling and interdependent components to SOA-based and then to a set of small, independent, low coupling, and small-connected services. Later, Microservices were the name given to these services. (see Fig. 1). An important Mapping Study was presented by Davide et al. in 2018 [17]. To map out current tools and methodologies, they described the various MSA-style principles and practices. These principles and practices were reported in 42 case studies. In addition, they identified several widely accepted MSA foundations and patterns, as well as a summary of the benefits, drawbacks, and lessons learned for each practice from the case studies, which, given our dedication to the limitations of the paper, we are unable to discuss here.
Fig. 1. Migrating from monolithic to MSA-based software [9].
1.2 Current Form of MSA The present form of MSA was recognized, categorized, and reviewed by Paolo et al. in 2017 from the following three perspectives: publishing trends, research priorities, and potential for industry application [3]. Their study’s findings showed a gap in the advertisements of microservices, which prompted us to concentrate our early research on the development of microservices’ business and industrial aspects. This ultimately led us to analyze 50 MSA-based projects using 50 distinct ideas (Table 1) based (Every
820
M. Ait Said et al.
project chosen was created between 2018 and 2020 by businesses and independent developers). Here is what we discovered: • Each concept is based on the same idea described by Fowler and Lewis. • The same developers completed 10% of the projects from start to finish (As long as the same source or developers created them, the projects where the microservices were recycled got counted.). • In 90% of the projects, pre-built microservices from services providers from outside the same development teams, like the Amplify AWS services
Fig. 2. The MSA as it now stands.
Thus, we came to the conclusion that the microservices open the gates to an unheardof rapidity in the building and deploying of software through the pre-built and readyto-use microservices proposed by vendors/providers; (Fig. 2) illustrates the MSA in its current incarnation. Currently, developers aim to use pre-built microservices as much as possible rather than designing every component of their projects from scratch. For instance, why am I creating a Roles system? Can I use the Amplify Roles service instead? When I’m able to use Amazon S3 services, why am I constructing my own file system? Especially when the cost is so inexpensive. But for those who are not software engineers or have limited knowledge about development, the most significant questions are: – QUESTION 1: Why do I require complicated development languages and methodologies when I can connect a bunch of microservices using basic programming concepts to create a completed application ready to be deployed? – QUESTION 2: Or can I create a project that can be released quickly by utilizing pre-built and ready-to-use tools that allow me to set and edit microservices without requiring programming skills?
Microservice-Specific Language, a Step to the Low-Code Platforms
821
In the next section of this paper, we will present an answer to these questions by proposing a novel approach to building applications that combines the Microservices Architecture and No-Code/Low-code techniques. Table 1. The 50 selected ideas. News
Ecommerce
Job recruitment
Video streaming
Storytelling
Funny clips
Sports live score
Online games
Music streaming
Invoicing
Travel Niche booking price Directory or comparison Marketplace
Property rental
Wedding planner
Books
Proposal making online
Travel booking
Directory
Event management
Astrology
Food delivery web app
Bus Booking
Dating
Raise a fund
Latest business
Travel tips
Professional networking
Customized printing
Deal and Coupon
What’s trending
Buy & Sell anything
Hire a digital contractor
All kinds of services
Ratings and reviews
Book review
Find a co-founder
Career advisor
Music learning
Advertisement
Social networking
Auction
Online teaching
Affiliate
Consulting
Truck loader services
Logging
Dropshipping
Question-answer
Product price comparison
Stock market
2 Low-Code The Low-code movement is a modern development style and phenomenon that enables programmers and other stakeholders to quickly, skillfully, and expertly build, test, and deploy complete web mobile and desktop applications or any type of software without needing any knowledge of conventional programming frameworks or even languages or any requirement of development skills. This is accomplished by creating platforms that let users to connect the software components via a human-friendly tool, such as drag-and-drop, to create a complete project, test them, and deploy them. The only thing
822
M. Ait Said et al.
the user sees is a human-friendly tool, such as drag-and-drop or a graphical user interface that enables the connection and testing of components and modules of the software [4]. By sparing the developers from having to write any software code, the Low-code tools enable them to construct, test and deploy software more quickly. In 2014, the research firm Forrester invented the term “low-code.” The demand for software proficiency has risen quickly across all types of companies as a result of the significant digitization that has taken place in every facet of life over the past few years [6]. For instance, according to the most recent Gartner’s research shows that low-code software platforms will be in charge of more than 65% of all software development activity by 2024. Similarly, Forrester anticipates $21 billion in investment in the low-code business by 2022 [12, 13]. Although the low-code movement might have started recently, it actually stems from the Model Driven Architecture (MDA) idea that The Object Management Group (OMG) first proposed in 2001 [14]. MDA gives businesses the ability to combine their existing infrastructure with both current and future projects. As part of MDA development, roles are first transformed to create a Platform-Independent Model (PIM) from a Computation Independent Model (CIM) and then again to create one or more Platform-Specific Models from the PIM (PSM) [15, 19]; however, the MDD uses pre-built and ready-to-use graphical models and software components and services, not just the UML model, so that users can graphically create complex software. The MDD uses the UML model to automatically generate all or most of the software components through a sequence of transformation steps between different levels of abstraction. Thus, the low-code movement was created to eliminate hand-coding and hard-coding and provide to stakeholders (non-developers) the opportunity to develop the whole software without any existing programming frameworks or even languages or any requirement of development skills or programming knowledge.
3 Integration Approach To generate the functionality and operations wanted in a computer software or application, developers usually write lines of code. Therefore, developers must be highly skilled in tech knowledge, technical skills, software architecture, and deployment environments in order to successfully complete this process. Our approach is to integrate the MSA’s flexibility, adoption, modularity, evolution, agility, and maintenance with the new development style of Low-Code by a Microservice-Specific Language (MSSL).that deals directly with microservices rather than objects or modules to encapsulate all the hard-coding that is done in the background and allowing the developers and non-developers to connect the software components via a human-friendly tool, that is our MSSL, to create a complete project, test them, and deploy them. We divided the project into two sections to accomplish this goal: • Phases 1: Create the Command Line Interface (CLI) and our MSSL, which will interpret and manage user commands such as producing, testing, and upgrading functionalities or graphical elements.
Microservice-Specific Language, a Step to the Low-Code Platforms
823
• Phases 2: Create a production platform that combines the MSSL and CLI we established in Phase 1 with the Low-Code technology (We plan to build phase 2 in our upcoming work). Our MSSL is a JSON-based descriptive and declarative language that manages microservices (regardless of who built them internally, externally by providers, or both) and was constructed according to the MDD principles to produce the backend or the logical part that will be connected to a graphical interface component in the platform by APIs (Phase 2) that we will build in the future. The initial prototype (Fig. 3) is built around ten modules named engines:
Fig. 3. 10 engines that make up our MSSL.
The user input is interpreted and processed by the interpretation engine, which then passes the information to the modeling modules and the microservices/third-parties (MTP) module after being integrated with graphical interface components in Phase 2. The model’s module will perform an analysis, extract all required models, link them to the chosen database, and perform all necessary actions (data insert, data select...). The MTP module will analyze, extract, and link all required microservices and packages with the models using the communication module, which uses the required SDKs or APIs.In order to send a report to the build module and begin extracting the project’s source code in the language or framework that the user has requested, the logical module then begins analyzing all the necessary functionalities, actions, and events and linking them with the graphical interface components that the user selects using drag-and-drop tools. Because the platform will be free and open-source, developers will be able to provide add-ons that make it simpler for users to utilize the platform. The files module aids the build module in producing project files, and the plugin module manages user extensions.
4 Case Study To demonstrate our MSSL, we provide a case study of a straightforward ecommerce web application in this section. This e-commerce is based on four microservices:
824
M. Ait Said et al.
1. The user microservice is in charge of managing, authenticating, and granting access to users and their privileges. 2. Microservice of products: in charge of handling product data. 3. Organizing microservice: Arranging the application’s product (e.g., Tag). 4. Reaction microservice: in charge in charge of handling the feedback of customers. The code in (Fig. 4) is The textual syntax of this e-commerce Prototype, which will generate four microservices, one microservice is for the management of the users and the second one is for organizing products by tags, a third microservice is for handling logical and CRUD operations of products with payment gateways and a fourth microservice for collection customers feedback (The code is now only created in PHP using the Laravel framework, but in the future, we plan to integrate the most widely used languages and frameworks, particularly Java with the Spring boot framework).
Fig. 4. The MSSL code for building e-commerce web site.
With the implementation of the routes, authorization, controllers, services, and middlewares required for all authentication and authorization features, including gates, policies login, registration,2-Step verification, password reset, phone verification, and more, with an essential graphical interface component, our MSSL builds a user’s authorization service based on three roles of Authorization: Administrator, Seller, and Customer and three authentication passports: Phone, Instagram, and Twitter. The CLI can be used to create users; for instance, the first user created will be an administrator. On the other hand, our MSSL makes a product management system with all the essential CRUD functions, including reviews, comments, and tags (For example, get a product by Tag, get all comments by product…) Suppose you forgot to add or update new functionality.
Microservice-Specific Language, a Step to the Low-Code Platforms
825
In that case, you could also utilize the CLI to build additional functions or operations that you did not specify in the MSSL code, such as getting the most popular product. Our tool’s flexibility to switch between several levels of abstraction is just one of its advantages. In this example, the Content class, which is an Entity, is used. It inherits certain operations, routing, methods, and properties, such as “created_at”, “updated_at”, and “deleted_at.”. Entity class could be used to alter models and features in many ways. Microservices provide a wide range of functions and can be separated and customized. Table 2 lists the number of lines of code in our Microservice-Specific Language for every one of the four microservices in textual syntax, together with the number of lines of PHP with the Spring boot framework that must be manually written for each microservice individually. Our efforts to implement the best programming techniques the number of blank lines led to reduce the number of manually produced lines of code. In order to demonstrate the difference between the number of lines of code required to specify the same microservices in PHP and our Microservice-Specific Language, we compared both approaches. As we notice in the table, the number of lines typed manually is significantly higher than the number of lines created utilizing our Microservice-Specific Language. However, it is clear that there are now 95.81% fewer lines of code overall that were manually written. Table 2. The number of lines for our MSSL and manually written PHP code were compared. Microservice
The MSSL number of lines of code
The PHP number of lines of code
Comparative ratio
Users microservice
10
193
94,82%
Collections microservice
11
113
90.27%
Product microservice
24
478
94.98%
Feedback microservice
2
337
99.41%
Total
47
1,121
95.81%
5 Comparison We compare our MSSL with manually written PHP code in Sect. 5, and the results are satisfactory as we would expect; our MSSL is designated to generate code for highlevel languages and frameworks. We contrast our MSSL with the conventional DomainSpecific Language (DSL), which is based on MDD, in this section. MDD is an approach to creating software rapidly, efficiently, and with the least amount of resources. The creation of software models is the primary goal of the MDD methodology. Before any code is written, a software system’s functionality is represented
826
M. Ait Said et al.
in a bunch or a series of models. All MDD-based DSLs are capable of generating code from models; as a result, they are only able to generate code for functional requirements like login, registration, subscription, payment, etc. The communication, association, and logic between these functionalities are left to be manually coded by developers. Every microservice in our MSSL is constructed up to MDD, and our MSSL is based on MSA. Every feature within every microservice can be customized because our MSSL builds microservices using a set, a bunch, or a series of models. For example, we can alter the authentication process to change the responsible models automatically without modifying the model itself, as shown in the case study in Sect. 5 of this paper. More than that, our MSSL can control the interaction, logic, operations, and relationship between models to create microservices. It’s also able to manage relationships between microservices and their communication using different strategies like RESTAPI.
6 Related Work We have discovered many research publications which handle the characterization of various MSA layers using the Model-driven development paradigm while reviewing the literature state-of-the-art in this field. Therefore, the remaining paragraphs of this section highlight the related research in this field and contrast them with our methodology. The authors of [2, 20, 21] introduce MicroBuilder [18], a model-driven engineering tool that deals with the spec of software engineering with respect to the concepts of Representational State Transfer MSA design. It provides a powerful and straightforward DSL used for the standard of the Representational State Transfer of MSA, and their instrument consists of two modules, MicroDSL and MicroGenerator. The level of abstraction separates our approach from their approach. Their tool necessitates in-depth familiarity with development languages and techniques. Contrarily, our method makes it simple and quick for developers and other citizen developers to create, build, and distribute software. Sagitec’s Software Studio is a low-code tool and a platform highlighted by the authors in [6]. Because of its extensive quality assurance activities, demand tracking, analytics for fraud detection, and other features, it even enables non-software developers to build apps quickly, lowering the cost of doing so and increasing investment returns. However, in several domains, it still has shortcomings because of the lack in the robust of abstraction that can generate software across all business domains.
7 Conclusion and Future Work In this paper, we present the current form of Microservice Architecture by analyzing 50 unique Microservice Architecture-based projects from 2018 to 2020. We provide a novel method to quicken the software development process by combining and integrating the decent standard of flexibility, adopting, modularity, evolution, agility, and maintaining of the Microservice Architecture with the new development style of Low-Code by a Microservice-Specific Language, which handles microservices rather than objects, to encapsulate all hard-coding that takes place in the background and sparing the developers from having to write any software code, and enable them to construct, test and deploy
Microservice-Specific Language, a Step to the Low-Code Platforms
827
software more quickly. When adopting our Microservice-Specific Language, we found that the percentage of all manually written lines of code dropped by 95.81% when we validated the initial prototype on an e-commerce web application project. On the other side, Forrester anticipates that by 2022, the low-code industry will account for $21B in spending [13]. Therefore, the market appears promising, especially given the early feedback we received from our prototype. We, therefore, intend to advance our approach by including various features, developing the workshop platform, and offering graphical user syntax. Additionally, we plan to support more languages and frameworks like JavaEE and.Net.
References 1. Hailpern, P., Tarr, P.: Model-driven development: the good, the bad, and the ugly. IBM Syst. J. 45(3), 451–461 (2006) 2. Branko, T.., Vladimir, D., Slavica, K., Gordana, M.: Development and evaluation of microbuilder: a model-driven tool for the specification of rest microservice software architectures. Enterp. Inf. Syst. 12, 1034–1057 (2018) 3. Paolo, F.D., Ivano, M., Patricia, L.: Architecting with microservices: a systematic mapping study. J. Syst. Softw. 150, 77–97 (2019) 4. Margaret, R.: Low-code and no-code development platforms. Retrieved from Techtarget (2020): https://searchsoftwarequality.techtarget.com/definition/lowcode-no-code-develo pment-platform. Accessed 19 Sept 2022 5. Paolo, F.D.: Architecting microservices. In: IEEE International Conference on Software Architecture Workshops (2017) 6. Rachit, A., Nayan, G., Tapan, M.: Sagitec Software Studio (S3) - a low code application development platform. In: 2020 IEEE International Conference on Industry 4.0 Technology (2020) 7. Yahia, E.B.H., Réveillère, L., Bromberg, Y.-D., Chevalier, R., Cadot, A.: Medley: an eventdriven lightweight platform for service composition. In: Bozzon, A., Cudre-Maroux, P., Pautasso, C. (eds) Web Engineering. ICWE 2016. Lecture Notes in Computer Science(), vol 9671, pp. 3–20. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-38791-8_1 8. Davide, T., Valentina, L., Claus, P.: Continuous architecting with microservices and DevOps: a systematic mapping study. In: Muñoz, V., Ferguson, D., Helfert, M., Pahl, C. (eds.) Cloud Computing and Services Science. CLOSER 2018. Communications in Computer and Information Science, vol. 1073, pp. 126–151. Springer, Cham. https://doi.org/10.1007/978-3-03029193-8_7 9. Francisco, P., Gastón, M., Hernán, A.: Migrating from monolithic architecture to microservices: a rapid review. In: 38th International Conference of the Chilean Computer (2019) 10. Zimmermann, O.: Microservices tenets: agile approach to service development and deployment. Comput. Sci. Res. Dev. 32, 301–310 (2017) 11. Villamizar, M., et al.: Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud. In: 10th Computing Colombian Conference (10CCC). IEEE, pp. 583–590 (2015) 12. Paul, V., Kimihiko, I., Mark, D., Jason, W., Yefim, N.: Gartner magic quadrant for enterprise low-code application platforms. Retrieved from Gartner (2019). https://www.gartner.com/en/ documents/3956079/magic-quadrant-for-enterprise-low-code-application-platf. Accessed 19 Sept 2022 13. John, R., Rob, K., Christopher, M., Sara, S., Christine, T.: The Forrester Wave™: low-code development platforms for AD&D Pros. Forrester Research Inc., 13 March 2019
828
M. Ait Said et al.
14. MDA® - The architecture of choice for a changing world. https://www.omg.org/mda/. Accessed 19 Sept 2022 15. Yassine, R., Youssef, H., Abdelaziz, M.: Model transformation with ATL into MDA from CIM to PIM structured through MVC. In: The 7th International Conference on Ambient Systems, Networks and Technologies (2016) 16. Lewis, J., Martin, F.: Microservices. A definition of this new architectural term. Retrieved from MartinFowler.com (2014). https://martinfowler.com/articles/microservices.html. Accessed 03 Sept 2022 17. Davide, T., Valentina, L., Claus, P.: Architectural patterns for microservices: a systematic mapping study. In: Proceedings of the 8th International Conference on Cloud Computing and Services Science, pp. 221–232 (2018) 18. MicroBuilder Overview. https://thoughtworksinc.github.io/microbuilder/1-overview.html. Accessed 19 Sept 2022 19. Mostapha, M., Yassine, R., Hadi, Y., An approach for transforming CIM to PIM up to PSM in MDA. In: The 11th International Conference on Ambient Systems, Networks and Technologies (2020) 20. Branko, T., Vladimir, D., Slavica, K., Gordana, M.: A model-driven approach to microservice software architecture establishment. In: Federated Conference on Computer Science and Information Systems (2018) 21. Branko, T., Vladimir, D., Slavica, K., Gordana, M.: MicroBuilder: a model-driven tool for the specification of REST microservice architectures. In: International Conference on Information Science and Technology (2017) 22. Paolo, F. D., Ivano, M., Patricia, L.: Research on architecting microservices: trends, focus, and potential for industrial adoption. In: IEEE International Conference on Software Architecture (2017)
Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms? Najem Kamal(B) , Ziti Soumia, and Zaoui Seghroucheni Yassine Intelligent Processing and Security of Systems, Faculty of Sciences, Mohammed V University in RABAT, Rabat, Morocco {najem.kamal,s.ziti,y.zaoui}@um5r.ac.ma
Abstract. Soft skills are mostly difficult to assess and teach. These skills are neither completely tangible nor measurable. Yet, they are essential qualities for any organization that seeks to innovate and boost the productivity of its work teams. The recent educational environment is constantly developing, and despite the fact that they attach increasing importance to the teaching of soft skills, these efforts remain insufficient according to several research works. In this paper we propose an approach for the teaching and learning of soft skills in an adaptive learning system that is based on the learning styles and the recommendations of the competency based learning. Keywords: Soft skills · Competency-based learning · Learning styles
1 Introduction Soft skills are human and social skills that can be developed by individuals. They are highly various and include the individual qualities of the learner. Simultaneously, no one can deny that the employability of university graduates is one of the main purposes of university programs established by a higher education system. As well, soft skills play a very important role in assessing the productivity of new applicants. The online teaching of soft skills is not resistant to this problem; it even suffers from other problems, especially those related to the motivation of learners to perform learning tasks as well as the implementation of pedagogical methodologies that suit their learning styles. Several research studies have shown the effectiveness of learning systems recently called adaptive learning systems. These systems focus mainly on the individual learning characteristics of learners to provide personalized learning. [1] Defines e-learning as the use of Internet technologies to provide various solutions to improve learners’ knowledge and performance. Therefore, the use of the Internet in education has renewed attention to the courses taught, the strategies for delivering those courses, and the learner’s ability to learn. In fact, [2] has shown that strategies can be adjusted according to the learning style and preferences of learners.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 829–840, 2023. https://doi.org/10.1007/978-3-031-26384-2_73
830
N. Kamal et al.
Personalization is also synonymous with adaptability; personalization means modeling solutions to meet the current needs of an individual. A system that adapts to the needs of different learners. It is therefore a question of being able to adapt teaching styles to meet the different needs of learners’ learning styles, through theorizing the methodology to be implemented, and adapting the models that allow the methods to meet the needs of each learner. A large number of learning style models and theories have been developed [3], mainly by Felder Silverman [7], Kold [4], Brusilovski [6], and Honey and Mumford [5]. A comprehensive and clear learning style model has not been identified [8]. Throughout this article, we redefine the pedagogical model of the adaptive learning system case, including soft skills learning. The rest of the paper is organized as follows: The next section presents different works on the topic of adaptive learning system. The third section is devoted to the presentation of the theory of learning styles, followed later by an overview of the major ALS and their adaptation strategies, and then in Sect. 5 we present an approach for teaching softskills in an adaptive learning system. Finally, some conclusions are drawn.
2 Related Works The traditional approach to teaching softskills focuses on the application of architectures designed for traditional teaching, but does not emphasize the new parameters that emerge in the case of adaptive teaching, [9] states that 21st century skills are skills that increasingly demand creativity, perseverance and problem solving combined with performing well as part of a team. Indeed, there are several problems regarding the interaction between the learner and the system with traditional e-learning [10, 11] states that traditional e-learning systems tend to offer identical courses and similar learning material in the same order for all users, which can be problematic and lead to learner dissatisfaction and increased dropout rates. A problem that has prompted many authors to ask whether online teaching approaches require adjustments to address the problem of teaching softskills. [6, 12–14] discussed adaptive frameworks with components that are needed to provide adaptation to different learning styles. [15] adds that adaptive models and frameworks can be used to shape the design and development of adaptive e-learning systems, taking into account their main components. These models contain the key components for producing adaptation, depending on the user, their choices, motivations and interests, and the way in which information is conveyed to them. [16] brings together the essential elements of adaptive systems (Fig. 1).
Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms?
831
Fig. 1. General diagram of a user adaptive system [Jameson 2009].
3 Learning Theories The topic of defining learning has been much discussed and defined in several ways by learning psychologists and theorists. In this paper, [17] presents a definition that integrates the main conceptual approaches to learning: "Learning is a lasting change in behavior, or in the ability to engage in a given way, that results in practice or other forms of experience. In this section, therefore, we do not claim to provide an unconditional and definitive definition of learning. Rather, we seek to define it from the point of view of the learner who, in turn, constructs his own knowledge and develops his own learning methods in order to apply them in various learning situations for his softskills. It is therefore necessary to assess learning styles in order to meet the needs of the learners, and this is where we get to the point where we need to move away from the classic approach that focuses on the transmission of knowledge from the teacher to the learner, to an approach that allows the learner to be able to construct their own learning. Especially for social skills that require commitment and dedication on the part of the learner. At the same time, [18] adds that “Learner-centered instruction means that learners take the initiative and responsibility for determining their learning needs based on their own schedule and pace.” The learner-centered teaching approach ensures effectiveness in online learning if we are able to replicate the complexity of situations that learners face, given their diverse learning styles. Indeed, [19] confirms that people differ in their personalities, abilities, experiences, skills and learning styles and preferences. Traditional e-learning systems generally do not take these features into account and do not allow for truly personalized and adaptive learning. The evolution of learning theories can be seen as moving from general theories designed to provide explanations for the many ways in which learning occurs to theories of adaptive learning.
832
N. Kamal et al.
4 Adaptive Learning Systems Adaptive learning is a presentation of educational content that is tailored to the learner’s pedagogical needs, level of understanding of the course being taught, and learning style. [14] adds that an adaptive learning system can adapt its response to various circumstances. Specifically, this education system aims to develop and implement a solution framework for individualized instruction in accordance with individual differences in real time. Although various adaptive learning systems meet the needs of learners, it is certain that new technologies open up new opportunities to innovate and offer new approaches adapted to current contexts. These, along with conflict resolution training and the development of group learning skills in students, certainly require a comprehensive instructional plan and strategies, such as the more innovative strategies of the PBL approach [21, 22] adds that it is essential to note that adaptability must be applied with caution; what works in one area may not necessarily be appropriate in another, Among the features recommended by [1] for effective adaptive e-learning is the need for a clear engagement with a learner model, a domain model, and an adaptation model. A model that we will develop later.
5 Approach Soft skills work effectively online if we manage to reproduce the complexity of the situations faced individually by learners dependent on each person’s learning abilities. It is not limited to evaluating students and giving them a grade. But rather on giving each learner personalized feedback and opportunities to improve their skills in order to master the desired objective. Many adaptive e-learning systems are based on different architectures to suit the needs of learners. Each architecture takes into consideration specific criteria depending on the technologies used and the primary characteristics of the learners. Among the systems developed that use these models Generally, the design of adaptive frameworks must incorporate components that answer three main questions [23]: • What can we adapt (Domain Model) • What are we adapting to (Learner Model) • How can we adapt (Adaptation Model) Figure 2 represents an abstract architecture that includes the main components of adaptive systems: Domain Model, Adaptation Model, Learner Model.
Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms?
833
Fig. 2. Schematic representation of the structure of adaptive systems.
Depending on these three main models, we can address the main needs of learners in an adaptive e-learning context: Learner Model: [24] states that a learner model is "what enables a system to deal with a student", it takes in the learner’s characteristics and preferences to model the learner’s profile and make a content recommendation. This allows for the design of a learner model that is the representation of specific characteristics that may be relevant to a personalized interaction. In an adaptive e-learning system, the learner model can follow implicit methods focusing on the outcome of their interactions with the interfaces, as well as collecting data on their ability to understand and interact with the learning content. Other explicit methods may be more reliable and accurate through questionnaires or surveys addressed to learners, however the possibility that learners may be reluctant to provide the requested information. Following this model, the interface has the ability to provide a learning path that is tailored to the learner, and then improve as the learner’s experience progresses. Table 1 gives a summary of learner models in the systems. Adaptation Model: The adaptation model acts as a link between the learner model and the domain model by matching the appropriate learning tools to the learners’ motivations. This model is constantly updated and modified in a translucent manner as the model changes radically in response to feedback generated by the learner model (Table 2). Domain Model: [25] indicates that the domain model is the result of capturing and structuring knowledge related to a specific domain, [26] adds that the model is an abstract representation of a part of the real world. It is composed of a set of knowledge elements of the domain (Table 3).
834
N. Kamal et al. Table 1. Learner model
System
Learner characteristics modelled
Model representation
Learning style model
Data elicitation method
MASPLANG
Knowledge, Learning Style
Overlay Model, Inferred Model
Felder-Silverman Model
Explicit (questionnaire), Implicit (student actions)
INSPIRE
Knowledge, Learning Style
Overlay Model
Honey and Mumford Model
Explicit (questionnaire), Manual
iWeaver
Preferences, Learning Style
[not specified]
The Dunn & Dunn Explicit Model (questionnaire)
TANGOW
Knowledge, Learning Style
Overlay Model
Felder-Silverman Explicit Model: Only 2 (questionnaire) dimensions: understanding and perception
AHA!
Preferences, Learning Style
Overlay Model
Combination of several models
Manual
eTeacher
Performance, Learning Style
Bayesian Network
Felder-Silverman Model: Only 3 dimensions: perception, processing and understanding
Explicit (questionnaire), Implicit (student actions)
WELSA
Learning Style
Stereotype Model, Inferred Model
Unified Learning Style Model (ULSM)
Implicit (student actions)
Protus
Knowledge, Learning Style
Inferred Model
Felder-Silverman Model
Explicit (questionnaire), Implicit (student actions)
LearnFit
Preferences, Learning Style
Bayesian network
Myers-Briggs Type Indicator (MBTI)
Explicit (questionnaire), Implicit (student actions)
The main objective through the use of these models is to provide a conceptual framework that we can use to design different adaptive learning systems in e-learning. However, the systems designed and implemented do not meet the need of teaching softskills online.
Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms?
835
Table 2. Adaptation methods System
Learner characteristics modelled
Model representation
Learning style model
Data elicitation method
MASPLANG
Knowledge, Learning Style
Overlay Model, Inferred Model
Felder-Silverman Model
Explicit (questionnaire), Implicit (student actions)
INSPIRE
Knowledge, Learning Style
Overlay Model
Honey and Mumford Model
Explicit (questionnaire), Manual
iWeaver
Preferences, Learning Style
[not specified]
The Dunn & Dunn Explicit Model (questionnaire)
TANGOW
Knowledge, Learning Style
Overlay Model
Felder-Silverman Explicit Model: Only 2 (questionnaire) dimensions: understanding and perception
AHA!
Preferences, Learning Style
Overlay Model
Combination of several models
Manual
eTeacher
Performance, Learning Style
Bayesian Network
Felder-Silverman Model: Only 3 dimensions: perception, processing and understanding
Explicit (questionnaire), Implicit (student actions)
WELSA
Learning Style
Stereotype Model, Inferred Model
Unified Learning Style Model (ULSM)
Implicit (student actions)
Protus
Knowledge, Learning Style
Inferred Model
Felder-Silverman Model
Explicit (questionnaire), Implicit (student actions)
LearnFit
Preferences, Learning Style
Bayesian network
Myers-Briggs Type Indicator (MBTI)
Explicit (questionnaire), Implicit (student actions)
Figure 3 shows the adaptive structure that implements the three main components of adaptability: Learner Model, Domain Model and Adaptation Model. [27] integrates two secondary components to an adaptive e-learning system, containing the interaction module and the interaction data modeling component.
836
N. Kamal et al. Table 3. Domain model
System
Model representation
Learning object standard
Application domain
MASPLANG
Hierarchal Network: Concepts, procedures, nodes and their relationship links
None
Computer Network: TCP/IP Protocols
INSPIRE
Hierarchal Network: Goals (topics to be learned), concepts (related lessons) and educational materials (facts, procedures, exercises)
ARIADNE metadata standard
Computer Architecture
iWeaver
Hierarchal Network: Seven lessons
None
Interactive Multimedia and Web design
TANGOW
Hierarchal Network: Tasks, sub-tasks and educational materials
None
Theory of Computation
AHA!
Hierarchal Network: concepts and their relationship (prerequisite)
None
Adaptive Hypermedia
eTeacher
Hierarchal Network: Course, unit, topics and reading materials
None
Artificial Intelligence
WELSA
Hierarchal Network: Chapter, sections, sub-sections and learning objects
Dublin Core and Ullrich Instructional Ontology [60]
Artificial Intelligence (Constraint satisfaction problems)
Protus
Hierarchal Network: Topics, lessons, and educational materials
None
Principles of Programming (Java)
LearnFit
Hierarchal Network: Course, chapter, concept and learning object
IEEE LOM standard
Introduction to PHP Programming
This framework is limited in the case of an adaptive learning system for soft skills, since it does not give enough importance to the type of content taught and the methodology provided. On the other hand, [28] presents an adaptive e-learning framework named LearnFit (see Fig. 4) based on the micro-adaptive approach. Like the previous system, it first
Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms?
837
Fig. 3. An adaptive e-learning framework.
collects the preferences of each learner using the Myers-Briggs character measurement method (MBTI). In this system, each learner profile is associated with an appropriate learning methodology to adapt the format of the content to be delivered. However, the design of the course content by the teacher is not taken into account in these frameworks for adaptive learning for e-learning softskills.
Fig. 4. Architecture système de LearnFit
We need to consider adding an element that will allow us to adapt to the course content, and to validate the best teaching methodology for softskills. In this regard, we propose the addition of the instructional model in order to have a better adaptation for the adaptive teaching in e-learning of softskills. 5.1 Instructional Model Teaching models are a guide for educators to plan instruction [29], It is fundamental to determine the best teaching model for the learners to ensure that it is appropriate for
838
N. Kamal et al.
classroom practice. At the same time, it must also be appropriate for the flow of the program. This model is a learner-centered, teacher-directed method that employs the learners’ expertise. This is viewed as techniques gathered from neurology and cognitive science research to enhance teacher instruction [14]. In order to improve our adaptive framework for implementing softskills in adaptive teaching systems, teachers themselves need to plan and implement softskills in their curricula and meet the requirements of learners. Softskills implementation methodologies can be initiated through the teaching of softskills independently of the courses taught by teachers on a formal basis following practical situations, which will allow learners to devote a volume of time to learning softskills. These can also be introduced by including them in existing courses. This could be a practical solution for teachers as it would require only minor changes to the structure of the courses. This model, in conjunction with the other main models, could integrate learning content for softskills into adaptive learning systems, as represented by the adaptive framework in Fig. 5.
Fig. 5 .
6 Conclusion In this paper we have presented the essentials of an approach aimed at teaching and learning soft skills based on the strengths of adaptive learning systems, the next step is to define the learning objects according to the recommendations of the competencybased learning and the learning styles. Later we intend to implement the learning objects within an adaptive learning system that we will design as our study progresses.
References 1. Rosenberg, M.J.: E-Learning: Strategies for Delivering Knowledge in the Digital Age, vol. 9. McGraw-Hill, New York (2001) 2. Evers, V., Cramer, H., van Someren, M., Wielinga, B.: Interacting with adaptive systems. In: Babuška, R., Groen, F.C.A. (eds.) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281, pp. 299–325. Springer, Heidelberg (2010). https://doi. org/10.1007/978-3-642-11688-9_11 3. Popescu, E.: Diagnosing students learning style in an educational hypermedia system. In: Cognitive and Emotional Processes in Web-Based Education: Integrating Human Factors and Personalization Advance in WebBased Learning B Series. IGI Globol, pp. 187–208 (2009)
Teaching Soft Skills Online, What Are the Most Appropriate Pedagogical Paradigms?
839
4. Kolb, D.A.: Experiential learning: Experience as the source of learning and development (1984) 5. Honey, P., Mumford, A.: Learning styles questionnaire. Organization Design and Development, Incorporated (1989) 6. Brusilovsky, P., Maybury, M.T.: From adaptive hypermedia to the adaptive web. Commun. ACM 45(5), 30–33 (2002) 7. Felder, R.M., Silverman, L.K.: Learning and teaching styles in engineering education. Eng. Educ. 78(7), 674–681 (1988) 8. Brusilovsky, P.: Methods and techniques of adaptive hypermedia. User Model Adapt. User Interact. 6(2), 87–129 (1996) 9. Duncan, A.: U.S. Secretary of Education (2009). http://www2.ed.gov/news/pressreleases/ 2009/02/02262009.html 10. David, H., Mirjam, K.: State of the art of adaptivity in e-learning platforms. In: Workshop at Adaptivity and User Modeling in Interactive Systems ABIS 2007 (2007) 11. Sun, P.-C., Tsai, R.J., Finger, G., Chen, Y.-Y., Yeh, D.: What drives a successful e-Learning? An empirical investigation of the critical factors influencing learner satisfaction. Comput. Educ. 50(4), 1183–1202 (2008) 12. Roy, S., Roy, D.: Adaptive e-learning system: a review. Int. J. Comput. Trends Technol. 1, 115–118 (2011) 13. Mohammad, A., Rachid, A., Robert, H.: Adaptivity in ELearning systems. In: The 8th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS 2014). Birmingham, United Kingdom, pp. 79–86 (2014). https://doi.org/10.1109/CISIS.2014.12 14. Peter, B.: Adaptive hypermedia for education and training. Adapt. Technol. Train. Educ. 2012(46), 46–68 (2012) 15. De Bra, P., Houben, G.-J., Wu, H.: AHAM: a Dexter-based reference model for adaptive hypermedia. In: Proceedings of the tenth ACM Conference on Hypertext and hypermedia: Returning to Our Diverse Roots: Returning to Our Diverse Roots, pp. 147–156 (1999) 16. Dieterich, H., Malinowski, U., Kühme, T., Hufschmidt, M.S.: State of the art in adaptive user interfaces. In: Hufschmidt, S.M., Kühme, T. Malinowski, U. (eds.) Adaptive User Interfaces - Results and Prospects. Elsevier Science Publications (1993) 17. Knutov, E., De Bra, P., Pechenizkiy. M.: AH 12 years later: a comprehensive survey of adaptive hypermedia methods and techniques. New Rev, Hypermedia Multimedia 15(1), 5–38 (2009) 18. Jameson, A.: Adaptive interfaces and agents. In: Human-Computer Interaction: Design Issues, Solutions, and Applications, vol. 105 (2009) 19. Schunk, D.H.: Learning Theories: An Educational Perspective. Macmillan, New York (1991) 20. Zhang, D.: Virtual Mentor and the lab system-toward building an interactive, personalized, and intelligent e-learning environment. J. Comput. Inf. Syst. 44(3), 35–43 (2004) 21. Brusilovsky, P.: Adaptive hypermedia. User Model. User-Adapt. Interact. 11, 87–110 (2001) 22. De Bra, P., Aroyo, L., Chepegin, V.: The next big thing: adaptive web-based systems. J. Digital Inf. 5(1) (2004) 23. Van Velsen, L., Van Der Geest, T., Klaassen, R., Steehouder, M.: User-centered evaluation of adaptive and adaptable systems: a literature review. Knowl. Eng. Rev. 23(03), 261–281 (2008) 24. Weibelzahl, S.: Evaluation of adaptive systems. User Model. 2001, 292–294 (2001) 25. Deep, S., Mohd Salleh, B., Othman, H.: Study on problem-based learning towards improving soft skills of students in effective communication class. Int. J. Innov. Learn. 25(1), 17–34 (2019) 26. Knutov, E., De Bra, P., Pechenizkiy, M.: AH 12 years later: a comprehensive survey of adaptive hypermedia methods and techniques. New Rev. Hypermedia Multimed. 15(1), 5–38 (2009)
840
N. Kamal et al.
27. Brusilovsky, P., Schwarz, E., Weber, G.: ELM-ART: an intelligent tutoring system on World Wide Web. In: Frasson, C., Gauthier, G., Lesgold, A. (eds.) Intelligent Tutoring Systems. ITS 1996. Lecture Notes in Computer Science, vol. 1086, pp. 261–269. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61327-7_123 28. Self, J.A.: The defining characteristics of intelligent tutoring systems research: ITSs care, precisely. Int. J. Artif. Intell. Educ. 10, 350–364 (1999) 29. Clark, M., et al.: Automatically structuring domain knowledge from text: an overview of current research. Inf. Process. Manag. 48(3), 552–568 (2012) 30. Alshammari, M., Anane, R., Hendley, R.: Adaptivity in ELearning systems. In: The 8th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS 2014), Birmingham, United Kingdom, pp. 79–86 (2014). http://dx.doi.org/10.1109/CISIS. 2014.12 31. Bachari, E.E., Abelwahed, E.H., Adnani, M.E.: E-learning personalization based on dynamic learners. Preference (2011) 32. Connell, J.D.: The global aspects of brained- based learning. Educational Horizons, pp. 28–38 (2009)
A Comparative Review of Tweets Automatic Sarcasm Detection in Arabic and English Soukaina Mihi(B) , Brahim Ait Ben Ali, and Nabil Laachfoubi IR2M Laboratory, Faculty of Science and Techniques, University Hassan First of Settat, Settat, Morocco [email protected]
Abstract. Sentiment analysis has become a prevalent issue in the research community, with researchers employing data mining and artificial intelligence approaches to extract insights from textual data. Sentiment analysis has progressed from a simple task of classifying evaluations as positive or negative to a sophisticated work that requires a fine-grained multimodal analysis of emotions, manifestations of aggression, sarcasm, hatred, and racism. Sarcasm occurs when the intended message differs from the literal meaning of the words employed. Generally, the content of the utterance is the opposite of the context. Sentiment analysis tasks are hampered when a sarcastic tone is recognized in user-generated content. Thus, Automatic Sarcasm Detection in textual data dramatically affects the performance of sentiment analysis models. This study aims to explain the basic architecture of a sarcasm detection system and the most effective techniques for extracting sarcasm in the English language. Then, for the Arabic language, determining the gap and challenges. Keywords: sarcasm · automatic · sentiment analysis · deep learning · Arabic
1 Introduction Internet users can express their opinions and appreciations through postings or written comments in text, image, or video format on social networking platforms. Users and organizations have made it a habit to mine data on the internet to evaluate the data and find out what people think about a particular item, product, or brand. Sentiment analysis is a branch of study that provides techniques for conducting this research and automatically extracting classes of favorable, unfavorable, or beyond to assign intensity to a text: neutral, extremely positive, evokes a slight negative, and so on. Social media users frequently use ironies to soften the blow of their statements or add a comic aspect to their persona. Even for humans, sarcasm is difficult to detect; it confuses class assignment to a text and might lead to a false judgment. For example, for the statement “it’s this kind of surprise that I like, the price of diesel has exceeded gasoline, a first”, I like is a positive expression that implies that what next will be pleasant, yet the rising price of diesel is a
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 841–849, 2023. https://doi.org/10.1007/978-3-031-26384-2_74
842
S. Mihi et al.
negative fact that consumers do not appreciate. The negative context completely changes the sentiment class. Sarcasm is a form of irony in which people communicate their negative feelings through positive or intensified positive language. People from numerous cultures use sarcasm in everyday life to attain diverse communicative purposes. For example, the comment “I am happy to be able to work and spend money on transportation” is used as self-deprecating. At the same time, the post “That is just what I needed today” is a brooding expression about an adverse event. Thus, a person could use sarcasm to express attitudes, be humorous, mock others, create reactions, or act aggressively. As a result, speaker intent and attitudes might be thought of as being on a scale from negative to upbeat. Since sarcasm indicates sentiment, identifying sarcasm in a text is essential for determining the text’s true sentiment. The challenges of sarcasm and the importance of sarcasm detection in sentiment analysis have sparked interest in automatic sarcasm detection as a study topic. Automatic sarcasm detection refers to computational methods for determining whether a given text is sarcastic. As a result, the comment “Well, at least your mum thinks you are pretty.” should be classified as sarcastic, whereas the sentence "Well, at least your mom believes you will be late" should be classified as non-sarcastic. Because sarcasm can be presented in various ways, this task is tough to handle. Automatic sarcasm detection is primarily a classification problem of polarity (sarcastic, non-sarcastic). The automated process involves several lexical or pragmatic feature extraction, rule-based methods, and learning techniques. The lack of appropriate labeled data in Natural Language Processing, especially for the Arabic language, is a barrier to sarcasm detection. The deep learning approaches have emerged and shown excellent outcomes for English. Deep learning models are helpful because of their capability to learn independently. In order to inform and guide researchers and practitioners in this area, we carefully review articles on sarcasm detection in English. Furthermore, we examine some of the most well-known methods for detecting sarcasm recently applied to the English language. Finally, we discuss the obstacles that Arabic sarcasm detection algorithms face and future developments in this field.
2 Methodology Automatic sarcasm detection frameworks are based on four key components depicted in Fig. 1: data collection, data preparation, feature engineering, and classification.
A Comparative Review of Tweets Automatic Sarcasm Detection in Arabic and English
843
Fig. 1. Components of automatic sarcasm detection frameworks
2.1 Data Collection Data collection is the first step in any sarcasm detection project. On social networks, the amount of data on the Internet is hindered by accessibility and privacy restrictions. Twitter is commonly used to extract popular datasets. Twitter developer APIs provide tools for constructing a Twitter dataset. These options can generate a dataset that is either a broad, untargeted sample of what is available or a set of targeted samples based on hashtags, usernames, and location. In order to collect data with sarcasm intentions, researchers often use hashtags like “#sarcasm”, “#sarcastic”, “and #irony”. 2.2 Data Preparation The method of reducing noise from data collection is known as data preparation. The data collected from various platforms, such as Twitter, Maktoob, and Facebook, is dispersed and unstructured. As a result, data preprocessing is one of the most critical processes in detecting sarcasm. The most prominent preprocessing techniques are data tokenization, stop words deletion, lemmatization, or stemming words. Transforming sentences into words is referred to as “tokenization”. The terms are converted into their stem or root form during stemming and lemmatization. The stop words would be removed during the stop word replacement method. Take, for example, articles. Another data preprocessing technique for detecting sarcasm is POS (part-of-speech) identification. 2.3 Feature Engineering Several methods are used at this stage to extract features from a cleaned data set. In sarcasm detection systems, methods based on the frequency of words to award a relative score, such as TF-IDF, N-gram, and Bag of words, are employed. However, because sarcasm detection is context-dependent, methods based on context, such as Word embedding and transformers, yield better results.
844
S. Mihi et al.
Feature selection is a dimensionality reduction technique in which relevant features are chosen, and unnecessary and redundant features are removed. Reduced input dimensionality can increase performance by lowering the model’s learning speed and complexity or enhancing the classification’s generalization and accuracy. In addition to reducing the overall cost of measurement, finding appropriate features helps improve understanding of the sarcastic utterances. 2.4 Classification The detection of sarcasm can be reduced to a binary classification task if there are two categories: sarcastic or not. Alternatively, multi-way classification in case there is several categories of sentiments including sarcasm. Besides, Sarcasm classification can be performed by four methods, machine learning methods, rule-based methods, deep learning methods, and hybrid methods. Classification approaches will be further discussed in Sect. 4.
3 Datasets In this section, we present a selection of English and Arabic datasets to compare their size, collection method and the use for automatic sarcasm detection. 3.1 English Datasets Twitter is a famous microblogging network within the Natural Language Processing community. Users post tweets, which are 140-character messages about a variety of topics ranging from personal life to politics. Twitter provides an API for extracting messages, making it easier to create datasets with short utterances. We observed that the first English dataset for sarcasm identification dates back to 2010, when Davidov et al. [1] employed Amazon Mechanical Turk service with 15 annotators to label a twitter collected dataset. They created a gold standard corpus combined with self-annotated tweets based on the “#sarcastic” hashtag. However, hashtags in the middle or at the beginning of tweets result in noisy annotations. To address this issue, the authors of [2] filtered all tweets with the hashtags of interest that were not located at the very end of the message to maintain only tweets that express sarcasm like “thanks. I can always count on you for comfort:) #sarcasm”. The final balanced dataset consists of 2700 tweets divided into 3 categories: positive, negative, and sarcasm. The work done in [3] arises from the contrast of positive words with negative content. It consists of 1600 tweets with “#sarcasm” and “#sarcastic” hashtags and 1600 other tweets. To ensure a high-quality annotation, three annotators manually labeled a gold standard from the whole dataset, resulting in 713 sarcastic tweets. A similar approach presented in Sem-Eval-2018 task 3 [4] resulted in a balanced dataset that contains 4792 sarcastic and non-sarcastic tweets. These tweets were initially collected based on the hashtags “#sarcasm,” “#irony,” and “#not,” regardless of their position in the tweet and then annotated by three linguistics students using a fine-grained annotation scheme.
A Comparative Review of Tweets Automatic Sarcasm Detection in Arabic and English
845
Whether using a manual annotation or a supervised hashtag approach, determining whether a statement is intended to be sarcastic or is perceived as such by the reader is difficult. Vlad Oprea’s et al. [5]. Research involves creating a dataset of intended sarcasm called the iSarcasm dataset. The authors asked Twitter users to reference one tweet meant to be sarcastic and three others that did not hide any ironic intent for data collection. Thus, they built a dataset of cleaned survey responses containing 777 sarcastic and 3707 non-sarcastic tweets. At the same time, they relied on annotators to annotate these tweets to compare the two views provided by the tweets creator and the external annotators. The compilation of state-of-the-art models on the new dataset showed low performance compared to the perceived dataset. The best model gave an F1 score of 0.364 against 0.616 for the same manually annotated dataset. Because most of the work on automatic sarcasm detection focused on English, some academics have attempted to apply it to other resource-poor languages like Czech. The authors [6] used the Twitter search API to collect 780k English tweets. A distant supervision system based on the “#sarcasm” hashtag was utilized to detect sarcastic tweets. The analysis produced two distributions of 100k balanced and imbalanced filtered tweets. Other studies concerned low-resource languages by comparing them with English such as Hindi in [7], Spanish [8] and Filipino [9]. 3.2 Arabic Datasets The problem of automatic sarcasm detection is a very recent issue in Arabic. Arabic is one of the most morphologically complex languages due to its inflectional nature. Given the high immersion of Arabic in social networks, several shared tasks have been organized to handle sarcasm detection in Arabic. Not far away, the first shared task on irony detection in English was organized during Sem-Eval 2018 [4]. Two classifications were addressed: the first one is the detection of sarcastic tweets, and the other one, in addition to the classification of tweets, finding the type of irony expressed. The best F1 score obtained on the former task is 0.71 against 0.51 for the latter. The 11th meeting of Forum for Information Retrieval Evaluation 2019 [10] was organized in India, and it involved the first sarcasm detection task in the Arabic language. The forum shared the IDAT dataset that consisted of 5030 tweets collected using (Masquerade, Mockery, Taunt). Arabic hashtags Ten participants submitted their run for the binary classification task, their approaches consisted of machine and deep learning methods with different techniques for feature weighting ranging from traditional bag of words to sophisticated embeddings. The best F1 score measure was obtained by an ensemble model [11] combined with TF-IDF ngram features showing that traditional ensemble methods outperformed the deep learning techniques for this task. The paper [12] introduced a domain-specific corpus collected from political tweets. Initially, the authors used the names of political candidates of the presidential elections in Egypt and the US and then collected the tweets containing them based on the same hashtags as in the paper [10] for sarcastic tweets and without those hashtags for nonsarcastic tweets. The imbalanced dataset resulting from the collection and cleaning process contains 5479 tweets, of which 1733 are ironic. It contains a mix of modern standard Arabic and dialectal Arabic, especially Egyptian. For validation purposes, the
846
S. Mihi et al.
authors constructed four groups of features: surface, sentiment, shifter, and internal context features and obtained satisfactory results similar to those obtained for English, French, and Japanese, with an F1 score of 0.72%. The Sixth Arabic Natural Language Processing Workshop (WANLP 2021) [13]. For two projects, WANLP-21 had 40 groups participate, one of which is the identification of sarcasm and sentiment in Arabic language. ArSarcasm [14] is the name of the shared dataset. This dataset contains 10547 tweets, 1687 are sarcastic and are labeled for sentiment and dialect. The approach adopted for the constitution of the ArSarcasm dataset is different from the previous ones. In effect, the authors considered the re annotation of datasets commonly used in the literature for the task of subjectivity and sentiment classification, the ASTD [15] and Sem-Eval 2017 [16]. The authors used a crowdsourcing method by sharing the guidelines with Arabic annotators to label the datasets according to three labels: sentiment, sarcasm and dialect type. Abuteir et al. [17] collected a balanced dataset consisting of 10000 sarcastic tweets that were posted between 2010 and 2020. The dataset is about sports and politics. It was compiled using hashtags synonymous with sarcasm or irony expressed in standard Arabic or Arabic dialect and was automatically annotated by the tweeters. Likewise, the authors of [18] presented the first Moroccan dataset that handles also sarcastic tweets. The Moroccan Sentiment Twitter Dataset (MSTD) contains 2188 sarcastic tweets manually annotated by three native speakers. A similar dataset was collected in the context of COVID-19 called the AraCOVID19-SSD. This dataset considers sentiment analysis and sarcasm detection of over 300k collected tweets containing terms related to the COVID-19 virus in the Arabic language. The distribution of the dataset is imbalanced with 1802 sarcastic tweets and 3360 non-sarcastic tweets. Experiments showed that the SVM classifier and the Arabert [19] transformer yielded very successful results for the sarcasm detection task reaching a F1-score of 0.95.
4 Classification Approaches This section presents a selection of works related to the detection of sarcasm in Arabic and English by focusing on the classification methods and the results obtained. 4.1 English Rule-based classification models are based on the correlation between indicators or rules and the presence of sarcastic intent. Thus, we seek syntactic, semantic, or other markers to judge a text as sarcastic. As such, the paper [20] investigates the effect of detecting sarcastic tweets on the sentiment analysis results. The proposed system recognized the presence of sarcasm within a tweet by using a hashtag tokenizer and then compiled several rules deduced from a fine-grained analysis of collected tweets. Furthermore, the producers of the Riloff dataset [3] considered learning positive sentiments and negative situations with the assumption that a sarcastic tweet is expressed by a positive expression contrary to the situation which is perceived to be negative. The most common method for detecting sarcasm is machine learning. The majority of models in the literature are based on SVM classifiers, naive Bayes, or logistic regression. The work in [21] evaluated an SVM based classifier with four features: lexical,
A Comparative Review of Tweets Automatic Sarcasm Detection in Arabic and English
847
pragmatic, explicit and congruity on two datasets and compared their results with the best reported evaluations. Their SVM based model showed an improvement in F1 score. Correspondingly, the authors of [22] investigated the representativeness of different patterns to irony detection. They performed classification using Naïve Bayes and Decision Tree to two distributions of datasets: balanced and imbalanced. The obtained results improved each time a feature is added, which generates a probabilistic behavior very dependent on the selected features. With the impeding success of deep learning in Natural Language Processing applications, multiple papers addressing various deep learning algorithms for sarcasm detection have been proposed in the recent 5 years. Le Hoang et al. [23] proposed a deep learning model based on the hybrid of soft attention-based bidirectional long short-term memory and convolution neural network with semantic word embedding using GloVe. They benchmarked the proposed model on two large-scale datasets including 15000 sarcastic tweets and yielded an accuracy of 97.87%. In another work, Ghosh et al. [24] compared a recursive SVM model with a set of features and a deep learning model for modeling the semantic of words in documents containing sarcasm. The obtained results were Tsur [1]. 4.2 Arabic The contribution described in [25] presented a hybrid system based on rule-based features and an ensemble model including embedding with MarBERT [26] and Mazajak [27]. The weighted prediction was calculated from the three models: Gaussian Naïve Bayes, Bi-LSTM, and MarBERT. The model was evaluated for sarcasm detection and sentiment analysis, giving a lower F1 score for sarcasm detection with 0.51 against 0.71 for sentiment categorization. Meanwhile, Allam et al. [28] proposed a machine learning model for the same tasks. They used traditional machine learning algorithms for the classification phase including SVM, Logistic Regression, Naïve Bayes and Stochastic Gradient Descent. Their model outperforms other deep learning models for sarcasm detection demonstrating yet more evidence that SVM is an excellent choice for sentiment analysis tasks. Similarly, the paper [29] used a Weka classifier model for sarcasm detection on a collected dataset from Saudi Tweets. The experiments achieved an F1-score of 0.676 by exploiting some syntactic features used in sarcastic tweets. Recently, the Sem-Eval 2022 shared the iSarcasmEval task dealing with intended sarcasm in Arabic and English, joining the work of [5], where the dataset is auto labeled by the authors of tweets. The intended sarcasm is a new field of research in both Arabic and English languages.
5 Conclusion Sarcasm is a figurative form of writing very common on social networks. It constitutes a challenge for researchers in Natural Language Processing because of the nuances of expression and context it represents. In the last five years, research in sarcasm detection
848
S. Mihi et al.
has increased exponentially due to the importance of this task for the proper interpretation and analysis of subjectivity and feelings. This work has discussed the advances in automatic sarcasm detection by highlighting the difference between English and Arabic. Publications and shared tasks show the growing interest in the Arabic language despite the scarcity of large-scale and publicly available resources and datasets.
References 1. Davidov, D., Tsur, O.: Semi-supervised recognition of sarcastic sentences in Twitter and Amazon, pp. 107–116 (2010) 2. González-ibáñez, R.: Identifying sarcasm in Twitter : a closer look. identifying sarcasm in Twitter : a closer look, March 2014 (2011) 3. Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation, pp. 704–714 (2013) 4. Van Hee, C., Lefever, E.: SemEval-2018 task 3 : irony detection in English tweets, pp. 39–50 (2018) 5. Oprea, S.V.: iSarcasm: A Dataset of Intended Sarcasm (2016) 6. Habernal, I., Hong, J., Republic, C., Science, C.: Sarcasm detection on Czech and English Twitter, pp. 213–223 (2014) 7. Swami, S., Khandelwal, A., Singh, V., Sarfaraz, S.: A corpus of English-Hindi code-mixed tweets, pp. 1–9 (2018) 8. Walker, T.M., Justo, R.: Detection of Sarcasm and Nastiness : New Resources for Spanish (2018) 9. Samonte, M.J.C., Dollete, C.J.T., Capanas, P.M.M., Flores, M.L.C., Soriano, C.B.: Sentencelevel sarcasm detection in English and Filipino Tweets, pp. 181–186 (2018) 10. Ghanem, B., Karoui, J., Benamara, F., Rosso, P.: IDAT @ FIRE2019 : overview of the track on irony detection in Arabic tweets, pp. 12–15 (2019) 11. Khalifa, M., Hussein, N.: Ensemble learning for irony detection in Arabic tweets, pp. 12–15 (2019) 12. Karoui, J., Zitoune, F.B., Moriceau, V.: SOUKHRIA: towards an irony detection system for Arabic in social media. Procedia Comput. Sci. 117, 161–168 (2017) 13. WANLP 2021: The Sixth Arabic Natural Language Processing Workshop Proceedings of the Workshop (2021) 14. Abu-farha, I., Magdy, W.: From Arabic Sentiment Analysis to Sarcasm Detection : The ArSarcasm Dataset, pp. 32–39 (2020) 15. Nabil, M., Aly, M., Atiya, A.: ASTD: Arabic Sentiment Tweets Dataset, pp. 2515–2519 (2015) 16. Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 Task 4 : Sentiment Analysis in Twitter SemEval-2017 Task 4 : Sentiment Analysis in Twitter (2017) 17. Abuteir, M.M., Elsamani, E.S.A.: Automatic Sarcasm Detection in Arabic Text : A Supervised Classification Approach (2021) 18. Mihi, S., Ait, B., El, I., Arezki, S., Laachfoubi, N.: MSTD: Moroccan sentiment Twitter dataset”. Int. J. Adv. Comput. Sci. Appl. 11(10), 363–372 (2020) 19. Antoun, W., Baly, F., Hajj, H.: AraBERT: transformer-based model for Arabic language understanding (2020) 20. Maynard, D.: Who cares about sarcastic tweets ? Investigating the impact of sarcasm on sentiment analysis This is a repository copy of Who cares about sarcastic tweets ? Investigating the impact of sarcasm on sentiment analysis. White Rose Research Online URL for this paper , March 2014 (2019)
A Comparative Review of Tweets Automatic Sarcasm Detection in Arabic and English
849
21. Joshi, A., Sharma, V., Bhattacharyya, P.: Harnessing context incongruity for sarcasm detection 2003, 757–762 (2015) 22. Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in Twitter, pp. 239–268 (2013) 23. Son, L.E.H., Nayyar, A.: Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access 7, 23319–23328 (2019) 24. Ghosh, A., Veale, T.: Fracking sarcasm using neural network, pp. 161–169 (2016) 25. Gaanoun, K.: Sarcasm and sentiment detection in Arabic language : a hybrid approach combining embeddings and rule-based features, no. Figure 1, pp. 351–356 (2021) 26. Abdul-mageed, M., Elmadany, A., Moatez, E., Nagoudi, B.: ARBERT & MARBERT: deep bidirectional transformers for Arabic, no. i (2020) 27. Farha, I.A.: Mazajak : an online Arabic sentiment analyser, pp. 192–198 (2019) 28. Allam, A.H., Abdallah, H.M., Amer, E., Nayel, H.A.: Machine learning-based model for sentiment and sarcasm detection, Ml, pp. 386–389 (2021) 29. Al-ghadhban, D.: Arabic Sarcasm Detection in Twitter (2017)
Mobile Payment as a Lever for Financial Inclusion Hanane Azirar1(B) , Bouchra Benyacoub1 , and Samir Aguenaou2 1 USMBA, Fez, Morocco [email protected] 2 Al Akhawayn University, Ifrane, Morocco [email protected]
Abstract. Over the past decade, the international community and countries around the world have undertaken concerted efforts to develop inclusive finance. The challenge is to create a financial system that is accessible to all while promoting stable, sustainable and equitable progress. Even in countries where financial inclusion is quite high, mobile money can work. Nonetheless,Bank account ownership, ATM access, mobile banking and online banking are inversely related to mobile money ownership. In this context, Morocco launched mobile payments in 2016. Many of the strategies implemented so far have struggled to overcome overcome the challenges. Nevertheless, since the 2000s, the emergence of mobile money has given finance a key role in this effort. Nowadays, it is possible to make some initial evaluations of the progress. This study assesses the effect of mobile money on reducing inequalities in access to digital financial services. A model composed of 7 variables was elaborated based on the SPSS 22 statistical software in order to draw up a questionnaire which was randomly sent out to a a sample population in Morocco, of which 156 people responded. The analysis reveals that mobile money significantly reduces inequalities in access to financial services. Indeed, the study shows that there are two main variables which have an effect on reducing inequalities in access to financial services, namely age and income. It also shows that lack of trust, supply and financial literacy also play a determining role in financial exclusion. Keywords: Morocco · Mobile money · Crisis · Reducing inequalities · Access to financial services · Financial inclusion
1 Introduction According to the World Bank indicators, the growth rate (GDP%) of sub-Saharan African countries dropped from 5.5% to 2.28% between the year 2010 and 2019, a downturn which is due to a persistently high the rate of poverty. According to the World Bank’s report on Poverty and Shared Prosperity 2018: Completing the Poverty Puzzle, more than half of the world’s poor lived in sub-Saharan Africa and it was in this region along with South Asia that 85% of poor people were concentrated. The remaining 15%, or © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 850–861, 2023. https://doi.org/10.1007/978-3-031-26384-2_75
Mobile Payment as a Lever for Financial Inclusion
851
about 106 million people, lived in the other four regions of the world. In all regions except sub-Saharan Africa, average poverty rates range from 1.5 to 12.4%; whereas in sub-Saharan Africa, about 41% of the population lives below the international poverty line, set at $1.90 per person per day, based on 2011 purchasing power parity (PPP) conversion rates. Several theoretical works have addressed the impacts related to the adoption of mobile money. In this context, the uneven distribution of wealth in African countries has the major effect of generating even more inequality. To mitigate this conundrum several international institutions and entities (World Bank, IMF, UNDP, OECD…) present development strategies so that financial development become a catalyst involved in reducing financial exclusion and reducing inequality. The literature (Aker and Mbiti 2010; Aker and Blumenstock 2015) has addressed the different mechanisms by which mobile money is expected to bring about economic benefits. Over the last decade, Morocco has undertaken several initiatives to expand access to different financial services for the benefit of broader segments of the population, individuals as well as small businesses. Indeed, financial inclusion has come to be regarded as a necessity for development, especially as far as the use of digital financial services through the use of cell phone or a smartphone is concerned. This is considered a major input for the development of financial policies so that the country may build a more inclusive and sustainable system financially. In 2014, a first step was taken by Bank Al Maghrib (BAM) to develop a strategy for the development of mobile payment systems after introducing several payment institutions as a new category of financial sector players into the national banking law. Currently, during the covid-19 health crisis, under lockdown conditions in order to reduce the number of infections, the use of monetary and financial transactions through a cell phone has become a necessity to avoid danger. Indeed, the impact of the use of mobile money in times of crisis at the level of urban and rural areas as well as the socio-economic situation of the population has been highly positive in the reduction of inequality and financial exclusion for a more inclusive and sustainable system. These elements bring up the question: did the crisis have an effect on the use of mobile money to reduce inequality in access to financial services? and did mobile money have an effect on the level of financial inclusion? To answer this question, this study proposes a study in times of crisis in order to determine the effect the crisis had on mobile payment as well as its effect on reducing inequality in access to financial services in Morocco, among a sample population of 170 people. The first section will present the theoretical framework of mobile money,the second section will present the methodological framework and the current state of the use of mobile money in Morocco and the reduction of inequality during the covid-19 crisis. The last section will be devoted to the results and discussion, and finally a conclusion of this study will be given.
852
H. Azirar et al.
2 Theoretical Framework 2.1 Literature Review There has been a multiplicity of recent economic literature dealing with the advantages of financial inclusion within the framework of improving living conditions for the people as well as their further economic development. Most of these analyses point out the factors that influence the adoption of formal banking such as mobile money. Thanks to its versatility, the cell phone is widely used as a tool that simplifies access to financial services, even for a growing number of people who are currently not included in the banking system. The capacity of mobile financial services to become an instrument which allows a greater access of people who are non-bank-account holders into the financial system is the object of several articles in literature. Most of these studies address the issue of the adoption of the use of mobile money. This has been a recent financial innovation that provides services for financial transactions due to the bias that people have towards their cell phone, especially for poor people throughout the world. Mobile phone technology is widespread in the developing world, “jumping over” and bypassing the services provided by the formal banking sector due to the precariousness of institutional infrastructures and the high cost of traditional banking services (Aron, Leapfrogging: a Survey of the Nature and Economic Implications of Mobile Money, January-2017) (Aron, Mobile money and the economy: a review of the evidence 2018). Along the same line of thought, mobile money could be one of the major determining factor behind the adoption of mobile money and policies to be enacted in order to reduce bottlenecks due to the weak inclusion of digital financing into the UEMOA (Senou 2019). Another similar study was carried out over the determining factors behind the process of adoption of mobile money and as to whether its use has helped households in Togo become more resilient to expected and unexpected living expenses. The results from ordered and sequential logit models showed that during the adoption process, households tended to benefit from weak social links such as the religious groups and other informal groups to enhance the transition over to mobile money. Likewise, the fact of being a client of a bank or of a microfinance institution constitutes a powerful channel to pass from one period to another. Furthermore, results showed as well that households that used mobile money appeared to be more resistant to not only climate shocks such as droughts, floods, soil degradation and a lack of fertility but also to other non-climate problems that affected them. However, the greatest contrasts are seen when individuals are classified into underprivileged groups such as rural populations, women, the under-educated and the impoverished (Komivi Afawubo 2019). Also, there was another study that looked into whether the fact of providing information over the complete range of services offered by the platform would bring about an increase in the use of mobile money. To that end, the study was carried out in the Ashanti region in Ghana. Effectively, education related to mobile money had a significantly enhanced positive impact on its use for recent transactions. However, there is no conclusive evidence that the workplace had a significant impact over the opening of new mobile money accounts or over the proportion of transactions carried out via mobile money. Likewise, in terms
Mobile Payment as a Lever for Financial Inclusion
853
of impact, weak and unsteady results were observed in the case of remittances sent after interventions (Apiors, 13 april 2022). According to Lenka (Lenka 2021), the financial sector can be examined from two different points of view: the financial development (depth and liquidity) and financial inclusion (access to services. The literature presents several definitions of financial inclusion. According to the Banque Centrale of Morocco (Banque Centrale du Maroc 2022), this is a multi-faceted concept consisting of several components whose degree of importance is determined by the context of each individual country, namely: – Access - the capacity of use of offered by formal institutions – Quality - how well such financial services and products respond to customers’ needs. This is reflected in the customers’ experience as shown through their attitudes and opinions towards available services and products. – Use - how often these financial services and products are used, – Wellbeing - the impact the use of these services and products has over the customers’ quality of life. Theoretical fundamentals. The theory of limited access/limited control (RALC) over privacy (confidentiality) recognizes that upon configuration of a parameter or a privacy zone to restrict outside access to your personal information, strict controls are necessary. The privacy policy of a web site protects user information in particular situations by controlling outside access. Web Site. The implementation of RALC enables the creation of an on-line privacy policy that is broad enough to handle a wide range of digital transactions linked to privacy. These user digital protection measures could create the proper atmosphere for transactions on mobile money platforms in order to promote a greater financial inclusion (Moor, The Ethics of Privacy Protection, 1990) (Moor, Towards a theory of privacy I1, 1997). According to Louise Malady (Louise Malady, Building Consumer Demand for Digital Financial Services:The New Regulatory Frontier 2014) mobile money services have shown a great potential for bringing about financial inclusion. The image of mobile financial service providers, the level of customer trust and the value of services offered influence the adoption of mobile money services and products. The use of mobile money financial services and banking entities however, introduce new operational and technical risks such as new forms of fraud and corruption, an inappropriate concept of products and system breakdowns. 2.2 The Concept of Mobile Money Brack (2013) defined mobile banking which in the strict sense of the term, refers to financial services via cell phone offered by banks. In this case, it is used mainly for consulting balance, bill payment and money-transfer services. In a broader sense, the concept extends to all financial services that can be offered with or without a bank account, via mobile, SMS or Smartphone applications by any institution approved for this purpose. Mobile payment is defined as “a transfer of funds, in exchange for a good or service, where the cell phone is used to initiate and confirm the payment” (Tobbin 2011, p. 188).
854
H. Azirar et al.
So mobile money is a new mode of payment, a financial service provided by credit institutions (banks) through the use of a cell phone or a mobile. These transactions pass through mobile operator networks and also cover several types of payments through Internet channels. Different financial operations over the cell phone take two essential forms which are mainly the exchange of financial information and the carrying out of financial transactions. Mobile money is therefore a means that promotes financial inclusion and allows agents (individuals or companies) to achieve easy access to financial services from applications specific to this purpose accessible at any time regardless of their area of residence. 2.3 State of the Art of Mobile Payment in Morocco In 2018, the mobile payment mode was launched in Morocco. Its generalized use at all levels, however, has encountered many obstacles since its full application requires a technical and structural infrastructure. In several countries, mobile payments have been an important, even central, vector of financial inclusion. It has achieved very high levels of account penetration and has been positioned as the main vehicle for “financialization” for a significant portion of the population. The success of this model can be attributed to the attractiveness of the value proposition for users, particularly for less affluent households: – A strong capillarity thanks to the development of agent networks which allow users to carry out their operations in a large number of contact points; • The simplicity of the services which allow an easy use even for populations with limited knowledge of financial products; • Price competitiveness, significantly lower than the cost of conventional operations; • Less restrictive access conditions with lighter “Know Your Customer” procedures than for traditional accounts. In Morocco, aware of the potential offered by digitalization to promote financial inclusion and the high penetration of mobile telephony in both urban and rural areas, for several years BAM has been committed to a strategy of developing Mobile Payment. Indeed, a first step was taken in 2014 after introducing payment institutions as a new category of financial sector players into banking law. In addition to existing Credit institutions, these new institutions are able to open payment accounts, collect deposits and offer payment services to their customers thus promoting innovation and the creation of a competitive market. Since then, the financial sector has been mobilized under the impetus of BAM and has continued to work on the implementation of a national solution for Interoperable Mobile Payment, widely available at a low cost. Indeed, a strategic committee including, in addition to BAM and the ANRT, the major banks in the country, the three telecom operators, the MEFRA and the Ministry of Industry, has been set up to steer this project.
Mobile Payment as a Lever for Financial Inclusion
855
This committee was given the task of defining the necessary guidelines for the market to ensure the proper function of inter-operated transactions, conducting syndication and refining the business model for an effective launch of the solution in 2018. At the same time, BAM has granted licenses to payment institutions whose applications have been examined and evaluated in accordance with legal and regulatory requirements. These institutions rely on principal retail agents to strengthen the capillarity of their networks and to serve under-privileged or even excluded segments of the informal sector. In addition, it should be noted that the typologies of payment accounts have been defined, with simplified KYC levels compared to bank accounts and differentiated according to the amounts deposited, (The national strategy for financial inclusion 2019). Mobile money is among the new technologies that promote high savings penetration. In this sense, several works (Asongu 2018) have shown that the use of mobile money reduces income inequality in developing countries. Therefore, we find four main channels of transmission through the use of mobile money: transfer, credit and savings, payment and insurance. These channels of transmission have an important role in universal financial inclusion for individuals and SMEs. 2.4 Mobile Payment and Financial Inclusion in Morocco The national strategy for financial inclusion aims to move a large category of the informal sector into the country’s formal economy. For several years now, Morocco has made financial inclusion one of the levers of the country’s economic and social development through facilitating access of the vulnerable population groups (women, SMEs, youth) to diverse financial services. In order to increase financial inclusion, mobile payment systems were launched in Morocco in 2016. These have had a very high penetration in Morocco and has enabled a greater accessibility to the unbanked population. In 2018, the central bank (BAM) and the National Telecommunications Regulatory Agency set the regulatory groundwork for mobile payment, 18 approvals having been granted so far. In 2019, the objectives were set to further improve access to financial services and increase financial account penetration from 34% to 47% of the adult population within five years. Morocco has thus become the 18th country in Orange’s Africa and Middle East zone to offer the “dematerialized wallet solution,” as noted by the mobile operator Orange.Moreover, the objective of inclusive finance goes beyond a simple combination of the rate of banking and the penetration rate of telecom lines. The inclusive model has a strategic challenge: to accelerate the widespread use of digital financial services. For example, the operator Inwi was the first to deploy mobile money in Morocco in September 2019. Among the advantages of mobile money are its low cost, immediacy and wide distribution. In other words, digitization systematically ensures the removal of many obstacles through low-cost transactional services.
856
H. Azirar et al.
In Morocco, the health crisis has accelerated the development of the mobile payment business, with 1.5 million mobile wallets having become operational by the end of September, according to Bank Al Maghrib figures. Therefore, we can say that mobile money is a major stimulating factor towards access to financial services which consist of banking transactions carried out over a cell phone. This is a promising way to overcome the low penetration of formal financial services and remove the barriers imposed by the traditional channels that penalize a large customer base, both individuals and businesses.
3 Methodological Framework 3.1 Presentation of Variables To conduct our analysis (mobile payments as a lever for financial inclusion), we found it useful to use two types of variables: The first is a dependent variable, that of financial inclusion that we will try to explain. The second is made up of seven explanatory or independent variables behind the use of mobile money: Educational level, Gender, Location, Socio-professional category, Financial education, Age and Income level. 3.2 Analysis Tools The analysis is performed by SPSS 22 software using the following tests: – The Chi-square test of statistical inference, which measures the degree of association between categorical variables and the Phi and Cramer contingency coefficient, which assesses the intensity and strength of the relationship. – The ANOVA method, developed by Fisher, which makes it possible to account for variations in the dependent variable by a single explanatory factor. The analysis is completed by the R2 coefficient of determination, which gives an idea of the existence of a relationship among variables. 3.3 Characteristics of the Sample The questionnaire was randomly addressed to 250 people through several channels (email, social networks, etc.). 170 responses were received, of which 58.8% were men and 41.2% were women.156 respondents had a bank or mobile account, 91.8%, while 14 people do not have a bank account, 8.2%. So the sample to be estimated consists of 156 people. This is a simple random sample from information collected via questionnaires. 3.4 Results and Discussion The first point of this section presents the results of the estimations, the second presents the test of the hypotheses made in this work and the third point focuses on the discussion of the results.
Mobile Payment as a Lever for Financial Inclusion
857
From the results obtained during the econometric analysis, we can state that all variables have a non-significant relationship (p less than 0.05). This was confirmed by the tests of the strength of relationship (V/phi of cramer) (coefficient of determination R) which indicated that the strength of relationship is low, except for the Gender variable which has an moderate influence over the use of mobile money. From the estimation results, we can conclude that there were two main variables that influence the use of mobile money during the crisis covis-19, namely gender which has a medium relationship and income which has a strong relationship on the use of mobile money and therefore financial inclusion. The use of mobile payments is a lever for financial inclusion and for the reduction of income inequality and access to financial services. The context of the crisis has accelerated the digitalization of financial services. The results of this study show a significant relationship between gender and the use of mobile money (58.8% men, 41.2% women in a sample of 170 people). Mobile money has significantly included a large part of the population (Thulani et al. 2014). This result is in line with those reported by Demir et al. (2020). Regarding the relationship between income and inequality reduction, studies (Demir et al.) (2020) have shown that mobile money promotes financial inclusion and has been shown to significantly reduce income inequality in 140 countries examined. In our study, we have 42.4% of respondents (170 people in the sample) with a high income (more than 7000 dh), which confirms the strong relationship between income level and inequality reduction by increasing the use of mobile payment. The national financial inclusion strategy 2019 has considered financial education as an important component to support the use of mobile money. 50% of our sample have never heard of mobile payment and 73.5% need financial education; this is confirmed by the strong preference for cash. Among the major obstacles to not using mobile payment we find: – – – –
low financial literacy provisioning a lack of trust in this type of payment Technical problems related to network instability
Therefore, financial education is an important lever to promote the use of mobile payments. In our study sample consisting of 170 people, 82.4% had security concerns and 49.4% had privacy concerns regarding mobile payments. In the context of the Covid-19 crisis, the penetration of mobile money has been accelerated in Morocco. At least judging by Bank Al-Maghrib: 1.5 million mobile wallets were operational by the end of September 2020. It was also confirmed by the results of our model’s estimation that income and gender have a major effect on the use of mobile payment and on the reduction of inequalities in access to financial services.
858
H. Azirar et al.
4 Conclusion This study has shown that although the financial reforms undertaken have not yet fully achieved their intended results, they have been conducive to the development of an increasingly greater financial inclusion in Morocco. Indeed, in the context of the Covid-19 crisis, the penetration of mobile money was considerably accelerated in Morocco. At least judging by Bank Al-Maghrib’s figures, there were 1.5 million mobile wallets operational by the end of September 2020. The results of our model’s estimation confirmed that income and gender have an effect on the acceptance of mobile payment systems and on the reduction of inequalities in access to financial services, thus enhancing financial inclusion via mobile payment. The analysis of these results leads us to recommend three points to mobile money users: – To increase the use of mobile money, revenues must be increased through financial development strategies. – To develop strategies to further integrate women into the financial system through the use of mobile payments. – The generalization of financial education. These are among the objectives of Morocco’s national financial inclusion strategy and are pillars for promoting the new development model. At the end of this study, however, it must be noted that the lack of available data about mobile money is a major limitation. Although the sample covers 156 people, it has barely provided any solid data on these variables. Further study would be even more interesting if it could include a larger sample population of Moroccans and a wider study period to better determine the effect over time. Moreover, theoretically, education, selfemployment, and growth are not the only channels that mobile money uses to reduce inequality in access to financial services, so further studies could explore this avenue of research. In addition, the first of such avenues to be considered would be to extend the current research to other segments of the population. A second way in which this work could evolve would consist of using a qualitative methodology based on interviews to address the same issue.
References Adeleye, I., Debrah, Y.A., Nachum, L.: Management of financial institutions in Africa: emerging themes and future research agenda. Africa J. Manag. 5(3), 215–230 (2019). https://doi.org/10. 1080/23322373.2019.1657766 Afawubo, K., Couchoro, M.K., Agbaglah, M., Gbandi, T.: Mobile money adoption and households’ vulnerability to shocks: evidence from Togo. Appl. Econ. 52(10), 1141–1162 (2020). https:// doi.org/10.1080/00036846.2019.1659496 Ali, G., Dida, M.A., Elikana Sam, A.: A secure and efficient multi-factor authentication algorithm for mobile money applications. Future Internet 13(12), 299 (2021). https://doi.org/10.3390/fi1 3120299
Mobile Payment as a Lever for Financial Inclusion
859
Apiors, E.K., Suzuki, A.: Effects of mobile money education on mobile money usage: evidence from ghana. Eur. J. Dev. Res. (2022). https://doi.org/10.1057/s41287-022-00529-x Aron, J.: Mobile money and the economy : a review of the evidence. World Bank Res. Observer 33(2), 135–188 (2018). https://doi.org/10.1093/wbro/lky001 Ashenafi, F.: The role of mobile money in financial inclusion in the SADC region (2016). https:// doi.org/10.13140/RG.2.2.26994.71369 Bidiasse, H., Mvogo, G.P.: Les déterminants de l’adoption du mobile money: L’importance des facteurs spécifiques au Cameroun. Revue d’économie industrielle 165, 85–115 (2019). https:// doi.org/10.4000/rei.7845 Boukhatem, J., Mokrani, B.: Effets directs du développement financier sur la pauvreté : Validation empirique sur un panel de pays à bas et moyen revenu. Mondes en développement 160(4), 133–148 (2013). https://doi.org/10.3917/med.160.0133 Bounie, D., Bourreau, M., François, A., Verdier, M.: La détention et l’usage des instruments de paiement en France. Revue d’économie financière 91(1), 53–76 (2008). https://doi.org/10. 3406/ecofi.2008.5057 Calderon, C., Kambou, G., Korman, V., Kubota, M., Cantu Canales, C.: Une analyse des enjeux façonnant l’avenir économique de l’afrique. World Bank (2019). https://doi.org/10.1596/9781-4648-1510-2 Chaix, L.: Le paiement mobile : Modèles économiques et régulation financière. Revue d’économie financière 112(4), 277–298 (2013a). https://doi.org/10.3917/ecofi.112.0277 Chaix, L., Torre, D.: Le double rôle du paiement mobile dans les pays en développement. Revue économique 66(4), 703–727 (2015a). https://doi.org/10.3917/reco.664.0703 Chamboko, R., Guvuriro, S.: The role of betting on digital credit repayment, coping mechanisms and welfare outcomes: evidence from Kenya. Int. J. Finan. Stud. 9(1), 10 (2021). https://doi. org/10.3390/ijfs9010010 Cicchiello, A.F., Kazemikhasragh, A., Fellegara, A. M., Monferrà, S.: Gender disparity effect among financially included (and excluded) women in Middle East and North Africa. Econ. Bus. Lett. 10(4), 342–348 (2021). https://doi.org/10.17811/ebl.10.4.2021.342-348 Coulibaly, S.S.: A study of the factors affecting mobile money penetration rates in the West African Economic and Monetary Union (WAEMU) compared with East Africa. Finan. Innov. 7(1), 1–26 (2021). https://doi.org/10.1186/s40854-021-00238-0 Demirguc-Kunt, A., Klapper, L., Singer, D., Ansar, S., Hess, J.: The global findex database 2017: measuring financial inclusion and the fintech revolution. World Bank, Washington, DC (2018). https://doi.org/10.1596/978-1-4648-1259-0 Dissaux, T.: Socioéconomie de la monnaie mobile et des monnaies locales au Kenya : Quelles innovations monétaires pour quel développement ? Revue de la régulation 25 (2019). https:// doi.org/10.4000/regulation.15139 Fox, M., Van Droogenbroeck, N.: Les nouveaux modèles de mobile Banking en Afrique : Un défi pour le système bancaire traditionnel ? Gestion 2000 34(5), 337–360 (2018). https://doi.org/ 10.3917/g2000.345.0337 Gruber, H., Koutroumpis, P.: Mobile telecommunications and the impact on economic development: MOBILES AND GROWTH. Econ. Policy 26(67), 387–426 (2011). https://doi.org/10. 1111/j.1468-0327.2011.00266.x Guermond, V.: Whose money? Digital remittances, mobile money and fintech in Ghana. J. Cult. Econ. 1–16 (2022). https://doi.org/10.1080/17530350.2021.2018347 Hamdan, J.S., Lehmann-Uschner, K., Menkhoff, L.: Mobile money, financial inclusion, and unmet opportunities. Evidence from Uganda. J. Dev. Stud. 1–21 (2021). https://doi.org/10.1080/002 20388.2021.1988078 Ifediora, C., et al.: Financial inclusion and its impact on economic growth: empirical evidence from sub-Saharan Africa. Cogent Econ. Finan. 10(1), 2060551 (2022). https://doi.org/10.1080/ 23322039.2022.2060551
860
H. Azirar et al.
Jack, W., Suri, T.: Risk sharing and transactions costs: evidence from kenya’s mobile money revolution. Am. Econ. Rev. 104(1), 183–223 (2014). https://doi.org/10.1257/aer.104.1.183 Kameswaran, V., Hulikal Muralidhar, S.: Cash, digital payments and accessibility : a case study from metropolitan India. Proc. ACM on Hum.-Comput. Interact. 3(CSCW), 1–23. https://doi. org/10.1145/3359199 Kedir, A., Kouame, E.: FinTech and women’s entrepreneurship in Africa : the case of Burkina Faso and Cameroon. J. Cult. Econ. 1–16 (2022). https://doi.org/10.1080/17530350.2022.204 1463 Kiendrebeogo, Y., Minea, A.: Accès aux services financiers et réduction de la pauvreté dans les PED. Revue économique 64(3), 483–493 (2013). https://doi.org/10.3917/reco.643.0483 Kim, K.: Assessing the impact of mobile money on improving the financial inclusion of Nairobi women. J. Gend. Stud. 31(3), 306–322 (2022). https://doi.org/10.1080/09589236.2021.188 4536 Lenka, S.K.: Relationship between financial inclusion and financial development in India: Is there any link? J. Public Aff. (2021). https://doi.org/10.1002/pa.2722 Lenka, S.K., Barik, R.: Has expansion of mobile phone and internet use spurred financial inclusion in the SAARC countries? Finan. Innov. 4(1), 1–19 (2018). https://doi.org/10.1186/s40854-0180089-x Lutfi, A., Al-Okaily, M., Alshirah, M. H., Alshira’h, A.F., Abutaber, T.A., Almarashdah, M.A.: Digital financial inclusion sustainability in jordanian context. Sustainability 13(11), 6312 (2021). https://doi.org/10.3390/su13116312 Malady, L., Buckley, R.P.: Building consumer demand for digital financial services the new regulatory Frontier. SSRN Electron. J. (2014). https://doi.org/10.2139/ssrn.2478482 Mawejje, J., Lakuma, P.: Macroeconomic effects of Mobile money: evidence from Uganda. Fina. Innov. 5(1), 1–20 (2019). https://doi.org/10.1186/s40854-019-0141-5 Mercanti-Guérin, M.: Crise du secteur bancaire et portrait de la banque idéale : Une étude menée auprès des jeunes consommateurs. La Revue des Sciences de Gestion 249–250(3), 57 (2011). https://doi.org/10.3917/rsg.249.0057 Natile, S.: Regulating exclusions? Gender, development and the limits of inclusionary financial platforms. Int. J. Law Cont. 15(4), 461–478 (2019). https://doi.org/10.1017/S17445523190 00417 Oborn, E., Barrett, M., Orlikowski, W., Kim, A.: Trajectory dynamics in innovation: developing and transforming a mobile money service across time and place. Organ. Sci. 30(5), 1097–1123 (2019). https://doi.org/10.1287/orsc.2018.1281 Bongomin, G.O.C., Ntayi, J.M.: Mobile money adoption and usage and financial inclusion: mediating effect of digital consumer protection. Digital Policy, Regul. Govern. 22(3), 157–176 (2020). https://doi.org/10.1108/DPRG-01-2019-0005 Robert, T.S.:. L’inclusion Financiere Et Le Paiement Mobile En Zone CEMAC. Eur. Sci. J. ESJ 15(7). https://doi.org/10.19044/esj.2019.v15n7p101 Sakyi-Nyarko, C., Ahmad, A.H., Green, C.J.: The gender-differential effect of financial inclusion on household financial resilience. J. Dev. Stud. 1-21 (2022). https://doi.org/10.1080/00220388. 2021.2013467 Senou, M.M., Ouattara, W., Acclassato Houensou, D.: Is there a bottleneck for mobile money adoption in WAEMU? Trans. Corp. Rev. 11(2), 143–156 (2019). https://doi.org/10.1080/191 86444.2019.1641393 Senyo, P.K., Karanasios, S., Gozman, D., Baba, M.: FinTech ecosystem practices shaping financial inclusion: the case of mobile money in Ghana. Eur. J. Inf. Syst. 31(1), 112–127 (2022). https:// doi.org/10.1080/0960085X.2021.1978342 Serbeh, R., Adjei, P.O.-W., Forkuor, D.: Financial inclusion of rural households in the mobile money era: Insights from Ghana. Dev. Pract. 32(1), 16–28 (2022). https://doi.org/10.1080/096 14524.2021.1911940
Mobile Payment as a Lever for Financial Inclusion
861
Shah, K., Mare, S., Anderson, R.: Understanding mobile money grievances from tweets. In: Proceedings of the Tenth International Conference on Information and Communication Technologies and Development, pp. 1–6 (2019). https://doi.org/10.1145/3287098.3287123 Walker, T., McGaughey, J., Goubran, S., Wagdy, N. (eds.): Innovations in Social Finance. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72535-8 Wiafe, E.A., Quaidoo, C., Sekyi, S.: Monetary policy effectiveness in the advent of mobile money activity: Empirical evidence from Ghana. Cogent Econ. Finan. 10(1), 2039343 (2022). https:// doi.org/10.1080/23322039.2022.2039343 World Bank: East Asia & Pacific. In: World Bank, Global Economic Prospects, June 2015: The Global Economy in Transition, pp. 107–117. The World Bank (2015). https://doi.org/10.1596/ 978-1-4648-0483-0_ch2_EAP
New Approach to Interconnect Hybride Blockchains Hajji Mohammed Amine1(B) , Ziti Soumia1 , Nassim Kharmoum1,2 , Labrim Hicham3 , and Ezziyani Mostafa4 1 Faculty of Sciences, IPSS Team, Mohammed V University in Rabat, Rabat, Morocco
[email protected]
2 National Center for Scientific and Technical Research (CNRST), Rabat, Morocco 3 ENSA, Ibn Tofail University, Kenitra, Morocco 4 Faculty of Sciences and Techniques of Tangier, Abdelmalek Essaadi University, Tetouan,
Morocco
Abstract. Currently the world of blockchain is evolving day by day and makes us discover new solutions in the world of information technology since it offers us an inescapable solution that has made the revolution in terms of the problem of security and decentralization of data (communications, transfers and transactions without going through a 3rd party) while benefiting from the security and reliability of data transfers. This is what gave birth to the creation of decentralized applications, and what made the evolution of blockchain from one version to another. But the main problem in this new technology is incapable to create inter-communication between different Blockchain, heterogynid Blockchain (like Bitcoin blockchain and Euthereum). Keywords: Blockchain · Meta-Model · Interconnect · Hybride · Core-Shell · Inter-communication · Blockchain bridge
1 Introduction Blockchain [1] is a network of information-containing blocks where each block tracks all recent transactions and once done, it is stored permently on a blockchain network, and it´s will be accessible for any updated or delete actions. Actually, there are 4 versions of blockchains: • • • •
Version 1.0: Cryptocurrency Version 2.0: Smart Contracts Version 3.0: Introduction to Dapps Last Version 4.0: Blockchain for industry
The interoperability of blockchains between them or in other words inter-blockchain [2] communication is the crucial problem of the decentralized internet, because their designs are intrinsically irreconcilable: any interaction must be customized. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 862–867, 2023. https://doi.org/10.1007/978-3-031-26384-2_76
New Approach to Interconnect Hybride Blockchains
863
Today, thousands of protocols coexist each protocol claims: technical specify and functionality Validation consensus and its nodes, system of oracles and IoT, level of block size, transactions, security and speed, level of transparency or confidentiality, transactions coasts and exchange rate etc. However, the blockchain world is still unable to create inter-communication between different heterogenous blockchains, this is the crucial problem of the decentralized internet because their designs are intrinsically irreconcilable: any interaction must be personalized, also every blockchain project features are specific and define many parameters unique to the project. In this article we’re going to start with this new approach by introduce the generic meta-model that we are working on it, also the Core-Shell [3–6] Structure and finally how we can combine them, in order to create a bridges between different blockchains to interconnect them, with respecting the bases of blockchain specifically the aspect of the security. This paper is structured as follows the second section we’re going to talk about the approach and the main idea how we can resolve the problem of the communication between different Blockchains, and finally a conclusion.
2 The Approach The idea is to offer a hybrid blockchain capable of interconnecting different blockchains through bridges using the Core shell architecture; this solution will solve the trilemma of existing blockchains: “scalability, security, decentralization”. We can define Core-Shell Structure (see Fig. 1) as nanostructures [7] or nanoparticles (physic definition), that are encapsulated and covered by an outer shell it is composite from two layers Core and Shell, so the shell protects the core, in order to provide stability and security against agglomeration and coalescence from any reactions. We can consider core-shell structure are hybrid systems, this structure giving different traits, like the semi-conductivity, metallicity and magnetism. The shell or the core materials, or both can give these attributes. [3–5].
Fig. 1. Structure Core-Shell [8].
864
H. M. Amine et al.
Our vision divides in two main ideas first one is to create a meta-model of the blockchain in order to create a common model between heterogeneous blockchains (common model will be abstract of all protocols that make the heterogeneous blockchains different and any intercommunication should be customize), the second idea is to use the core-shell structure to inter-connect different blockchains, using the particularity of core-shell as mentioned at Fig. 3, we can’t access to the core without passing through the shell where all the main functions are implement: verify the blockchain and create the abstract model. That’s what make the core shell the best environment to make sure of the security inside the core and make the interoperability and transactions between different blockchains that has been verified and abstract safe and possible. 2.1 Creation of a Generic Meta-model Blockchain If we focus about the model propose in the article of Satoshi Nakamoto where they explain how to create peer to peer Electronic cash System [9] we can create a meta model without using any specific protocols, in order to create a common model (Fig. 2).
Fig. 2. Model of transactions bitcoin Blockchain [9]
First step creates a new environment or structure where we can connect all the blockchains that we want to connect, the best environment where we can implement this connection and safety way is using core-shell. Reasons why using core-shell structure because it composes from two layers shell and core, to access to core, you should pass from the shell where we will implement validator that will be useful to abstract the blockchain, validate the model and give the access.
New Approach to Interconnect Hybride Blockchains
865
2.2 Core Shell Structure in Global View Structure Core-shell: • Shell level: Validators: In this level it’s the main functions are create, it will be contain the build of a logic that allows validator to agree on a state of the blockchain, and giving the permission to the blockchain in order to accede to the core, only the abstract model of the blockchain verified and validate and abstract from all protocols which are not common with the model we define before are allow through the core • Core Level: in this level we’re going to build a peer-to-peer network which allow the different users of the blockchain to inter-communicate with each other by using remote procedure and blockchains bridges like physical bridges, it’s connecting two separate blockchain networks. It works in many ways it’s referred as a ‘cross-chain bridge’. Let’s develop specifically the idea of the blockchains bridges that will be exist at the Core level. First of all, blockchain networks include a global community of nodes interaction with other in a small environment for management, validation and storage of data exchanges. The distinctive features of blockchain networks separate them from each other and create distinct communities. For example, every blockchain network has a consensus model, which is key to ensuring that all nodes can agree on specific transactions. So, the solution is to use the blockchain bridge [10] because bridges enable the cross-chain transfer of information. Bridges have many types of designs and intricacies; it’s fall into two categories: trusted and trustless bridges. – Trusted Bridges: The first entry among the blockchain bridge types would refer to a trusted blockchain bridge. It is essentially a protocol governed by a centralized approach, an operator or an entity. Trusted Blockchain Bridge got its name because users must trust the reputation or identity of a centralized bridge and deposit their money on the bridge. Some of the trusted blockchain bridge examples have shown evidence of user-friendly interfaces, which can help encourage more users. – Trustless Bridges: The second option among blockchain network bridges would refer to a trustless blockchain bridge. Compared to the trust blockchain bridge, the trustless variant uses algorithms and smart contracts on the blockchain network. Therefore, a trustless blockchain bridge does not need central intermediaries or enforcers. In that fact users don’t need to trust in any central authority. Also, trustless bridge gives the users the complete transparency. In our case we’re going to use trustless bridge because we have the shell who occupies to give the aspect of security, so the blockchains project that exist inside the shell in the core exactly is pure secure.
866
H. M. Amine et al.
Fig. 3. Schema resume the inter-communication between 4 Blockchains
3 Discussion Our approach is a new idea because it’s combined two different domains: Physic domain with the idea of the core-shell and the second one is computer science with the idea of abstract and create new model of the blockchain and using blockchain bridge. However, we didn’t find any articles discuss this new approach that combine these two domains. To resume the whole idea is to create a system capable to create safe environment to connect different blockchains project, so the first of all is to create a meta-model of the blockchain which give us the possibility to interconnect easily inside the core where we’re going to create blockchains bridge. In the Shell we’re going to find validators that allows the blockchain project to access inside the ecosystem, this step is very important to make sure of the security of the environment inside the shell and, we can use trustless bridges with low risks. The principal advantage to use our idea is the aspect of the security because there is 2 layers it’s mean we can’t access to the next layer without having the permission from the validators, also this layers it will extendable more if there is lot of blockchain project, so all the communication inside the shell will be secure with low risk of hacks.
New Approach to Interconnect Hybride Blockchains
867
There are some risks specially when we use blockchain bridges like hack risk for all the ecosystem or in validator level, or smart contract risks but this risks we’re going to resolve them step by step in the implementation level.
4 Conclusion In this paper we introduce new approach to interconnect different Blockchains specifically heterogeneous blockchains like bitcoin blockchain and Ethereum or Solana Blockchain. Using physic structure called: core shell structure and blockchains bridges, to make multiple bridges between heterogeneous Blockchains, in order to find a solution of this trilemma. Even if the blockchain bridge might seem like the most practical choice for extracting the actual value benefits. However, bridges present certain risks like smart contract risk, technology risk, Censorship risk. The implementation needs more reflection to englobe the whole aspect of the blockchain like security aspect, decentralized aspect. The new approach will be present in another article, and it will be contained how we can make this structure work with this technology also how to create a meta-model contains the main components of blockchain in order to have a common model.
References 1. Bashir, I.: Mastering blockchain. Packt Publishing Ltd. (2017) 2. Qasse, I., Talib, M.A., Nasir, Q.: Toward inter-blockchain communication between hyperledger fabric platforms. In: Hu, M., Rehman, Svetinovic, D., Salah, K., Damiani, E. (eds.) Trust Models for Next-Generation Blockchain Ecosystems. EICC, pp. 251–272. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75107-4_10 3. Singh, S., Kaur, V., Kumar, N.: Core–shell nanostructures: an insight into their synthetic approaches. In: Metal Semiconductor Core-Shell Nanostructures for Energy and Environmental Applications, pp. 35–50. Elsevier (2017) 4. Aouini, S., Ziti, S., Labrim, H., Bahmad, L.: Monte Carlo study of the core-shell phase transitions in nanostructure systems. J. Supercond. Novel Magn. 31(4), 1095–1100 (2017). https://doi.org/10.1007/s10948-017-4282-3 5. Aouini, S., Ziti, S., Labrim, H., Bahmad, L.: Monte Carlo study of core/shell polymer nanostructure systems. Solid State Commun. 267, 57–62 (2017) 6. Labrim, H., et al.: Ground state study of a double core-shell dendrimer nanostructure. arXiv preprint arXiv:1708.03764 (2017) 7. Balhaddad, A.A., Garcia, I.M., Mokeem, L., Alsahafi, R., de Melo, M.A.S.: Metal oxide nanoparticles and nanotubes: ultrasmall nanostructures to engineer antibacterial and improved dental adhesives and composites Bioeng. Basel. Switz. 8(10), 146 (2021) 8. Schärtl, W.: Current directions in core–shell nanoparticle design. Nanoscale 2(6), 829–843 (2010) 9. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Decentralized Business Review, 21260 (2008) 10. Stone, D.: Trustless, privacy-preserving blockchain bridges, arXiv:2102.04660 (2021)
A Variable Neighborhood Search (VNS) Heuristic Algorithm Based Classifier for Credit Scoring Mohamed Barhdadi(B) , Badreddine Benyacoub, and Mohamed Ouzineb National Institute for Statistics and Applied Economics, Rabat, Morocco [email protected], {bbenyacoub,m.ouzineb}@insea.ac.ma
Abstract. Credit scoring models have been commonly used by lenders and financial institutions to grant credit and have been investigated to minimize the risk of defaults borrowers.Numerous modeling approaches have been developed to evaluate the worthiness of borrowers. In this paper, we propose a new credit scoring model via one of the heuristic search methods - variable neighborhood search (VNS) algorithm. The optimizing VNS neighborhood structure is a useful method applied to solve credit scoring problems. The main idea of the proposed algorithm is based on exploring the neighborhood structure of VNS and simultaneously generating an optimal plane which is used to build a discriminant function. The experimental results demonstrate that the VNS method has good prediction performance applied to real datasets and also improves the comprehensive classification performance of credit scoring problems which are difficult to optimize. Keywords: Credit scoring
1
· Classification · VNS · optimisation
Introduction
Nowadays, people in our society have become perfect consumers; the fact which compels individuals to seek alternative financing solutions to cover their various expenses. One of which is getting loans. Credit can be the ideal solution, otherwise, it will become a risky operation if the applicant fails to meet the payment deadlines [1]. Credit scoring is a system analysis developed by statistical methods, operations research, artificial intelligence and machine learning techniques that enable one to make a distinction between “good” and “bad” loans, based on information about the borrower [2]. Due to the increase in demand for consumer credit, along with the rapid development of the storage infrastructure of the information, the importance of credit scoring has become more and more significant nowadays and has gained more and more attention. Credit scoring was developed in order to explore the relationship between the dependent variable describing the risk of a consumer defaulting on a loan, and independent variables characterizing the information of consumers (e.g. age, number of previous loans, salary, housing etc.) [3]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 868–877, 2023. https://doi.org/10.1007/978-3-031-26384-2_77
A Variable Neighborhood Search (VNS) Heuristic Algorithm
869
Many credit scoring models based on statistical methods which include discriminant analysis, logistic regression [4], nonlinear regression, classification trees, nearest-neighbor approach [5]; and non-statistics based methods which include linear programming, integer programming, neural network [6], genetic algorithm, expert system and so forth; have been proposed to trade with some consumers whose credit evaluations are good and gain reasonable benefits [7,8]. The successful use of the credit scoring models in commercial banks, stockjobbers and other financial organizations depends on the sophisticated algorithms which can make appropriate decisions to do business with credit appliers or does not. Having estimated their credits overall, the organizations measure the possibility of customers not to paying for their goods or services on time or being in arrears on purpose. Variable neighborhood search (VNS) is a metaheuristic proposed by Mladenovic and Hansen in 1997 [9,10]. VNS systematically changes neighborhood structures during the search for an optimal (or near-optimal) solution. VNS is easy to be trapped in local optimum or lack of effective local search mechanism. The used training algorithm is able to rapidly locate good solutions, even for a difficult search space. It is a method for optimizing numerical functions and real-world problems. Actually, it has been successfully applied to solve many hard combinatorial optimization problems such as vehicle scheduling, vehicle routing problem and timetabling problems... etc. Hence, this paper develops a VNS model in terms of practical personal credit scoring problems. The paper is organized as follows: Section 2 presents the credit scoring problem, Sect. 3 describes the VNS method and presents the proposed model for the credit scoring. Section 4 reports computational experiments that demonstrate the effectiveness of our methodology on a set of benchmark datasets. Section 5 comes up with final conclusions.
2 2.1
Background The Credit Scoring Problem
Credit agencies must assess the level of risk associated with granting a loan to a new customer. To do this, they rely on the old credit records in their possession using classification models that can distinguish between good and bad customers. Credit scoring problem is a two-group classification problem assuming that there are two well-defined populations, G1 and G2 (good loans vs. bad loans respectively). We assume that the information is known in advance about the input data. In this learning category, the objective is usually to provide the best discriminant function using the input data by measuring p discriminatory variables or attributes for each member of either population. Each member represents a customer; each client is characterized by a set of attributes (information): X = (X1 , X2 , . . . , Xp ) (age, account, job, income, residence and so on). This information is recorded directly from different customers. Given a sample E that includes n customers. this sample will consist of g good customers and b bad ones; i.e. ((n = g + b)and (E = G1 ∪ G2), each
870
M. Barhdadi et al.
customer i gives answers: Xi = (Xi1 , Xi2 , . . . , Xip ). The classification model used by the credit institution must differentiate between data associated with a good customer and data related to a bad one. To do this, each attribute is assigned a weight w = (w1 , w2 , . . . , wp ), and a threshold c is sought /found to p wj xij ≥ c then customer separate the good customers from the bad ones. If j=1
i is a good one, and he is a bad one in case he satisfies
p j=1
2.2
wj xij ≤ c.
Problem Formulation
The mathematical programming approach was first applied in classification by Magasarian (1965) [11]. Freed and Glover (1981)[12] presented an evaluation of the LP approach in discriminant analysis. A classification approach has been proposed based on the idea of reducing the misclassification by minimizing the overlaps between the two groups [13]. In a linear system, the separating hyperplane maximizes the distance between two objectives. It’s not evident to perfectly separate the good customers from the bad ones. We can tolerate eventual errors by introducing positive values ai , then we will p p have wj xij ≥ c − ai (if customer i is a good one) wj xij ≤ c + ai (if j=1
j=1
customer i is a bad one). The objective will be then to find the values of the n weight vector w and the value of the separator c for which ai is minimal. The i=0
final model is then:
Min
n
ai
(1)
i=1
subject to : p wj xij ≥ c − ai j=1 p j=1 p j=1
wj xij ≤ c + ai (nB
nG i∈G1 i=1
∀i ∈ G1
(2)
∀i ∈ G2
(3)
xij − nG
nB
xij )wj = 1
(4)
i∈G2 i=1
ai ≥ 0 ∀i, c ∈ R∗ and wj ∈ R
∀j.
(5)
The objective function aims at minimizing the sum of the distances between the clients from the population and the hyperplane corresponding to the discriminant function. Constraint (2) ensures that the client will belong to group 1 (good loans), while constraint (3) ensures that the client will be assigned to
A Variable Neighborhood Search (VNS) Heuristic Algorithm
871
group 2 (bad loans). Equation 4 defines the normalization constraint and it is needed to avoid a trivial solution. For a problem of classification with two distinguished groups givenby a set of observations (i=1,2,. . . ,n) from group G1 to group G2 . The objective is to determine a separating hyperplane presented by a linear discriminant function expressed as Z = w1 X1 + w2 X2 + . . . + wp Xp . This function provides the boundary between the two groups and will be used to classify the new observations. The goal of solving the model is to estimate the parameters including a weighting vector W = (w1 , w2 , ..., wp ) and a decision rule cutoff value c to minimize the number of misclassifications for a given datasets. In the following, we focus on a specific combinatorial optimization methodology using VNS to adjust the parameters that improve the classification performance.
3
The Proposed VNS Model
V N S is a metaheuristic method used for optimization problems. It looks for a local or global minimum starting from an initial solution W0 . It creates a neighborhood of the solution and then, if there is a solution better than the initial solution, the V N S updates W0 . It restarts until the stopping condition is not met (It is defined as the maximum number of global iterations without finding an improvement in the best known solution). The algorithm needs an initial solution W0 . This solution is identified as the null vector W0 = (0, 0, . . . , 0). The proposed VNS uses three neighborhood structures N1 , N2 and N3 .
3.1
Neighborhood Structures
The neighborhood structure N1 is based on a swap operation which strips a randomly chosen regenerator r from a location and add it elsewhere. structure N1 : Let W be a given vector W = (w1 , w2 , ..., wp ) and i, j two integers such as 1 ≤ i, j ≤ p. wi becomes wi + r and wj becomes wi - r. So N1 = {(w1 , ..., wi + r, ..., wj − r, ..., wp ); ∀ i = j} While the neighborhood structure N2 is obtained by adding a randomly chosen regenerator r to the solution. Let W be a given vector W = (w1 , . . . , wp ) and i an given integer 1 ≤ i ≤ p. wi becomes wi + r. So N2 = {(w1 , . . . , wi + r, . . . , wp ); ∀i} The N3 neighborhood is practically the same as the N1 , only this time the regenerator chosen is randomly stripped from a location then equally added on two other locations. Let W be a given vector such as W = (w1 , w2 , . . . , wp ) where i, j, k three different integers such as 1 0.5 and X9 > 0.5 gives a very simple combination of three features and their thresholds for classifying credit applicants as good client with high precision and recall.
908
S. Akil et al.
Fig. 2. Rulefit results on Australian Credit Dataset
4.3
RuleFit
As we identified in Fig. 2, the features X2 and X4 were the most significant features, and many features such as X1, X5, X6, etc. play no role in any decision rules indicating that their importance is relatively low in the dataset. We further find that majority of the most significant rule combinations contain the X2 and X4 features. The first rule means that IF X2 > 0.26999999582767487 and X4 0 are the corresponding weighting. Note that, the value of the CIec (t) and the other combined indicators is intrinsically by construction in the range must be in the range
CIei ∈ [−1, 1] • Environment Combined Indicator
CIen t
m −1 m := γi γi sing(Iiec (t)−Sen i + τt ) i=1
(3)
i=1
en are the chosen economy indicators, I en ec whereI(t) max,i its maximal magnitude, and Si are their norms. The positive scalars βi > 0 are the corresponding weighting.
• Social Combined Indicator
CIso t
l −1 l := γi γi sing(Iiso (t)−Sso i + τt ) i=1
i=1
(4)
958
S. Haloui et al.
so are the chosen economy indicators, I so so where I(t) max,i its maximal magnitude, and Si are their norms. The positive scalars γi > 0 are the corresponding weighting. The value of the composite index varies between −1 and 1 it depends on the value of each indicator, the tolerance and the impact factor Once these composed indicators are defined, we are in position to introduce the State Sustainability Index (SSI) that provides a clear evaluation on the state of the sustainability of the whole tourism destination. Such evaluation index is defined by:
GSIt =
3 i=1
δi
−1
en so (δ1 CIec i (t) + δ2 (CIi (t) + δ3 (CIi (t))
(5)
where 0 < δ 1 , 0< δ2 , and 0 < δ3 are the sustainability index weightings. Model Proprieties and Interpretation This proposed model is useful in practice by profiting from its nice proprieties that allow providing clear determination of the current state of sustainability. Precisely, as has been mentioned before all the taken values range between [−1, 1] (a scale that can be use in state of the classical percentage). Moreover, we emphasize that it can be easily seen that any indicator having −1 value means that it is not respected, otherwise, if its value is 1 it means that it is respected within the predefined tolerance. Moreover, the model ca be used to detect clearly the best cases of the sustainability that are • CIec t = 1 ⇔all economic indicators are respected • CIen t = 1 ⇔all environment indicators are respected • CIso t = 1 ⇔all social indicators are respected In contrast, the worst cases of sustainability are • CIec t = −1 ⇔all economic indicators are not respected • CIen t = −1 ⇔all environment indicators are not respected • CIso t = −1 ⇔ all social indicators are not respected The previews consideration also apply to the global sustainability index as follows en so • GSIt = 1 ⇔ CIec t = 1, CIt (t) = 1, and CIt = 1 en (t) = −1, and CIso = −1 = −1, CI • GSIt = −1 ⇔ CIec t t t
As can be deduced from the proposed model, if the global sustainability index range strictly between [−1, 1]: −1 < GSIt < 1, then this means that we do not have complete detreated sustainability or perfect sustainability, that is, necessarily one indicator at least is not respected. More precisely, if the output value of GSIt is close to 1 then the sustainability has to be little improver but if the value GSI(t) is far away from 1 or close to −1 then the sustainability is drastically damaged which needs same radical changes
Diagnosis and Adjustment for Sustainable Tourism
959
and improvements in at list one of the economic, the environment, and social component of the tourist destination under consideration. Sustainability Diagnosis and Its Evaluation The capacity of a tourist destination of to be sustainable includes the interaction of the three essential component of development, namely the economy, the environment and the social. The determination of its dynamic states are determined by the associated parameters and indicators. The implementation of the concept of sustainability, in general, is still unclear and poses a problem of operationalization see [38], especially, since in achieving the right sustainable development, the three pillars must always meet a balance between them, and achieve the aimed success separately and simultaneously, although they represent different categories. This is hard to achieve by evaluating concretely the state of a tourist site to monitor and finally to act in a precise targeted and time saving way. In order to deal with the sustainability of a destination site, we define a sustainability assessment system that is based on parameters assessed by stakeholders through certain indicators in accordance with conventional benchmarks between stakeholders in the sequel; we discuss how such model has been developed based upon a proposed method. The performance of our model is based on the identification of the essential touristic parameters, and the determination of the key-role indicators and characterization of the different interactions between them. This is according to the following table that contains some output information from the model to be discussed further. Sustainability Adjustment The adjustment process is basically made after the sustainability diagnosis, but of course this does not prevent immediate further corrections to be made to face all possible unpredictable fluctuations in durability of the performance as has been witnessed in the epidemic period of covid19. This adjustment process of the tourism sustainability consists of evaluating consecutive situations resulting from old decisions to be corrected or improved for defective situations. Practically this should be done to adjust and improve the current actions at period tk with regard to the nearest future actions at the period tk+1 . This process consist of evaluating the return values of the model with regard to the corresponding indicators and their standard values with in a limit of a concrete tolerance. As a result, the resulting adjustments are assigned in form of recommendations and indications to the stakeholders according to their share of responsibility. The adjusted indicators are revised according to three following criteria: I ∗ (tk)
– Worst sustainability: this situation at time tk occurs if we got i S ∗ + τtk < 1, then i the immediate recommendation is to correct the underlying defected indicators by improving their next state as follows: Ii∗ (t(k+1) ) + τ(k+1) = Ii∗ (t) + τtk + a, a > 0
(6)
960
S. Haloui et al.
The parameter a: represents a relative magnitude added to the defective indicators in order to achieved the aimed performance and objectives according to the designer or/and stockholders responsible. Other objectives of the adjustment processes are • Minimize the tolerance until it is canceled. Necessarily, the tolerance should be chosen as decreasing value that tends to 0 that is (τ → 0) τi > τi+1 > τi+2 > . . . 0. • Stabilize or increase the value of the standards at each period time tk , for the elementary composite indices, which consequently improve the general sustainability status.
4 Case Study: Perdicaris Park at Tangier In the sequel, we carry out the application of our model to a touristic site in Morocco. Specifically this site is situated at Tangier that is a maritime city located in then north west of Morocco, facing the Gibraltar Detroit, separating the Atlantic Ocean from the Mediterranean Sea. For centuries, Tangier, can be considered a multicultural city and served as a crossroads between the Middle East, Africa and Europe. The excellent location of Tangier with its two coastlines overlooking the Atlantic and the Mediterranean seas is a major asset for the development of national and international tourist activities. The local tourist industry is mainly based on seaside activities, but the historical sites of the city of Tangier, such as the medina, the caves of Hercules, the Roman tombs, the Phoenician remain to have also an interest. In addition, and the natural spaces made up of forests on both ribs, are also of important interest such as Perdicaris park considered in this paper. The development of Perdicaris Park, that is characterized as an essential pillar of the region’s tourism activity, has been made possible by its various natural and historical assets, as well as its coastlines. Perdicaris as Ecological Destination Perdicaris Park (depicted in Fig. 2) consists of large park with a huge area of nearly 70 ha. Which is also known as the Rmilat forest. Certainty, it can be considered as a real botanical park containing hundreds of indigenous and exotic species such as eucalyptus, dragon trees, crown trees, oak-cork and Parasol pine, pinion pine ... etc. For this reason, this area was declared in 1993 as a Site of Biological and Ecological Interest (SEBI). This part also intersects directly with the Mediterranean Sea and the Atlantic Ocean and offers a breathtaking view of Gibraltar detroit. This touristic site is located near the downtown of Tangier about only 4 km. It experiences a huge number of visitors that can increase the risk of degradation of this environmentally rich but fragile park. Perdicaris Park Model Determination The ecological and cultural potentials of Perdicaris Park, as well as the offered services and the influences/community are the most important factors to be consider in the associated model. The evaluation and adjustment depend on stakeholders’ approvals. The model parameters are detailed in the tables below. The performance assessment is carried out at two successive times, tk and tk + 1 .
Diagnosis and Adjustment for Sustainable Tourism
961
Fig. 2. Perdicaris park
The features of the Perdicaris Park is mainly based on the ecological and historical potentials. The offered services and the relationship with the local community are of importance. By taking into account these considerations, a collected data has been chosen according to the selected indicators to be implemented into the evaluation model given by the Eqs. (2)–(5). The primary investigation on Perdicaris Park was performed based upon questionnaires and meetings with some involved actors from the Tangier Observatory of Environmental Protection and Historic Monuments see [39]. Other complementarity sources of information have been taken from the Tangier Delegation of Tourism. The gathered information is then analyzed and classified with regard to appropriate chosen indicators with their associated standards within the selected tolerance. The challenge of good monitoring of Perdicaris Park is to maintain the best indicators while improving the defective ones. This is in order to defend and protect this fragile site against possible internal deteriorations and external disturbances to achieve the desired sustainability objectives. In the Tables 2 and 3, the measurement and values of standards norms, tolerances, weighting factors and other criteria are detailed. It is worth mentioned that each of the three main sustainability components have a corresponding weightings which reflect their importance and priorities considered in Tables 2 and 3 according to the successive time periods tk (1 year) and tk+1 (second year) (Table 1).
962
S. Haloui et al. Table 1. Summary table Determination of Indicators
2
Classification
3 4
Quantification Comparison of indicators to their standards
5
Output results of all indicators
6
Optaining the output results of the general sustainability index at time tk Final treatment step
or
,
or
GSI
Assessment and adjustment of the state of sustainability at time tk according to perfect sustainability, moderately sustainable, completely unsustainable
ENVIRONNEMENT
Economic
Table 2. Measured indicators
Social
7
,
Settings
Indicators (
Values
Turnover
Tourism turnover
245366
Visitors flow
Number of visitors
154567
Number of tourists
Number of potential customers
50000
Local products, crafts
Number of units per business
11
Pollution
Air pollution CO2
38
Tourist waste
Quantity of waste collected per inhabitant,
35
Biodiversity
Number of species fauna, flora
200
Natural resources
Total forest area
67
Security
Frequency of prevention and control
25
Employability
Employability of local people
50%
Costume, Traditions
Number of cultural and artistic activities of the native people
15
Unemployment rate
Number of active women per 100
30
Diagnosis and Adjustment for Sustainable Tourism
963
Table 3. Determination of the model parameters, standards and weightings S
τ
α,β,γ
δ1,δ2,δ3
I-S+τ
Model output
400000
0,05
12268,30
α1
4
-142365,7
-1
150000
0,05
7728,35
α2
3
12295,35
1
4500
0,05
2500,00
α3
5
48000
1
10
0,05
0,55
α4
3
1,55
1
9,9
1
11,75
1
-10
-1
Evaluation of the economic component,
2,5
0,73
30
0,05
30
5
4
25
0,05
25
5
7
220
0,05
220
5
6
70
0,05
70
5
5
0,35
1
Evaluation of the environment component,
5
= 0,73
23
0,05
23
5
4
3,25
1
56
0,05
56
5
4
-0,035
-1
15
0,05
15
5
3
0,75
1
50
0,05
50
5
5
-18,5
-1
Evaluation of the social component,
2,5
0,44
Global Sustainability Index, GSI = 0,66
5 Results and Discussion Knowing that the defective indicators influence directly the whole sustainability of the tourist site. They can interact with respected indicators even if they are not in the same category. In fact, any deficiency resulting from different effects characterize by indicators, can affect the human resource with regard to the whole resources management and deteriorate the tourist site. The well-being of the local community and even the state the tourist incomes revenue. Moreover, such not desired situation can have a maleficent effect on stakeholder’s decision. Recently, in January 2022, in the city of Tangier, as part of the management and adjustment processes, “Msarya F’Mdina” action is an initiative organized within the framework of a partnership between the prefecture of the Tangier-Tetouan-Al Hoceima Region and the municipality of Tangier, to promote tourist activity in the city by a helpful increasing of number of visitors which has experienced in the past a significant drop. The result outputs concerning Perdicaris destination has been computed from the model with respect to the adopted tolerances and weighting factors shown in Table 2 and 3. In one hand, the weighting factor has been taken stronger on the environmental component CIen (δ2 = 5) because of its importance. On the other hand, the economic and social components CIec , CIso have equal weaker weighting factors (δ1 = δ3 = 2,5). Then based on the collected data and the indicators’ determination of the sustainability state of Perdicaris park at the initial year 2022 has been considered in order to reflect the resulting interactions between the various components of sustainability through other parameters as previously discussed. Note that those indicators responsible for “unsustainability” according to their assessed values can be easily identified from Table 2 and 3.
964
S. Haloui et al.
As depicted in Table 2, we have found that several indicators are not respected with regard to standard norms with in the given tolerance. By consequence, our evaluation model automatically qualifies these Perdicaris site as not completely sustainable. Of cores, this is seen from the value of the global sustainability index which is about GSI = 0, 66 that is far frome the perfect value 1. In other words, we can qualify the sustainability of Perdicaris site in terms of percentage output result represents by only 66% of the perfect sustainability performance. This undesired fact, has to be avoided in the next period of time concerning 2023, since it can still a risk of a decline into an unsustainability if indeed responsible key indicators are not taken into account adequately by adjusting them and by correcting on the grownd the real causes behind such unsustainability. This successive process of evaluation and adjustment is based on collected data and outputs of the model in Tables 2 and 3. Precisely, one has to improve defective indicators concerning tourism turnover, number of species fauna and flora, employability of local people and increasing the number of active women workers. The best improvement can be also achieved by decreasing the tolerance values as has been considered from 0, 05 to 0, 04 as a target.
6 Conclusion The contribution of this research has consisted in the sustainability assessment and adjustment of a tourist destination with regard to the real sustainability pillars that represents the economic, environmental and social contexts. It is shown how one can achieves the desired overall performance of a tourist site. The theories of indicators has been applied for designing an efficient original model that profits from the collected data true combined composed indicators for the purpose of assessing and monitoring the state of sustainability that greatly helps for decision making. Moreover, for analyzing and assessing a real application on the tourist site Perdicaris from Tangier in Morocco, the proposed model has been applied for a clear and systematic evaluation and assessment.
References 1. Brundtland, G.H.: Our common future—call for action*. Environ. Conserv. 14(4), 291–294 (1987) 2. Ruhanen, L., Weiler, B., Moyle, B. D., McLennan, C.J.: Trends and patterns in sustainable tourism research: a 25-Year bibliometric analysis. J. Sustain. Tourism 23(4), 517–535 (2015). https://doi.org/10.1080/09669582.2014.978790 3. Brink, B.J.E., Hosper, S.H., Colijn, F.: A quantitative method for description & assessment of ecosystems: the AMOEBA-approach. Mar. Pollut. Bull. 23, 265–270 (1991) 4. Ko, T.G.: Development of a tourism sustainability assessment procedure: a conceptual approach. Tour. Manage. 26(3), 431–445 (2005) 5. Joseph, A., di Tollo, G., Pesenti, R.: Fuzzy multi-criteria decision-making: an entropy-based approach to assess tourism sustainability: Tourism Economics (2019a) 6. Sayabek, Z., et al.: Fuzzy logic approach in the modeling of sustainable tourism development management. Polish J. Manage. Stud. 19(1), 492–504 (2019) 7. Claire, B.: La théorie de la viabilité au service de la modélisation mathématique du développement durable. Application au cas de la forêt humide de Madagascar. phdthesis. Université Blaise Pascal - Clermont-Ferrand II (2011)
Diagnosis and Adjustment for Sustainable Tourism
965
8. Timothy, J.T., Johnston, R.: Tourism Sustainability, Resiliency and Dynamics: Towards a More Comprehensive Perspective: Tourism and Hospitality Research (2008) 9. Eva, K., Mihaela, N.: Dynamics of a tourism sustainability model with distributed delay. Chaos Solitons Fractals 133, 109610 (2020) 10. Christoph, B., Patrick, J.: Measuring the immeasurable — A survey of sustainability indices. Ecol. Econ. 63(1), 1–8 (2007) 11. Thomas, M.P., Kates, R.W: Characterizing and Measuring Sustainable Development (2003). https://doi.org/10.1146/annurev.energy.28.050302.10555 12. Asmelash, A.G., Kumar, S.: The structural relationship between tourist satisfaction and sustainable heritage tourism development in Tigrai. Ethiopia. Heliyon 5(3), e01335 (2019) 13. Franceschini, F., Galetto, M., Maisano, D.: Quality management and process indicators. In: Designing Performance Measurement Systems. MP, pp. 1–20. Springer, Cham (2019). https:// doi.org/10.1007/978-3-030-01192-5_1 14. Singh, R.: An overview of sustainability assessment methodologies. Ecol. Indicat. 9(2), 189– 212 (2009). https://doi.org/10.1016/j.ecolind.2008.05.011 15. Blanc, I., Friot, D., Margni, M., Jolliet, O.: Towards a new index for environmental sustainability based on a DALY weighting approach. Sustain. Dev. 16(4), 251–260 (2008) 16. Selvet, N., Zhechev, V.: Could happiness be an assessment tool in sustainable tourism management? Adv. Hospital. Tourism Res. 8(2), 338–370 (2020) 17. Conaghan, A., Hanrahan, J., Mcloughlin, E.: The sustainable management of a tourism destination in Ireland: a focus on county clare. Adv. Hospital. Tourism Res. 3(1), 62–87 (2015) 18. Bertalanffy, L.V.: Théorie générale des systèmes, p. 17. Dunod, Paris (1973) 19. Moine, A.: Le territoire comme un système complexe. Des outils pour l’aménagement et la géographie. In Septièmes Rencontres de Théo Quant (pp. http-thema) (2005) 20. Tazim, J., Higham, J.: Justice and ethics: towards a new platform for tourism and sustainability. J. Sustain. Tourism (2020) 21. Alison, C., Jobbins, G.: Governance capacity and stakeholder interactions in the development and management of coastal tourism: examples from Morocco and Tunisia. J. Sustain. Tourism (2009) 22. Budowski, G.: Tourism and environmental conservation: conflict, coexistence, or symbiosis? Environ. Conserv. 3(1), 27–31 (1976) 23. Fu, Y., Kong, X., Luo, H., Yu, L.: Constructing composite indicators with collective choice and interval-valued TOPSIS: the case of value measure. Soc. Indic. Res. 1–19 (2020). https:// doi.org/10.1007/s11205-020-02422-8 24. Bell, S., Morse, S.: Sustainability Indicators: Measuring the Immeasurable? Earthscan, London Sterling, VA. Routledge (2012) 25. Mohamed, H., Rachid, E.-D.: Assessing sustainable tourism: trends and efforts in essaouira in morocco as a coastal city. Int. J. Sustain. Manage. Inform. Technol. 5(1), 23 (2019). https:// doi.org/10.11648/j.ijsmit.20190501.14 26. John, S.: Tourism geography: emerging trends and initiatives to support tourism in Morocco. J. Tourism Hospital. 5(3) (2016) 27. Saltelli, A.: Composite indicators between analysis and advocacy. Soc. Indic. Res. 81(1), 65–77 (2007) 28. Fernandez, E.J., Ruiz Martos, M.J.: Review of some statistical methods for constructing composite indicators. Stud. Appl. Econ. 38(1) (2020). https://doi.org/10.25115/eea.v38i1. 3002 29. Alyson, W.: Sustainability Indicators and Sustainability Performance Management, vol. 129 (2002)
966
S. Haloui et al.
30. Opschoor, H., Reijnders, L.: Towards sustainable development indicators. In: Kuik, O., Verbruggen, H. (eds.) In Search of Indicators of Sustainable Development, pp. 7–27. Springer Netherlands, Dordrecht (1991). https://doi.org/10.1007/978-94-011-3246-6_2 31. Singh, R.K., Murty, H.R., Gupta, S.K., Dikshit, A.K.: An overview of sustainability assessment methodologies. Ecol. Ind. 1(15), 281–299 (2012) 32. Maggino, F. (ed.): Complexity in Society: From Indicators Construction to their Synthesis. SIRS, vol. 70. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60595-1 33. Matúš, M., Gajdošík, T.: Bigger and Complex Is Not Always Better. The Creation of Composite Sustainable Tourism Indicator to Compare Destination Development. Advances in Hospitality and Tourism Research (AHTR) (2021) 34. Gökay, K.I., A¸skun, V.: Artificial Intelligence in Tourism: A Review and Bibliometrics Research. Advances in Hospitality and Tourism Research (AHTR) (2020) 35. Tanguay, G.A., Rajaonson, J.: Une analyse de l’application d’indicateurs de développement durable aux villes québécoises. CIRANO (2012) 36. Allan, E., et al.: Land use intensification alters ecosystem multifunctionality via loss of biodiversity and changes to functional composition. Ecol. Lett. 18(8), 834–843 (2015) 37. Tyrrell, T.J., Johnston, R.J.: Tourism sustainability, resiliency and dynamics: towards a more comprehensive perspective. Tour. Hosp. Res. 8(1), 14–24 (2008) 38. Torres-Delgado, A., Saarinen, J.: Utiliser des indicateurs pour évaluer le développement du tourisme durable: Un bilan. Géographies du tourisme 16(1), 31–47 (2014) 39. The Tangier Observatory of Environmental Protection and Historic Monuments. https://mar sadtanger.org/
Author Index
A Abdelwahed, El Hassan 788 abdelwahed, Namir 1 Aboulaich, Rajae 668 Abouloifa, Houria 206 Adib, Abdellah 778, 903 Agoujil, Said 383 Agramelal, Fouad 476 Aguenaou, Samir 850 Ahouat, Chaimae 770 Ahout, Chaimae 763 Ahsain, Sara 578 Ait Ben Ali, Brahim 841 Ait Kbir, M’hamed 578 Ait Lahcen, Ayoub 698 Ait Rai, Khaoula 933 Ait Rami, Mustapha 949 Ait Said, Mehdi 817 Akil, Siham 903 Aksasse, Brahim 741 Alaoui, Hicham Tahiri 196 Alaoui, Youssef Lamrani 903 Amal, Battou 181 Amamou, Youssef 318 Amine, Hajji Mohammed 862 Ammari, Mohammed 136, 718 Aouicha, Mohamed Ben 878 Ardchir, Soufiane 92, 940 Arezki, Sara 817 Ariss, Anass 77 Arkiza, Mariam 698 Arwa, Allinjawi 604 Asri, Hiba 344 Assad, Noureddine 155 Ayad, Habib 778 Ayad, Meryem 501 Ayoub, Mniai 411 Azdimousa, Hassan 122 Azirar, Hanane 850 Aziz, Essakhi 587 Azmani, Abdellah 364, 396, 451, 921
Azmani, Monir 364, 396, 451 Aznag, Khalid 419 Azouaoui, Ahmed 26, 196 Azouazi, Mohamed 92 Azrour, Mourade 383 Azzouazi, Mohamed 940 B Badir, Hassan 627, 639, 802 Badreddine, EL Gougi 231 Bahaj, Mohamed 206 Baqach, Adil 122 Barhdadi, Mohamed 868 Battou, Amal 122 Bayoude, Kenza 940 Belhaj, Abdelilah 466 Belhiah, M. 710 Belkacemi, Mourad 680 Belkasmi, Mohammed Ghaouth 756 Belkhedar, Ghizlane 686 Ben Allal, Laïla 136, 718 Benkassioui, Bouchaib 515 Benmarrakchi, Fatimaezzahra 189 Bensalah, Nouhaila 778 Bentayeb, Youness 639 Benyacoub, Badreddine 868 Benyacoub, Bouchra 850 Benzyane, Manal 383 Borrohou, Sanae 627 Bouchti, Karim El 145 Boulaalam, Ali 552 Boulezhar, Abdelkader 587 Bounabat, Bouchaib 56 Boutaibi, O. 710 Bouzidi, Morad 662 Burmani, Nabil 802 C Chaimae 466 Chao, Jamal 949 Charrat, Loubna 888
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 J. Kacprzyk et al. (Eds.): AI2SD 2022, LNNS 637, pp. 967–970, 2023. https://doi.org/10.1007/978-3-031-26384-2
968
Author Index
Cherrat, Loubna 46, 168 Chetoui, Ismail 15 Chkouri, Mohamed Yassin 265 Chriatt, Mohammed 718
F Fakih Lanjri, Asmaa 718 Fath Allah, Rabie 718 Fatima, Taif 1 Fissoune, Rachida 627
D Dahani, Khawla 668 Dahbi, Aziz 155 Daif, Abderrahmane 92 Datsi, Toufik 419 Dhaiouir, Ilham 46
G Gadi, Taoufiq 599 Ghanimi, Fadoua 651
E Ech-Charrat, Mohammed Rida 168 El Adnani, Mohammed 15 El Bachari, Essaid 15 El Bahi, Hassan 528 El Bouchti, Karim 252, 466, 680, 770 El Bouhadi, Ouafae 451 El Boujnouni, Mohamed 308 El Fakir, Adil 903 El Farouk, Abdelhamid Ibn 778 El Fazziki, Aziz 217 El Ghazouani, Mohamed 217 El Hassan, Abdelwahed 15 El Kafi, Jamal 196 El Khanboubi, Yassine 217 El Mamoune, Soumaya 888 El Mezouari, Said 501 El Oirrak, Ahmed 419 El Yessefi, Abderrahim 168 Elbouchti, Karim 763 Elghoumari, Yassine 92 Elmamoune, Soumaya 168 Elouidani, Rania 536 En Nahnahi, Noureddine 802 En-Nahnahi, Noureddine 639 Ennejjai, Imane 77 Er-raha, Brahim 344 Es-Sadek, Mohamed Zeriab 145 Essayad, Abdessalam 295 Es-Swidi, Ayoub 92 Ezzaim, Aymane 155 Ezzati, Abdellah 817 Ezziyyani, Mostafa 46, 77, 145, 168, 515, 888
H Hachad, Lamiae 651 Hachad, Tarik 651 haddadi, Anass El 344 Hadi, Moulay Youssef 515 Hafidi, Bezza 567 Hafidi, Meriem 788 Haidine, Abdelfatteh 155 Hajjaj, Mouna 272 Hakkal, Soukaina 698 Haloui, Samir 949 Hamza, Alami 802 Hanane, Chliah 181 Hanine, Yahya 903 Haqiq, Abdelhay 56 Hasidi, Oussama 788 Hicham, Labrim 862 Hind, Ait Mhamed 729 Hnini, Khadija 344 Houda, Anoun 231 I Ibourk, Aomar 284, 344 Ikidid, Abdelouafi 217 J Jebari, Khalid 35, 318 Jebraoui, Smahane 567 K Kamal, Najem 829 Karczmarek, Pawel 411 Kassimi, Moulay Abdellah Khaldi, Mohamed 46 Khalid, Jebari 411 Khalil, Amal 756 Khaoula, Marhane 1
295
Author Index
Kharmoum, Nassim 77, 145, 501, 515, 862 Khawlah, Altuwairqi 604 Khrouch, Sarah 168 L Laachfoubi, Nabil 841 Labrim, Hicham 680 Ladouzi, Bouchra 552 Lafkiar, Said 802 Laguidi, Ahmed 651 Lamjid, Asmaa 680 Lamsellak, Hajar 756 Lamsellak, Oussama 756 Larbi, Hassouni 231 Lechheb, Houda 272 Loukili, Manal 65 Lyhyaoui, Abdelouahid 686 M Mabrouk, Aziz 560 Marhraoui, Mohamed Amine 354 Mariam, Zaghli 729 Mehdi, Chrifi Alaoui 442 Mensouri, Doae 364 Mensouri, Houssam 396 Merrouchi, Mohamed 599 Messaoudi, Fayçal 65 Mezrioui, Abdellatif 489 Mgarbi, Hanae 265 Mihi, Soukaina 841 Mohamed, Azzouazi 1 Mohamed, Ettaouil 442 Mohamed, Reda Oussama 680 Mohammed, Ridouani 231 Mostafa, Ezziyani 862 N Nadi, Oumaima 770 Nadi, Oumaina 763 Nassr Eddine, Bahya 489 Nemiche, Mohamed 567 Nihal, Abuzinadah 604 Nour-Eddine, Joudar 442 O Ouaadi, Ismail 284, 344 Ouaddah, Aafaf 489 Ouakil, Hicham 272
969
Ouanan, Mohammed 741 Oukhouya, Lamya 344 Oumaira, Ilham 698 Oussaleh Taoufik, Amina 921 Outouzzalt, Ahmed 536 Ouzineb, Mohamed 868 Ouzzif, Mohammed 432 Q Qaffou, Issam 912, 933 Qassimi, Sara 788 Qazdar, Aimad 788 R Rahmaoui, Othmane 432 Rahmouni, M. 710 Reda, Oussama Mohamed 252, 763, 770 Retal, Sara 145 Rhalem, Wajih 77, 145 Rhazi, Azeddine 552 Riadsolh, Anouar 680 Riffi, Mohammed Essaid 662 S Saadouni, Chaimae 252 Saber, Mohammed 756 Sabir, Essaid 476 Sadgal, Mohamed 217 Sadik, Mohamed 476 Sadiq, Abdelalim 651 Salma, Kammoun Jarraya 604 Samar, Alkhuraiji 604 Sbai, Asma 344 Sekkate, Sara 903 Silkan, Hassan 587 Skittou, Mustapha 599 Sliman, Rajaa 26 Souali, Kamal 432 Soulami, Naila Belhaj 122 Soumia, Ziti 829, 862 T Tabaa, Mohamed 627 Tagmouti, Mostafa 560 Tahiri, Abderrahim 265 Talbi, C. 710 Tarik, Mouna 35 Tkiouat, Mohamed 903
970
Trabelsi, Salma 878 Tsouli Fathi, Maroi 136 Tsouli Fathi, Ramz 136
Y Yassine, Asmae 662 Yassine, Zaoui Seghroucheni 829 Yessefi, Abderrahim El 888
Author Index
Z Zahaf, Sahbi 878 Zaouiat, Charafeddine Ait 217 Zekkouri, Hassan 741 Zeroual, Imad 383 Zerouani, Hajar 56 Ziti, Soumia 77, 145, 252, 466, 680, 710, 763, 770 Zouitni, Mohamed 802