148 27 24MB
English Pages 388 [377] Year 2023
Lecture Notes in Networks and Systems 748
Pablo García Bringas · Hilde Pérez García · Francisco Javier Martínez de Pisón · Francisco Martínez Álvarez · Alicia Troncoso Lora · Álvaro Herrero · José Luis Calvo Rolle · Héctor Quintián · Emilio Corchado Editors
International Joint Conference 16th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2023) 14th International Conference on EUropean Transnational Education (ICEUTE 2023) Proceedings
Lecture Notes in Networks and Systems
748
Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).
Pablo García Bringas · Hilde Pérez García · Francisco Javier Martínez de Pisón · Francisco Martínez Álvarez · Alicia Troncoso Lora · Álvaro Herrero · José Luis Calvo Rolle · Héctor Quintián · Emilio Corchado Editors
International Joint Conference 16th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2023) 14th International Conference on EUropean Transnational Education (ICEUTE 2023) Proceedings
Editors Pablo García Bringas Faculty of Engineering University of Deusto Bilbao, Spain Francisco Javier Martínez de Pisón Department of Mechanical Engineering University of La Rioja Logroño, Spain Alicia Troncoso Lora Data Science and Big Data Lab Pablo de Olavide University Seville, Spain José Luis Calvo Rolle Department of Industrial Engineering University of A Coruña A Coruña, Spain Emilio Corchado Faculty of Science University of Salamanca Salamanca, Spain
Hilde Pérez García School of Industrial, Computer and Aerospace Engineering University of Leon León, Spain Francisco Martínez Álvarez Data Science and Big Data Lab Pablo de Olavide University Seville, Spain Álvaro Herrero Applied Computational Intelligence University of Burgos Burgos, Burgos, Spain Héctor Quintián Department of Industrial Engineering University of A Coruña A Coruña, Spain
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-42518-9 ISBN 978-3-031-42519-6 (eBook) https://doi.org/10.1007/978-3-031-42519-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume of Lecture Notes in Networks and Systems contains accepted papers presented at the 16th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2023) and the 14th International Conference on EUropean Transnational Education (ICEUTE 2023). These conferences were held in the beautiful city of Salamanca, Spain, in September 2023. The aim of the CISIS 2023 conference is to offer a meeting opportunity for academic and industry-related researchers belonging to the various, vast communities of computational intelligence, information security, and data mining. The need for intelligent, flexible behaviour by large, complex systems, especially in mission-critical domains, is intended to be the catalyst and the aggregation stimulus for the overall event. After a through peer-review process, the CISIS 2023 International Program Committee selected 19 papers, which are published in these conference proceedings. In this edition, one special session was organised: new methods and models to study the spread of malware and fake news. The aim of ICEUTE 2023 conference is to offer a meeting point for people working on transnational education within Europe. It provides a stimulating and fruitful forum for presenting and discussing the latest works and advances on transnational education within European countries. In this edition, two special session was organised: Using Machine Learning techniques in Educational and Healthcare Settings: a path towards precision intervention and Innovation in Computer Science Higher Education. In the case of ICEUTE 2023, the International Program Committee selected 15 papers, which are also published in these conference proceedings. The selection of papers was extremely rigorous to maintain the high quality of the conferences. We want to thank the members of the Program Committees for their hard work during the reviewing process. This is a crucial process for creating a high-standard conference; the CISIS and ICEUTE conferences would not exist without their help. CISIS 2023 and ICEUTE 2023 enjoyed outstanding keynote speeches by distinguished guest speakers: Prof. Oscar Cordón at University of Granada (Spain), Prof. Paulo Novais at University of Minho (Portugal), Prof. Michał Wo´zniak (Poland), and Prof. Hujun Yin at University of Manchester (UK). CISIS 2023 has teamed up with “Logic Journal of the IGPL” (Oxford University Press) for a suite of special issues, including selected papers from CISIS 2023. Particular thanks go as well to the conference’s main sponsors, Startup Olé, the CYLHUB project financed with NEXT-GENERATION funds from the European Union and channelled by Junta de Castilla y León through the Regional Ministry of Industry, Trade and Employment, BISITE research group at the University of Salamanca, CTC research group at the University of A Coruña, and the University of Salamanca. They jointly contributed in an active and constructive manner to the success of this initiative. We would like to thank all the special session organisers, contributing authors, as well as the members of the Program Committees and the Local Organizing Committee
vi
Preface
for their hard and highly valuable work. Their work has helped to contribute to the success of the CISIS and ICEUTE 2023 events. September 2023
José Luis Calvo Rolle Francisco Javier Martínez de Pisón Pablo García Bringas Hilde Pérez García Francisco Martínez Álvarez Alicia Troncoso Lora Álvaro Herrero Héctor Quintián Emilio Corchado
CISIS 2023
Organisation General Chair Emilio Corchado
University of Salamanca, Spain
Program Committee Chairs Jose Luis Calvo Rolle Francisco Javier Martínez de Pisón Pablo García Bringas Hilde Pérez García Francisco Martínez Álvarez Alicia Troncoso Lora Álvaro Herrero Héctor Quintián Emilio Corchado
University of A Coruña, Spain University of La Rioja, Spain University of Deusto, Spain University of León, Spain Pablo Olavide University, Spain Pablo Olavide University, Spain University of Burgos, Spain University of A Coruña, Spain University of Salamanca, Spain
Program Committee Adam Wójtowicz Agustín Martín Muñoz Álvaro Herrero Álvaro Michelena Grandío Amparo Fuster-Sabater Andrysiak Tomasz Angel Arroyo Angel Martin Del Rey Borja Sanz Carlos Pereira Ciprian Pungila Cristina Alcaraz
Pozna´n University of Economics and Business, Poland C.S.I.C., Spain University of Burgos, Spain University of A Coruña, Spain C.S.I.C., Spain University of Technology and Life Sciences, Poland University of Burgos, Spain University of Salamanca, Spain S3Lab-University of Deusto, Spain ISEC, Portugal West University of Timis, oara, Romania University of Malaga, Spain
viii
CISIS 2023
Daniel Urda Eneko Osaba Enrique Onieva Esteban Jove Fernando Ribeiro Fernando Tricas Francisco Martínez-Álvarez Francisco Zayas-Gato Guillermo Morales-Luna Héctor Quintián Hugo Sanjurjo-González Hugo Scolnik Javier Nieves Jesús Díaz-Verdejo Jose Barata Jose Carlos Metrolho Jose Luis Calvo-Rolle José Luis Casteleiro-Roca José Luis Imaña Jose M. Molina Jose Manuel Lopez-Guede Josep Ferrer Juan J. Gude Juan Jesús Barbarán Lidia Sánchez-González Luis Hernandez Encinas Luis Alfonso Fernández Serantes Manuel Castejón-Limas Manuel Graña Marc Ohm Martin Dobler Michael Hellwig Michal Wozniak Michał Chora´s Michał Wo´zniak Mike Winterburn Miriam Timiraos Díaz
University of Burgos, Spain TECNALIA Research and Innovation, Spain University of Deusto, Spain University of A Coruña, Spain EST, Portugal University of Zaragoza, Spain Pablo de Olavide University, Spain University of A Coruña, Spain CINVESTAV-IPN, Mexico University of A Coruña, Spain University of Deusto, Spain ARSAT SA, Argentina Azterlan, Spain University of Granada, Spain NOVA de Lisboa University, Portugal IPCB, Portugal University of A Coruña, Spain University of Coruña, Spain Complutense University of Madrid, Spain University Carlos III of Madrid, Spain University of the Basque Country, Spain University of the Balearic Islands, Spain University of Deusto, Spain University of Granada, Spain University of León, Spain C.S.I.C., Spain FH-Joanneum University of Applied Sciences, Spain University of León, Spain University of the Basque Country, Spain University of Bonn and Fraunhofer FKIE, Germany Vorarlberg University of Applied Sciences, Austria Vorarlberg University of Applied Sciences, Austria Wroclaw University of Technology, Poland Bydgoszcz University of Science and Technology, Poland Wroclaw University of Technology, Poland Technological University of the Shannon, Ireland University of La Coruña, Spain
CISIS 2023
Nuno Lopes Ovidiu Cosma Pablo García Bringas Paweł Ksieniewicz Pedro Hecht Petrica Pop Rafael Alvarez Rafael Corchuelo Rafał Kozik Raúl Durán Robert Burduk Roberto Casado-Vara Rogério Dionísio Salvador Alcaraz Sorin Stratulat Steffen Finck Stephen McCombie Wenjian Luo
Polytechnic Institute of Cavado and Ave, Portugal Technical University Cluj Napoca, Romania University of Deusto, Spain Wrocław University of Science and Technology, Poland University of Buenos Aires, Argentina Technical University of Cluj-Napoca, North University Center at Baia Mare, Romania University of Alicante, Spain University of Seville, Spain Bydgoszcz University of Science and Technology, Poland University of Alcalá, Spain University Wroclaw, Poland University of Burgos, Spain Instituto Politécnico de Castelo Branco, Portugal Miguel Hernandez University, Spain Université de Lorraine, Metz, France Vorarlberg University of Applied Sciences, Austria NHL Stenden University of Applied Sciences, Netherlands Harbin Institute of Technology, China
CISIS 2023: Special Sessions New Methods and Models to Study the Spread of Malware and Fake News Program Committee Ángel Tocino (Organiser) Ángel Martin del Rey (Organiser) Roberto Casado Vara (Organiser) Elisa Frutos José Diamantino Hernández Guillén Luxing Yang Mercedes Maldonado-Cordero
ix
University of Salamanca, Spain University of Salamanca, Spain University of Burgos, Spain University of Salamanca, Spain University of Extremadura, Spain Deakin University, Australia University of Salamanca, Spain
x
CISIS 2023
Qingyi Zhu Raul Felipe-Sosa Samir Llamazares-Elias
Chongqing University of Posts and Telecommunications, China Autonomous University of Chiapas, Mexico University of Salamanca, Spain
CISIS 2023 Organising Committee Chairs Emilio Corchado Héctor Quintián
University of Salamanca, Spain University of A Coruña, Spain
CISIS 2023 Organising Committee Álvaro Herrero Cosio José Luis Calvo Rolle Ángel Arroyo Daniel Urda Nuño Basurto Carlos Cambra Leticia Curiel Beatriz Gil Raquel Redondo Esteban Jove José Luis Casteleiro Roca Francisco Zayas Gato Álvaro Michelena Míriam Timiraos Díaz Antonio Javier Díaz Longueira
University of Burgos, Spain University of A Coruña, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of A Coruña, Spain University of A Coruña, Spain University of A Coruña, Spain University of A Coruña, Spain University of A Coruña, Spain University of A Coruña, Spain
ICEUTE 2023
Organisation General Chair Emilio Corchado
University of Salamanca, Spain
Program Committee Chairs José Luis Calvo Rolle Francisco Javier Martínez de Pisón Pablo García Bringas Hilde Pérez García Francisco Martínez Álvarez Alicia Troncoso Lora Álvaro Herrero Héctor Quintián Emilio Corchado
University of A Coruña, Spain University of La Rioja, Spain University of Deusto, Spain University of León, Spain Pablo Olavide University, Spain Pablo Olavide University, Spain University of Burgos, Spain University of A Coruña, Spain University of Salamanca, Spain
Program Committee Alessandra Raffaetà Álvaro Michelena Grandío Ana Rosa Pereira Borges Antonio Morales-Esteban Antonio Javier Díaz Longueira Carlos Pereira Daniela Zaharie David Méndez Dragan Simic Eloy Irigoyen Esteban Jove Estibaliz Apiñaniz Francisco Martínez-Álvarez Francisco Zayas-Gato Gloria Gratacós Héctor Quintián
Ca’ Foscari University of Venice, Italy University of A Coruña, Spain Coimbra Polytechnic Institute, Portugal University of Seville, Spain University of A Coruña, Spain ISEC, Portugal West University of Timisoara, Romania Francisco de Vitoria University, Spain University of Novi Sad, Serbia University of the Basque Country, Spain University of A Coruña, Spain University of the Basque Country, Spain Pablo de Olavide University, Spain University of A Coruña, Spain Villanueva University, Spain University of A Coruña, Spain
xii
ICEUTE 2023
Hugo Sanjurjo-González J. David Nuñez-Gonzalez Jairo Ortiz-Revilla Jorge Barbosa Jose Luis Calvo-Rolle José Luis Casteleiro-Roca Jose Manuel Lopez-Guede José Manuel Galán José-Lázaro Amaro-Mellado Juan J. Gude Julián Estévez Laura Fernández-Robles Laura Melgar-García Lidia Sánchez-González Luis Alfonso Fernández Serantes Manuel Castejón-Limas Maria Jose Marcelino Maria Victoria Requena Marián Queiruga-Dios Matilde Santos Peñas Miguel Carriegos Miguel Ángel Queiruga-Dios Miriam Timiraos Díaz Monika Ciesielkiewicz Pablo García Bringas Paola Clara Leotta Paulo Moura Oliveira Richard Duro Sorin Stratulat
University of Deusto, Spain University of the Basque Country, Spain University of Burgos, Spain Coimbra Polytechnic Institute, Portugal University of A Coruña, Spain University of Coruña, Spain University of the Basque Country, Spain University of Burgos, Spain University of Seville, Spain University of Deusto, Spain University of the Basque Country, Spain University of León, Spain Pablo de Olavide University, Spain University of León, Spain FH-Joanneum University of Applied Sciences, Spain University of León, Spain University of Coimbra, Portugal University of Seville, Spain Francisco de Vitoria University, Spain Complutense University of Madrid, Spain RIASC, Spain University of Burgos, Spain University of A Coruña, Spain Villanueva University, Spain University of Deusto, Spain University of Catania, Italy UTAD University, Portugal University of A Coruña, Spain Université de Lorraine, France
ICEUTE 2023: Special Sessions Using Machine Learning Techniques in Educational and Healthcare Settings: A Path Towards Precision Intervention Program Committee María Consuelo Saiz Manzanares (Organiser) Estrella Fernández
University of Burgos, Spain University of Oviedo, Spain
ICEUTE 2023
Francisco Alcantud Leandro Almeida Luis Jorge Martin Antón María Camino Escolar Llamazares Miguel Angel Carbonero Martin Paula Molinero-González
University of Valencia, Spain University of Minho, Afghanistan University of Valladolid, Spain University of Burgos, Spain University of Valladolid, Spain University of Valladolid, Spain
Innovation in Computer Science Higher Education Program Committee Alicia Troncoso (Organiser) David Gutiérrez-Avilés (Organiser) Francisco Martínez Álvarez (Organiser) José F. Torres (Organiser) Laura Melgar-García (Organiser) María Martínez-Ballesteros (Organiser) Antonio Morales-Esteban Khawaja Asim
Pablo de Olavide University, Spain University of Seville, Spain Pablo de Olavide University, Spain Pablo de Olavide University, Spain Pablo de Olavide University, Spain University of Seville, Spain University of Seville, Spain PIEAS, Pakistan
ICEUTE 2023 Organising Committee Chairs Emilio Corchado Héctor Quintián
University of Salamanca, Spain University of A Coruña, Spain
ICEUTE 2023 Organising Committee Álvaro Herrero Cosio José Luis Calvo Rolle Ángel Arroyo Daniel Urda Nuño Basurto Carlos Cambra Leticia Curiel Beatriz Gil Raquel Redondo Esteban Jove José Luis Casteleiro Roca Francisco Zayas Gato
University of Burgos, Spain University of A Coruña, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of Burgos, Spain University of A Coruña, Spain University of A Coruña, Spain University of A Coruña, Spain
xiii
xiv
ICEUTE 2023
Álvaro Michelena Míriam Timiraos Díaz Antonio Javier Díaz Longueira
University of A Coruña, Spain University of A Coruña, Spain University of A Coruña, Spain
Contents
CISIS Applications Accountability and Explainability in Robotics: A Proof of Concept for ROS 2- And Nav2-Based Mobile Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laura Fernández-Becerra, Miguel A. González-Santamarta, David Sobrín-Hidalgo, Ángel Manuel Guerrero-Higueras, Francisco J. Rodríguez Lera, and Vicente Matellán Olivera Reducing the Security Margin Against a Differential Attack in the TinyJambu Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Fúster-Sabater and M. E. Pazo-Robles Fuzzing Robotic Software Using HPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francisco Borja Garnelo Del Río, Francisco J. Rodríguez Lera, Camino Fernández Llamas, and Vicente Matellán Olivera
3
14
23
Intrusion and Fault Detection Intrusion Detection and Prevention in Industrial Internet of Things: A Study . . . Nicholas Jeffrey, Qing Tan, and José R. Villar A Novel Method for Failure Detection Based on Real-Time Systems Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Álvaro Michelena, Antonio Díaz-Longueira, Míriam Timiraos, Héctor Quintián, Óscar Fontenla Romero, and José Luis Calvo-Rolle Systematic Literature Review of Methods Used for SQL Injection Detection Based on Intelligent Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan José Navarro-Cáceres, Ignacio Samuel Crespo-Martínez, Adrián Campazas-Vega, and Ángel Manuel Guerrero-Higueras Impact of the Keep-Alive Parameter on SQL Injection Attack Detection in Network Flow Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ignacio Samuel Crespo-Martínez, Adrián Campazas-Vega, Ángel Manuel Guerrero-Higueras, Claudia Álvarez-Aparicio, and Camino Fernández-Llamas SWAROG Project Approach to Fake News Detection Problem . . . . . . . . . . . . . . . Rafał Kozik, Joanna Komorniczak, Paweł Ksieniewicz, Aleksandra Pawlicka, Marek Pawlicki, and Michał Chora´s
37
49
59
69
79
xvi
Contents
Neural Networks Analysis of Extractive Text Summarization Methods as a Binary Classification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joanna Komorniczak, Szymon Wojciechowski, Jakub Klikowski, Rafał Kozik, and Michał Chora´s
91
QuantumSolver Composer: Automatic Quantum Transformation of Classical Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Daniel Escanez-Exposito and Pino Caballero-Gil Bytecode-Based Android Malware Detection Applying Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Alberto Miranda-Garcia, Iker Pastor-López, Borja Sanz Urquijo, José Gaviria de la Puerta, and Pablo García Bringas Prediction of Water Usage for Advanced Metering Infrastructure Network with Intelligent Water Meters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Łukasz Saganowski and Tomasz Andrysiak Phishing URL Detection with Prototypical Neural Network Disentangled by Triplet Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Seok-Jun Bu and Sung-Bae Cho Special Session 1: New Methods and Models to Study the Spread of Malware and Fake News Finding and Removing Infected T-Trees in IoT Networks . . . . . . . . . . . . . . . . . . . 147 Marcos Severt, Roberto Casado-Vara, Angel Martín del Rey, Esteban Jove, Héctor Quintián, and Jose Luis Calvo-Rolle Critical Analysis of Global Models for Malware Propagation on Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 A. Martín del Rey, E. Frutos Bernal, R. Macías Maldonado, and M. Maldonado Cordero Benchmarking Classifiers for DDoS Attack Detection in Industrial IoT Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Marcos Severt, Roberto Casado-Vara, Angel Martín del Rey, Nuño Basurto, Daniel Urda, and Álvaro Herrero A Q-Learning Based Method to Simulate the Propagation of APT Malware . . . . 177 Jose Diamantino Hernández Guillén and Ángel Martín del Rey
Contents
xvii
On the Statistical Analysis of an Individual-Based SI Model for Malware Propagation on WSNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 E. Frutos-Bernal, A. Martín del Rey, and Miguel Rodríguez-Rosa Stability Analysis of a Stochastic Malware Diffusion SEIR Model . . . . . . . . . . . . 197 Samir Llamazares-Elías and Angel Tocino General Track Modelling and Simulation of Wind Energy Systems: Learning-by-Doing in a Master’s Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Lía García-Pérez and Matilde Santos Personalised Recommendations and Profile Based Re-ranking Improve Distribution of Student Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 ˇ ek Žid, Pavel Kordík, and Stanislav Kuznetsov Cenˇ AIM@VET: Tackling Equality on Employment Opportunities Through a Formal and Open Curriculum About AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Abraham Prieto, Sara Guerreiro, and Francisco Bellas System Identification and Emulation of a Physical Level Control Plant Using a Low Cost Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Daniel Méndez-Busto, Antonio Díaz-Longueira, Álvaro Michelena, Míriam Timiraos, Francisco Zayas-Gato, Esteban Jove, Elena Arce, and Héctor Quintián A Simulation Platform for Testing Negotiation Strategies and Artificial Intelligence in Higher Education Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 Adrián Heras, Juan M. Alberola, Victor Sánchez-Anguix, Vicente Julián, and Vicent Botti Special Session 1: Using Machine Learning Techniques in Educational and Healthcare Settings: A Path Towards Precision Intervention Eye-Tracking Technology Applied to the Teaching of University Students in Health Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 María Consuelo Sáiz-Manzanares, Irene González-Díez, and Carmen Varela Vázquez En_Línea. An Online Treatment to Change Lifestyle for People with Overweight and Obesity. A Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Carmen Varela, Irene González-Diez, María Consuelo Sáiz-Manzanares, and Carmina Saldaña
xviii
Contents
Use of Eye-Tracking Methodology for Learning in College Students: Systematic Review of Underlying Cognitive Processes . . . . . . . . . . . . . . . . . . . . . . 279 Irene González-Diez, Carmen Varela, and María Consuelo Sáiz-Manzanares Using Machine Learning Techniques in eEarlyCare Precision Diagnosis and Intervention in 0–6 years Old . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 María Consuelo Sáiz-Manzanares A Machine-Learning Based Approach to Validating Learning Materials . . . . . . . 306 Frederick Ako-Nai, Enrique de la Cal Marin, and Qing Tan Special Session 2: Innovation in Computer Science Higher Education Association Rule Analysis of Student Satisfaction Surveys for Teaching Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Manuel J. Jiménez-Navarro, Belén Vega-Márquez, José María Luna-Romera, Manuel Carranza-García, and María Martínez-Ballesteros Robustness Analysis of a Methodology to Detect Biases, Inconsistencies and Discrepancies in the Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Jose Divasón, Francisco Javier Martínez-de-Pisón, Ana Romero, and Eduardo Sáenz-de-Cabezón Evaluation of the Skills’ Transfer Through Digital Teaching Methodologies . . . . 340 Javier Díez-González, Paula Verde, Rubén Ferrero-Guillén, Rubén Álvarez, Nerea Juan-González, and Alberto Martínez-Gutiérrez Educational Innovation Project in the Field of Informatics . . . . . . . . . . . . . . . . . . . 350 Jose Manuel Lopez-Guede, Javier del Valle, Ekaitz Zulueta, Unai Fernandez-Gamiz, Josean Ramos-Hernanz, Julian Estevez, and Manuel Graña Explainable Artificial Intelligence for Education: A Real Case of a University Subject Switched to Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Laura Melgar-García, Ángela Troncoso-García, David Gutiérrez-Avilés, José Francisco Torres, and Alicia Troncoso Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
CISIS Applications
Accountability and Explainability in Robotics: A Proof of Concept for ROS 2- And Nav2-Based Mobile Robots Laura Fern´ andez-Becerra(B) , Miguel A. Gonz´ alez-Santamarta , ´ Manuel Guerrero-Higueras , David Sobr´ın-Hidalgo , Angel an Olivera Francisco J. Rodr´ıguez Lera , and Vicente Matell´ Universidad de Le´ on, Campus de Vegazana s/n, 24071 Le´ on, Spain [email protected], {mgons,dsobh,am.guerrero,fjrodl,vmato}@unileon.es https://robotica.unileon.es/
Abstract. Using mobile robots in environments with humans has become increasingly common, raising concerns over security and safety. The lack of capabilities to justify their decisions and behaviours to nonexpert users poses a significant challenge in gaining their trust, especially in safety-critical situations. Therefore, explaining why a robot carries out a specific and unexpected action is crucial to comprehend the cause of a failure when attempting to achieve a goal or complete a task. Besides, it is essential to make such explanations understandable to people, providing appropriate, effective, and natural responses to their queries. This work depicts a Proof of Concept of an accountability and explicability engine on ROS-based mobile robots. Our solution has two components: first, a black box-like component to provide accountability, and another for generating natural language explanations from data in the black box. Initial results show that it is possible to get accountable data from robot actions to obtain understandable explanations.
Keywords: Robotics
1
· accountability · explicability · ROS 2
Introduction
The number of mobile robots deployed in public spaces is increasing. Security and safety concerns appear when there are people in the loop. Once they arrive at hospitals or care centers, clarifying the reasons that have produced a specific situation is necessary. Therefore, including accountability and explainability engines onboard the robot is mandatory. Such engines will take evidence of any behaviour, allowing them to explain any decision the robot takes. Designers, manufacturers, and developers involved in robot deployment should know that their actions have a responsibility [15] for generating trustworthy robotic behaviours. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 3–13, 2023. https://doi.org/10.1007/978-3-031-42519-6_1
4
L. Fern´ andez-Becerra et al.
Every trustworthy robot deployed in a public space should provide transparency. Transparency is often seen as a means to provide accountability and shows that something is done with due diligence [3]. Furthermore, several authors have worked in the accountability processes associated with robotics [2,8]. Thus, a transparent logging system is the first step to reaching a trustworthy robot. However, such transparency should be adapted to different stakeholders interacting with the robot [12]. This work depicts a Proof of Concept (PoC) of an accountability engine that supports explainability. We propose a black box-like component for ROS 2-based robots that allows for getting natural language explanations from ROS component outputs. Specifically, we explain navigation decisions from LIDAR and Nav2 outputs. The remainder of this paper is organized as follows. The next section presents related works. Section 3 poses the technologies and algorithms implemented in this paper. Section 4 presents the evaluation process carried out. Section 5 closes with a conclusion and overviews future work.
2
Related Works
There are different approaches to providing black box recorders for robotics. Winfield [13] has been exploring the concept of the ethical black box for robotics. He presents a theoretical approach and its limits. However, he proposes to avoid recording the internal low-level decision-making process of the embedded AI, comparing it with the autopilot of a flight. Given the current steps to push robot autonomy, the authors do not share this decision, and we consider it necessary to explore this way. One of the latest Winfield works [14] proposes a Draft Standard for an Ethical Black Box Module focused on Social Robots. This standard tracks his proposal and does not include information about robot decisions or other internal events. However, the black box is not enough to explain what generated a particular robot’s behaviour. Therefore, other research explores the concept of explainability in autonomous robots XAR [1,10]. They define four requirements to reach it: Owning an interpretable decision-making space, Estimation of the model of others (particularly the human), Estimation of the information needed for a user to estimate, and the engine to deliver an explanation to humans. Sado et al. add to these requirements the combination of both verbal and non-verbal communication means and the ability of an agent to continuously learn to construct natural and effective explanations [9]. Authors in [11] propose an explainable social robot behavior architecture to improve human-robot interaction. Despite the effectiveness of their proposal, its evaluation highlights the need to provide the robot with capabilities that allow it to modify its decisions dynamically in response to changes in the environment. Olivares Alarcos et al. [7] propose alternatives based on more elaborated information, such as ontologies, usually classified as a Knowledge source in a robot. They present a set of algorithms for providing explanations from OCRA ontology. However, this information is not initially intended to be stored in a black box-like device.
Explainable Action ROS 2
5
This research aims to explain autonomous robot behaviour by translating internal log-like information gathered by the robot black box component into formatted sentences easy-to-understand by stakeholders.
3
Materials and Methods
To evaluate our proposal, we conducted an experiment involving simple navigation tasks. This section provides a detailed description of the methods and elements used in its development. The experimentation was executed entirely in Gazebo, an open-source solution for 3D robotics simulations. To provide a realistic scenario, we used a hospital simulation environment made available by Amazon Web Services (AWS)1 . Furthermore, we selected RB-1, an autonomous mobile robot based on the ROS 2 platform produced by Robotnik [4]. Using these components allowed us to assess the effectiveness of our proposal accurately. 3.1
ROS 2
Robot Operating System (ROS) is the most commonly used framework in robotics. It consists of a set of software libraries and tools to build robot applications. However, difficulties in satisfying real-time embedded systems requirements and the need to improve its multi-platform capabilities promoted a significant upgrade to ROS 2, which major change lies in using the Data Distribution Service (DDS) [6]. Nodes are the fundamental building block of ROS and ROS 2 applications. They consist of individual processes, each of them performing specific tasks and communicating with each other via messages. Data can be moved between nodes using different methods, including topics, services, and actions. Topics are used for message passing, whereas services provide synchronous request-reply interactions. Actions, on the other hand, are necessary for long-running tasks that require feedback. Particularly, this work is focused on the ability to navigate. To this end, we have used Nav2, a collection of tools for ROS 2, considered the successor of the ROS Navigation Stack. Nav2 provides the robot with the ability to perform complex navigation tasks. Planning, control, localization, visualization, and additional features can be found between its main functionalities. Furthermore, it uses behaviour trees to create customized navigation behaviour via orchestrating many independent modular servers intended to compute a path, control effort, or recovery. This fact implies high flexibility in the navigation task and the specification of complex robot behaviours [5].
1
https://github.com/aws-robotics/aws-robomaker-hospital-world.
6
L. Fern´ andez-Becerra et al.
3.2
Algorithms for Accountability and Robot Explanation Generation
Knowing the origin, causes, and reasons guiding an autonomous system’s behaviour requires new functionalities to help users to understand the system’s actions, changes in decision-making, and task completion. To this end, we propose a Proof of Concept of an accountability and explicability engine for ROS 2-based robots in the navigation domain. The algorithms supporting this approach are included in this section. These functionalities are fully implemented at GitHub2 , licensed under GPLv3. Algorithm 1 includes the steps involved in the accountable information recording process. ROS 2 nodes publish data that other nodes consume during regular operation through ROS 2 topics. Such data include perception data from sensors or commands to actuators from decision-making nodes. We aim to record such data in Rosbag files, file format in ROS, and ROS 2 for storing data such as topics’ messages. Input parameter topics set contains the ROS 2 topics whose messages are needed to guarantee the navigation’s reproducibility and build explanations that include causality between the performed events. The former are comprehended by those in charge of providing the information related to the initial pose of the robot, the used map, the odometry, the linear and angular velocity, the robot’s position and orientation, its physical properties, its kinematic structure, the obstacles, free space, and other features of the environment that affect robot navigation. The recording service can be selectively activated to avoid storing non-essential information while detecting anomalous behaviour. In addition to the whole navigation information saving, it can be triggered when an obstacle is within a specific distance or when one of the nodes included in the Nav2 behaviour tree fails, ensuring that information is saved only during unexpected situations. The explanations would be generated using the readings from topics corresponding to the laser sensor, the robot’s planned trajectory, and the status of the action in charge of processing a new goal in the navigation process. Table 1 includes the questions that can be answered via the proposed explanation service. Table 1. Questions to be answered through the explanation service. ID Question
Algorithm
Q1 What is the current navigation status? Algorithm 2 Q2 Why have you change the planned route? Algorithm 3
Algorithm 2 is based on the study of the result of the ROS 2 action server NavigateToPose, used for commanding a robot to move to a specified pose. While 2
https://github.com/uleroboticsgroup/nav2 accountability explainability.
Explainable Action ROS 2
7
executing the planned trajectory, the feedback from this action is examined through the mentioned algorithm by studying the messages returned by the /navigate to pose/ action/status topic. To check the reason that may have caused a route change in the robot’s trajectory, we have developed Algorithm 3. By default, Nav2 guarantees to find the shortest path under any condition. The /plan topic includes the set of poses computed to reach a specific goal pose. If this initial path changes due to an external circumstance, the Euclidean distance between all the poses included in the pre-computed path will increase. This fact, along with the values obtained through the /scan topic, will allow us to determine if the appearance of an obstacle has caused the route modification. 3.3
Benchmark Description
The experiment conducted in this proposal involved the robot’s navigation between two points and analyzing the generated messages during execution. As mentioned earlier, this work has two main objectives: (i) to develop a black box-like component to gather ROS messages related to the robot’s behaviour, and (ii) to develop a component to provide explainability capabilities. The black box component is designed as a set of Rosbags that has been recorded during the robot’s navigation process. These Rosbags are generated from the information provided by selected topics of interest, which are specified in the input of Algorithm 1. The experimentation consisted of two test cases. In both, the robot has been positioned at the same point (A) and ordered to reach a second point (B). Figure 1 presents a visualization of the navigation tasks performed during the experiment. Test 1. In this test, represented by a green arrow in Fig. 1, the robot navigates from point A to point B along an obstacle-free path without any issues. During the execution of this test, no events are triggered except for the start of navigation and the indication of successful completion. Test 2. In the second test, the robot is again assigned to navigate from point A to point B. However, an external agent closes one of the apartment doors during the execution of this test, blocking the path the robot has followed in the first test. This unexpected event forces the robot to recalculate its required task route. This way, the robot heads to point B’s destination through a further door. The red arrow represents this route in Fig. 1. During the tests, the robot will record data from ROS topics in Rosbag file to provide explainability, which refers to the robot’s ability to explain its decisions in a way humans can understand. For instance, in Test 2, an explanation of interest might be why the robot went out into the corridor to reach point B. The expected explanation would be similar to “because the robot encountered an obstacle and had to re-plan its route”. To achieve such explanations, the content of the black box must be translated into a natural language that is understandable to human users. By doing so,
8
L. Fern´ andez-Becerra et al.
Algorithm 1 Accountable information recording Input: topics set = [/initialpose, /map, /odom, /cmd vel, /amcl pose, /tf, /tf static, /robot description, /global costmap/costmap, /camera/image raw, /local cost map/costmap, /behavior tree log, /scan, /plan, /navigate to pose/ action/status], service call value Output: rosbag 1: Boolean recording ← false 2: // Recording process attending to a service call. 3: recording ← service call value 4: // Recording process attending to Nav2 Behaviour Tree nodes’ status. 5: for event in /behavior tree log.msg.event log do 6: if event.node status = FAILURE and not recording then 7: for message in topics set do 8: rosbag ← rosbag ∪ message 9: bt f ailed nodes ← bt f ailed nodes ∪ event.node name 10: recording ← true 11: end for 12: end if 13: if event.current status = (RUNNING or SUCCESS) and event.node name ∈ bt f ailed nodes then 14: bt f ailed nodes ← bt f ailed nodes - event.node name 15: recording ← f alse 16: end if 17: end for 18: // Recording process attending to obstacles’ detection with /scan topic. 19: for msg in /scan do 20: obstacle distance ← min(/scan.last msg.ranges) 21: if obstacle distance < OBSTACLE DISTANCE THRESHOLD and not recording then 22: for message in topics set do 23: rosbag ← rosbag ∪ message 24: recording ← true 25: end for 26: else 27: if obstacle distance >= OBSTACLE DISTANCE THRESHOLD and recording then 28: recording ← f alse 29: end if 30: end if 31: end for 32: return rosbag
the robot can provide clear explanations when an event occurs, significantly improving the trust relationship between humans and robots.
Explainable Action ROS 2
9
Algorithm 2 Explanation to the question ”What is the current navigation status?” Input: /navigate to pose/ action/status, previous goal id, finished goals Output: String answer 1: answer ← “No navigation is running.” 2: for status in /navigate to pose/ action/status.msg.status list do 3: current goal id ← status.goal inf o.goal id 4: if current goal id != previous goal id and current goal id not in f inished goals then 5: answer ← “Navigation to a new goal has started.” 6: else 7: answer ← “Navigation to the goal ” 8: if status.status = STATUS EXECUTING then 9: answer ← answer + ”is in progress.” 10: end if 11: if status.status = STATUS SUCCEEDED then 12: answer ← answer + “has succeeded.” 13: end if 14: if status.status = STATUS CANCELED then 15: answer ← answer + “was cancelled.” 16: end if 17: if status.status = STATUS ABORTED then 18: answer ← answer + “has aborted.” 19: end if 20: if status.status = STATUS SUCCEEDED or status.status = STATUS CANCELED or status.status = STATUS ABORTED then 21: f inished goals ← f inished goals ∪ current goal id 22: end if 23: end if 24: return answer 25: end for
4
Results
In this section, we analyze the experiment’s results. Table 2 provides a comparison of the global metrics for the three different approaches to the recording service as described in 3.2: (i) full recording, (ii) recording based on laser scanner events, and (iii) recording based on the status of Nav2 behavior tree nodes. It was found that saving data only when a node from the Nav2 behavior tree fails is the optimal approach for storage size and execution time variables. Most recorded messages are related to odometry, velocity, transforms, camera images, behavior tree status, and laser scanner data, as shown in Table 3. However, depending on the robot’s obstacle detection threshold or the obstacles in the scenario, the scan detection approach may store duplicate information. This fact can lead to the inclusion of multiple identical messages from topics like /map and /robot description in nearly every created Rosbag file as a means of ensuring reproducibility.
10
L. Fern´ andez-Becerra et al.
Algorithm 3 Why have you changed the planned path?” Input: /plan, /scan, previous distance Output: String answer 1: answer ← “I have not changed the planned path.” 2: distance ← get EuclideanDistance(/plan.msg.poses) 3: if distance > previous distance * DISTANCE THRESHOLD then 4: answer ← “I have changed the planned path” 5: new plan ← true 6: if min(/scan.last msg.ranges) < OBSTACLE DISTANCE THRESHOLD then 7: answer ← answer + “ because there was an obstacle.” 8: end if 9: else 10: if new plan then 11: answer ← answer + “ Then, I followed a new path to the goal pose.” 12: new plan ← false 13: end if 14: end if 15: return answer
Fig. 1. Navigation examples: 1) the green arrow shows a robot that finds the door open and is able to traverse between rooms; 2) the red arrow shows a robot that finds the door closed and needs to reach the final point through an alternative path.
Information about the robot’s path and data collected by its scanner over time are depicted in Fig. 2. The blue line corresponds to the distance between the poses forming the path, while the orange line represents the minimum distance detected by the scanner. To facilitate comparison, both distances have been scaled between 0 and 1. Finally, the dashed line identifies the moment the scanner detects an obstacle, triggering the path to re-plan and causing a significant increase in the distance between the path’s points.
Explainable Action ROS 2
11
Table 2. Global metrics for black box recording approaches.
Size (MB) Duration (s) No. Rosbag files No. messages
Full
Scan detection Behavior tree status
2132.21 508.30 1 31687
1808.41 387.95 17 134417
1167.04 175.23 2 30688
Table 3. Number of messages, average rate, and size of topics for black box recording approaches. Full Topic Name
Scan detection
Behavior tree status
No. msgs Av. R. (Hz) Size (MB) No. msgs Av. R. (Hz) Size (MB) No. msgs Av. R. (Hz) Size (MB)
/initialpose 0 1 /map 10913 /odom 6992 /cmd vel 197 /amcl pose 4111 /tf 1 /tf static 1 /robot description 98 /global costmap/costmap 2054 /camera/image raw 199 /local costmap/costmap 1281 /behaviour tree log 5411 /scan 426 /plan 2 /navigate to pose/ action/status
0 0 23.82 14.69 0.44 27.74 0 0 0.21 1.95 0.44 2.52 11.83 0.83 0
0 0.29 7.53 0.35 0.07 0.56 0.02 0.02 28.57 2031.05 0.70 0.16 45 17.89 0
0 14 41589 37893 1579 15153 15 14 660 1307 1301 8529 23053 3298 12
0 0 67.15 57.71 2.43 35.56 0 0 0.84 1.09 1.76 10.74 38.44 4.41 0
0 4.08 28.72 1.88 0.55 2.08 0.02 0.30 192.40 1292.40 4.60 0.99 191.71 88.68 0
0 2 6825 5717 180 12077 1 1 63 1120 125 891 3388 296 2
0 0 31.25 21.03 1.75 87.61 0 0 0.29 1.34 0.55 2.94 17.74 1.03 0
0 0.58 4.71 0.28 0.06 1.45 0 0.02 18.37 1107.49 0.44 0.11 28.17 5.36 0
Fig. 2. Path distance and Scan Min Distance Plot.
Our proposal includes a ROS 2 explainability service that can infer new information from recorded data or selected topics in real time. To evaluate the effectiveness of our approach, we used Algorithms 2 and 3 in Test 2, where the robot’s path was modified due to an obstacle. The generated explanations attend to the four main scenarios involved in the navigation process: (i) the goal pose has not been sent, and the robot is in its initial pose, (ii) navigation is in progress, (iii) an obstacle is detected, and (iv) the end of the navigation process. Graphical representations of these states can be found in the GitHub repository described in 3.2. We collected the answers provided by our service to
12
L. Fern´ andez-Becerra et al.
the questions in Table 1 to assess the effectiveness of our approach. The results of the explainability service are presented in Table 4. Table 4. Answers provided by the explainability service. Sce. Q1. What is the current navigation status? Q2. Why have you changed the planned path? i ii iii iv
5
No navigation is running Navigation to the goal is in progress Navigation to the goal is in progress Navigation to the goal has succeeded
I I I I
have have have have
not changed the planned path not changed the planned path changed the planned path because there was an obstacle. Then, I followed a new path to the goal pose changed the planned path because there was an obstacle. Then, I followed a new path to the goal pose
Conclusions and Future Work
This work presents a Proof of Concept of an accountability and explicability engine for ROS-based mobile robots. Our approach includes a black box component for accountability and a natural language explanation generator, demonstrating the possibility of obtaining understandable explanations for robot actions using the raw data stored in Rosbag files. Initial results have shown that this proposal can provide natural and meaningful explanations, improving human interaction. However, it is essential to note that an accountability system should not rely solely on a single black box recorder. It should be part of a broader safety management system for autonomous robots. To guarantee security and reliability in robots deployed in public spaces, a comprehensive model is necessary. This model should include risk assessment, system design, testing, real-time monitoring, and robot maintenance. Additionally, it is necessary to explore other approaches for generating more accurate and context-aware explanations. This improvement will enhance the efficiency and effectiveness of the system and increase confidence in using robots in safety-critical situations. Future work includes extending our system to support features such as antitampering and expanding the explanation generation process to cover more ROS 2 topics and domains. Further research is required to assess the suitability of a more general solution across complex scenarios. By including a wider range of events and explanations, this proposal has the potential to enhance the trust and acceptance of robots in a variety of applications and contexts. ´ Gonz´ Acknowledgements. Miguel A. alez-Santamarta acknowledges an FPU fellowship provided by the Spanish Ministry of Universities (FPU21/01438). This work has been partially funded by EDMAR: Grant PID2021-126592OB-C21 funded by MCIN/AEI/10.13039/501100011033.
References 1. Anjomshoae, S., Najjar, A., Calvaresi, D., Fr¨ amling, K.: Explainable agents and robots: results from a systematic literature review. In: 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Montreal, Canada, May 13–17, 2019, pp. 1078–1088. International Foundation for Autonomous Agents and Multiagent Systems (2019)
Explainable Action ROS 2
13
2. Arkin, R.C.: Accountable autonomous agents: the next level (2009) 3. Felzmann, H., Fosch-Villaronga, E., Lutz, C., Tamo-Larrieux, A.: Robots and transparency: the multiple dimensions of transparency in the context of robot technologies. IEEE Robot. Autom. Mag. 26(2), 71–78 (2019) 4. Guzm´ an, R., Navarro, R., Cantero, M., Ari˜ no, J.: Robotnik—professional service robotics applications with ROS (2). In: Koubaa, A. (ed.) Robot Operating System (ROS). SCI, vol. 707, pp. 419–447. Springer, Cham (2017). https://doi.org/10. 1007/978-3-319-54927-9 13 5. Macenski, S., Martin, F., White, R., Gin´es Clavero, J.: The marathon 2: a navigation system. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020) 6. Maruyama, Y., Kato, S., Azumi, T.: Exploring the performance of ROS2. In: Proceedings of the 13th International Conference on Embedded Software, EMSOFT 2016. Association for Computing Machinery, Inc. (2016). https://doi.org/10.1145/ 2968478.2968502 7. Olivares-Alarcos, A., Foix, S., Aleny´ a, G.: Knowledge representation for explainability in collaborative robotics and adaptation (2021) ´ Guerrero, A.M., ´ 8. Rodr´ıguez-Lera, F.J., Gonz´ alez Santamarta, M.A., Martin, F., Matellan, V.: Traceability and accountability in autonomous agents. In: Herrero, ´ Cambra, C., Urda, D., Sedano, J., Quinti´ A., an, H., Corchado, E. (eds.) CISIS 2019. AISC, vol. 1267, pp. 295–305. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-57805-3 28 9. Sado, F., Loo, C.K., Liew, W.S., Kerzel, M., Wermter, S.: Explainable goal-driven agents and robots - a comprehensive review. ACM Comput. Surv. 55(10), 1–41 (2023). https://doi.org/10.1145/3564240 10. Sakai, T., Nagai, T.: Explainable autonomous robots: a survey and perspective. Adv. Robot. 36(5–6), 219–238 (2022). https://doi.org/10.1080/01691864.2022. 2029720 11. Stange, S., Hassan, T., Schr¨ oder, F., Konkol, J., Kopp, S.: Self-explaining social robots: an explainable behavior generation architecture for human-robot interaction. Front. Artif. Intell. 5, 87 (2022). https://doi.org/10.3389/frai.2022.866920 12. Weller, A.: Transparency: Motivations and challenges (2017). https://arxiv.org/ abs/1708.01870 13. Winfield, A.F.T., Jirotka, M.: The case for an ethical black box. In: Gao, Y., Fallah, S., Jin, Y., Lekakou, C. (eds.) TAROS 2017. LNCS (LNAI), vol. 10454, pp. 262–273. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64107-2 21 14. Winfield, A.F., van Maris, A., Salvini, P., Jirotka, M.: An ethical black box for social robots: a draft open standard. arXiv preprint arXiv:2205.06564 (2022) 15. Yazdanpanah, V., Gerding, E., Stein, S., Dastani, M., Jonker, C.M., Norman, T.: Responsibility research for trustworthy autonomous (2021). https://doi.org/ 10.48448/w5y9-qk13
Reducing the Security Margin Against a Differential Attack in the TinyJambu Cryptosystem A. Fúster-Sabater(B) and M. E. Pazo-Robles Institute of Physical and Information Technologies, CSIC, 144 Serrano, 28006 Madrid, Spain [email protected]
Abstract. TinyJambu is one the 10 finalists in the NIST Lightweight Cryptography (LWC) Standardization Project. This Authenticated Encryption with Associated Data algorithm is very fast and extremely small in terms of the hardware needed for its implementation. In this work, we study a differential cryptanalytic attack against TinyJambu. It is an analysis that goes deeper than previous works found in the literature as well as it also obtains better differential probabilities than those of other studies. More precisely, we develop a differential forgery attack against nonce and associated data with probability 2−66.7712 , what is much better than that one obtained by the own designers of value 2−80 and better than the best probability 2−70.12 obtained by other authors. In brief, we have reduced the margin of security against a forgery attack for this proposal of lightweight cryptosystem. Keywords: Differential cryptanalysis · Lightweight Cryptography · TinyJambu · Gurobi
1 Introduction The Internet of Things (IoT) is one of the keywords in information technologies and computer science. In the near future, IoT technology will be used more and more to connect devices of very different types. Some of them use powerful processors and can perform the same cryptographic algorithms as those of standard PCs. However, many other devices use extremely low power microcontrollers that can hardly devote a small fraction of their computational power to security features. It is inside this kind of hardware that we intend to introduce our cryptography. In lightweight cryptography, the concern is not only the security but also the performance. Dealing with small hardware, say like 2000–2500 GE (Gate Equivalent), performance is an issue not only in terms of throughput but also to stand up against hackers with unconstrained computational power. In brief, these small IoT devices should stand against hackers as well as against cryptographic security issues. In August 2018, the National Institute of Standards and Technology (NIST) “initiated a process to solicit, evaluate, and standardize lightweight cryptographic algorithms that © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 14–22, 2023. https://doi.org/10.1007/978-3-031-42519-6_2
Reducing the Security Margin Against a Differential Attack
15
were suitable for use in constrained environments where the performance of current NIST cryptographic standards is not acceptable”, see [1]. The TinyJambu algorithm was one of the 10 finalists in such a call [2], being the fastest out of all of them. Indeed, its sponge structure combined with its keyed-permutation make it behave like a real stream cipher, unifying in a single device simplicity and speed. These characteristics of stream cipher make it play a leading part in lightweight cryptography. The main concern in stream cipher design is to generate from a short and truly random key a long and pseudorandom sequence called keystream sequence. In emission, bits of the plaintext (original message) are bitwise xored with bits of the keystream sequence to generate the ciphertext. In reception, bits of the ciphertext are bitwise xored with the same keystream sequence bits to recover the corresponding plaintext. In the original TinyJambu algorithm, the designers [3] show that the differential forgery attack against nonce and associated data succeeds with probability of value at most 2−80 , where 80 is the number of NAND gates obtained in their best differential trail. Later in [4], the authors show that, under specific conditions, some NAND gates in the differential trail are correlated, so the forgery attack against nonce and associated data succeeds with probability 2−70.64 . In this paper, we make use of the same refined model developed in [4] but looking for the maximum number of different trails with the minimum number of uncorrelated NAND gates. In fact, our contribution goes further into the differential cryptanalysis of TinyJambu mode of operation with 384 rounds given by Saha et al. in [4]. Thanks to a more exhaustive search of differentials, we will show a differential trail type 3 on full 384 rounds with final probability 2−66.71 , which means that we have improved the results given in the previous reference. In the TinyJambu updated design (September 2020), the number of rounds had been increased to 640. Our results are unlikely to become a security problem for the entire 640 rounds of the TinyJambu algorithm. Nevertheless, they reduce the security margin given by the designers (first version of TinyJambu 384 rounds) in 14 bits. We achieved these results by introducing some changes to the programs in [4] using a desktop PC with an Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz, 3601 MHz, with 8 cores, RAM 16,0 GB and Microsoft Windows 10 Pro operating system. Note that this computer is the kind of PC that anybody could actually have at home. We also use Python 3.11 64-bit and Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (win64) [5]. The work is organized as follows. In Sect. 2, we describe the TinyJambu authenticated encryption algorithm with associated data candidate to the NIST Lightweight Cryptography Standardization Project. Next in Sect. 3, we consider the different security analysis found in the literature about this lightweight cryptosystem. In Sect. 4, we reduce the security margin of TinyJambu improving the search of optimal differential trails and, consequently, the final differential probability. Finally, conclusions and future work in Sect. 5 end the work.
16
A. Fúster-Sabater and M. E. Pazo-Robles
2 The TinyJambu Authenticated Encryption with Associated Data Mode TinyJambu was one of the 10 finalists in the search for a cryptographic standard for the so-called Lightweight Cryptography project. Due to the fact that this structure is a classical sponge with a secret permutation providing authenticated encryption with associated data, TinyJambu is very fast. It uses a secret key permutation Pk,n in the form of an Non-Linear Feedback Shift Register (NLFSR). Such a NLFSR is based on 128-bit register with a NAND as the unique non-linear part. In the way it operates, we might consider TinyJambu as a sequential mode of encryption similar to a stream cipher operational mode because the sponge structure allows that. We can see the encryption mode in Fig. 1. It uses a Pk,n permutation where n denotes the number of rounds (shifts of the NLFSR) and k represents the key. The permutation is always the same, but it is used with different number of rounds during the whole encryption process. The associated data (AD) are processed in an analogous way as the plaintext (M) and the tag (T ) on the sponge structure. We see that in this way it is very easy to obtain the ciphertext (C).
Fig. 1. The TinyJambu mode in a classical sponge for 128-bit state and keyed-permutation.
2.1 The Encryption Algorithm TinyJambu-128 TinyJambu mode accepts 3 different key sizes, e.g. 128, 192 and 256 bits, with 64-bit tag and a NLFSR-128 bit state. The Pk,n permutation is based on a NLFSR with only one NAND as nonlinear component. The performance is very high in terms of throughput.
Fig. 2. The 128-bit Nonlinear Feedback Shift Register in TinyJambu
This Pk,n permutation runs n rounds (shifts) based on the 128-bit NLFSR in Fig. 2, which updates its state.
Reducing the Security Margin Against a Differential Attack
17
The encryption process can be divided into four phases: 2.1.1 Initialization of the Encryption Process During this stage the key and nonce are initialized by using different Pk,n permutations. Key setup: the state set 128 bits to 0, and the Pk,1024 is used to update the state. Nonce Setup: is done in three steps for processing the 96-bit nonce. In each step, the Pk,384 is used. 2.1.2 Processing the Associated Data The frame bits are used to process these data. In this case, the authors use a FrameBits of value 3. The state of the permutation is updated by using Pk,384 . For the purpose of the example taken from [3]; for i from 0 to [ADlength/32]: S{36···38} = S{36···38} FrameBits{0···2} Update the state using Pk,384 S{96···127} = S{96···127} AD{32i···32i+31} end for
2.1.3 The Encryption Process For the encryption, it uses a FrameBits of value 5, and the state is updated by using the Pk,1024 permutation. We introduce a block of 32 bit-message as a plaintext sequentially and obtain 32 bit-ciphertext as in Fig. 1. For the purpose of the example taken from [3]; for i from 0 to [ M length /32]: S{36···38} = S{36···38} FrameBits{0···2} Update the state using Pk,1024 S{96···127} = S{96···127} M{32i···32i+31} C{32i···32i+31} = S{64···95} {32i···32i+31} end for
2.1.4 Finalization: Tag Generation For generating the 64bit-Tag, it uses a FrameBits of value 7, and the state is updated by using the Pk,1024 permutation and the Pk,384 permutation. Note that two different permutations are used depending on what part of the tag is being computed. For the purpose of the example taken from [3];
18
A. Fúster-Sabater and M. E. Pazo-Robles
S{36···38} = S{36···38} FrameBits{0···2} Update the state by using Pk,1024 T{0···31} = S{64···95} S{36···38} = S{36···38} FrameBits{0···2} Update the state using Pk,384 T{32···63} = S{64···95}
After the description, we proceed with the analysis of security in TinyJambu.
3 Security of the TinyJambu The first security evaluation of TinyJambu was provided by the designers in the original document [3] submitted at the NIST Lightweight Cryptography (LWC) Standardization Project [1]. In fact, they applied techniques of differential and linear cryptanalysis supported by Mixed Integer Linear Programming (MILP) as in [5, 6]. Later, a more precise security evaluation of this cryptosystem was reported in [4]. In the following subsections, we analyze in detail such evaluations. 3.1 Number of Active AND Gates We denote by Si and Si∗ (i = 0, 1, . . . , 127) the binary contents for two different initial states of TinyJambu. Thus, the initial differential between them is defined as Si = Si ⊕ Si∗ for (i = 0, 1, . . . , 127) where the symbol ⊕ represents the XOR logic operation. After l rounds, both states have been shifted jointly according to the keyed-permutation Pk,n described in Sect. 2, that is S128+j = S0+j ⊕ S47+j ⊕ S70+j S85+j ⊕ S91+j ⊕ kj (j = 0, 1, . . . , l),
(1)
(similarly for Si∗ ). In the same way, the differential register satisfies its own recurrence relationship, giving rise to the differential trail S 128+j = S 0+j ⊕ S47+j ⊕ S70+j S85+j ⊕ S 91+j (j = 0, 1, . . . , l). (2) As the only nonlinear component per round of TinyJambu is the NAND logic operation, S70+j S85+j , then the differential (S70+j S85+j ) can be a good measure of how the differences propagate through the permutation along l rounds. On the other hand, since the complementation of the logic product is defined as ab = ab + 1, we can easily omit the constant 1 by replacing the NAND gate for an AND gate without affecting the results of this differential analysis. Finally, we call active AND gate to the differentials S70+j S85+j = 1, as they propagate the actual differences that allow this type of cryptanalytical technique. In brief, we try to find differential trails that, after a number l of rounds (in practice l = 384), minimize the number X of active AND gates, as a trail with a score of X can be satisfied with probability p = 2−X . As long as the probability p ≥ 2−64 , it allows to launch a forgery attack that breaks the 64-bit security claimed by the cryptosystem designers [3].
Reducing the Security Margin Against a Differential Attack
19
3.2 Refined Model for Differential Trails: Correlated AND Gates In the submission document of TinyJambu [3], the designers searched for differential trails with a minimum number of active AND gates. Nevertheless, they considered every AND gate independently and treated every AND gate separately. Then, in [4] the authors introduced a new model that took into account the first order correlations of the AND gates that frequently occur in TinyJambu. The main fact that motivates this refinement in the model to count the active AND gates is that the same binary value a that enters the AND gate at a particular round will enter again the same gate 15 rounds later. More precisely, according to the TinyJambu recurrence relationship S85+j will be an input of the AND gate with S70+j as well as 15 rounds later it will be an input of the same gate with S100+j . This fact raises the question about the correlation of a · b and b · c for some binary values a, b, c. Let ab and bc denote the output difference of the AND gate for a · b and b · c, respectively. We can jointly consider both differences. In [4, Subsect. 3.3], it is proved that if (a, b, c) = (1, 0, 1) and b = 1, then ab = bc = 1. Under these conditions, both differences propagate jointly and we count them as a single active AND gate. In terms of the state bits of TinyJambu, we can write the previous conditions such as follows: if (S70+j , S85+j , S100+j ) = (1, 0, 1) as well as S85+j = 1, then the differences (S70+j S85+j ) = (S 85+j S100+j ) = 1. As we count both correlated active gates as a single active AND gate, we reduce the number X of active gates through l rounds and the success probability p = 2−X of a forgery attack is consequently incremented. 3.3 Differential Probabilities In the submission document of TinyJambu [3], the designers searched for differential trails under four different constraints about the active-bit positions of the input/output states Si and Si∗ through the permutation Pk,n . Type 1: Input differences only exist in the 32 most significant bits of the differential (MSBs). No constraint on the output difference. Type 2: No constraints on the input. Output differences only in the 32 MSBs. Type 3: Both the input and output differences only exist in the 32 MSBs. Type 4: No constraints. Type 3 is the most relevant scenario for the security of TinyJambu since, as described in Fig. 1, this cryptosystem processes a nonce in multiple blocks. In fact, it absorbs a 96-bit nonce in three blocks, thus the input and output differences of the optimal trail can be injected in the first two blocks and then to choose a new value for the third block. The differential trails of type 3 correspond to that situation and differential cryptanalysis can exploit the three blocks of the nonce for a forgery attack. In [3], the designers claim that the maximum probability for a differential trail of type 3 is p = 2−80 , which is sufficient to guarantee 64-bit security as mentioned before. Nevertheless, in [4] the authors by means of their refined model found a better differential trail for l = 384 rounds that is given in Table 1 (in hexadecimal format). It consists of 88 active AND gates with 14 correlated active ANDs. Therefore, this differential trail
20
A. Fúster-Sabater and M. E. Pazo-Robles
propagates with a probability of p = 2−88+14 = 2−74 . Later, they evaluated its differential probability by identifying multiple trails with the same input and output differential masks (enhanced in red) whose distribution is depicted in Table 2. Table 1. A differential trail type 3 with probability 2–74 for 384 rounds. Input:
Output:
∆S127…0 ∆S255…128 ∆S383…256 ∆S511…384
01004800 81044c80 81004082 81004082
00000000 24080304 00010200 00000000
00000000 d9200000 83000010 00000000
00000000 22090000 26090240 00000000
Table 2. Differential probabilities with the same input/output differential masks. Probability
2−74
2−75
2−76
2−77
2−78
2−79
2−80
# Trails
1
5
9
14
20
24
30
Finally, summing up all these probabilities, they get the differential probability p = 1 · 2−74 + 5 · 2−75 + 9 · 2−76 + 14 · 2−77 + 20 · 2−78 + 24 · 2−79 + 30 · 2−80 . Thus, the final differential probability is p = 2−70.68 ,
(3)
which is much higher than the probability p = 2−80 obtained by the designers.
4 Reducing the Security Margin of the TinyJambu In this Section, we introduce the main contributions of this work that can be summarized as follows: 1. Proceeding in a similar way as before, we improve the previous results given in references [3] and [4] obtaining a better differential probability. 2. We reduce the security margin of TinyJambu in 14 bits against a forgery attack. More precisely, we have searched for differential trails with a number of active AND gates X satisfying the inequality X < 74, that is reducing the minimum number of gates given in Table 1 for l = 384 rounds. Our results are described in the following items. 1) We have carried out a more detailed and deeper search of differential trails by using the Gurobi Optimizer [7] and the refined model developed in [4]. We have ranged in a greater interval of possible solutions provided by Gurobi.
Reducing the Security Margin Against a Differential Attack
21
2) We have found a differential trail with a number of active AND gates of value X = 71 as it can be seen in Table 3. In fact, we obtained a differential trail type 3 with 84 active AND gates and 13 correlated ANDs. In Table 3, it is listed all the differential propagation in order to provide information about the intermediate differences. 3) Later, we have identified multiple trails with the same input S127…0 and output S511…384 differentials (Table 3) with a number X of active AND gates in the range X [71, . . . , 75]. The distribution of such trails is depicted in Table 4. It is easy to see that the number of differential trails associated with each one of the values of X is greater than the number of trails obtained in [4] and depicted in Table 2. Combining both effects (greater number of trails with less values of X), we get a final differential probability of value p = 5 · 2−71 + 10 · 2−72 + 22 · 2−73 + 18 · 2−74 + 16 · 2−75 . p = (18.75)2−71 . Thus, the final differential probability is p = 2−66.7712 .
(4)
4) Our final differential probability is much greater than the probability of value p = 2−80 obtained by the designers and rather greater than the probability p = 2−70.68 obtained by the authors of [4]. The new differential probability increments the success probability for a cryptanalytical attack. 5) The new differential probability is, from a numerical point of view, on the target to join the inequality p ≥ 2−64 . 6) We have used the Gurobi Optimizer [7] in a more efficient way than the previous authors combining it with programs in Python language (Python 3.11 64-bit). In type 3 differences, the input differential are the differences S127…96 while the output differential are the differences S511…480, as the remaining values equal 0. As a summary, we can say that we have built Table 3 and Table 4 with their corresponding numerical results. We found several trials with a higher probability for differential cryptanalysis of type 3 than those obtained by other authors. Table 3. A differential trail type 3 with probability 2–71 for 384 rounds. Input:
Output:
∆S127…0 ∆S255…128 ∆S383…256 ∆S511…384
048a2000 44800001 40800441 40800441
00000000 12000986 00008100 00000000
00000000 48800020 0880000c 00000000
00000000 91440000 12009120 00000000
Since our computational resources are quite limited (just a desktop PC), a more powerful computational scenario could derive in an even better differential probability that would lead, in turn, to a real forgery attack, breaking the claimed 64-bit security of TinyJambu.
22
A. Fúster-Sabater and M. E. Pazo-Robles
Table 4. Differential probabilities corresponding to multiple trails with the same input and output differential masks Probability
2−71
2−72
2−73
2−74
2−75
# Trails
5
10
22
18
16
5 Conclusions and Future Work In this work, we have analyzed the security of the cryptosystem TinyJambu against a possible forgery attack. Based on a refined model to count the number of active AND gates, we have searched differential trails with a minimum number of gates compared with other analysis and numerical results found in the literature. These better differential trails allow us to compute a final differential probability that increments the success probability of a hypothetical forgery attack. In the future, we intend to execute the programs used in this study in a more adequate computational environement to approach the differential probability to the bound specified for a successful cryptanalytical attack. Moreover, the results here obtained can be extrapolated to an updated version of TinyJambu with 640 rounds in the keyed permutation. The study of the relationship between the number of rounds and the minimum number of active AND gates is also one of our priorities for a near future work. Acknowledgements. This work is part of the R+D+i grant P2QProMeTe (PID2020-112586RBI00), funded by MCIN/AEI/10. 13039/501100011033.
References 1. National Institute of Standards and Technology. Lightweight Cryptography (LWC) Standardization Project, 2019. https//csrc.nist.gov/projects/lightweight-cryptography. Accessed 30 Apr 2023/04/30 2. NIST Lightweight Cryptography Finalists. https://csrc.nist.gov/Projects/lightweight-cryptogra phy/finalists. Accessed 24 Apr 2023 3. Wu, H., Huang, T.: TinyJAMBU: A Family of Lightweight Authenticated Encryption Algorithms. The NIST Lightweight Cryptography (LWC) Standardization Project (A Round-2 Candidate) (2020). https://csrc.nist.gov/CSRC/media/Projects/lightweight-cryptography/docume nts/round-2/spec-doc-rnd2/TinyJAMBU-spec-round2.pdf 4. Saha, D., Sasaki, Y., Danping, S., Sibleyras, F., Sun, S., Zhang, Y.: On the security margin of TinyJAMBU with refined differential and linear cryptanalysis. IACR Trans. Symmetric Cryptol. 2020(3), 152–174 (2020) 5. Mouha, N., Wang, Q., Gu, D., Preneel, B.: Differential and linear cryptanalysis using mixedinteger linear programming. In: Wu, C.K., Yung, M., Lin, D. (eds.) Inscrypt 2011, LNCS, vol. 7537, pp. 57–76. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34704-7_5 6. Teng, W., Salam, I., Yau, W.C., Pieprzyk, J., Phan, R.C.: Cube attacks on round-reduced TiniJAMBU. Sci. Rep. 12, 5317 (2022). https://doi.org/10.1038/s41598-022-09004-3. Nature Portfolio 7. Gurobi Optimizer. http://www.gurobi.com/
Fuzzing Robotic Software Using HPC Francisco Borja Garnelo Del R´ıo(B) , Francisco J. Rodr´ıguez Lera , an Olivera Camino Fern´ andez Llamas , and Vicente Matell´ Universidad de Le´ on, Campus de Vegazana s/n, 24071 Le´ on, Spain [email protected], {fjrodl,cferll,vmato}@unileon.es https://robotica.unileon.es/ Abstract. Developing secure systems is particularly important in autonomous systems, where testing and debugging robotic platforms are very complex. This paper presents a novel HPC pipeline for fuzzing robotic software systems that control robots. Speeding up the fuzz process allows more testing to be performed in less time. By using parallel processing, HPC systems can perform computations much faster and more efficiently than traditional personal computers, in this way, the paper presents an updated version of the fuzzing pipeline for HPC environments. To validate the pipeline, a containerized version of the sota fuzzer RoboFuzz has been implemented for running in HPC, and to empirically evaluate its performance against a personal computer and the original approach. The full empirical approach is available to other researchers on GitHub. Keywords: Robotics · fuzzing · HPC Singularity · ROS 2 · Simulation
1
· RoboFuzz · Software security ·
Introduction
Fuzz testing, also known as fuzzing, was first used in a study of UNIX security in the 1990s [12,13]. It is a highly effective software testing technique employed to discover security vulnerabilities, programming errors, and other potential issues in software systems. By subjecting the target system to a multitude of random, malformed, or unexpected input data, fuzzing aims to provoke unanticipated responses, crashes, or unexpected behaviors, thus revealing weaknesses that can be exploited or lead to system instability. Fuzz testing techniques can be applied to evaluate the robustness and security of the software components governing the robot’s operation [8]. Robotic systems typically consist of numerous interconnected modules such as control algorithms, sensor data processing, communication protocols, and user interfaces. The complexity of these systems makes them highly susceptible to software bugs and vulnerabilities, which can lead to malfunctions or even catastrophic failures, thus having a high impact these days on robots, for instance, in collaborative environments. Fuzz testing in robotics can be applied at various levels of granularity, from individual components or modules to the entire system. For instance, fuzzing c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 23–33, 2023. https://doi.org/10.1007/978-3-031-42519-6_3
24
F. B. G. D. R´ıo et al.
can be used to test the robustness of specific algorithms, such as localization or path planning, by providing them with unexpected or corrupt sensor data. At a higher level, fuzz testing can be applied to the communication protocols and data exchange between different modules within a robotic system, simulating failures or attacks on the communication channels. Thus, it is necessary to speed up the process of evaluating any piece of source code that can be deployed in the robot. Complex tasks or algorithms with large datasets should be tested in reasonable windows of time, which leads to the use of High-Performance Computing (HPC) systems. HPC can significantly enhance the fuzzing testing process by providing the computational resources and parallel processing capabilities needed to efficiently explore the vast input space of complex systems, such as those found in robotics. The use of HPC for fuzzing testing can lead to improvements in several key aspects of the fuzzing process: – Scalability: Fuzz testing can generate a large number of test inputs, and processing these inputs can be computationally expensive, especially for complex systems. HPC provides the necessary computing power to enable parallel execution of test cases, which dramatically increases the scalability of the fuzzing process. – Speed: By leveraging the parallel processing capabilities of HPC systems, fuzz testing can be performed much faster than in traditional single-node systems. – Real-time monitoring and analysis: The use of containers on HPC enables realtime monitoring and analysis of the fuzz testing process, providing researchers with insights into the system’s behavior and vulnerabilities as they are discovered. This can help guide the testing process, enabling more targeted and effective testing strategies. – Large-scale simulations: HPC resources can be used to run large-scale simulations of complex robotic systems, allowing for more realistic and comprehensive fuzz testing scenarios. – Enhanced collaboration: HPC resources often come with advanced collaborative tools and frameworks, enabling researchers and developers from different locations to collaborate more effectively on fuzz testing projects. This can lead to better knowledge sharing, faster identification of vulnerabilities, and more efficient development of fixes and patches. Different authors have proposed the use of HPC in robotics. Camargo et al. [3] have proposed an HPC-ROS package for enhancing robot performance in different run-time robot processes. Arumugam et al. [1] use HPC to parallelize the FastSLAM algorithm using a Hadoop system. In the same way, this work presents how state-of-the-art fuzzing tools for Robot Operating System (ROS) can be used in HPC systems. This means that fuzzing could be included as part of the test and simulation phase of a robotic DevOps flow [11]. 1.1
Contribution and Hypotheses
The goal of this paper is to provide a set of empirical evidence to categorize and assess a State-Of-The-Art fuzzing tool for software developed using Robot
Fuzzing Robotic Software Using HPC
25
Operating System (ROS). Thus, the research question faced in this work can be expressed as: What are the implications of using HPC systems when fuzzing ROS applications?
This question arises from the following hypothesis and questions: 1. H1: Distributing the fuzzing workload across multiple processing units can significantly increase overall fuzzer performance. 2. Q1: What are the key concepts associated with Fuzzing in HPC? 3. Q2: What is the performance of the fuzzer when running in a personal computer or HPC computer? The two main contributions of this work are: – Empirical evaluations that allow us to answer the hypothesis above. – A set of publicly available Singularity containers deploying a ROS 2 fuzzer. – Lessons learned of using a SOTA fuzzer in an HPC environment. The remainder of this paper is organized as follows. The next section presents the state of the art, focusing on the fuzz problem. Section 3 explains the technologies explored in this research. Section 4 presents the evaluation process carried out. The lessons learned in the process are summarised in Sect. 5 and the paper closes with some conclusions in Sect. 6.
2
State of the Art
Fuzzing testing is coming along rapidly in the last two years to evaluate robotics software. Woodlief et al. [16] present the PHYS-FUZZ tool. They aim to perform fuzzing of robotic software that integrates traditional fuzzing with the physical attributes and hazards associated with the context of mobile robots. Xie [17] designed ROZZ, a multi-dimensional generation method to generate effective test cases for ROS programs. Again, they consider multiple dimensions such as configuration parameters, messages from robot sensors, or human input introduced from GUI, command lines, and ROS services. In addition, they also propose mutation strategies, which could of course be increased with machine learning approaches [15]. Seulbae [7] illustrates the effectiveness of their tool, RoboFuzz, tested in a case study based on a robotic manipulator system and compares its performance with other state-of-the-art fuzz testing tools. As soon as we integrated a fleet of diverse robots and have to think about robot dimensions, dynamic characteristics of the environment or estimate trajectories and control decisions, as well as introduce machine learning approaches, it would be necessary to optimize this process using HPC computing. In this sense, several researchers have been studying how to introduce HPC in the field of robotics. Camargo-Forero is one of the researchers who had explored the use of HPC in different robotic environments defining the concept of High-Performance Robotic Computing[4]. However, this approach explores the HPC process using embedded
26
F. B. G. D. R´ıo et al.
solutions. We aim to focus on HPC premises, outside the robot as the goal is to take advantage of these infrastructures. Matell´ an et al. present in [10] a position paper overviewing the use of HPC for exploring computation in cloud systems with cybersecurity and explainability. Brewer et al. propose in [2] a set of software components deployed within Singularity containers that run a Robot Operating System (ROS) supported on Message Passing Interface (MPI) with the idea of benchmarking HPC performance with multiple vehicle simulations. Franchi proposes in [5,6] an approach to run a well-known robot simulator in the robotics community, Webots, in a cluster to enhance the overall experimental performance, finding in preliminary results a total of 2,304 runs in a cluster versus 74 runs of a personal computer during 12 h. Again, Franchi’s work relies on Singularity but also proposes using Docker containers for part of the simulation. These works have in common how to move part of the robotic development flow to the cloud, and those with empirical processes, supported in Singularity Containers. This has motivated us to apply Singularity[9] as a containerization approach. Moreover, it is the preferred containerization approach of SCYLE, the HPC center associated with our University. The authors intend to continue the definition of the HOUSE framework proposed in [14] by adapting the fuzzing process to current robotic environments based mainly on ROS for decoupled deployment in an HPC. The idea is to improve fuzzying processes in a multi-variable environment as the one defined by robot contexts.
3
Materials and Methods
This section describes the experimental design and procedures used to carry out the research when deploying the fuzzer in the HPC. 3.1
Tools
The main tool used in the experiments is RoboFuzz [7], an autonomous fuzz testing tool for robotic systems. This tool uses machine learning techniques to generate diverse and valid test inputs for robotic systems and leverages system feedback to improve testing efficiency. This tool is available as a 20G docker image to run in a computer with Docker straightforwardly. SLURM (Simple Linux Utility for Resource Management) is also used in the experiments. SLURM is a widely used open-source workload manager and job scheduler for Linux and Unix-based clusters and supercomputers. It provides a flexible and scalable framework for managing and scheduling jobs across a large number of nodes, enabling efficient use of computing resources and facilitating the deployment of complex computational workflows.
Fuzzing Robotic Software Using HPC
3.2
27
Experimental Design
The main parameters of the experiments carried out are: Duration Three types of tests, 1h, 2h, and 4h, with the incremental duration with a 2x growth factor, were chosen to model the results linearly and facilitate comparisons. Iterations Each experiment was repeated three times to minimize measurement errors and to smooth out potential inferences from other jobs on the HPC nodes. Parallelization The fuzzer was tested: 5, 10, and 20 HPC jobs for every fuzzing test; MI2 y TB3. SLURM tool was applied using the default scheduling assigning workload to two different HPC nodes for 20 jobs and to one node for 5 and 10 jobs. Three main factors have been considered to compare the performance in the different experimental environment systems: Exec (N execs) In the context of fuzzing, an exec refers to a single iteration of the testing process. Specifically, an exec in RoboFuzz involves running the target system, publishing a mutated message, checking for errors, and terminating the target. Each exec is typically designed to test a specific aspect of the system’s behavior or vulnerability, with the results of multiple execs combined to generate a comprehensive assessment of the system’s security and robustness. Round (N rounds) A round of fuzzing consists of multiple executions performed under a specific mutation schedule. In RoboFuzz, the mutation schedule is controlled by a scheduler module that defines the sequence and parameters of mutations applied to the input messages. Each round is designed to test a specific set of vulnerabilities or edge cases in the target system, with the results of multiple rounds providing a complete picture of the system’s robustness and security. Cycle (N cycles) A cycle in RoboFuzz is a collection of several rounds, typically performed on a single input message. The number of rounds in a cycle is configurable and can be adjusted depending on the complexity and size of the input message. During a cycle, RoboFuzz applies a series of mutations to the input message and tests the system’s behavior under each mutation. When a new cycle begins, RoboFuzz moves on to the next message in the queue and repeats the process, continuing until all messages in the queue have been tested. The cycles are designed to thoroughly test the behavior of the system and identify potential vulnerabilities or weaknesses that may not have been detected in previous rounds or execs. To normalize the data obtained from the experiments, a metric was created. This metric should preserve the order in the data obtained and provide similar values for the two experiments performed, taking into account that MI2 works with cycles and rounds, and TB3 uses execs too but provides no information about cycles. The result is the following metric:
28
F. B. G. D. R´ıo et al.
score = (N cycles ∗ 5) + (N rounds ∗ 10) + N execs Two different robotics experiments from RoboFuzz have been used in the experiments: MoveIt MI2, consisted of testing the MoveIt 2 library, which is a robotic manipulation library for ROS that implements fundamental robotic manipulation concepts. Turtlebot 3 TB3, tested a differential wheeled mobile robot equipped with a LiDAR sensor. Each experiment was tested on two different platforms: Standalone SDO, virtual machine with 8GB ram and 6 x Intel Xeon E3-12xx v2 vcpu (virtual cpu) and linux kernel 5.3.11-100.x86 64 (x86 64) when using containers the distro is indifferent. Local storage on mechanical hard disks. High-Performance Computing HPC, computing cluster with Haswell nodes in bare-metal with 48GB of ram and 2 x Intel Xeon E5-2630 v3 @ 3.20GHz with a total of 16 cores and a Linux kernel 3.10.0-1062.9.1.x86 64 (x86 64) when using containers the distro is indifferent. Network storage and cache on solid disks. The correction factor is used to account for the impact of hardware differences, such as CPU clock speed or memory size, on the performance metric, and to ensure that the comparison focuses on performance differences between different environments and not on hardware variations. We have used Phoronix Test Suite v10.8.4 as a tool to calculate it. Phoronix test suites are designed to be easy to use and configure, providing a standardized and repeatable methodology and a detailed set of instructions for executing the tests and interpreting the results. 3.3
Data Collection and Analysis
The methods used to collect data, including any instruments or tools used, data sources, and data collection procedures are presented in our GitHub repository1 .
4
Results
First, we analyzed the performance in a personal computer where we compared docker vs singularity. The measures obtained are presented in Table 1.
1
https://github.com/b0rh/HOUSE#robofuzz-code-and-documentation-experimentresults-and-download-singulary-container.
Fuzzing Robotic Software Using HPC
29
The results demonstrate that Docker outperforms Singularity in terms of our proposed score. Specifically, we found that the MI application ran twice as fast on Docker compared to Singularity and that Docker was able to score twice as much with the available system resources. However, TB3 tests performed slightly better under the same conditions. Table 1. Score value statistics between containerization environments in SDO. Docker MI2 TB3
Singularity MI2 TB3
Mean 431.667 425.556 Std. Deviation 303.614 236.547 35.000 186.000 Minimum 960.000 724.000 Maximum
299.444 296.821 25.000 900.000
412.667 228.065 177.000 713.000
Figure 1 Illustrates the performance and the differences between Docker and Singularity. These findings suggest that original Docker images are best suited to our specific use case and highlight the importance of selecting the right containerization tool for the specific requirements of a given application. It should be noted, however, that specific performance differences between Docker and Singularity may vary depending on the particular use case and configuration, and further testing may be necessary to determine the optimal tool for other applications. Secondly, the performance difference was evaluated when the singularity container is running on an HPC facility. It is commonly understood that HPC outperforms personal computers, however, it will depend on the version of the kernel, the architecture of the node, and the number of jobs. For instance, HPC 1000
score
800 600 400 200 0 docker-sdo-MI2
docker-sdo-TB3
singularity-sdo-MI2 singularity-sdo-TB3
environment-infrastructure-test
Fig. 1. Comparison between sdo environments and fuzzing tests using the average score of all single-task runs.
30
F. B. G. D. R´ıo et al.
systems may become outdated and lack the processing power and speed of newer computers. In addition, the architecture of the HPC system and the way it is configured can also affect its performance. As shown in Fig. 2, we can observe that an SDO computer outperforms a single HPC node. Finally, to determine the optimal number of jobs needed to achieve the same level of performance as a new-generation computer, additional experiments were necessary. Although previous data has shown that HPC systems can outperform new-generation computers under certain workloads, the specific conditions under which this is true are not reproduced here. Therefore, the same experiment was conducted with 5, 10, and 20 jobs. The MI2 results achieve almost the same performance with 5 jobs and far exceed it with 10 or 20 jobs. However, the same is not true for the TB3 tests, where 10 jobs are required to achieve similar performance and 20 jobs show no more than a 25% increase in score.
5
Discussion
This section details several problems encountered in the course of the development of this work. These issues are related to the size of the container image, delays in the container executions, concurrency problems anomalies detected.
Fig. 2. The diagram presents a comparison between: (green bars ) the average score of 2h single tasks using a standalone computer (sdo), (red bars ) the average score of 2h single tasks using the singularity environment in the HPC infrastructure, and (blue bars ) parallel tasks using the score accumulated by the n parallel jobs in the HPC.
Fuzzing Robotic Software Using HPC
31
Image size was too large The first issue found was the Robofuzz image size was too large. So, the first step was to take advantage of the properties of Singularity, and a fixed based on a read-only file system is proposed, with a directory mapping between the container and the host that uses a unique identifier for each execution. This approach reduces the size of the image from 22.7GB in the Docker image to 7.3GB in the Singularity image and allows the entire execution context of RoboFuzz and ROS2 to be stored independently and already outside the container. This approach allows the clean-up of previous execution traces and the reuse the same container. Delay between the stop order and its execution The Slurm queue manager used in the HPC has a delay of between 10 s and one minute to finish container executions: To improve the accuracy of the benchmark fuzzing, the timeout command was used to define a time limit for the execution of the main command passed as a parameter. It is also used to precisely control the times in the tests using docker. The time it takes for a job to stop from the moment it is triggered, as defined at job creation until all its processes are completed or killed on the nodes, can range from 10 s to more than a minute. To address this issue, a command was added as part of the execution of each fuzzing process to be measured (timeout), which runs in the same context as the fuzzer. Once the specified time has elapsed, the command terminates or analyzes the process (since it is the process’s child, it is the one that invokes it with milliseconds). This problem also occurred with Docker and Singularity, where it could take a long time from the time the container was given the stop command until it stopped. The graphic session dependency A regular graphical session is required to run some RoboFuzz examples. This limitation of the docker container, or its use in native, has been solved by using the QT library and VNC as a backend. This approach allows tests and scripts that require a graphical session to be used in a scalable way in environments without traditional graphical sessions. Standard screen output using slurm + singularity in the HPC infrastructure does not work for some crawlers that gather output using Python. A Python directive was used to disable the use of buffers and write the data on the fly to a file mapped to the host directory, with a unique path in each instance to avoid concurrency. Concurrency of files and blocking permissions. The permissions for directory bindings in Docker do not work properly. To run a script and output results we propose a script added in a variable and passed as part of the instantiate command. Identified anomalies The following anomalies have been found during testing: – Docker test pts/fs-mark performance: Due to limitations with unprivileged containers, it is not possible to run some of the Phoronix Test Suite (pts) tests.
32
F. B. G. D. R´ıo et al.
– Unstable RoboFuzz MI2 test: After two hours of execution, the MI2 test that implements the RoboFuzz example fuzzing tests for MoveIt 2 + Pandas becomes unstable. Further work will explore these issues looking for long-term tests. – Performance difference between the SDO and HPC jobs Linux kernel: The SDO Linux kernel (5.3.11-100.x86 64) is more modern than the HPC Linux kernel (3.10.0-1062.9.1.x86 64), with almost 5 times more compute performance and almost 3 times more memory performance. The main cause is how the latest kernel versions patch security vulnerabilities in Intel processors and performance improvements in container handling. The results show that the system configuration, the applications, and the type of robot have a great impact on the system performance. Starting with 20 HPC jobs characterized by the hardware presented here, the proposed H1 is confirmed. However, it is not possible to say the same with the lower number of jobs considering the proposed fuzzing flow.
6
Conclusions
In conclusion, High-Performance Computing can significantly improve the fuzzing testing process by providing the necessary computational resources, parallel processing capabilities, and advanced algorithms to more efficiently and effectively explore the input space of complex systems, such as robotic software. The latest kernels offer significantly better performance than older ones thanks to improvements in container management and patching of processor vulnerabilities. However, parallelization, by splitting the workload across multiple threads, can significantly reduce the time it takes to complete a task.
References 1. Arumugam, R., et al.: DAvinCi: a cloud computing framework for service robots. In: 2010 IEEE International Conference on Robotics and Automation, pp. 3084– 3089. IEEE (2010) 2. Brewer, W., Bretheim, J., Kaniarz, J., Song, P., Gates, B.: Scalable interactive autonomous navigation simulations on HPC. In: 2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7. IEEE (2022) 3. Camargo-Forero, L., Royo, P., Prats, X.: Towards high performance robotic computing. Robot. Auton. Syst. 107, 167–181 (2018) 4. Camargo-Forero, L., Royo, P., Prats, X.: Towards high performance robotic computing. Robot. Auton. Syst. 107, 167–181 (2018). https://doi.org/10.1016/j.robot. 2018.05.011 5. Franchi, M.: Webots.HPC: a parallel robotics simulation pipeline for autonomous vehicles on high performance computing. arXiv preprint arXiv:2108.00485 (2021) 6. Franchi, M., et al.: Webots.HPC: a parallel simulation pipeline for autonomous vehicles. In: Practice and Experience in Advanced Research Computing, pp. 1–4 (2022)
Fuzzing Robotic Software Using HPC
33
7. Kim, S., Kim, T.: RoboFuzz: fuzzing robotic systems over robot operating system (ROS) for finding correctness bugs. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, pp. 447-458. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3540250.3549164 8. Kim, T., et al.: RVFuzzer: finding input validation bugs in robotic vehicles through control-guided testing. In: USENIX Security Symposium, pp. 425–442 (2019) 9. Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PloS one 12(5), e0177,459 (2017) ´ 10. Matell´ an, V., Rodr´ıguez-Lera, F.J., Guerrero-Higueras, A.M., Rico, F.M., Gin´es, J.: The role of cybersecurity and HPC in the explainability of autonomous robots behavior. In: 2021 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO), pp. 1–5. IEEE (2021) 11. Mayoral-Vilches, V., Garc´ıa-Maestro, N., Towers, M., Gil-Uriarte, E.: Devsecops in robotics. arXiv preprint arXiv:2003.10402 (2020) 12. Miller, B.P., Fredriksen, L., So, B.: An empirical study of the reliability of UNIX utilities. Commun. ACM 33(12), 32–44 (1990). https://doi.org/10.1145/96267. 96279 13. Miller, B.P., et al.: Fuzz revisited: a re-examination of the reliability of UNIX utilities and services, p. 23 (1995) 14. Garnelo del R´ıo, F.B., Rodr´ıguez Lera, F.J., Esteban Costales, G., Fern´ andez Llamas, C., Matell´ an Olivera, V.: 39 house: Marco de trabajo modular de arquitectura escalable y desacoplada para el uso de t´ecnicas de fuzzing En HPC (2021) 15. Wang, Y., Jia, P., Liu, L., Huang, C., Liu, Z.: A systematic review of fuzzing based on machine learning techniques. PloS one 15(8), e0237,749 (2020) 16. Woodlief, T., Elbaum, S., Sullivan, K.: Fuzzing mobile robot environments for fast automated crash detection. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5417–5423. IEEE (2021) 17. Xie, K.T., Bai, J.J., Zou, Y.H., Wang, Y.P.: ROZZ: property-based fuzzing for robotic programs in ROS. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 6786–6792. IEEE (2022)
Intrusion and Fault Detection
Intrusion Detection and Prevention in Industrial Internet of Things: A Study Nicholas Jeffrey1(B)
, Qing Tan2
, and José R. Villar1
1 University of Oviedo, Oviedo, Spain
[email protected] 2 Athabasca University, Athabasca, Canada
Abstract. The Industrial Internet of Things (IIoT) brings the ubiquitous connectivity of the Internet of Things (IoT) to industrial processes, optimizing manufacturing and civil infrastructures with assorted “smart” technologies. This ubiquitous connectivity to industrial processes has increased the attack surface available to threat actors, with increasingly frequent cyber attacks on physical infrastructure resulting in significant economic and life safety consequences, due to service interruptions in power grids, oil distribution pipelines, etc. The difference between IoT and IIoT is largely one of degree, with the consequence of service interruptions to IoT (ie home automation) typically limited to mild inconvenience, while interruptions to IIoT environments (ie power grids) have more significant economic and life safety consequences. The field of Intrusion Detection Systems / Intrusion Prevention Systems (IDS/IPS) has traditionally focused on cyber components rather than physical components, which has resulted in threat detection capabilities in IIoT environments lagging behind their non-industrial counterparts, leading to increasingly frequent attacks by threat actors against critical infrastructure. This paper reviews the current state of IDS/IPS capabilities in industrial environments and compares the maturity and effectiveness to the more established IDS/IPS capabilities of non-industrial Information Technology (IT) networks. As a new contribution, this paper also identifies gaps in the existing research in the field, and maps selected challenges to potential solutions and/or opportunities for further research. Keywords: Industrial Internet of Things · IIoT · Cyber-Physical Systems · Intrusion Detection · Intrusion Prevention
1 Introduction The fourth industrial revolution, commonly known as Industry 4.0, began in 2011 as a research initiative of the German government, with the goal of leveraging ubiquitous network connectivity and big data to improve industrial manufacturing processes [1]. Ubiquitous connectivity allowed manufacturing processes to be optimized through smart automation, Cyber-Physical Systems (CPS), big data, and artificial intelligence. Earlier generations of industrial control processes were isolated systems using primitive © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 37–48, 2023. https://doi.org/10.1007/978-3-031-42519-6_4
38
N. Jeffrey et al.
relay logic, which soon gave way to the Industrial Internet of Things (IIoT), with hyperconnected sensors and actuators forming an intelligent CPS, a fusion of Information Technology (IT) and Operational Technology (OT) environments. This merging of IT and OT networks has tremendously increased economic activity and brought quality of life improvements, but is not without challenges. The rapid growth of IIoT has outpaced advancements in cybersecurity, with new threat models and security challenges that lack a unified framework for secure design, malware resistance, and risk mitigations. Much of the attention from academia and industry is focused on consumergrade IoT devices (smart home automation, etc.). Industrial-grade IoT seems to have less attention from academia and industry, which is unfortunate, as the consequences of IIoT failure are much higher (power grid failure, oil pipeline shutdowns, train switching, etc.) This paper presents a study of the current state of IDS/IPS in IIoT environments, and as a new contribution, identifies outstanding challenges in the field, which include resource constraints, a lack of standardized communication protocols, extreme heterogeneity that hampers industry consensus, and different information security priorities between Operational Technology (OT) and Information Technology (IT) networks. These selected challenges are mapped to potential solutions and/or opportunities for further research. In enterprise IT networks, it is common for IDS and IPS services to be co-located on the same device (perimeter firewall, centralized logging host, etc.). Depending on the risk tolerance level, a network may use a passive IDS that “sniffs” network traffic and sends alert notifications to the system operator, or may employ an active IPS that in addition to performing detection, will actively block or shut down a detected intrusion. An active IPS is typically located on an inline device such as a perimeter firewall, and inspects all network traffic in real time, blocking any traffic that is detected as unwanted. For historical reasons, the operators of IIoT environments are much more risk-averse than the operators of IT networks, due to the significant economic and life safety concerns associated with false positives. Since availability is of paramount importance in IIoT environments, it is uncommon for an active IPS to be in use, as the system operators prefer to manually initiate any interruption or blocking of communications to avoid shutting down the IIoT environment due a possible false positive detection. Due to historical design goals of ICS, observability of system state [3] has typically been limited to the current real-time status of a particular sensor or actuator, with relatively simple threshold-based alerts for the system operator. The historical assumption of an ICS running on an isolated and fully trusted network meant that intrusion detection and intrusion prevention (IDS/IPS) were not design priorities, leading to a lack of observability in the increasingly hostile network layer of the IIoT environment, making it difficult to detect threats and malicious activity in an increasingly connected world. As hostile attacks on IIoT environments become more frequent, the need for robust and accurate IDS/IPS has become increasingly critical to industry and life safety, as Critical National Infrastructure (CNI) becomes increasingly interconnected to public networks. Therefore, further study is needed to advance the state of academic research on the issue, and to develop and apply preventative solutions for industry to ensure safe and secure IIoT environments.
Intrusion Detection and Prevention in Industrial Internet of Things
39
The remainder of this paper is organized as follows; Sect. 2 provides a review of the areas of coverage in existing literature, which will allow identification of gaps in the current research. Section 3 illustrates the current challenges, with potential solutions for advancing the state of art. Finally, Sect. 4 discusses the conclusions reached in this paper, as well as identifies opportunities for future research.
2 Literature Analysis The keywords described previously were used to search literature from the various described sources. A total of 24 papers were selected and reviewed for this study. As a study done by literature review, this section will provide statistical analysis to describe the existing research presented in the reviewed literature by publisher, publication type, publication year, and country of origin. The top 3 publishers (IEEE 52%, Springer 16%, ScienceDirect 16%, all others 16%) comprise the bulk of available research in this field and are all well-established academic publishers with robust levels of peer review and quality assurance. Most of the research in this area is published in academic journals (57%), with academic conferences a distant second (34%). The field of IIoT security is also heavily influenced by industry, but those efforts are typically for short-term tactical responses to current market threats and opportunities. For competitive advantage and trade secret reasons, industry efforts are rarely shared with the broader community, with “security by obscurity” still a common tactic in industry. To maintain relevance in a rapidly changing field, the reviewed literature in this paper is within the 5 years, with most articles from the past 3 years. The term “IIoT” was coined in 2011 by German researchers, and reached widespread adoption in 2015, so little research exists before that date. Earlier research related to IIoT existed in fields of cybernetics, industrial process control, and control logic and engineering. The USA is the largest single source of research in the area, with the top 3 countries generating more research than all other countries combined (Figs. 1, 2 and 3).
Fig. 1. Publishers
As discussed in previous sections of this paper, a CPS can be considered as a combination of IT and OT networks. The field of IT networks has mature and robust IDS/IPS solutions, but there have been significant challenges [4] in effective intrusion detection on OT networks.
40
N. Jeffrey et al.
Fig. 2. Publication types
Fig. 3. Countries of publication
Seng et al. [5] focus on the stark divide between industry and academia in the design and operation of IDS/IPS for industrial networks. The oldest and most common method of intrusion detection on both IT and OT networks is signature-based, which uses a database of known patterns to recognize malicious traffic. Signature-based IDS are extremely effective at detecting known attacks, but are typically unable to detect novel or zero-day attacks, which means the signature database must be frequently updated. To contrast, anomaly-based detection is typically based on using AI/ML to model normal behaviour, then classify any behaviour that falls outside those parameters as anomalous, and therefore a potential threat. A simplistic method of anomaly-based detection is to use predefined thresholds to define normal behaviour. In IT networks, these thresholds would apply to metrics such as processor or memory utilization, while OT networks would use physical environmental measurements such as temperature, pressure, voltage, etc. While threshold-based detection methodologies do not require the same frequent database updates as signature-based detection methods, they still share the same high levels of accuracy, and relatively low administrative burden on the part of the system operator to maintain the IDS. This combination of minimal administrative requirements and high levels of accuracy for known threats has resulted in high levels of industry acceptance and adoption of this type of IDS/IPS. It is interesting to note that there is a noticeable disconnect between the operators of IT networks vs the operators of OT networks, specifically relating to the adoption of signature-based vs threshold-based detection methodologies. The operators of IT networks will typically have a professional background that focuses on corporate data networks, which tend to be relatively homogenous in terms of operating systems and applications. Signature-based malware detection has been embraced in IT networks for
Intrusion Detection and Prevention in Industrial Internet of Things
41
decades, and the occasional false positive due to inaccurate signature files is typically easily mitigated by reverting to a previous backup, without potential for physical damage to the IT network. To contrast, the operators of OT networks will typically have a professional background in Controls Engineering, where the physical components of the CPS are much more important than the cyber components, resulting in higher focus on the physical operating parameters of the CPS, which can be easily measured in terms of performance thresholds, but not so much through signature-based analysis of network traffic. For this reason, the institutional knowledge for the operators of OT networks has heavily focused on threshold-based detection methodologies, resulting in historical inertia that prefers the continued use of threshold-based detection methodologies, to the minimization or even exclusion of signature-based detection methodologies. Additionally, OT networks are typically less tolerant of false positive threat detections, because there are more significant economic and life safety issues resulting from unplanned downtime of the physical industrial processes controlled by the CPS, which has contributed to this disconnect in the preferred anomaly detection methodologies between the operators of IT networks and OT networks. Seng et al. report that while the overwhelming majority of industry adoption of IDS/IPS in industrial environments use signature-based and threshold-based strategies, the academic community focuses almost entirely (97%) on behaviour-based detection strategies using AI/ML, which have very little uptake in industry. AI/ML algorithms are able to achieve very high accuracy rates (typically > 95%), but despite achieving high detection rates of both known and unknown threats, the proposed methods rarely progress from academia to industry. The extreme heterogeneity of IIoT environments is particularly challenging, as AI/ML models will typically have low applicability to other environments, making widespread adoption difficult. Availability of representative datasets is another significant challenge, with most of the available research datasets being artificially generated by researchers, with varying levels of fidelity to real-world IIoT environments. These limitations translate to a high administrative cost in terms of time and expertise on the part of the system operator, which significantly hinders widespread adoption. The primary challenges identified by Seng et al. with behaviourbased IDS are the availability of exhaustive datasets containing accurately labeled data, and the ongoing ease of administrative maintenance, both of which are currently unsolved problems. Khraisat et al. [6] further develop the works of Seng by building a taxonomy of current IDS/IPS techniques, and compare the different AI/ML techniques common in academic literature. Systematic comparisons of several ML algorithms are provided, each with varying advantages and disadvantages, with a common theme being that an ensemble method that combines multiple ML algorithms to leverage synergies to improve overall accuracy, by stacking different algorithms to achieve higher accuracy levels than can be achieved with a single algorithm. A disadvantage of this method is higher complexity and additional time and expertise requirements on the part of the system operator, which has hampered industry adoption. Vasan et al. [7] further develop the work of Khraisat on ensemble learning models to improve IDS/IPS accuracy in IIoT environments, with a novel approach for feature selection by stacking heterogeneous features, achieving very high classification accuracy
42
N. Jeffrey et al.
(99.98%) with low computational overheads that can be handled by resource constrained IoT devices. This ensemble learning model focuses on cross-platform malware, due to the rapid growth in IoT attacks against diverse processor architectures, driven by malicious actors refining their own adversarial capabilities. The proposed ensemble model is referred to as MTHAEL (Malware Threat Hunting based on Advanced Ensemble Learning), and combines multiple weak learner algorithms to train a strong learner to generate enhanced predictions based on the multiple predictions from the weak learner models. MTHAEL generates a normal baseline through disassembly of the executable binary files on the IoT devices to determine the OpCode instructions (ie machine code, one level lower than assembly language), to determine which operations occur during normal operation. This extremely low level of instructions provides cross-platform compatibility for intrusion detection, which is advantageous in the highly heterogeneous market of IoT devices. The primary advantage of this methodology is the ability to leverage the same IDS/IPS across a broad range of IoT devices, which helps reduce the administrative burden on the system operator by making the IDS/IPS available to a wider audience without additional customization. However, the initial disassembly of all application binaries is a laborious process that requires high levels of expertise, which is not common outside of academia, thereby limiting broad industry adoption. This type of IDS would best be deployed as a SaaS (Software as a Service) offering, offloading the ongoing maintenance of the IDS to a centralized subject matter expert that can leverage a federated dataset to provide rapid intrusion detection of both known and unknown threats. Abid et al. [8] propose a novel method of using a distributed IDS, defining the challenges as a big data problem, and using machine learning to sift through both legitimate and attack data from disparate sources in a centralized cloud-based environment, then feeding the analyzed / classified data back to the distributed nodes in the IDS to be acted upon. The value proposition of this method is to provide the Machine Learning (ML) model with richer data sources than can be obtained from a single viewpoint on the network, thus improving the classification accuracy of the IDS. While this approach is useful for a homogenous IIoT environment, there is limited applicability outside of a single organization due to the wide variation in IIoT architectures. This contrasts to IDS/IPS implementations in traditional enterprise IT networks, which tend to be more monocultural, and thus better able to leverage distributed IDS/IPS learning models. Bai et al. [9] propose an instruction-level methodology of malware detection in CPS / IIoT environments through the use of an out-of-band circuit board that collects power consumption details from the sensors and actuators in the CPS, and performs offline analysis to determine the components in the CPS are operating normally. This methodology avoids the more traditional signature-based detection of malware, opting for behaviour-based anomaly detection by using side-channel characteristics such as power consumption, acoustics, response time fluctuations, etc. Analysis of side-channel characteristics essentially turns the entire CPS into a deterministic finite state machine, with the IDS/IPS considering any non-deterministic behaviour as malicious. While this is a novel idea, requiring additional out-of-band hardware solely for behavioural monitoring adds significant complexity to a CPS, which is a significant barrier to industry adoption. Additionally, this method suffers from low detection accuracy if the CPS
Intrusion Detection and Prevention in Industrial Internet of Things
43
experiences normal load-based variation due to changing operational parameters being incorrectly classified as non-deterministic behaviour, so this methodology is best suited to relatively small and static CPS / IIoT environments. Chavez et al. [10] propose a hybrid IDS that treat the cyber and physical portions of the CPS as distinct environments with different threat profiles, using a signature-based IDS for the cyber components, and a behaviour-based IDS for the physical components. The objective of this hybrid model is to increase detection accuracy, by increasing the level of difficulty for a malicious actor to bypass the IDS on both the cyber and physical portions of the CPS. Additionally, by separating the detection methodologies for the IT and OT portions of the network, the update frequency of intrusion signatures can be independent, which is advantageous for the typically higher velocity of updates on the IT portion of the CPS. Additionally, the detection methodology for the OT portion of the CPS tends to be primarily threshold-based fault detections (ie voltages or temperatures out of defined ranges), rather than intrusion detections. This clear delineation between the IT and OT portions of the CPS allow more precise tuning of the alert thresholds to minimize false positives, which can have significant economic and life safety issues in IIoT environments. Gu et al. [11] propose a classification framework that uses Convolutional Neural Networks (CNN) to learn representative attack characteristics from raw network traffic to train a network-based IDS. This methodology is designed to counter the common ML challenge of having insufficient training data in an ML model due to the scarcity of malicious activity, as well as the heterogeneity of different IIoT environments preventing meaningful transfer of ML models between organizations. The proposed method attempts to minimize the data preprocessing by automatically transforming the data into fixed formats that are transferable across different data types, then using a discriminator to determine if the traffic is benign or malicious. The proposed use of CNN for automated feature extraction improves the accuracy of the ML model by mitigating the reduced accuracy typically introduced by unbalanced datasets, but still suffers from relatively high false positives and false negatives for previously unknown attacks. Rakas et al. [12] suggest that many IIoT environments are well-suited to intrusion detection via pattern recognition of a narrowly defined set of traffic patterns between specific hosts using specific communication protocols, with anything outside these narrowly defined boundaries defined as anomalous and potentially hostile. This is essentially a threshold-based anomaly detection strategy, which works well for static environments in which all communication patterns are known in advance. This strategy suffers when the consumers of sensor data are dynamic, or the amount of communication between nodes on the network varies due to external events that may be semi-random, such as weather fluctuations or human client behaviour. Ravikumar et al. [13] build on the work of Rakas by proposing a distributed IDS for federated CPS environments such as interconnected smart power grids. The proposed distributed IDS uses ethernet switch port mirroring to capture network traffic flow data, and consolidate that data to a centralized or cloud-based IDS environment for enhanced situational awareness of activity in a distributed and/or loosely coupled CPS. This allows IDS rules to be dynamically generated based on activity across a loosely coupled or distributed IDS such as a smart power grid to provide faster anomaly detection, providing
44
N. Jeffrey et al.
greater protection against cascading failures. On test datasets, the accuracy of simulated anomaly events was very high, but requires further development to be applicable to more dynamic live environments. Hwang and Lee [14] focus on improvements in anomaly detection from the perspective of minimizing false alarms from sensors by adding an interpretation layer to AI predictions that are opaque to the system operators, providing a higher quality of information to the human security analysts. This combines unsupervised machine learning with supervised machine learning by automatically adding data labels based on statistical mean prediction errors. Data labeling is typically a high-cost effort in terms of human time and expertise, which this proposal attempts to mitigate by automatically applying labels based on statistical analysis, with the intent of the IDS suffering from fewer false positive alerts.
3 Outstanding Challenges The field of IIoT is still quite young, with the term first defined in 2011, and coming into widespread use by 2015. Although barely a decade old, IIoT has grown in leaps and bounds in just a few short years, largely mirroring advancements in microprocessor technology, and the increased availability of high-speed wired and wireless networks. Due to rapid rate of change, IIoT has a number of outstanding challenges, a selection of which are described below. A modern IIoT can be considered as a combination of corporate computer networks and industrial control networks, sometimes referred to as IT (Information Technology) and OT (Operational Technology), each of which have different priorities. Traditional IT networks have used the so-called CIA (Confidentiality, Integrity, Availability) triad to define the organizational security posture, with each facet listed in order of importance. OT networks reverse that order [15, 16], with availability being the most important factor, followed by integrity, with confidentiality the least important facet of overall system security. This difference is largely due to IIoT growing out of earlier ICS networks used for industrial control processes, where availability was of the utmost importance, with integrity and confidentiality rarely considered due to usage of trusted and air-gapped isolated network environments. As OT networks merged with IT networks to form modern IIoT environments, those differing priorities have resulted in ongoing challenges that have yet to be fully resolved. IT networks heavily prioritize authentication (who you are) and authorization (what you are allowed to do), which roughly map to the confidentiality and integrity facets of the CIA triad of information security. However, OT networks have traditionally focused so heavily on the availability facet of the CIA triad, that authentication and authorization were assumed to be true by virtue of physical access to the trusted and isolated OT network. This historical assumption of a fully trusted and isolated environment is no longer true after the interconnection of IT and OT networks, resulting in vulnerability to common network-based attacks such as DDoS, MitM, replay attacks, impersonation, spoofing, false data injection, etc. Compounding the problem, OT networks typically lack integration with antimalware programs, as well as detailed logging capabilities, making it difficult to observe potentially hostile activity on OT networks [12].
Intrusion Detection and Prevention in Industrial Internet of Things
45
There are ongoing efforts [17] to extend the IDS/IPS capabilities of IT networks into OT networks, but the lack of standardized protocols and interfaces to the physical components of IIoT makes intrusion detection very challenging. Those IDS/IPS systems that have been extended into IIoT environments struggle with high levels of false positives and false negatives, due to the heterogeneous complexity of IIoT. While this extreme heterogeneity makes the field ripe with different research opportunities, the lack of a common design framework [18, 19] or standardized communication protocols makes it difficult to leverage prior research and industry expertise to drive forward the current state of the art. In other words, the extreme heterogeneity tends to require “reinventing the wheel”, resulting in duplication of effort across both academia and industry, which slows advancement in the field. The highly proprietary nature of IIoT products is due to their historical evolution from ICS, which were designed to operate on closed networks without interoperability or communication requirements with external networks. As OT and IT networks merged to become IIoT, the open standards and communication protocols used by IT networks have been rapidly adopted by OT networks [20, 21], but there is still significant opportunity for improvement, particularly for the OT networks that have unexpectedly found themselves connected to public and untrusted networks, including the Internet. Recognizing that proprietary and heterogenous communication protocols are a barrier to effective intrusion detection, as well as effective development of industry best practices for IIoT security postures, the O-PAS (Open Process Automation Standard) [22] is an industry consortium that is attempting to standardize the wide range of proprietary IIoT environments into a set of open and collaborative standards around communication protocols and security postures, with the goal of increased efficiency from multivendor interoperability. By achieving broad consensus within the open industry consortium, a particular IIoT implementation can be designed to meet the O-PAS standard, which allows for a standardized methods of threat detection, health monitoring, and the use of standardized security methods. By designing a CPS to support open standards, significant benefits can be gained through the accelerated product development lifecycle thanks to the reliance on pre-existing standards, as well as ongoing efficiencies throughout the lifecycle of the IIoT. The use of IDS/IPS for anomaly detection is well established in IT networks, but still suffers from excessive false positives in OT networks, which hinders their adoption in IIoT. The use of machine learning models for anomaly detection shows promise, but also suffers from excessive false positives, due to the challenges of precisely defining abnormal behaviour. Additionally, challenges in obtaining sufficiently representative data for the learning model can make the detection algorithm opaque and unpredictable to the human operators of the IIoT, resulting in low confidence in the IDS alerts.
4 Conclusions The maturity of IDS/IPS in IIoT environments is still a field in a rapid state of development, due to multiple confounding factors, the most significant of which are outlined below.
46
N. Jeffrey et al.
Many IIoT environments grew out of legacy SCADA and ICS networks, which assumed operation in an isolated and fully trusted network. The current reality of ubiquitous connectivity to increasingly hostile networks is only grudgingly accepted by many IIoT network operators, who still see additional security requirements as an impediment to system availability. Due to their extreme diversity, there is no one-size-fits-all intrusion detection model that can be generalized. Many researchers have proposed a generic or universal framework for intrusion detection in IIoT environments, but as detection methods increase in generality to aid rapid test case development, they necessarily reduce in real-world fidelity, making them less representative of the live IIoT. The reverse is also true, as intrusion detection accuracy for a real-world IIoT increases, its generic applicability to other IIoT environments decreases. The optimal solution seems to be a low-level generic framework, with a modular architecture that allows plugins to be developed for the unique characteristics of a particular IIoT. Due to historical inertia from legacy ICS / SCADA systems, many IIoT environments are not built with observability as a design feature. Gathering telemetry data is frequently an afterthought, if it is considered at all. This lack of observability makes it challenging to detect anomalies, since a known good baseline dataset is often unavailable. This is particularly challenging for AI researchers, as the quality of training data for ML models is frequently lacking. Signature-based detection techniques such as traditional antivirus tools are less effective in OT networks than IT networks, due to the lack of standardization, which includes heterogeneous sensors, different actuators, and various communication protocols. Behaviour-based detection methods (heuristic analysis, AI/ML) are less effective because IIoT environments will often have highly variable and unpredictable behaviours such as power grid fluctuations, and can vary widely due to randomized events such as scheduled/unscheduled maintenance, weather events, etc. Threshold-based detection (i.e. power consumption outside of defined thresholds) are commonly used for physical fault detections, although less so for cyber intrusions. This is effectively a sub-set of behaviour-based analysis, with the system operator deciding what the acceptable ranges should be. Unfortunately, accurately determining the ocptimal threshold levels has proven elusive, especially because IIoT environments tend to be dynamic, with regular changes throughout the system life cycle. Future directions and opportunities for further research include embedding telemetry functionality at the design stage of the IIoT, which will provide higher quality baseline data, which is vital for AI-based models to accurately detect anomalous activity. For low-power sensor devices with constrained CPU/battery/bandwidth resources, embedding resource-intensive intrusion detection functionality may be impractical. A potential solution is moving IDS/IPS functionality up one level to the network layer (i.e. throttle access at network perimeter to prevent DDoS attacks, use behaviour-based traffic analysis on upstream firewall, etc.). There are recent enhancements in IDS/IPS (i.e. Cisco Talos cloud-based threat intelligence) designed for use in enterprise networks, that can be adapted to IIoT. These IDS/IPS leverage the use of network flow data gathered from the network edge, with the
Intrusion Detection and Prevention in Industrial Internet of Things
47
resource-intensive data mining and intrusion detection performed outside of the resourceconstrained IIoT. Industry practices appear to be evolving from the self-contained IDS/IPS appliances to SIEM (Security Information and Event Monitoring) products, which will typically use a centralized logging host to collect data and generate statistical models to classify traffic as benign or potentially malicious. Industry and academia are not entirely aligned in this regard, with Industry preferring the use of statistical models due to ease of data collection, while academia focuses on AI/ML models for anomaly detection. A particularly promising area of research is the development of a hybrid model for anomaly detection, which includes threshold-based anomaly detection for simple threats such as brute force password attacks or temperature extremes, along with signaturebased detection for known threats, plus behaviour-based detection for unknown threats. Behaviour-based detection is the most difficult to perform with high accuracy, but there are promising options for using AI/ML to increase accuracy over time. Acknowledgement. This research has been funded by the SUDOE Interreg Program -grant INUNDATIO-, by the Spanish Ministry of Economics and Industry, grant PID2020-112726RBI00, by the Spanish Research Agency (AEI, Spain) under grant agreement RED2018–102312T (IA-Biomed), and by the Ministry of Science and Innovation under CERVERA Excellence Network project CER-20211003 (IBERUS) and Missions Science and Innovation project MIG20211008 (INMERBOT). Also, by Principado de Asturias, grant SV-PA-21-AYUD/2021/50994.
References 1. Kagermann, H., Wahlster, W., Helbig, J.: Securing the future of german manufacturing industry: recommendations for implementing the strategic initiative industrie 4.0. Final Report of the Industrie 4.0 Working Group, Acatech— National Academy of Science and Engineering, p. 678 (2013) 2. Al-Hawawreh, M., Sitnikova, E.: Developing a security testbed for industrial internet of things. IEEE Internet Things J. 8(7), 5558–5573 (2021). https://doi.org/10.1109/JIOT.2020. 3032093 3. Wolf, M., Serpanos, D.: Safe and Secure Cyber-Physical Systems and Internet-of-Things Systems. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-03025808-5 4. Huang, L., Zhu, Q.: A dynamic games approach to proactive defense strategies against advanced persistent threats in cyber-physical systems. Comput. Secur. 89, 101660 (2020). https://doi.org/10.1016/j.cose.2019.101660 5. Seng, S., Garcia-Alfaro, J., Laarouchi, Y.: Why anomaly-based intrusion detection systems have not yet conquered the industrial market? In: Foundations and Practice of Security: 14th International Symposium, FPS 2021, Paris, France, December 7–10, pp. 341–354 (2021). https://doi.org/10.1007/978-3-031-08147-7_23 6. Khraisat, A., Alazab, A.: A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public datasets and challenges. Cybersecur 4, 18 (2021). https://doi.org/10.1186/s42400-021-00077-7 7. Vasan, D., Alazab, M., Venkatraman, S., Akram, J., Qin, Z.: MTHAEL: cross-architecture IoT malware detection based on neural network advanced ensemble learning. IEEE Trans. Comput. 69(11), 1654–1667 (2020). https://doi.org/10.1109/TC.2020.3015584
48
N. Jeffrey et al.
8. Abid, A., Jemili, F., Korbaa, O.: Distributed architecture of an intrusion detection system in industrial control systems. In: ICCCI 2022: Communications in Computer and Information Science, vol. 1653. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-16210-7_39 9. Bai, Y., Park, J., Tehranipoor, M.: Real-time instruction-level verification of remote IoT/CPS devices via side channels. Discov. Internet Things 2, 1 (2022). https://doi.org/10.1007/s43 926-022-00021-2 10. Chavez, A., et al.: Hybrid intrusion detection system design for distributed energy resource systems. IEEE CyberPELS (CyberPELS) Knoxville, TN, USA 2019, 1–6 (2019). https://doi. org/10.1109/CyberPELS.2019.8925064 11. Haoran, Gu., et al.: DEIDS: a novel intrusion detection system for industrial control systems. Neural Comput. Appl. 34(12), 9793–9811 (2022). https://doi.org/10.1007/s00521-022-069 65-4 12. Rakas, S.V.B., Stojanovic, M.D., Markovic-Petrovic, J.D.: A review of research work on network-based SCADA intrusion detection systems. IEEE Access 8, 93083–93108 (2020). https://doi.org/10.1109/ACCESS.2020.2994961 13. Ravikumar, G., Singh, A., Babu, J.R., Moataz, A., Govindarasu, M.: D-IDS for cyber-physical DER modbus system - architecture, modeling, testbed-based evaluation. In: 2020 Resilience Week (RWS), Salt Lake City, ID, USA, Oct. 2020, pp. 153–159 (2020). https://doi.org/10. 1109/RWS50334.2020.9241259 14. Hwang, C., Lee, T.: E-SFD: explainable sensor fault detection in the ICS anomaly detection system. IEEE Access 9, 140470–140486 (2021). https://doi.org/10.1109/ACCESS.2021.311 9573 15. Ashibani, Y., Mahmoud, Q.H.: Cyber physical systems security: analysis, challenges and solutions. Comput. Secur. 68, 81–97 (2017). https://doi.org/10.1016/j.cose.2017.04.005 16. Yaacoub, J.-P.A., Salman, O., Noura, H.N., Kaaniche, N., Chehab, A., Malli, M.: Cyberphysical systems security: limitations, issues and future trends. Microprocess. Microsyst. 77, 103201 (2020). https://doi.org/10.1016/j.micpro.2020.103201 17. Qassim, Q.S., Jamil, N., Mahdi, M.N., Abdul Rahim, A.A.: Towards SCADA threat intelligence based on intrusion detection systems - a short review. In: 2020 8th International Conference on Information Technology and Multimedia (ICIMU), Selangor, Malaysia, Aug. 2020, pp. 144–149 (2020). https://doi.org/10.1109/ICIMU49871.2020.9243337 18. Kandasamy, K., Srinivas, S., Achuthan, K., Rangan, V.P.: IoT cyber risk: a holistic analysis of cyber risk assessment frameworks, risk vectors, and risk ranking process. EURASIP J. Info. Secur. 2020(1), 1–18 (2020). https://doi.org/10.1186/s13635-020-00111-0 19. Darabseh, A., Freris, N.M.: A software-defined architecture for control of IoT cyberphysical systems. Cluster Comput. 22(4), 1107–1122 (2019). https://doi.org/10.1007/s10586-018-028 89-8 20. Kabore, R., Kouassi, A., N’goran, R., Asseu, O., Kermarrec, Y., Lenca, P.: Review of anomaly detection systems in industrial control systems using deep feature learning approach. ENG 13(01), 30–44 (2021). https://doi.org/10.4236/eng.2021.131003 21. Sgueglia, A., Di Sorbo, A., Visaggio, C.A., Canfora, G.: A systematic literature review of IoT time series anomaly detection solutions. Futur. Gener. Comput. Syst. 134, 170–186 (2022). https://doi.org/10.1016/j.future.2022.04.005 22. Bartusiak, R.D., et al.: Open Process Automation: a standards-based, open, secure, interoperable process control architecture. Control. Eng. Pract. 121, 105034 (2022). https://doi.org/ 10.1016/j.conengprac.2021.105034
A Novel Method for Failure Detection Based on Real-Time Systems Identification ´ Alvaro Michelena1(B) , Antonio D´ıaz-Longueira1 , M´ıriam Timiraos1,2 , ´ Fontenla Romero3 , and Jos´e Luis Calvo-Rolle1 H´ector Quinti´ an1 , Oscar 1
2
Department of Industrial Engineering, University of A Coru˜ na, CTC, CITIC, Calle Mendiz´ abal s/n, 15403 Ferrol, A Coru˜ na, Spain {alvaro.michelena,a.diazl,miriam.timiraos.diaz, hector.quintian,jlcalvo}@udc.es Fundaci´ on Instituto Tecnol´ ogico de Galicia, Department of Water Technologies, National Technological Center, Cant´ on Grande 9, Planta 3, C.P. 15003 A Coru˜ na, Spain [email protected] 3 Faculty of Computer Science, University of A Coru˜ na, LIDIA, Campus de Elvi˜ na, s/n, 15071 A Coru˜ na, Spain [email protected]
Abstract. The current state of climatic emergency that we are suffering, combined with the increase in population, is causing water availability to decrease. Therefore, the water distribution and management systems in cities and industrial areas correspond to a critical system. It is, therefore, essential to ensure the robust operation of these kind of systems. In this context, this paper presents a novel proposal for detecting realtime malfunctions in water resource management systems. The proposed method is based on combining the Recursive Least Squares method and detecting hyperplanes using regression methods. The proposal has been validated using a real dataset, and the results obtained have reached 80% F1-score. Keywords: Anomaly detection identification · Hyperplanes
1
· Regression hybrid model · Online
Introduction
Climate change is responsible for various water-related problems, such as scarcity, poor quality, floods, and droughts [2]. Water is a strategic resource for socioeconomic development and environmental protection [1]. The impacts of climate change may be more severe in regions with low levels of water resources and suffer from frequent droughts, generating imbalances between water demand and availability [4]. In the European territory, Spain is highly vulnerable to possible climate changes due to the spatial and temporal irregularity of water resources, the high c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 49–58, 2023. https://doi.org/10.1007/978-3-031-42519-6_5
50
´ Michelena et al. A.
degree of water use, and the associated socioeconomic impacts [3]. Regional climate models indicate temperature increases and decreases in precipitation, resulting in a marked decrease in water resources [5]. For all these reasons, and due to their great importance, water management and distribution systems are considered critical systems. Any kind of supply problem entails serious consequences such as public health problems, among many others. Likewise, water is an indispensable resource in the supply chain of many production processes, so a cut in its supply could also cause significant economic losses. Therefore, it is essential to ensure high reliability and robustness in the operation of these systems. However, to achieve robust systems, it is not only necessary to have adequate infrastructures, but it is also necessary to implement anomaly detection measures to avoid possible system failures or even cyber-attacks. Thus, this research project proposes a new system for detecting anomalies and malfunctions in water distribution systems. The proposed method combines a real-time identification algorithm (Recursive Least Squares) with hyperplane detection from the application of the linear regression method. The present paper is structured as follows: following the introduction, Sect. 2 provides an overview of the case study. Subsequently, Sect. 3 introduces the suggested method for detecting anomalies. Section 4 outlines the conducted experiments and their corresponding results. Lastly, Sect. 5 encompasses the conclusions drawn from the study and presents potential avenues for future research.
2
Case Study
This section presents the water management and distribution model on which we have worked and the dataset used for the experiments and validation of the anomaly detection system. 2.1
Water Management and Distribution Mock-Up Used
The main goal of this study is to assess the efficacy of the proposed fault detection system in an actual scenario. The level control plant, situated in the Control Laboratory of the Faculty of Engineering at the University of A Coru˜ na, serves as an industrial prototype. This system mimics the tank filling control system, which is commonly found in water management and distribution systems. Figure 1 illustrates the physical representation of the mock-up. The plant consists of two tanks positioned at different heights. The upper tank (1) is responsible for regulating the filling level. To achieve this, the system utilizes water stored in the lower tank (2), connected to a centrifugal pump (linked to a 0.8 HP three-phase motor) (3) via a pipe. The pump propels the water to the upper tank, and subsequently, the water is discharged from the upper tank to the storage tank through two pipes, thereby completing the water operation cycle.
A Novel Method for Failure Detection Based on Real-Time Systems
51
Fig. 1. Level control plant
To monitor and control the entire process, the system incorporates various sensors and actuators. Firstly, a Schneider Altivar 31 variable frequency drive, (4), is employed to control the three-phase motor associated with the centrifugal pump. This drive provides a 0/10 V analog input, enabling regulation of the motor’s rotational speed and, consequently, the water flow rate. Secondly, an ultrasonic sensor Banner S18UUA, (5), is positioned at the top of the tank to measure the water level in the filling tank. This sensor produces a continuous 0/10 V output signal. Additionally, the plant includes two valves, one manual (7) and the other electric (6), which are controlled by a 0/10 V signal to regulate the water discharge flow to the lower tank. Furthermore, like any other industrial system, the control plant incorporates several safety features. These include an emergency stop button that halts the system completely when activated and safety buoys that prevent water overflow from either of the tanks and safeguard against the water pump running dry. The control system employed is a virtual PID controller that obtains real-time data from the plant via a data acquisition card. The set point signal corresponds to the desired liquid level, while the process value indicates the actual liquid level within the tank. The computer generates the control signal, representing the centrifugal pump speed. Figure 2 illustrates the configuration of this singleinput single-output (SISO) system, where the pumping speed is regulated to attain the desired liquid level in the tank. A National Instruments data acquisition card (model USB-6008), (8), was used to establish the connection between the plant and the computer.
52
´ Michelena et al. A.
Fig. 2. Control loop scheme
2.2
Dataset
The dataset used in this study was obtained by recording the control signal, set point, and process value during 35 min of regular plant operation. An adaptive PID controller based on the Dahlin PID was implemented to collect the data with the manually piloted valve fully open. The sampling time was set at 0.5 s, resulting in 4200 samples collected across different operating points ranging from 35% to 85% in increments of 10%. The range limitation was imposed due to the plant’s design, as it does not perform optimally below or above these percentage values. The 10% increment was chosen as it was determined that smaller increments did not significantly affect the system dynamics. The recorded data represents only the plant’s correct operation, while this work aims to detect anomalies. Consequently, 30 anomalies were introduced by modifying the process value signal, deviating it by a random percentage between 4% and 10% of the total filling. It should be noted that this percentage of deviation used was selected because it was found to be the usual range in which the signals fluctuated when it produced a malfunction. When using the control loop, an increase or decrease in the signal measured by the level sensor also affects the control signal generated by the control algorithm. Therefore, to emulate the entire system operation most accurately, the control signal was modified in the same samples but inversely to the modification generated in the fill level signal. In addition, since changes in the control signal have a more significant impact on the system output, the modification in the control signal was set to half of the percentage variation in the process value signal. These anomalies were designed to emulate minor signal interferences or issues such as valve obstruction, leaks, etc. The sensor output signal was also modified to simulate measurement failures of the ultrasonic sensor, taking on values ranging from 0% to 100%.
3
The Proposed Method
The primary objective of this research is to create an innovative anomaly detection system utilizing online identification algorithms. Broadly speaking, the system employs the Recursive Least Square online identification algorithm to detect the dynamic of the system at a given time. Then, the detected dynamic is checked
A Novel Method for Failure Detection Based on Real-Time Systems
53
to determine if it belongs to the obtained hyperplane that reflects the system operation for that setpoint level. An anomaly or malfunction is detected if the detected dynamics do not belong to the hyperplane. Figure 3 shows a scheme of the proposed method where SP is the desired fill level (measured as a percentage) selected by the user, e(k) corresponds to the error signal, u(k) is the control signal, y(k) is the level measured in the upper tank and a0 , a1 and b0 are the transfer function parameters identified by the RLS algorithm.
Fig. 3. Proposal scheme
3.1
Online Identification Process. Recursive Least Square (RLS)
The Recursive Least Square (RLS) method is widely used in online identification due to its simplicity and efficiency. Its main objective is to determine the transfer function parameters, represented by the θ vector, that minimize prediction errors and establish the strongest correlation between input and output signals of the system [1]. What makes RLS particularly advantageous is its ability to run on various devices with limited computational capabilities [7]. Consequently, the RLS method can be seamlessly integrated into multiple applications and systems. In this research, the tank fill level plant is recognized as a second-order transfer function with time delay (k=1), as it tends to be the most suitable transfer function for many systems (Eq. 1) [8]. Therefore, the RLS algorithm is employed to extract the parameters a0 , a1 and b0 from the input and output signals of the system. Gp z −1 =
b0 z −k 1 − a0 z −1 − a1 z −2
(1)
It is worth emphasizing that RLS is a recursive method that incorporates an exponential forgetting factor, denoted as λ, typically falling within the range
54
´ Michelena et al. A.
of λ ∈ [0.8, 1] [9]. The inclusion of this factor is crucial for achieving optimal performance of the algorithm. The tuning of this parameter plays a vital role as it directly influences the identification process. When λ takes on a lower value, the algorithm exhibits less memory and higher sensitivity, which may lead to errors in the identified parameters due to increased susceptibility to system noise. Conversely, higher values close to 1 reduce the sensitivity of the RLS algorithm by enhancing its memory capacity. Furthermore, lower values enhance the algorithm’s robustness against system noise. 3.2
Hyperplane Fault Detection Module
The fault detection system is also based on two differentiated blocks. The first block contains the hyperplanes associated with each setpoint level, and the second block is a decision block used to check whether or not the dynamics detected by the identification method (represented by the point formed by the transfer function coefficients) belong to the hyperplane associated to that setpoint value. 3.3
Hyperplane Module
The hyperplane module is based on a hybrid structure, i.e., there is not a single hyperplane that serves for any set point, but different models are used, each of which is associated with a specific fill value. Figure 4 shows an example of the hybrid structure used in our system, where the hyperplane selector is used to know which model to use depending on the working filling level.
Fig. 4. Hybrid proposal structure
For each model, the linear regression method [6] will be used to obtain the hyperplane that best fits the system’s dynamics under study for a given filling value.
A Novel Method for Failure Detection Based on Real-Time Systems
55
It should be noted that the RLS method will return three values: b0 , associated with the numerator, and a0 and a1 , which are the factors associated with the denominator. In this way, the regression model can have different configurations depending on the parameters chosen as input and output. In general terms, the model will have to predict one of the three weights of the transfer function from the other two. This research will test all possible configurations to obtain the best results. Figure 5 shows the three different configurations that will be populated.
Fig. 5. Possible configurations
In general terms, the operation of this block would be to obtain each of the hyperplanes associated with the dynamics of the system for each of the filling levels from the data generated by the RLS algorithm. An example of the operation of this block for different filling levels is shown in Fig. 6.
Fig. 6. Diagram of the obtaining of each hyperplane
3.4
Decision Module
The decision module will be used to check whether or not the point identified by the RLS method belongs to the hyperplane that captures the system’s dynamics at the indicated filling level. For this purpose, this block will compare the output predicted by the generated model (hyperplane) with the value determined by the RLS real-time identification algorithm. The data is classified as anomalous if the difference between the two values exceeds a certain level.
56
´ Michelena et al. A.
It is essential to underline that the threshold values selected are not arbitrary but have been calibrated based on the maximum distances measured between the hyperplanes derived from the usual operating dataset and the furthest points associated with each hyperplane. Figure 7 shows an example in three-dimensional space of how anomalous data are detected.
Fig. 7. An example of anomaly detection
4
Experimental Setup and Results
To validate the proposed anomaly detection method, multiple tests were conducted. Initially, it was necessary to determine the optimal values for RLS forgetting factor and the distance threshold. The threshold value was determined by analyzing the original dataset without any added anomalies. For this purpose, it was necessary to train the regression models for each filling level and then measure the maximum distance between identified data points and their associated hyperplanes. Through this process, the maximum distances were less than 0.01, depending on the operational point chosen and the input/output model configuration. Subsequently, for tuning the RLS algorithm’s forgetting factor, different values were tested within the typical range of use, which, as already discussed in Sect. 3.1, is λ ∈ [0.8, 1]. The system was executed using a dataset containing anomalies for various parameter values mentioned earlier. Since anomaly detection is the objective, the F1-score metric was used to evaluate the results. The F1-score represents the harmonic mean of precision and recall. A value of 1 for the F1-score indicates that all anomalies were detected without classifying any normal data points as anomalies. The three possible configurations of the regression model have been tested to compare the results obtained and check which gives the best performance. Table 1 shows the results obtained for each of the configurations. As seen in Table 1, the method can obtain a good performance, with F1score values higher than 0.8 with different configurations. However, the best
A Novel Method for Failure Detection Based on Real-Time Systems
57
Table 1. Results obtained Model Configuration Threshold F. factor Recall Precision F1-score Inputs Output a0, a1 b0
0.002 0.004 0.006
b0, a0 a1
0.002 0.004 0.006
b0, a1 a0
0.002 0.004 0.006
0.95 0.98 0.95 0.98 0.95 0.98
0.568 0.885 0.920 0.950 0.952 0.944
0.833 0.767 0.767 0.633 0.667 0.567
0.676 0.821 0.836 0.760 0.784 0.708
0.95 0.98 0.95 0.98 0.95 0.98
0.639 0.840 0.880 0.857 0.857 0.882
0.767 0.700 0.733 0.600 0.600 0.500
0.697 0.764 0.800 0.706 0.706 0.638
0.95 0.98 0.95 0.98 0.95 0.98
0.793 0.880 0.958 0.900 0.957 0.842
0.767 0.733 0.767 0.600 0.733 0.533
0.780 0.800 0.852 0.720 0.830 0.653
result is obtained for the model configuration that predicts the parameter a0 of the transfer function, a forgetting factor of the RLS algorithm of 0.95, and a decision threshold of 0.004. With this configuration, a 0.852 F1-score is achieved. The three possible configurations of the regression model have been tested to compare the results obtained and check which gives the best performance.
5
Conclusions and Future Work
This article presents a new method for anomaly detection in water management and distribution systems. The proposed method combines the Recursive Least Squares (RLS) real-time identification algorithm with hyperplane detection based on regression methods. The proposal has been tested on a real system, in this case, the laboratory filling level control plant. The results obtained have reached more than 80% F1-score in different configurations of the proposal, indicating that this method can detect many of the anomalies in this type of system. However, a precise adjustment of several parameters is required for optimal proposal performance, which can be somewhat complex. In addition, a previous analysis is required to obtain the hyperplanes.
58
´ Michelena et al. A.
In future work, the suggested approach will be tested using a dataset that collects a more significant number of anomalies and a larger number of samples. It will also be implemented in additional laboratory facilities, including temperature control plants. Furthermore, it is crucial to perform a comparative analysis between the proposed method and other fault detection techniques, such as one-class or hybrid techniques, among several options. ´ Acknowledgement. Alvaro Michelena’s research was supported by the Spanish Ministry of Universities (https://www.universidades.gob.es/), under the “Formaci´ on de Profesorado Universitario” grant with reference FPU21/00932. M´ıriam Timiraos’s research was supported by the “Xunta de Galicia” (Regional Government of Galicia) through grants to industrial PhD (http:// gain.xunta.gal/), under the “Doutoramento Industrial 2022” grant with reference: 04 IN606D 2022 2692965. CITIC, as a Research Center of the University System of Galicia, is funded by Conseller´ıa de Educaci´ on, Universidade e Formaci´ on Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF) and the Secretar´ıa Xeral de Universidades (Ref. ED431G 2019/01).
References 1. Arnell, N.W.: Climate change and global water resources: SRES emissions and socioeconomic scenarios. Global Environ. Change 14(1), 31–52 (2004) 2. Brouwer, F., Falkenmark, M.: Climate-induced water availability changes in Europe. Environ. Monit. Assess. 13, 75–98 (1989) 3. Brunet, M., et al.: Generaci´ on de escenarios regionalizados de cambio clim´ atico para espa˜ na (2009) 4. CEDEX: Evaluaci´ on del impacto del cambio clim´ atico en los recursos h´ıdricos en r´egimen natural (2011) 5. Commission, E.E., et al.: Adapting to climate change: towards a European framework for action. White Paper 147 (2009) 6. Groß, J.: Linear regression, vol. 175. Springer Science & Business Media (2003) 7. Islam, S.A.U., Bernstein, D.S.: Recursive least squares for real-time implementation [lecture notes]. IEEE Control Syst. Mag. 39(3), 82–85 (2019) 8. Jove, E., Casteleiro-Roca, J.L., Quinti´ an, H., Simi´c, D., M´endez-P´erez, J.A., Luis Calvo-Rolle, J.: Anomaly detection based on one-class intelligent techniques over a control level plant. Logic J. IGPL 28(4), 502–518 (2020) 9. Zhang, H., Gong, S.J., Dong, Z.Z.: On-line parameter identification of induction motor based on RLS algorithm. In: 2013 International Conference on Electrical Machines and Systems (ICEMS), pp. 2132–2137. IEEE (2013)
Systematic Literature Review of Methods Used for SQL Injection Detection Based on Intelligent Algorithms Juan Jos´e Navarro-C´aceres(B) , Ignacio Samuel Crespo-Mart´ınez, ´ Adri´ an Campazas-Vega, and Angel Manuel Guerrero-Higueras Universidad de Le´ on, Campus de Vegazana s/n, 24071 Le´ on, Spain [email protected], {icrem,acamv,am.guerrero}@unileon.es
Abstract. In this work, a systematic literature review of methods used for SQL injection detection based on intelligent algorithms, especially deep learning or machine learning, is carried out. We identify the main methods used, the algorithms that have been tested and the main techniques used to process the data. In order to make the study, a quantitative review has been performed following the SLR methodology which includes identification of the need for the review, definition of the search strategy, search and selection of articles, and data extraction and analysis. This work aims to prove if it is possible to apply intelligent algorithms in this field and to collect the methods used so far. Keywords: SQL Injection · SQL Injection Detection Learning · Machine Learning
1
· Deep
Introduction
SQL injection attacks are one of the most frequent attacks on web applications[1]. Over the years, various techniques and tools have been proposed to prevent and detect this attack. Recently, methods based on artificial intelligence, such as deep learning and machine learning, have been successfully used to identify them[2]. In this literature review, we explore the application of artificial intelligencebased techniques to detect SQL injections, analyse the main methods that have been used, identify the best algorithms used and define the main techniques used for data preprocessing. Overall, this literature review provides an overview of the current state of the art using deep learning and machine learning techniques. This work is relevant because there is no review of SQL injection detection based on deep learning and machine learning simultaneously. However, there are some reviews on machine learning methods [3], deep learning methods [4], or SQL injection attacks as a general topic[5]. Also, we have tried to choose articles whose dataset is composed of flow data. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 59–68, 2023. https://doi.org/10.1007/978-3-031-42519-6_6
60
J. J. Navarro-C´ aceres et al.
The article is divided into three sections. First, the experimental section details the steps to collect and analyze data. Then, in the results and discussion section, we expose the results of the research conducted and its interpretation. And finally, we present the conclusions of the review.
2
Experimental Section
This section details the steps involved in the research. In order to conduct the review, a research protocol has been developed following the systematic literature review (SLR) methodology [6], which includes identification of the need for the review, definition of the search strategy, search and selection of articles, and data extraction and analysis. 2.1
Identification of the Need
The motivation for this review is the lack of reviews about SQL injection detection using algorithms based on deep learning and machine learning simultaneously. Therefore, the main objective of this review is to perform this type of review. Also, we have defined some specific objectives: – OBJ 1. Identify the main methods based on intelligent algorithms to detect this type of attack. – OBJ 2. Identify the best algorithms that have been tested based on the results obtained. – OBJ 3. Define the main techniques used for data processing, prior to training. Then, we elaborated on the research questions, which will guide the conduct of the remaining process. The research questions are shown in Table 1 Table 1. Research questions Research Question
Related Objective
P1. What are the main techniques used to perform data preprocessing? OBJ 3. P2. What are the main deep learning and machine learning-based algorithms and techniques used? OBJ 1. OBJ 2. P3. Which algorithms have provided the best results?
2.2
Article Search
In order to perform the review, we must start from a series of original articles related to SQL injection detection techniques based on deep learning or machine learning. To perform this extraction of articles, we must decide on a database from which we can extract the desired articles. There are several databases like Google Scholar, Scopus or Web of Science. In this article, we have used Scopus and Google Scholar to search and select articles.
SQL Injection Review
61
Then, a keyword-based search was carried out to look for articles related to the subject. The words used were: SQL Injection attack detection, SQL Injection detection, SQL Injection detection machine learning, and SQL Injection detection deep learning. A total of 22 articles were selected according to these keywords. From those 22 articles, filtering was performed using different inclusion and exclusion criteria. Those criteria are shown in Table 2. As a result, six research articles were obtained which are shown in Table 3. Table 2. Inclusion and exclusion criteria ID
Inclusion criteria (CI)/Exclusion criteria (CE)
CI 1 CE 1 CE 2 CE 3
Research dedicated to detecting SQL attacks Articles before 2014 Articles on SQL attack detection that do not use AI-based algorithms Articles from low impact journals
Table 3. Final articles ID
Final articles
A1 A2 A3 A4 A5 A6
SQL injection detection using machine learning [7] Lstm-based SQL injection detection method for intelligent transportation system [8] Detection of SQL injection attacks: A machine learning approach [9] SQL injection attack detection and prevention techniques using deep learning [10] SQL Injection attack detection in network flow data [11] Detection of SQL injection based on artificial neural network [12]
2.3
Quality Assessment
Then, we have done a quality assessment based on some criteria, which were extracted from [13]. After evaluating each article, it was found that all articles passed this quality assessment and were relevant to continue the research. We have considered the article to pass the quality test if it has a ‘Yes’ score on at least 10 criteria. The criteria and results are shown in Table 4.
62
J. J. Navarro-C´ aceres et al. Table 4. Quality criteria
Quality criteria
Score (Yes/No/ Partial)
1. Are the research aims clearly specified? 2. Was the study designed to achieve these aims? 3. Are the techniques used clearly described, and is their selection justified? 4. Are the variables considered by the study suitably measured? 5. Are the data collection methods adequately described? 6. Is the data collected adequately described? 7. Is the purpose of the data analysis clear? 8. Are statistical techniques used to analyze data adequately described and their use justified? 9. Are negative results (if any) presented? 10. Do the researchers discuss any problems with the validity/reliability of their results? 11. Are all research questions answered adequately? 12. How clear are the links between data, interpretation and conclusions? 13. Are the findings based on multiple projects?
Y Y P Y Y Y Y Y P Y Y Y Y
2.4
Data Analysis
After obtaining the articles to perform the research, different aspects of each article will be analyzed. First, we analyze the different techniques used for data preprocessing. These techniques are used to clean and prepare the input data before it is used to train the model. The aim is to improve the data quality and make it more suitable for training the model. This data preprocessing can involve several techniques, such as normalization or feature selection. Then the different techniques and algorithms used are analyzed. For example, in the case of machine learning, some of the most common algorithms are linear regression (LR), k-nearest neighbours (K-NN), Support Vector Machines (SVM), Random Forests (RF) and the ensemble learning algorithms like majority voting (VC), Bagging Trees or Boost Trees. On the other hand, in the case of deep learning, the most common techniques are convolutional neural networks (CNN), recurrent neural networks (RNN) and multilayer networks (MLP). Finally, the results obtained with each algorithm are analyzed using different evaluation metrics. Evaluation metrics are used for measuring the effectiveness of the deep learning and machine learning models built. These metrics make it possible to compare different models and determine which one is the most suitable and the most effective. In this work, the metrics that will be analyzed are accuracy, precision, recall and F1-score.
3
Results and Discussion
In this section, we will present the results obtained from the analysis of each article, answering the research questions.
SQL Injection Review
3.1
63
Data Preprocessing
The following paragraphs describe the data preprocessing methods used in each article to answer the research question P1. In A1, the preprocessing has two phases: feature extraction and then a tokenization process. First, feature extraction is performed by separating the elements of the SQL statement concerning the blank characters and counting the occurrences of each element. Then the tokenization consists of classifying each of these occurrences into different classes. For instance, the word “select” is classified as a keyword, the word “or” as a logical operator or the character “;” as a punctuation mark. In A2, the preprocessing consists of generalization, segmentation and word vector generation. Two methods are used to generate the word vector: word2vec and Bag of Words (BoW). In A3, they used a feature extraction technique using the feature extractor provided by the MATLAB tool. They extract as features the presence or not of comments in the request, the number of semicolons, the presence of conditions always true, the number of commands per SQL statement, the presence of uncommon commands such as “drop” or “grant” requests, and the presence of special keywords, such as to send a SQL version query. In A4, the first step was to clean the data. For this purpose, a decoding of the HTTP messages was carried out first. This decoding refers to message transformation processes in case they contain data in base64, encoded in PHP, URL encoded, etc. Then a data generalization is performed to reduce special characters’ influence on the dataset information. In this case, they have replaced the numbers with 0, the URL with http://u and separated the message into words. In A5, preprocessing consists of three phases: data cleaning, parameter or dimensional reduction and data standardization. First, in the first phase, they convert the IP characteristic from a character string to a numeric value and check if any columns containing empty data are performed. Then, in the dimensional reduction, they removed the features with a variance with a value of 0 or that do not include relevant features for training, such as the date of emission of the frame. Finally, the parameters are standardized using Min-Max scaling. The dataset of this article is composed of flow data. In A6, they do a feature extraction. For each of the requests, they extract information about the length, the number of keywords (such as select, union, insert, delete, and, or, etc.), the sum of the keyword weights (each keyword is previously assigned a weight), the number of space characters, the number of special characters, the proportion of special characters in the request and the proportion of the remaining characters in the request, excluding blank characters. Then, they do a standardization of the data is performed using Min-Max scaling or mapping of the characters, converting them into a numerical matrix following the ASCII code. The choice depends on the training algorithm used. After analyzing the methods used in the selected articles, Fig. 1 shows a graph summarizing the techniques used.
64
J. J. Navarro-C´ aceres et al.
Fig. 1. Preprocessing techniques
Therefore, answering the first research question, the most commonly used preprocessing techniques are data normalization, such as Min-Max scaling, and feature extraction, like parameter reduction. 3.2
Algorithms Used
To answer the research question P2, we have classified the articles based on the principal topic: deep learning or machine learning. As a result of this analysis, we have obtained that one-half of the articles are about deep learning and the other half about machine learning. The specific algorithms used in each article are described in the following paragraph. In A1, they used the Na¨ıve-Bayes (NB) algorithm, which belongs to the machine learning branch. In A2, a deep learning algorithm has been used, which is Long short-term memory (LSTM). In A3, machine learning algorithms have been used. The algorithms used have been selected from those provided by MATLAB by default. They have selected: Ensemble Boosted Trees, Ensemble Bagged Trees, Linear Discriminant, Cubic SVM and Fine Gaussian SVM. In A4, deep learning algorithms have been used. Specifically, they have used CNN and MLP neural networks. In A5 they have used machine learning algorithms, which were LR, Perceptron + Stochastic Gradient Descend (SGR), KNN, RF, Linear Support Vector Classification (LSVC) and VC. In A6, the deep learning algorithms used were MLP and LSTM networks. In this case, each one received a different preprocessing, using the Min-Max standardization for the MLP algorithm and the transformation to word vectors for the LSTM algorithm. Figure 2 and Fig. 3 show the different algorithms used by the selected articles and the proportion of each of them. In the case of machine learning, we have decided to group the algorithms by set algorithms, SVM algorithms, tree algorithms, Bayesian classifiers, regressor algorithms and other algorithms. Therefore, answering the second research question, the most commonly used algorithms for SQL attack detection have been MLP and LSTM in the case of Deep Learning based research and ensemble and SVM algorithms in the case of machine learning.
SQL Injection Review
65
Fig. 2. Algorithms based on Deep Learning.
Fig. 3. Algorithms based on Machine Learning.
3.3
Best Algorithms
To answer the research question P3, the results obtained with the algorithms used in each article are shown, according to the metrics defined in Sect. 2.4. In A1, the results obtained for the algorithm used are shown in Table 5. Table 5. Results in A1 Algorithm Accuracy Precision Recall F1-score NB
0.933
1.0
0.89
0.942
In A2, the results obtained for the algorithm used are shown in Table 6. Table 6. Results in A2 Algorithm Generacion de vector Accuracy Precision Recall F1-score LSTM
word2vec BoW
0.9347 0.9193
0.9356 0.9067
0.9243 0.9299 0.9291 0.9178
66
J. J. Navarro-C´ aceres et al.
In A3, the results obtained for the algorithm used are shown in Table 7. Table 7. Results in A3 Algorithm
Accuracy Precision Recall F1-score
Ensemble Boosted Trees Ensemble Bagged Trees Linear Discriminant Cubic SVM Fine Gaussian SVM
0.938 0.938 0.937 0.937 0.935
0.641 0.641 0.621 0.631 0.612
0.985 0.985 1.0 0.985 1.0
0.777 0.777 0.766 0.769 0.759
In A4, the results obtained for the algorithm used are shown in Table 8. Table 8. Results in A4 Algorithm Accuracy Precision Recall F1-score CNN MLP
0.9825 0.9858
0.9747 0.9795
0.9908 0.9826 0.9923 0.9858
In A5, the results obtained for the algorithm used are shown in Table 9. Table 9. Results in A5 Algorithm
Accuracy Precision Recall F1-score
LR 0.973 Perceptron +SGD 0.963 0.856 VC 0.840 RF 0.834 LSVC 0.718 KNN
0.973 0.963 0.848 0.832 0.824 0.672
0.973 0.963 0.856 0.840 0.834 0.718
0.973 0.963 0.852 0.836 0.829 0.694
In A6, the results obtained for the algorithm used are shown in Table 10. Table 10. Results A6 Algorithm
Accuracy Precision Recall F1-score
MLP (1 layer) MLP (2 layers) MLP (3 layers) LSTM
0.9951 0.9936 0.9967 0.9768
0.9997 1.0 1.0 0.9986
0.9905 0.9872 0.9941 0.9549
0.9951 0.9936 0.9970 0.9763
SQL Injection Review
67
Therefore, the machine learning algorithms that achieved the best results were the LR and Perceptron+SGD algorithms in A5 and NB in A1. On the other hand, the deep learning algorithms that obtained the best results were MLP networks in A6 and A4. Furthermore, according to these results, we can say that the deep learning algorithms obtained better results than those based on machine learning, as shown in Fig. 4.
Fig. 4. F1-score of the best algorithms.
3.4
Limitations
Finally, the major limitation of this work is the number of articles used chosen. Therefore, the amount of research selected needs to be more significant to make generalizations in the conclusions. However, it does demonstrate certain outcomes on a reduced scale, like the distinct enhancement in outcomes with the application of deep learning methodologies as opposed to machine learning.
4
Conclusions
In conclusion, SQL injection detection using deep learning and machine learning techniques has been the subject of research in recent years, caused by the rise of artificial intelligence. Different algorithms and methods have been used to face this problem, and it has been shown that these techniques can obtain effective and accurate results in SQL injection detection. However, there are still limitations in this field, such as the need for higher quality datasets, as it has been observed that each research uses very different data collection and preprocessing techniques. Overall, using these technologies to detect SQL injections is a promising area of research, and significant advances are expected soon. In this study, we have collected the techniques and algorithms used so far.
References 1. OWASP. Top ten most critical web application vulnerabilities (2005). https:// owasp.org/www-project-top-ten/
68
J. J. Navarro-C´ aceres et al.
2. Chandrashekhar, R., Mardithaya, M., Thilagam, S., Saha, D.: SQL injection attack mechanisms and prevention techniques. In: Thilagam, P.S., Pais, A.R., Chandrasekaran, K., Balakrishnan, N. (eds.) ADCONS 2011. LNCS, vol. 7135, pp. 524–533. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-292804 61 3. Alghawazi, M., Alghazzawi, D., Alarifi, S.: Detection of SQL injection attack using machine learning techniques: a systematic literature review. J. Cybersecur. Priv. 2(4), 764–777 (2022) 4. Muslihi, M.T., Alghazzawi, D.: Detecting SQL injection on web application using deep learning techniques: a systematic literature review. In: 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE), pp. 1–6. IEEE (2020) 5. Lawal, M., Sultan, A.B.M., Shakiru, A.O.: Systematic literature review on SQL injection attack. Int. J. Soft Comput. 11(1), 26–35 (2016) 6. Codina, L.: Revisiones bibliogr´ aficas sistematizadas: procedimientos generales y framework para ciencias humanas y sociales (2018) 7. Joshi, A., Geetha, V.: SQL injection detection using machine learning. In: 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 1111–1115 (2014) 8. Li, Q., Wang, F., Wang, J., Li, W.: LSTM-based SQL injection detection method for intelligent transportation system. IEEE Trans. Veh. Technol. 68(5), 4182–4191 (2019) 9. Hasan, M., Balbahaith, Z., Tarique, M.: Detection of SQL injection attacks: a machine learning approach. In: International Conference on Electrical and Computing Technologies and Applications (ICECTA), vol. 2019, pp. 1–6 (2019) 10. Chen, D., Yan, Q., Wu, C., Zhao, J.: SQL injection attack detection and prevention techniques using deep learning. J. Phys.: Conf. Ser., 1757(1), 012055 (2021). https://doi.org/10.1088/1742-6596/1757/1/012055 11. Crespo-Mart´ınez, I.S.: SQL injection attack detection in network flow data. Comput. Secur. 127, 103093 (2023). https://www.sciencedirect.com/science/article/ pii/S0167404823000032 12. Tang, P., Qiu, W., Huang, Z., Lian, H., Liu, G.: Detection of SQL injection based on artificial neural network. Knowl.-Based Syst. 190, 105528 (2020). https://www. sciencedirect.com/science/article/pii/S0950705120300332 13. Cruz-Benito, J.: Systematic literature review and mapping (2016). https:// repositorio.grial.eu/handle/grial/685
Impact of the Keep-Alive Parameter on SQL Injection Attack Detection in Network Flow Data Ignacio Samuel Crespo-Mart´ınez1(B) , Adri´ an Campazas-Vega2 , 2 2 ´ ´ Angel Manuel Guerrero-Higueras , Claudia Alvarez-Aparicio , 2 and Camino Fern´ andez-Llamas 1
Supercomputaci´ on Castilla y Le´ on (SCAYLE), Campus de Vegazana s/n, 24071 Le´ on, Spain [email protected] 2 Robotics Group, University of Le´ on, Campus de Vegazana s/n, 24071 Le´ on, Spain
Abstract. The recent increase in the frequency and diversity of cyberattacks has raised concerns among companies, organizations, and users. Web applications are particularly critical among the various targets as they provide users access to online data and financial transactions. SQL injections, which can be exploited to compromise the security of web infrastructures, represent a significant risk. While SQL injection detection has been a solved problem for scenarios where all network layer datagrams are analyzed, it remains a challenge in data-intensive networks that use lightweight protocols based on network flows, such as NetFlow. In this paper, we attempt to emulate a realistic SQL injection attack scenario, where an attacker tries to generate minimum noise in the network. To this end, we generated SQL injection attack datasets based on flows using the SQLMap tool and the keep-alive parameter. We evaluated several machine learning algorithms and achieved a detection rate higher than 97%, which can significantly improve the security of networks that handle significant traffic.
Keywords: Machine Learning Keep-alive · Network security
1
· Netflow · SQLIA detection ·
Introduction
Businesses, organizations, and users are increasingly concerned about cyberattacks, which have become more numerous and diverse in recent years. Web applications are particularly vulnerable to exploitation, as they offer various functions enabling users to access data or conduct financial transactions online. The primary security concern for web infrastructures is injections [5], particularly SQL injections, which allow attackers to access and potentially manipulate the application’s database, including stealing or deleting data. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 69–78, 2023. https://doi.org/10.1007/978-3-031-42519-6_7
70
I. S. Crespo-Mart´ınez et al.
According to the Open Web Application Security Project (OWASP) Top 10 published in 2021 [12], injections pose a significant risk to web application security. This category encompasses various types of injections, including SQL injection and Cross-Site Scripting (XSS), among others. While SQL injections are commonly recognized, it is important to note that the OWASP classification considers injections as a whole, rather than singling out SQL injections alone. Therefore, it would be more accurate to state that injections, including SQL injection and other forms such as XSS, are identified as a critical risk to web application security according to OWASP’s ranking. Similarly, the MITRE CWE Top 25 Most Dangerous Software Weaknesses [18] also ranks SQL injections as the third most serious threat. The problem of detecting SQLIA (SQL Injection Attacks) using network packets and machine learning models has been resolved. Several studies have explored the use of machine learning to detect SQL injection attacks (SQLIA) through network packets. For example, in [19], the authors used multiple datasets to train different models, with the Random Forest (RF) model achieving an accuracy of 98.05%. The research conducted in [14] tested multiple algorithms and found ensemble methods to be the most effective, with an accuracy of 93.8%. Other studies focused on using only packet payloads and tokenized queries. For example, in [11] proposing a comprehensive hybrid approach with an accuracy of 99.2%. These studies demonstrate the effectiveness of machine learning in detecting SQLIA through network packets. Detecting SQL injections can pose a challenge in high-volume data networks, where analyzing the content of every network packet is unfeasible. In these types of networks, lightweight network flow-based protocols such as NetFlow, sFlow, or IPFIX are commonly used. Flow is defined as a collection of packets that traverse a specific observation point in the network within a defined time interval [8]. Flows are characterized by common features, such as the source and destination IP addresses, as well as the source and destination ports. These shared attributes allow for the grouping of packets into flows, facilitating the analysis of network traffic. One of the advantages of analyzing network flows is that they can provide information about network activity even when a significant portion of the traffic is encrypted, due to the fact that network flows capture and store features. In the study [20], the performance of the Extra Trees ensemble classifier in detecting SQL injection attacks (SQLIA) using both packet-based and flowbased datasets was compared. The research revealed that the proposed model’s detection rate and precision for SQLIA detection through the flow-based dataset were only 25% and 22%, respectively, which could be improved. This investigation suggests that the issue of detecting SQLIA through network flows data still needs to be solved, and no available solution has achieved a detection rate higher than 25%. However, research has demonstrated the feasibility of detecting SQL injection attacks using NetFlow network flows. In [10], the authors attempted to detect SQLIA in flow data by collecting network flow data for SQL injections in the three most widely used relational database engines, resulting in two datasets: one for training machine learning-based models and another for model validation.
Impact of Keep-Alive
71
The results indicate that detecting SQL injections using NetFlow Version 5based flow data is possible. Although the SQL injections used in the study were conducted via the automated Sqlmap tool, the parameters employed do not reflect a realistic scenario where attackers would attempt to carry out the attack while minimizing network noise. In order to reduce noise during attacks, the Keep-Alive parameter is used. Within the Hypertext Transfer Protocol (HTTP), keep-alive [6] is a header field that establishes a persistent connection between the client and server. Upon receiving a client request, the server can issue a keep-alive header, specifying a set duration for the connection to remain open. The sustained connection enables the client to transmit multiple requests through the same connection, avoiding establishing a new connection for each request. This approach can enhance the communication’s performance by diminishing the overhead of establishing new connections and reducing the data transfer latency. Malicious actors can also exploit the establishment of keep-alive connections to maintain a solitary connection with the target system and repeatedly transmit requests, obviating the need for establishing multiple connections. This tactic may enable attackers to circumvent detection by network security systems that are designed to detect atypical traffic patterns [15]. The present study aims to simulate a realistic environment of SQL injection attacks, in which the attackers aim to minimize network noise to avoid detection. To achieve this objective, we will replicate the study conducted in [10] to investigate the impact of the keep-alive parameter on the detection of SQL injection attacks (SQLIAs) in network flows. Initially, we will employ the models from the study conducted in [10], which were trained using datasets without the keep-alive parameter. It will enable us to assess the ability of models trained without the keep-alive parameter to detect SQLIAs that were carried out with the keep-alive parameter. Subsequently, we will be trained and tested new machine learning algorithms using datasets generated with the keep-alive parameter to compare the outcomes against the models proposed in [10]. The results indicate that the models perform well even with the keep-alive parameter activated. However, to effectively identify SQL injection attacks in real-world environments (where the keep-alive parameter is employed), it is necessary to train the algorithms with datasets generated using keep-alive. The remainder of this paper is organized as follows: Sect. 2 outlines the materials, tools, and methodology employed in this study to collect datasets and conduct experiments. Section 3 presents and discusses the results of the experiments. Finally, Sect. 4 provides conclusions and outlines future research directions.
2
Materials and Methods
In this section, we elaborate on the methodology adopted, comprising the specification of the Netflow protocol, the implementation of DOROTHEA, a framework for acquiring NetFlow data, and instructions for obtaining suitable flow datasets. Finally, we present our proposed evaluation approach.
72
2.1
I. S. Crespo-Mart´ınez et al.
Netflow
NetFlow is a protocol created by Cisco Systems in 2004, designed to gather flow data and has become a popular tool for networks that handle large traffic volumes. Other manufacturers, including Juniper and Enterasys Switches, also support this technology. NetFlow was initially introduced as a feature for Cisco routers, collecting IP traffic to provide administrators with a global view of their network activity. NetFlow has multiple versions, with V1, V5, and V9 being the most common. NetFlow [7] creates unidirectional flows, generating two separate flows in network communication. The first flow collects packets with a source-destination address, while the second flow gathers reply packets with a destination-source address. After a period of inactivity or when it has been active for a specific duration, a NetFlow stream expires. Both time periods can be customized. Although terminating active flows may seem contradictory, it allows flow analyzers to obtain information on long-lived flows and prevents infinite flows from going unanalyzed. 2.2
Dorothea
DOROTHEA [4] is a NetFlow data collection framework created by the authors. Based on Docker, it facilitates the construction of interconnected virtual networks that generate and collect flow data using the NetFlow protocol. DOROTHEA transmits network traffic packets to a NetFlow generator that has a sensor ipt netflow installed [1]. This sensor is a Linux kernel module that processes the packets and converts them into NetFlow flow data. DOROTHEA provides a high degree of customization and scalability, allowing users to deploy nodes that generate synthetic network traffic, including benign and malicious types. The benign traffic generation nodes simulate usergenerated traffic, such as web browsing, email transmission, or SSH connections, which are executed as Python scripts [13]. The procedure for generating malicious traffic in DOROTHEA shares a similar approach with benign traffic generation, but with the added feature of an isolated environment where all traffic is empirically labeled as malicious. The attacks are also executed as Python scripts, and users have the ability to customize or include their own scripts. The gateway functions analogously to the benign traffic generation process, which involves directing packets and aggregating flow data. 2.3
Data Gathering
The SQLmap tool offers a keep-alive parameter that can be employed to establish and maintain a persistent HTTP connection with the target server. This functionality permits the tool to dispatch multiple requests over the same connection, eliminating the need to establish a new connection for each subsequent request.
Impact of Keep-Alive
73
Leveraging the keep-alive parameter in SQLmap can enhance the speed and efficiency of the tool’s HTTP requests by reducing the overhead incurred during the establishment of new connections for each request. Nevertheless, it is essential to take into account that some servers may not support persistent connections or may have restrictions on the number of connections that can be maintained. Thus, it is advisable to test and adjust the keep-alive parameter appropriately for each target server. Given that the aim is to reproduce the experiment described in [10], data collection procedures were performed in the same way as outlined in [10]. The only modification made was the inclusion of the keep-alive parameter to accurately emulate a realistic scenario of SQL injection attacks. 2.4
Evaluation
The confusion matrix is a powerful tool for computing Key Performance Indicators (KPIs) that are commonly used to identify the most accurate classification algorithm. The performance of the models was evaluated by measuring their accuracy score on the test sets, as shown in Eq. 1. Accuracy =
T P + TN T P + FP + T N + FN
(1)
The number of true positives (TP) represents the number of malicious flows that were correctly identified as malicious. True negatives (TN) indicate the number of benign flows that were correctly identified as benign traffic. False positives (FP) correspond to the number of benign samples that were incorrectly classified as malicious. Finally, false negatives (FN) refer to the number of malicious samples that were wrongly classified as benign traffic. Since binary classifiers often predict the majority class, we also computed Detection Rate (DR), Recall (R), and F1 score for both classes - benign (0) and malicious (1) flow data. DR measures the accuracy of the positive predictions and is computed as shown in Eq. 2. TP (2) DR = T P + FP R, also known as sensitivity or true positive rate, is the ratio of positive instances correctly detected by the classifier and is computed as shown in Eq. 3. R=
TP T P + FN
(3)
To combine detection rate and recall into a single metric for easy comparison of two classifiers, the F1 score (F1) is often used. It is the harmonic mean of DR and R, giving much more weight to low values than the regular mean. F1 is computed as shown in Eq. 4. F1 = 2
DR × R DR + R
(4)
74
2.5
I. S. Crespo-Mart´ınez et al.
Classification Models Fitting
The same experimentation has been carried out as in [10]. Moev is used for the training of machine learning algorithms. The algorithms used are: KNN [17]), LR [21], Linear Support Vector Classification (LSVC) [9], Perceptron with stochastic gradient descent (SGD) [2], and RF [3]. In addition to these algorithms, it also uses The Majority Voting (VC) [16] ensemble classifier, which selects the class that gets the most votes, regardless of whether the total votes exceed 50%.
3
Results and Discussion
Table 1. Dataset volumetry Dataset Aim
Samples Benign-malicious traffic ratio
D1
Training 400,003
D2
Test
57,239
50%
D3
Training 14,760
50%
D4
Test
50%
318
50%
Table 1 presents the volumetry of the datasets. The datasets D1 and D2 were used in [10] and generated without the keep-alive parameter. Meanwhile, D3 and D4 were generated using the same attacks but with the keep-alive parameter enabled. As illustrated, the number of samples is significantly reduced when SQLIA is performed with keep-alive. This effect arises from the usage of Keep-Alive functionality on a network connection, which maintains the TCP connection open after an HTTP request has been completed. This feature permits multiple additional HTTP requests to be issued using the same connection. Consequently, instead of establishing a new TCP connection for each HTTP request, the connection is retained open, thereby reducing the total number of TCP connections created and terminated on the network. The frequency of NetFlow flow generation is directly affected by this. With each new TCP connection established, a new NetFlow flow is generated. As such, the use of keep-alive connections that remain open for an extended duration results in a reduced number of NetFlow flows produced within the network. Table 2 displays the outcomes obtained in [10] for detecting SQLIA, in which algorithms were trained and validated using datasets lacking keep-alive functionality.
Impact of Keep-Alive
75
Table 2. Accuracy, Detection Rate, Recall and F1 score obtained in [10] for SQLIA detection Algorithm
Accuracy Class
LR
0.973
DR
R
F1
Benign (0) 0.751 0.758 0.754 Malicious (1) 0.756 0.748 0.752 Average 0.753 0.753 0.753
Perceptron+SGD 0.963
Benign (0) Malicious (1) Average
0.969 0.971 0.970
0.971 0.968 0.970
0.970 0.969 0.970
VC
0.856
Benign (0) Malicious (1) Average
0.758 0.925 0.841
0.943 0.698 0.821
0.840 0.796 0.818
RF
0.840
Benign (0) Malicious (1) Average
0.000 0.500 0.250
0.000 1.000 0.500
0.000 0.667 0.333
LSVC
0.834
Benign (0) Malicious (1) Average
0.348 0.388 0.368
0.296 0.447 0.371
0.320 0.415 0.367
KNN
0.714
Benign (0) Malicious (1) Average
0.832 0.996 0.914
0.996 0.799 0.898
0.907 0.887 0.897
The assessment was conducted using accuracy, detection rate, recall, and F1 score for both benign and malicious categories, as well as the average of both categories. Table 3 shows the findings from the validation of algorithms that were trained on [10] and validated with a dataset that includes keep-alive functionality. KNN algorithm demonstrated the best overall performance among the tested algorithms with an accuracy of 0.833 and an average F1 score of 0.829. The detection rate for malicious traffic was also high at 0.995, indicating its effectiveness in detecting malicious traffic. On the other hand, RF performs inadequately with an accuracy of 0.189 and an average F1 score of 0.159. It fails to identify any malicious traffic, thus rendering it unsuitable for this task. As illustrated, models trained without keep-alive connections exhibit a decline in their detection capability ranging from 34% to 78% when compared to models that incorporate keep-alive connections. The diminished detection capability of models trained without keep-alive connections is attributed to their incapacity to effectively process persistent connections. Such models may be ill-equipped to manage the higher frequency and volume of NetFlow flows generated through persistent connections.
76
I. S. Crespo-Mart´ınez et al.
Additionally, the use of keep-alive connections leads to the creation of larger flows, as measured by the ‘dpkts’ field of NetFlow data, which reflects the number of packets comprising each flow. The performance of algorithms trained with datasets created using the keepalive parameter is presented in Table 4. Perceptron+SGD exhibits the best overall performance, attaining an accuracy of 0.971 and an average F1 score of 0.970. KNN also performs well, with an accuracy of 0.897 and an average F1 score of 0.897, but its malicious traffic detection rate is slightly lower than that of Perceptron+SGD. VC and LR exhibit improved performance when trained with the keep-alive parameter, with average F1 scores of 0.818 and 0.753, respectively. However, VC shows a higher malicious traffic detection rate than LR, indicating that it is more effective in identifying malicious traffic. Training with the keep-alive parameter results in significant improvements in model performance compared to models trained without this parameter. All models, except RF and LSVC, demonstrate enhanced performance when trained and validated using keep-alive datasets. The malicious traffic detection rates are similar to those presented in [10], in which models were trained and validated without keep-alive. Hence, we can deduce two conclusions. Firstly, it is feasible to identify connections that implement the keep-alive protocol, albeit necessitating the retraining of models employing this parameter. Table 3. Accuracy, Detection Rate, Recall and F1 score obtained using the models build in [10] and validated with the keep-alive datasets. Algorithm
Accuracy Class
DR
R
F1
KNN
0.833
LR
0.643
VC
0.531
0.751 0.995 0.873 0.616 0.686 0.651 0.510 0.554 0.532 0.463 0.336 0.400 0.348 0.388 0.368 0.274 0.000 0.137
0.997 0.670 0.833 0.758 0.528 0.643 0.551 0.512 0.531 0.758 0.123 0.440 0.296 0.447 0.371 0.377 0.000 0.189
0.857 0.801 0.829 0.680 0.597 0.638 0.528 0.527 0.527 0.575 0.180 0.377 0.320 0.415 0.367 0.317 0.000 0.159
Perceptron+SGD 0.440
LSVC
0.371
RF
0.189
Benign (0) Malicious (1) Average Benign (0) Malicious (1) Average Benign (0) Malicious (1) Average Benign (0) Malicious (1) Average Benign (0) Malicious (1) Average Benign (0) Malicious (1) Average
Impact of Keep-Alive
77
Table 4. Accuracy, Detection Rate, Recall and F1 score obtained using the models trained with datasets created with the keep-live parameter. Algorithm
4
Accuracy Class
DR
R
F1
Perceptron+SGD 0.971
Benign (0) 0.969 0.971 0.970 Malicious (1) 0.971 0.968 0.969 Average 0.970 0.970 0.970
KNN
0.897
Benign (0) Malicious (1) Average
0.832 0.996 0.914
0.996 0.799 0.898
0.907 0.887 0.897
VC
0.821
Benign (0) Malicious (1) Average
0.758 0.925 0.841
0.943 0.698 0.821
0.840 0.796 0.818
LR
0.753
Benign (0) Malicious (1) Average
0.751 0.756 0.753
0.758 0.748 0.753
0.754 0.752 0.753
RF
0.500
Benign (0) Malicious (1) Average
0.000 0.500 0.250
0.000 1.000 0.500
0.000 0.667 0.333
LSVC
0.371
Benign (0) Malicious (1) Average
0.348 0.388 0.368
0.296 0.447 0.371
0.320 0.415 0.367
Conclusions
The aim of this study is to establish a highly realistic scenario in which an attacker can execute SQL injection attacks (SQLIA) with minimal network activity. To achieve this objective, two SQLIA datasets were created through the utilization of the SQLmap tool with the keep-alive parameter enabled. The keepalive parameter enables the maintenance of an ongoing HTTP session between requests issued by the tool. This functionality enables attackers to avoid detection by intrusion detection systems. First, using the datasets produced in this study, which were generated with the parameter keep-alive enabled, the machine learning algorithms trained in [10] were tested. Afterward, the same machine learning algorithms were re-trained and validated with the datasets generated in this study. The results obtained show that it is possible to detect SQLIA using network flows when performed as stealth attacks with the Keep-Alive parameter active. To achieve this, the models utilized for detection must be trained with datasets that have been collected while the Keep-Alive parameter was active. These results can enable network administrators to deploy these models to improve the security of the network and the users who use it.
78
I. S. Crespo-Mart´ınez et al.
References 1. Aabc/IPT-netflow. Ipt-netflow: Netflow iptables module for Linux kernel (2022). https://github.com/aabc/ipt-netflow. Accessed 28 July 2022 2. Bottou, L.: Stochastic gradient learning in neural networks. Proc. Neuro-Nimes 91(8), 12 (1991) 3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) ´ 4. Campazas-Vega, A., Crespo-Mart´ınez, I.S., Guerrero-Higueras, A.M., Fern´ andezLlamas, C.: Flow-data gathering using netflow sensors for fitting malicious-traffic detection models. Sensors 20(24), 7294 (2020) 5. Chapin, N., Sethi, V.: The Web Application Hacker’s Handbook: Finding and Exploiting Security Flaws. Wiley, Hoboken (2019) 6. Chetty, C.M.: Keep-alive mechanisms. In: Computer Networks: A Systems Approach, 5 edn, pp. 183–185. Morgan Kaufmann, Burlington (2011) 7. Claise, B., Sadasivan, G., Valluri, V., Djernaes, M.: Cisco systems netflow services export version 9. RFC 3954. Internet Engineering Task Force (2004) 8. Claise, B., Trammell, B., Aitken, P.: Specification of the IP flow information export (IPFIX) protocol for the exchange of flow information. RFC 7011 (Internet Standard), Internet Engineering Task Force, pp. 2070–1721 (2013) 9. Cortes, C., Vapnik, V.: Support vector machine. Mach. Learn. 20(3), 273–297 (1995) ´ 10. Crespo-Mart´ınez, I.S., Campazas-Vega, A., Guerrero-Higueras, A.M., Riego´ Fern´ DelCastillo, V., Aparicio, C.A., andez-Llamas, C.: SQL injection attack detection in network flow data. Comput. Secur. 127, 103093 (2023) 11. Deriba, F.G., Salau, A.O., Mohammed, S.H., Kassa, T.M., Demilie, W.B.: Development of a compressive framework using machine learning approaches for SQL injection attacks. Przeglad Elektrotechniczny 12. OWASP Foundation. Owasp top ten (2022). https://owasp.org/www-project-topten/. Accessed 20 July 2022 13. Python Software Foundation. Python (2022). https://www.python.org/. Accessed 26 July 2022 14. Hasan, M., Balbahaith, Z., Tarique, M.: Detection of SQL injection attacks: a machine learning approach. In: 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–6. IEEE (2019) 15. Karpe, S., Bansode, R., Mahajan, V.: Http keep-alive: a double-edged sword for attackers and defenders. Int. J. Adv. Sci. Technol. 30(6), 2209–2216 (2021) 16. Krishnaveni, S., Prabakaran, S.: Ensemble approach for network threat detection and classification on cloud computing. Concurrency Comput. Pract. Exp. 33(3), e5272 (2021) 17. Mitchell, H.B., Schaefer, P.A.: A “soft” k-nearest neighbor voting scheme. Int. J. Intell. Syst. 16(4), 459–468 (2001) 18. mitre. mitre (2022). https://www.mitre.org/. Accessed 13 Sept 2022 19. Ross, K., Moh, M., Moh, T.-S., Yao, J.: Multi-source data analysis and evaluation of machine learning techniques for SQL injection detection. In: Proceedings of the ACMSE 2018 Conference, pp. 1–8 (2018) 20. Sarhan, M., Layeghy, S., Moustafa, N., Portmann, M.: NetFlow datasets for machine learning-based network intrusion detection systems. In: Deze, Z., Huang, H., Hou, R., Rho, S., Chilamkurti, N. (eds.) BDTA/WiCON -2020. LNICST, vol. 371, pp. 117–135. Springer, Cham (2021). https://doi.org/10.1007/978-3-03072802-1 9 21. Wright, R.E.: Logistic regression. In: Reading and Understanding Multivariate Statistics (1995)
SWAROG Project Approach to Fake News Detection Problem Rafał Kozik1,3(B) , Joanna Komorniczak2,3 , Paweł Ksieniewicz2,3 , Aleksandra Pawlicka2,3 , Marek Pawlicki1,3 , and Michał Choraś1,3 1
Bydgoszcz University of Science and Technology, Bydgoszcz, Poland [email protected] 2 Wrocław University of Science and Technology, Wrocaław, Poland 3 University of Warsaw, Warszawa, Poland
Abstract. We often come across the seemingly obvious remark that the modern world is full of data. From the perspective of a regular Internet user, we perceive this as an abundance of content that we unintentionally consume every day, including links and amusing images that we receive from friends and content providers via webpages, social media, and other sources. Consequently, some of this information is only loosely related to the truth. This problem is one of the challenges the SWAROG project is intended to address. SWAROG is an ongoing Polish research project, which involves the creation of artificial intelligence algorithms for the automatic classification and detection of so-called fake news. In this paper, we report the recent project’s achievements regarding fake news detection, analyse and discuss the pitfalls the existing solutions run into concerning data annotation, and explain the project approach to deliver services for determining the credibility of information published in public space. Keywords: fake news machine learning
1
· disinformation · natural language processing ·
Introduction
It is critical to understand that fake news is more than a deception [12]. Every day, we accept hundreds of fresh news from the outside world, which wears us out and makes us numb to stimuli. This is the right time to trick us with a fake sensation that sticks out among the headlines. Unfortunately, we cannot validate every piece of information we come across. A natural human instinct is to absorb any information that appears to be relevant or intriguing. Often we are not aware that we contribute to the disinformation of the community to which we belong. As a result, false news is the outcome of our collective error. In that regard, our research motivations are mostly driven by three challenges, which are: – The evolution and characteristics of social networks, particularly its main flaw, which is that an expert’s opinion is now just as valid and relevant as that of a non-expert, or even someone who deliberately misleads the audience. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 79–88, 2023. https://doi.org/10.1007/978-3-031-42519-6_8
80
R. Kozik et al.
– Changes in traditional media, both in relation to COVID-19 and dynamically shifting economic requirements. These changes and processes in the traditional media gradually emphasize journalists’ work from a distance and force them to use online sources more frequently than they would otherwise. – The effect of fake news and misinformation on a nation’s and people’ sense of security - The network social media, along with traditional media, have evolved into a element of state security because fake news and disinformation spread by both internal and external actors can influence readers by sparking pointless debate on issues that are inherently unimportant to society. This degrades the quality of discourse, instills fear, and endangers the security of the state. To combat the fake news phenomenon, in the SWAROG project, we suggest several remedies. First, we promote an open architecture for IT systems that stop fake news. Second, we propose advanced machine learning techniques for identifying fake news based on text analysis in English. The project mission is also the creation of a representative database of Polish annotated texts (containing both true and incorrect information), representing the actual distribution of information. In that regard, we also focus on modern machine-learning techniques for Polish text analysis.
2
Related Work
In order to take use of each model’s distinct advantages, researchers usually combine various deep learning techniques to tackle fake news. This section presents the most notable methods for detecting disinformation. Convolutional Neural Networks (CNN) with margin loss have been suggested by the authors in [5] as a possible solution to the problem of fake news detection. They use a variety of embedding models, both static and non-static, to handle this issue. On two reputable datasets, ISOT and LIAR, their best design surpasses cutting-edge techniques. A Long Short-Term Memory (LSTM) has emerged as the most effective model for the task, according to a study by [11] that examined the outcomes of several techniques used to detect fake news in Thai news websites. In [15] authors looked into how deep neural networks (such as short-term memory and bi-directional LSTM) may automatically classify and recognize false information concerning the COVID-19 epidemic posted on social media platforms. In [16] authors have developed a novel strategy for classifying news articles as trustworthy or fraudulent by building multiple models utilizing document embeddings. They have also offered a benchmark for different architectures that recognize bogus news through binary or multi-labelled classification. They evaluated the models’ accuracy, precision, and recall using a number of news corpora. They added that getting high accuracy is not mainly dependent on the classification model’s complexity.
SWAROG Project Approach to Fake News Detection Problem
81
In [8], Khan et al. examined the effectiveness of multiple traditional machine learning techniques as well as neural network models for textual fake news identification. In the experiments, the authors have used three different datasets. They came to the conclusion that Naive Bayes with n-gram (bigram TF-IDF) features offered the best performance. In order to identify fake news with the best degree of accuracy, researchers at [7] developed an ensemble classification model. After feature extraction, the model classifies the vectors using three machine learning algorithms, namely: Decision Tree, Random Forest, and Extra Tree Classifier. A substantially different approach constitutes a system presented in [4], In this article, authors adapted knowledge-based architecture based on keywords extraction. They gather narratives from multiple reliable information sources and extract the semantic and sentiment-relevant features. Moreover, they structure the information as a conceptual graph to generate trustworthy knowledge.
3 3.1
Project SWAROG: Data Annotation and Fake News Detection Overview of the Project
SWAROG is an on-going Polish research project, which involves the creation of artificial intelligence algorithms for the automatic classification and detection of so-called fake news. As a result of the project, a service based on artificial intelligence algorithms are created, which are able to d - in particular on the Internet and social media. The solution will be prepared to take into account the current model of information distribution - in which the recipient more and more often functions in his closed information “bubble”, and false information is generated by the synchronised activities of many small sources (e.g. fake Twitter accounts). The prepared solution will have a high business value both in Poland (support for Polish) and abroad (work in English). The solution will be based on the latest achievements in the field of natural language processing, social network analysis and machine learning, mainly in the field of data classification. Advanced supervised learning models will be used, but also clusters of classifiers. The target solution will also take into account the streaming nature of fake news, and thus the non-stationarity of the model, i.e. the occurrence of the concept drift phenomenon, requiring the predictive model to adapt to the changing parameters of the stream of analysed news. 3.2
Innovative SWAROG Approach to Data Annotation
The phenomenon of deliberate disinformation, regardless of whether it is defined from the perspective of classic propaganda, the imposition of the so-called fog of war, or today as fake news, is a complex cultural concept with a fluid, dynamic definition. Due to its vagueness, automatic recognition systems require an unambiguous and readable source of bias to identify it as a concept and gain sufficient
82
R. Kozik et al.
generalizing power. In this case, the only properly analyzed artificial intelligence methods are inductive learning models, which base the learning procedure on a marked observations pool. However, observing publications from recent years and already available literature reviews, it should be noted that the properties and strategies of annotation used in the most widely available benchmark sets of the problem are an exceedingly significant limitation for the reliability of the proposed models. The data used in the vast majority of publications are significantly burdened both by the tendency to batch annotation, where the designation of a human expert does not apply to individual documents, but to strongly heterogeneous clusters of the publishing house, and by the discrete bias of the multidimensional phenomenon, simplified to a binary label. The first of these problems can only be solved by an individual assessment of each article – which was included as a mandatory condition for the annotation carried out for the SWAROG project. The second problem – simplified annotation – is a much more significant and interesting challenge. In order to describe the annotation procedure, it is first necessary to describe the data acquisition strategy. The dataset’s development began with acquiring articles from 48 internet portals. A set of scrappers was used to collect the data. From December 2021 to October 2022, basic information was systematically collected on the articles, including their content, title, author, and publication date. Source portals concerned various topics – from standard information portals, through sports and economics, to culinary and gossip portals. Both international and local portals were included. As a result of a nearly one-year data acquisition process, over 430,000 records were collected. Due to the varying frequency of content published on the monitored websites, significant disproportions were visible in the collected data – from several documents per month to over five thousand, depending on the period and the source website. In order to ensure an even distribution in the pool of objects submitted for annotation, their number was reduced, retaining all documents from a given month and sources whose number did not exceed 110. When the base number of documents was greater, their number was reduced to 110 objects. In this way, the total pool of documents was reduced from over 430,000 to about 40,000, ensuring that documents from all sources and the entire acquisition period were preserved and that the distribution of documents was close to uniform. Since the phenomenon of fake news is a multi-factor concept, the content rating form was not limited to a one-dimensional binary response. A set of 13 questions has been developed to indicate signs of untrue content. Based on the earlier analysis of the fake news phenomenon, three groups of factors were distinguished: verification, manipulation, and metaphysical factors. Our set of questions is based on well-established fact-checking good practices. Verification factors concern the objective verification of the content presented in the articles and are the only group of factors requiring additional activity from the annotator in the form of confirmation of the content in other sources. The verification factors included the following questions:
SWAROG Project Approach to Fake News Detection Problem
83
– Is there at least one credible source that confirms all the information in the content? – Are most of the information provided confirmed by reliable sources? – Are none of the information confirmed by reliable sources? – Does the content refer to current data (when creating the statement)? Manipulation factors were to examine whether the author of the published content intended to deliberately mislead the reader or impose a certain point of view on the described situation. The pool of questions to examine the factors of manipulation consisted of the following: – – – – –
Is additional information required to understand the content correctly? Does the content contain inaccuracies? Does the content contain parts taken out of context? Does the author of the content use cherry-picking? Is the author trying to mislead the reader?
Metaphysical factors, similarly to manipulation factors, are supposed to affect the reader’s emotions, depending on his views. The authors of the texts, knowing which group of users the published content reaches, can influence the process of disseminating fake news and give the content a specific overtone. The set of questions on metaphysical factors consists of the following: – – – –
Is the content satirical? Does the author admit that the presented facts are fictitious? Does the content contain political promises? Does the content contain religious content?
The answers to the above mentioned questions are not unambiguous, which means they will be characterized by the annotators’ bias. It was reasonable to examine the pool of annotators answering the analyzed questions. Similar to the phenomenon of fake news, it was decided to look at the annotators in a multidimensional way. The expert assessment metric was divided into two parts: basic and extended. The basic metrics included questions about gender, age, place of residence, voivodeship, education, and profession. The extended metrics took into account, among others, the political and religious views of experts and how they use traditional and social media. The extended metric question pool consisted of the following: – – – – – – – –
Which political party did you vote for in the last election? Are you an active political activist? Are you an active social activist? Do you believe in the existence of an extrasensory world? Do you believe in the effectiveness of fortune-telling/astrological predictions? Do you participate in organized religious services? How do you use social media? How many hours a day do you spend browsing social media?
84
R. Kozik et al.
– Do you use fact-checking services? – Do you verify information from several different sources? – Do you consider traditional media (press, radio, television) a reliable information source? The presented annotation strategy introduces one fascinating property to the obtained set. Due to the presence of information about the characteristics of the annotator, the final recognition system can include in its inferential procedure not only knowledge about the concept but also knowledge about the user if they agree. Therefore, the target response of the system will concern the distribution, which potentially takes into account the specificity of the decision being made, taking into account not only objective premises but also the socio-political preferences of the user, giving him both a source of knowledge prepared especially for him, as well as insight into a broad spectrum of views of the entire community on the analyzed topic. Thanks to such a strategy, the SWAROG system not only answers the dichotomous question of whether something is fake news or not but the question that can be interpreted: How would I evaluate the content after spending a few minutes analyzing its credibility?
Fig. 1. Correlation between individual features of annotators and individual answers regarding news
Figure 1 shows the result of the correlation of answers to questions (a) regarding annotators and (b) regarding news content. The results of a separate analysis showed that a broad group of annotators was obtained, both in terms of basic metrics such as gender, age, and place of residence, as well as extended metrics, which included, e.g., political and religious views and the way of using social media. The graphs in the figure show the absolute degree of correlation between the answers to the questions. Lighter color means a higher absolute correlation, and deep blue means no correlation.
SWAROG Project Approach to Fake News Detection Problem
85
In part (a) of the graph, on the left, it can be seen that the place of residence and voivodeship, as well as political and social activity are most correlated. There is also a high correlation between belief in the extrasensory world and belief in fortune-telling and between experts’ age and time spent on social media. In part (b) of the graph, on the right, there is a strong relationship between the groups of verification factors (first four questions), manipulation factors (questions 5–9), and metaphysical factors (questions 10–13). The strongest absolute correlation is visible for the first and second questions regarding confirming the credibility of the information presented in the article’s content. It is also worth emphasizing the particularly strong absolute correlation of questions examining factors of manipulation, e.g., inaccuracies in content and intentional misleading. There are also visible connections between articles containing fabricated facts and satire. The presented analysis results may prove the high quality of the obtained data set. 3.3
Feature Extraction and Machine Learning Pipeline
In the SWAROG project, we considered various types of feature extraction methods. In particular, the work was carried out in several directions: (i) classical methods based on word frequency (in particular TF-IDF), (ii) methods based on word embedding (in particular Word2Vec), (iii) methods based on the BERT language model (Bidirectional Encoder Representations from Transformers) (Fig. 2).
SoftMax
... WORD
Classification Head
WORD
...
WORD
AVG. POOLING
WORD TEXT
TRANSFORMER LAYER
WORD
TRANSFORMER LAYER
WORD
Fig. 2. The processing pipeline adopted for the fake news detection problem.
In addition to that, we have also investigated various techniques for contextual coding of the entire document using fixed-length vectors. Vector joining techniques (e.g. pooling), convolutional neural networks (CNNs), and bidirectional recursive networks (Bidirectional LSTM) were consider. For the considered benchmark sets, the best extraction and spatial reduction technique turned out to be a combination of BERT word embeddings with the average pooling technique.
86
R. Kozik et al.
4
Experiments
4.1
Results Obtained on Benchmark Datasets
In the experiments, we used the 5 × 2 cross-validation technique. In this approach, standard two-way cross-validation is applied five times, and the results are averaged. In addition, to illustrate the variability and significance of differences, we indicated the standard deviation from the mean value. In the evaluation, we considered several popular benchmark datasets concerning fake news, disinformation, and propaganda problems. The comparison of F1-Score (obtained both for Fake and Real document types) has been presented in Fig. 3. Moreover, in Table 2 we have reported additional details on the performance (e.g. standard deviation) (Table 1). It can be noticed that the proposed approach achieves good results on most of the considered datasets. In particular, for COVIDFN, MMCOVID, QPROP, and GRAFN the weighted F1 score exceeds 90%. Moreover, the experiments revealed that the most challenging datasets are those referring to fact-checking problems (e.g. LIAR, and FAKENEWSNET). In particular, for the LIAR dataset, the proposed method has been able to achieve only 64% of the weighted F1-score. Table 1. Performance of the proposed model on the considered datasets (F1 score reported). Dataset
Fake
Real
W. Avg.
COVIDFN [2]
98.5 ± 0.1 68.7 ± 1.6 97.0 ± 0.1
MMCOVID [10]
85.5 ± 0.7 95.1 ± 0.1 92.4 ± 0.3
PUBHEALTH [9]
85.4 ± 0.3 75.5 ± 0.6 81.7 ± 0.4
QPROP [3]
70.3 ± 0.6 96.7 ± 0.1 93.7 ± 0.1
ISOT [1]
99.7 ± 0.0 99.7 ± 0.0 99.7 ± 0.0
GRAFN [13]
78.0 ± 0.2 94.7 ± 0.1 91.3 ± 0.1
NELA-GT [6]
72.2 ± 0.1 86.4 ± 0.0 81.5 ± 0.0
FAKENEWSNET [14] 58.2 ± 0.8 89.2 ± 0.2 81.6 ± 0.3 LIAR [17]
4.2
57.9 ± 0.8 68.9 ± 0.4 64.1 ± 0.5
Comparison with Other Techniques
In this experiment, we have adapted other fake news detection techniques in order to compare the results using the various datasets. Inspired by the analysed state of the art and various researchers, we have considered – Convolutional Neural Networks (CNN) [5], – Long Short-Term Memory (LSTM) [11], and – Static embedding followed by traditional ML techniques (Emb+Clf)[8]. The results revealed that the proposed approach to fake news detection surpasses other techniques in almost all of the considered benchmarks (except the CovidFN case).
SWAROG Project Approach to Fake News Detection Problem
87
Fig. 3. Comparison of F1 score (reported for labels Real and Fake) obtained for different benchmark datasets. Table 2. Performance of the proposed model on the considered datasets (F1 score reported). Dataset
CovidFN
MMCovid
Emb+Clf 87,3 ± 0,6 63,3 ± 1,6
5
PubHealth QPROP
ISOT
GRAFN
61,4 ± 1,2 35,5 ± 1,0 93,3 ± 0,1 77,8 ± 0,2
CNN
98,2 ± 0,1 67,3 ± 23,7 64,5 ± 5,2 59,9 ± 5,7 99,5 ± 0,1 88,6 ± 1,2
LSTM
98,0 ± 0,3 75,0 ± 2,4
67,0 ± 1,6 70,2 ± 4,9 99,1 ± 0,3 89,0 ± 1,8
Proposed 97.0 ± 0.1 92.4 ± 0.3
81.7 ± 0.4 93.7 ± 0.1 99.7 ± 0.0 91.3 ± 0.1
Conclusion
In this paper, we briefly presented the key assumption and achievements of the SWAROG project with respect to fake news and disinformation challenges. The proposed solution involves the creation of artificial intelligence algorithms for automatic classification and detection. In this paper, we have reported the recent project’s achievements regarding fake news detection, analysed and discussed the limitations of existing approaches to fake news annotation, and presented the preliminary results. Acknowledgments. This publication is funded by the National Center for Research and Development within INFOSTRATEG program, number of application for funding: INFOSTRATEG-I/0019/2021-00.
88
R. Kozik et al.
References 1. Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Privacy 1(1), e9 (2018) 2. Sumit Banik. Covid fake news dataset [data set]. Online (2020). Zenodo 3. Barrón-Cedeño, A., Da San, G., Martino, I.J., Nakov, P.: Proppy: organizing the news based on their propagandistic content. Inf. Process. Manag. 56(5), 1849–1864 (2019) 4. Martín, A.G., Fernández-Isabel, A., González-Fernández, C., Lancho, C., Cuesta, M., de Diego, I.M.: Suspicious news detection through semantic and sentiment measures. Eng. Appl. Artif. Intell. 101, 104230 (2021) 5. Goldani, M.H., Safabakhsh, R., Momtazi, S.: Convolutional neural network with margin loss for fake news detection. Inf. Process. Manag. 58(1), 102418 (2021) 6. Gruppi, M., Horne, B.D., Adalı, S.: Nela-gt-2021: a large multi-labelled news dataset for the study of misinformation in news articles (2022) 7. Hakak, S., Alazab, M., Khan, S., Gadekallu, T.R., Maddikunta, P.K.R., Khan, W.Z.: An ensemble machine learning approach through effective feature extraction to classify fake news. Future Gener. Comput. Syst. 117, 47–58 (2021) 8. Khan, J.Y., Khondaker, Md.T.I., Afroz, S., Uddin, G., Iqbal, A.: A benchmark study of machine learning models for online fake news detection. Mach. Learn. Appl. 4, 100032 (2021) 9. Kotonya, N., Toni, F.: Explainable automated fact-checking for public health claims. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7740–7754, Online, November 2020. Association for Computational Linguistics (2020) 10. Li, Y., Jiang, B., Shu, K., Liu, H.: Mm-COVID: a multilingual and multimodal data repository for combating COVID-19 disinformation (2020) 11. Meesad, P.: Thai fake news detection based on information retrieval, natural language processing and machine learning. SN Comput. Sci. 2(6), 425 (2021) 12. Zyblewski, P., Wozniak, M., Ksieniewicz, P., Kozik, R.: Swarog - fake news classification for the local context. In: Proceedings of the Basque Conference on CyberPhysical Systems and Artificial Intelligence, pp. 135–140 (2022) 13. Risdal, M.: Getting real about fake news. Online (2016). Kaggle 14. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: Fakenewsnet: a data repository with news content, social context and spatialtemporal information for studying fake news on social media (2019) 15. Tashtoush, Y., Alrababah, B., Darwish, O., Maabreh, M., Alsaedi, N.: A deep learning framework for detection of COVID-19 fake news on social media platforms. Data 7(5) (2022) 16. Truică, C.-O., Apostol, E.-S.: It is all in the embedding! fake news detection using document embeddings. Mathematics 11(3) (2023) 17. Wang, W.Y.: “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 422–426, Vancouver, Canada, July 2017. Association for Computational Linguistics (2017)
Neural Networks
Analysis of Extractive Text Summarization Methods as a Binary Classification Problem Joanna Komorniczak1 , Szymon Wojciechowski1 , Jakub Klikowski1(B) , Rafal Kozik2 , and Michal Chora´s2 1
Pattern Recognition Team of Department of Computer Systems and Networks, Wroclaw University of Science and Technology, Wybrze˙ze Wyspia´ nskiego 27, 50-370 Wroclaw, Poland {joanna.komorniczak,szymon.wojciechowski,jakub.klikowski}@pwr.edu.pl 2 Bydgoszcz University of Science and Technology, Al. prof. S. Kaliskiego 7, 85-796 Bydgoszcz, Poland {rkozik,chorasm}@pbs.edu.pl Abstract. One of the critical challenges for natural language processing methods is the issue of automatic content summarization. The enormous increase in the amount of data delivered to users by news services leads to an overload of information without meaningful content. There is a need to generate an automatic text summary that contains as much essential information as possible while keeping the resulting text smooth and concise. Methods of automatic content summarization fall into two categories: extractive and abstractive. This work converts the task of extractive summarization to a binary classification problem. The research focused on analyzing various techniques for extracting the abstract in a supervised learning manner. The results suggest that this different view of text summarization has excellent potential. Keywords: text summarization
1
· extractive summary · classification
Introduction
The automatic generation of text summarization is dictated by the constantly increasing content we encounter daily. Thanks to the Internet, publishing new content in text form has become widespread. The is also directly linked to the growing phenomenon of “fake news” [6,7,13]. The task of automatic text summarization focuses on developing content with reduced volume and preserving its semantic meaning and critical information [2]. According to the generally accepted assumptions, the text summary should be at most half of the summarized content [5]. Current methods for summarizing content are divided into two approaches abstract summaries and extractive summaries [1]. Extractive methods are based on selecting the most relevant parts of texts. On the other hand, abstract summarization algorithms try to extract the information contained in the content c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 91–100, 2023. https://doi.org/10.1007/978-3-031-42519-6_9
92
J. Komorniczak et al.
and convey it in a different and shortened form. These methods are among the more complex techniques, so extraction approaches are more often chosen. Therefore, in this work, we focus mainly on extraction approaches. Text extraction abstracts are a frequently undertaken research topic [9]. Some algorithms automatically summarizing the original text can be based on phrase frequency analysis [8]. Other approaches apply analysis of the relationship between a set of texts and the terms they contain by creating a set of concepts - Latent Semantic Analysis [12]. Algorithms that take a probabilistic approach are also popular Bayesian Topic Models [14]. One can also find many methods that rely on the representation of the text in the form of a graph [10], additionally including information about the similarity of individual sentences of the summarized text. In this work, we want to examine this problem from a different point of view. The task of extractive text summarization can be converted to a supervised machine learning problem. The essence of this transformation is binary classification, where a model tries to determine whether each of the original text’s sentences should belong to the summary’s content. A few recent works considered the idea of employing classification to text summarization [10]. Among the existing approaches, one can distinguish those that look for specific reliances between the sentences of the text and those that omit these dependencies. The former include Hidden Markov Models [11] and Conditional Random Fields [4]. Usually, methods that assume dependencies between sentences are more effective than others. Abandoning information about dependencies between sentences seems unrealistic when the goal is to summarize the document’s overall content. The rest of the article consists of the following sections: Sect. 2 describes the methodology. Section 3 presents the experimental setup, and Sect. 4 analyzes the results. The entire paper concludes with a summary in Sect. 5.
2
Methodology
2.1
Data Analysis
The studied collection “BBC News Summary”1 provided on the Kaggle platform contains 2225 BBC news articles from 2004–2005, with corresponding extraction summaries. Sequentially numbered text files with articles consist of title, introduction paragraph and main content. The articles represent five subject areas: – – – – –
business – 510 articles, entertainment – 386 articles, politics – 417 articles, sport – 511 articles, tech – 401 articles.
The corresponding abstract is also given in the form of a text file. It consists only of actual sentences selected from the original article. Figure 1 shows the distribution of article lengths and abstracts in each field. 1
https://www.kaggle.com/datasets/pariza/bbc-news-summary.
Analysis of Extractive Text Summarization Methods
93
Fig. 1. Histogram showing the number of sentences in each article by category (left column) and the percentage of sentences from the original text included in the summary (right column)
3
Experimental Setup
As part of the research, experiments were performed to test the various configurations of the methods used to generate an automatic text summary. The experiments were conducted independently for each article category. 3.1
Data Preparation
In order to perform the summary extraction, it was necessary to transform the text and summary for the classification task. The original text was divided into sentences. The title in the first line of each article was ignored because it was not a sentence by any of the language rules. Double spaces were replaced with single spaces, some special characters such as quotation marks and hyphens were ignored and newline characters were ignored. The text of the summary was handled similarly. The first step was to verify the correct division into sentences. For each sentence of the summary, it was checked whether such a sentence was in the
94
J. Komorniczak et al.
initial content. If it was not, the file was skipped. In the tech category from 401 files remained 347, from the sport category 511–442, from the politics category 417–355, for the entertainment category 386–332 and for business 510–478. Errors resulted from mistakes in the content of the article, or sentence construction possibilities not taken into account during the division, such as an ellipsis in the middle of a sentence, a new sentence beginning not with a letter but with a number, or abbreviations such as ”Co.”. After being correctly divided into fragments, each sentence from the original text was given a label of 1 when contained in the summary and 0 otherwise. The data was transformed using three different vectorization methods: CV (Count Vectorizer), TF (Term Frequency), and TF-IDF (Term Frequency Inverse Document Frequency). The first approach was vectorization in the context of a single article – the vectorization method was trained using the content of the entire single article, then subsequent sentences were transformed into a vector of numeric values. A numerical representation of each sentence was obtained along with the previously preserved label. Depending on the size of the file, patterns were described by an average of 100–200 features within a single article. Another approach was vectorization in context of categories. Within each domain of articles the content of all sentences was collected. Then all vectorization methods were trained using the resulting set. Then the method trained on the entire category of sentences was used to vectorize a single article. As a result of the transformation, each pattern in the category frame contained from about 9,500 features (for the category of sport) to about 11,700 features (for the category of business). 3.2
Configuration
In order to conduct experiments on summary extraction, the set of algorithm parameters was assumed and the corresponding configurations were used to solve the stated classification problem. All the enlisted algorithms were used with a standard hyperparameters configuration. – Base classifier • Gaussian Naive Bayes (gnb) • k-Nearest Neighbors (knn) • MLP Classifier (mlp) • Decision Tree (dt) – Feature extraction ([5, 15, 25]) • Select k best (with χ2 score) • Principal Components Analysis (PCA) – Vectorizers • Count vectorizer (cv) • Term frequency - inverse document frequency vectorizer (tf-idf) • Term frequency vectorizer (tf)
Analysis of Extractive Text Summarization Methods
95
– Vectorizers Context • Single Article • Article Domain Each article is to be considered as a single dataset and a sentence as a pattern. Due to the small size of each article, the Leave-One-Out [3] experimental protocol will be used. Leave-One-Out is a specific cross-validation method that facilitates the validation of small data sets. The specificity of this validation is to divide the test folds into N one-element sets, where N is the number of samples in the dataset. The accuracy metric was used for algorithm comparison and extraction quality estimation.
4 4.1
Results Analysis Classification in the Context of a Single Article
Table 1 shows the classification qualities for each article category, depending on the vectorization method used, in terms of the number of features after selection using the select k best method and the distribution of χ2 for a given base classifier. The table is divided into three columns (the number of selected k best features), and each main column is divided into four sub-columns denoting the simple classifier used. The rows are grouped into five categories, and each category is divided according to the vectorization method. The floating-point numbers in the tables show the values of the accuracy metric for each unit experiment. The metrics were calculated based on the results from each field obtained during the Leave-One-Out evaluation. In all categories – most often – the highest results were obtained for 15 and 25 features after selection, using the GNB classifier. Table 2 shows the quality of classification for each article domain, depending on the vectorization method, in terms of the number of components after extraction using the PCA method for a given simple classifier. As before, the Table is divided into three main columns (the number of k components) and each main column is divided into four sub-columns denoting the base classifier used. The rows are grouped into five categories denoting the domain of the articles and each category is divided according to the vectorization method. In all categories the best results were obtained for decision trees, with 25 components extracted. Figure 2 shows diagrams of the Critical Difference (cd), with a Nemenyi post-hoc test at a confidence level of 5%, between the method configurations studied. The columns show the vectorization methods, the rows show the article categories. In each case, regardless of the vectorization method and context, the select k best selection method gives significantly better results than the PCA extraction. In each domain, there are no statistical differences between the best group of methods. The best results are given by the classifiers GNB and MLP in combination with selection of 15 or 25 features.
96
J. Komorniczak et al. Table 1. Classification quality for the tested methods and feature selection Select k best k =5
tech
k = 15
gnb
knn
mlp
dt
cv
0.83
0.777 0.838 0.768
tf
0.845
0.784 0.837 0.774
tfidf 0.864 0.776 0.815 0.798 sport
knn
mlp
dt
0.88
0.743 0.881 0.739
0.888
0.746 0.9
0.728
gnb
knn
mlp
0.886
0.716
0.867 0.71
dt
0.897
0.728
0.898 0.716
0.897 0.718 0.874 0.757
0.899 0.676
0.872 0.737 0.699 0.877
cv
0.756
0.859 0.774 0.881
0.718
0.873 0.725 0.884
0.684
0.855
tf
0.763
0.841 0.782 0.914
0.727
0.881 0.737 0.915
0.687
0.874
0.715 0.886
0.819 0.795 0.924 0.684
0.856 0.757 0.92
0.634
0.837
0.73
0.837 0.768 0.883
0.737
0.879 0.739 0.881
0.71
0.86
0.709 0.546
0.744
0.896 0.731 0.896
0.724
tfidf 0.739 politics cv
entert.
k = 25
gnb
0.773
tf
0.782
0.84
tfidf
0.772
0.822 0.802 0.901 0.712
0.877 0.761 0.901 0.681
0.871
0.751 0.629
cv
0.765
0.854 0.763 0.89
0.722
0.883 0.722 0.891
0.688
0.856
0.692 0.697
tf
0.779
0.843 0.774 0.902
0.736
0.885 0.72
0.701
tfidf 0.749
0.771 0.888
0.881
0.906
0.895 0.728 0.591
0.878 0.701 0.639
0.824 0.797 0.918 0.684
0.857 0.753 0.914 0.646
0.847
0.726 0.65 0.7
business cv
0.769
0.854 0.774 0.883
0.735
0.876 0.716 0.883
0.695
0.848
tf
0.775
0.84
0.738
0.884 0.724 0.897
0.703
0.874 0.705 0.625
tfidf 0.753
0.774 0.894
0.821 0.792 0.905 0.692
0.85
0.755 0.905 0.654
0.84
0.649
0.736 0.649
Table 2. Classification quality for the tested methods and feature extraction PCA k =5 gnb tech
k = 15 knn
mlp
cv
0.565 0.56
tf
0.588 0.555 0.59
0.607 0.56
k = 25 knn
mlp
dt
0.533 0.561 0.596 0.553
gnb
knn
mlp
dt
0.499 0.547 0.564 0.596
0.566
0.57
0.535 0.578
0.546
0.512 0.528 0.553 0.588
0.562 0.517 0.543
0.551
0.517 0.517 0.534 0.593
cv
0.503 0.521 0.576 0.554
tf
0.521 0.52
0.479 0.525 0.531
0.58
0.47
0.523 0.511 0.624
0.574
0.544
0.499 0.505 0.499
0.568
0.48
0.507 0.498 0.611
tfidf 0.523 0.505 0.552
0.542
0.495 0.491 0.485
0.565
0.468 0.499 0.484 0.612
politics cv tf
0.559 0.572 0.611 0.585
0.512 0.55
0.577 0.558 0.59
0.565
0.53
0.58
0.532 0.5
tfidf 0.575 0.554 0.589 entert.
gnb
0.575
tfidf 0.588 0.565 0.601 sport
dt
0.578 0.571
0.515 0.528 0.522
0.475 0.55
0.537 0.62
0.562
0.482 0.53
0.516 0.616
0.552
0.486 0.508 0.504 0.626
cv
0.512 0.521 0.595 0.558
0.471 0.509 0.525
tf
0.539 0.524 0.566
0.548
0.486 0.504 0.508
0.572
0.46
0.505 0.497 0.594
tfidf 0.546 0.519 0.557
0.556
0.494 0.48
0.487
0.579
0.46
0.48
0.559 0.468 0.526 0.514
0.565
0.458 0.528 0.487 0.608
0.541
0.491 0.493 0.489
0.559
0.455 0.494 0.484 0.607
0.539
0.49
0.567 0.454 0.482 0.47
business cv tf
0.52
0.538 0.581
0.538 0.511 0.566
tfidf 0.546 0.504 0.55
0.477 0.471
0.587 0.462 0.509 0.506 0.611 0.477 0.594
0.606
In all domains, the PCA extraction method gives significantly worse results. It is assumed, that it is due to the representation of the data after vectorization – most of the features for a given pattern before transformation equal zero, so it carries low discrimination ability. With PCA extraction, the quality of classification often falls below 0.5, which is a accuracy close to the random classifier. Only the values for 25-component extraction and classification using decision trees yield results around 0.6 for PCA.
Analysis of Extractive Text Summarization Methods
97
Fig. 2. Critical difference diagram of the vectorization methods tested (horizontal axis) in each category (vertical axis).
4.2
Classification in the Context of Article Categories
In addition, an experiment was conducted in which vectorization was based on the entire article domain – each task was transformed using a vectorizer trained with the message content of the entire category to which it belonged. Due to the significantly increasing computational complexity resulting from the
98
J. Komorniczak et al.
number of features of each pattern (from about 9,000 to 12,000), it was decided to conduct the experiment only for the best method configuration from the previous experiment – for the classifier GNB and selection of 25 features using the select k best method and of χ2 test. Figure 3 shows the histogram of classification quality in the article context (orange color) and in the category context (blue color) for the article category business. The horizontal axis represents the classification quality, and the vertical axis represents the number of articles with a given classification quality value. It can be clearly seen that the classification quality of sentences for text summary generation is significantly better in the context of a single article. Importantly, the classifier for pattern processing in the context of categories is not random (accuracy around 0.6) which means that vectorization in the context of categories also carries some information about the relevance of a particular sentence within the domain.
Fig. 3. Comparison of classification quality in the context of a single article (orange) and a category (blue).
5
Conclusions
This work’s goal was to summarize information extracted from online news sites automatically. The data contained the article bodies from the BBC News website divided into five different categories: tech, politics, entertainment, sport, business, and their extractive summaries. The task of automatic summary generation has been transformed into a binary classification of sentences within an article. This clever procedure allows an experimental evaluation of various vectorization methods, classification, and feature extraction. Within the scope of this work, implementations of the required method were prepared, and experiments were planned to investigate this concept. An experimental evaluation was performed, and the results obtained were analyzed. The applied approach enabled satisfactory classification performance for feature selection and analysis of each sentence in the article context. The best method configuration was to select 15 or 25 features and use the classifier GNB
Analysis of Extractive Text Summarization Methods
99
or MLP. The classification quality was much lower in the setup with feature extraction using the PCA method. The analysis for each sentence in the context of the entire article category leads to significantly worse results and an increase in computational complexity, which is a consequence of the high dimensionality of the problem after vectorization (amounting to about 12,000 attributes). The research conducted shows excellent potential for the presented approach. A valuable development of this research would be a broader look at the reference methods, allowing for better verification of the stated assumptions of this approach’s concepts. During development, focusing on systematizing the results by introducing the Rouge metric would also be worthwhile. This metric would allow for more comprehensive evaluations of abstracts, regardless of how the abstract was produced. Acknowledgments. This publication is funded by the National Center for Research and Development within INFOSTRATEG program, number of application for funding: INFOSTRATEG-I/0019/2021-00.
References 1. Automatic text summarization: a comprehensive survey. Expert Syst. Appl. 165, 113,679 (2021) 2. Review of automatic text summarization techniques & methods: J. King Saud Univ. - Comput. Inf. Sci. 34(4), 1029–1046 (2022) 3. Fukunaga, K., Hummels, D.M.: Leave-one-out procedures for nonparametric error estimates. IEEE Trans. Pattern Anal. Mach. Intell. 11(4), 421–423 (1989) 4. Galley, M.: A skip-chain conditional random field for ranking meeting utterances by importance, pp. 364–372 (2006) 5. Gholamrezazadeh, S., Salehi, M.A., Gholamzadeh, B.: A comprehensive survey on text summarization systems. In: 2009 2nd International Conference on Computer Science and its Applications, pp. 1–6. IEEE (2009) 6. Ksieniewicz, P., Zyblewski, P., Borek-Marciniec, W., Kozik, R., Chora´s, M., Wo´zniak, M.: Alphabet flatting as a variant of n-gram feature extraction method in ensemble classification of fake news. Eng. Appl. Artif. Intell. 120, 105,882 (2023). https://doi.org/10.1016/j.engappai.2023.105882 7. Kula, S., Kozik, R., Chora´s, M.: Implementation of the BERT-derived architectures to tackle disinformation challenges. Neural Comput. Appl. (2021). https://doi.org/ 10.1007/s00521-021-06276-0 8. Mohan Kalyan, V., Santhaiah, C., Naga Sri Nikhil, M., Jithendra, J., Deepthi, Y., Krishna Rao, N.V.: Extractive summarization using frequency driven approach. In: Mai, C.K., Reddy, A.B., Raju, K.S. (eds.) Machine Learning Technologies and Applications. AIS, pp. 183–191. Springer, Singapore (2021). https://doi.org/10. 1007/978-981-33-4046-6 18 9. Moratanch, N., Chitrakala, S.: A survey on extractive text summarization. In: 2017 International Conference on Computer, Communication and Signal Processing (ICCCSP), pp. 1–6 (2017) 10. Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4 3
100
J. Komorniczak et al.
11. O’leary, D., Conroy, J.: Text summarization via hidden Markov models and pivoted QR matrix decomposition (2001) 12. Ozsoy, M.G., Alpaslan, F.N., Cicekli, I.: Text summarization using latent semantic analysis, 37(4) (2011) 13. Szczepa´ nski, M., Pawlicki, M., Kozik, R., Chora´s, M.: New explainability method for BERT-based model in fake news detection. Sci. Rep. 11(1), 23,705 (2021). https://doi.org/10.1038/s41598-021-03100-6 14. Wang, D., Zhu, S., Li, T., Gong, Y.: Multi-document summarization using sentence-based topic models, pp. 297–300 (2009)
QuantumSolver Composer: Automatic Quantum Transformation of Classical Circuits Daniel Escanez-Exposito(B)
and Pino Caballero-Gil
Department of Computer Engineering and Systems, University of La Laguna, 38200 Tenerife, Spain {jescanez,pcaballe}@ull.edu.es Abstract. This paper describes the implementation of a software module that allows the generation of quantum circuits from the definition of their classical analog logic circuits. This tool confers a great power of abstraction to the user, who does not need to know any concept of quantum computing to implement quantum algorithms or quantum protocols. Thus, the proposal achieves its main objective by obtaining the quantum equivalent of several classical circuits in an intuitive and didactic way. Additionally, this composer module has been added to a library developed by the authors for quantum development. This is part of a work in progress so that the implementation of some relevant cryptographic protocols is planned to demonstrate the pedagogical and abstraction potential of the developed tool. Keywords: Quantum Computing
1
· Qiskit · Quantum Cryptography
Introduction
Encouraging the study of quantum technologies by users with or without experience with computer science is one of the main purposes of the QuantumSolver library [1] developed by the authors of this work. This objective is faced through two outstanding features of the tool, which refer to the ways of accessing the developed software. Firstly, it offers a web interface where it is possible to run the available predefined algorithms and obtain the results in a visual and intuitive way. Secondly, it has a command line interface, which is more intended for users with some computer knowledge. In previous papers [2–4], QuantumSolver toolset has been progressively enriched with several quantum algorithms such as random number generation, Deutsch-Jozsa, Bernstein-Vazirani, Grover, quantum teleportation and superdense encryption, and various cryptographic protocols such as BB84, E91, B92, versions of Elgamal and quantum RSA. All of them can be run, both on simulators and on quantum computers provided by IBM, obtaining results in a simple and straightforward manner. Initially, the toolset did not distinguish between algorithms and protocols. This document proposes an improvement and extension of the library, separating c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 101–110, 2023. https://doi.org/10.1007/978-3-031-42519-6_10
102
D. Escanez-Exposito and P. Caballero-Gil
it into four different modules: QuantumSolver Basic for simple algorithms that can be represented with a parameterized circuit, QuantumSolver Subroutine for those algorithms containing a quantum subroutine, QuantumSolver Crypto for quantum cryptography protocols, and QuantumSolver Composer for the generation of quantum circuits through the definition of classical circuits. This last feature is precisely one of the new contributions of this paper. Some similar works can be found in [5–7]. However, the proposal presented in this document takes a completely different and new approach, since it provides a useful tool to learn and program about quantum circuits through classical circuits. This paper is structured as follows. Section 2 contains the theoretical basis necessary to understand the decisions made. Section 3 explains the issues concerning the application design. Section 4 describes in detail the implementation performed and the functional structure. Finally, Sect. 5 closes the paper with a brief conclusion and future work.
2 2.1
Theoretical Basis Classical Gates
A classical logic gate [8] represents a Boolean function that performs an operation on input data to output data. The domain size of an n-ary Boolean function is 2n , since there are 2n rows in its truth tables. Since there are 2 ways to fill each n output of each of the 2n possible inputs, there are 22 possible n-ary functions. Some examples of the most commonly used logic gates are: NOT. This unitary gate that carries out the negation to the bit being applied. Therefore, it has only two possible input values and two possible values. Both are perfectly identified since the function is bijective. This allows the input to be restored by reapplying, in this case, the same gate (see Table 1a). OR. This gate performs the binary sum of two bits. There are four different possible combinations of the input value that generate the result of adding the two binary input values. However, only two possible output values exist, since only one bit is returned. This is why this application is not bijective and therefore, given a value of the output, it is impossible to recover the input (see Table 1b). AND. The same considerations apply to the AND logic gate, whose operation is described by the binary multiplication operation. Note that when at least one of the two input bits is zero, the output will be zero and it will be impossible to know the values that produced it (see Table 1c). XOR. In a very similar way to the OR logic gate, it also performs a sum of the input values, in this case the sum in modulo two (1 + 1 = 0). Note that in this case, it is possible to recover the value of one of the inputs, if the value of the output and the remaining input are known (see Table 1d).
QuantumSolver Composer
103
Table 1. Classical truth tables (b) OR (a) NOT
(c) AND
(d) XOR Input0
Output
Input
Output
0
0
0
0
0
0
0
0
0
0
1
0
1
1
0
1
0
0
1
1
1
0
1
0
1
1
0
0
1
0
1
1
1
1
1
1
1
1
1
0
2.2
Input1
Input0
Output
Input1
Input0
Output
Input1
Quantum Gates
A quantum logic gate [9] can be defined from its application effect on quantum basis states. It is worth noting that, unlike what is contemplated in the classical paradigm, there are infinite quantum gates for a single qubit. All quantum logic gates, whatever their number of inputs and outputs, must maintain the information of the system, since the information is considered energy that must be conserved by the unitary transformations applied. This is a fundamental constraint imposed by quantum mechanics, which makes their design more complex than in the classical case. For the sake of simplicity, and since this module does not consider the use of qubits in superposition state, the quantum gates used will be explained in terms of classical states. Pauli-X (X). This is the quantum analog of the classical NOT gate. Therefore, it is possible to univocally know the input from the output (see Table 2a). Controlled NOT (CNOT). This gate has two distinct inputs: Control and target. The output it produces, in basic states, keeps control unchanged, while in the target it performs a negation if control is at one. Therefore, it can be interpreted as applying to the target a classical XOR between it and the control. It is a widely used gate to generate entanglement between two qubits. CNOT, like all quantum gates, must be reversible, so that given the output of the gate it must be possible to fully determine the state of the input (see Table 2b). SWAP. Exchanges the value of the input states. It can be interpreted as a wire crossing (see Table 2c). Toffoli Gate (CCNOT). Using the same approach as the CNOT gate, the CCNOT gate has two controls. Both must be at one to flip the target qubit (see Table 2d).
3
Design
Taking into account the aforementioned imposed restrictions, it can be seen how in general the number of outputs of the implemented quantum logic gates is quite likely to increase compared to classical analog gates. In terms of design, the following aspects have been taken into account fundamentally:
104
D. Escanez-Exposito and P. Caballero-Gil Table 2. Quantum truth tables (b) CNOT (a) X-Gate
Input Output |0
|1
|1
|0
Input
(c) SWAP
Output
Control Target Control Target |0
|0
|0
|0
|0
|1
|0
|1
|1
|0
|1
|1
|1
|1
|1
|0
Input1 Input0 Output1 Output0 |0
|0
|0
|0
|0
|1
|1
|0
|1
|0
|0
|1
|1
|1
|1
|1
(d) CCNOT Input
Output
Control1 Control0 Target Control1 Control0 Target |0
|0
|0
|0
|0
|0
|0
|0
|1
|0
|0
|1
|0
|1
|0
|0
|1
|0
|0
|1
|1
|0
|1
|1
|1
|0
|0
|1
|0
|0
|1
|0
|1
|1
|0
|1
|1
|1
|0
|1
|1
|1
|1
|1
|1
|1
|1
|0
– A class must be created that allows the modeling of a classical circuit within a classical circuit. – The behavior must be correct for quantum base-state input states, not for superposition states. – The user must know in advance the number of qubits (or simply bits considering the reversibility of the gates present in the library) needed in the circuit to be implemented. – For a new programmer wishing to contribute to the library, it should be easy to define new gates from the existing software. – Quantum logic gates (such as Pauli-X, CNOT, CCNOT, SWAP, etc.) shall be used to simulate classical behaviors. – The structure should be clear and easy to understand to facilitate the addition of new gates to the library. – The states of the registers must be easy to initialize by the user. – The number of gate inputs shall be equal to the number of outputs. – At least the basic classical gates (NOT, OR, AND) must be considered, from which a composition can be made to define any other gate. – It is really useful to have the universal NAND gate as well. – More complex operations, typical of real algorithms or protocols, such as shifting, swapping, copies of values, etc., must be contemplated. – Each of these complex functionalities can be implemented from the composition of the simpler ones.
QuantumSolver Composer
4
105
Implementation
A QSCircuit class derived from QuantumCircuit from IBM’s Qiskit Python library for circuit-level quantum software development has been developed. This class receives in its constructor, as a parameter, the number of qubits to be used, and for each one, a classical bit is reserved to be measured. It should be noted that the function nomenclature name_gateN has been used, where name_gate describes the gate operation and N is the number of qubits over which it is applied. At the moment, the following functionalities have been added: 4.1
Set and Reset
A set of functions related to circuit initialization has been implemented. On the one hand, the method reset1 forces the initialized qubit to the |0 state, while the method set1 establishes it to the |1 state. On the other hand, the function set_reg loads, using the previous functions, the value of the input register in the indicated qubits (see Fig. 1).
Fig. 1. Codes and circuits of reset1, set1 and set_reg (input “1101”) methods
4.2
Basic Gates
To achieve the objectives related to the easy and intuitive composition of circuits through the software currently present in the library, the following set of basic gates has been implemented:
106
D. Escanez-Exposito and P. Caballero-Gil
– not1: It performs an equivalent to the classical NOT on its single input by returning the negated value on its single output. This is possible since it is reversible, the input can be obtained univocally from the output. The only disadvantage of using this gate is that to recover the original value of the gate, it must be reapplied. However, adding a little more complexity in the design of the final circuit improves its efficiency in terms of the total number of qubits to be used (see Fig. 2).
Fig. 2. Code and circuit of not1 method
– not2: It has two inputs, that of the value to negate and an auxiliary one that will proceed to reset; and two outputs, in one of them it returns the value to negate unchanged and in the other its negated version (see Fig. 3).
Fig. 3. Code and circuit of not2 method
– or3, and3: It performs the expected operation between the inputs of the first two argument qubits and returns the output on the third. A 2-qubit version of these gates was tried, but it broke the quantum mechanical condition of reversibility, so it was not possible (see Fig. 4, 5).
Fig. 4. Code and circuit of or3 method
QuantumSolver Composer
107
Fig. 5. Code and circuit of and3 method
– nand3: From the gateit nand3 its negated version was implemented to achieve the implementation of the NAND gate, known for its universality condition, which allows expressing any classical circuit as a combination of these gates (see Fig. 6).
Fig. 6. Code and circuit of nand3 method
– xor2: It has two inputs, and its outputs are the first one unaltered and the second one the result of the XOR between the inputs. This makes the gate reversible, since the value of the second input can be recovered from the two outputs. Moreover, it can be intuited that the gate is really simple to implement by means of a single CNOT (see Fig. 7).
Fig. 7. Code and circuit of xor2 method
– xor3: To explicitly maintain the two inputs, a version of the xor gate with 3 inputs and 3 outputs has been created. It is very similar to the previous case, only with one more qubit to load the result, which is initialized to zero and receives the target of two CNOT gates, each with control on one of the inputs (see Fig. 8).
108
D. Escanez-Exposito and P. Caballero-Gil
Fig. 8. Code and circuit of xor3 method
– swap2: The classical gate coincides with the quantum gate. It needs no special considerations since it is already reversible (see Fig. 9).
Fig. 9. Code and circuit of swap2 method
4.3
Conditional Copy
The conditional copy operation clones the input record into the output record if the conditional qubit is set to one. This operation has been implemented using the Toffoli gate (see Fig. 10).
Fig. 10. Code and example circuit of copy_if method
4.4
Shifting
Left and right displacements have been implemented by using swap gates in one direction or the other (see Fig. 11).
QuantumSolver Composer
109
Fig. 11. Codes and circuits of shift_left and shift_right methods with 6 qubits
5
Conclusion and Future Work
In this work, the development of a useful module for direct translation between classical and quantum computation using the circuit paradigm has been presented. One of its objectives has been didactic, since programming using this tool or to observe a previously developed circuit is very simple. Furthermore, it helps to understand the basic relationships between qubits, using quantum gates analogous to classical ones. Although the code developed in the proposal is very easily understandable and manageable, it is not only a didactic tool, since its practical utility lies in the fact that it makes accessible challenges that at first could be considered intractable. For example, implementing AES in quantum circuits directly can be a daunting task. However, thanks to the approach of the developed tool, it can be achieved without facing the difficulties of the circuit paradigm of quantum computing. As a future work, the implementation at quantum circuit level of important cryptographic protocols, such as AES, will be done with this tool, to obtain the quantum analog and to study the resulting circuit in an educational way. Thanks to these ongoing implementations, it will be possible to perform specific performance evaluations of the developed module. In this way, different evaluation metrics endorsing the usefulness of the tool, and reveal its limitations and shortcomings. Acknowledgement. This research has been supported by the Cybersecurity Chair of the University of La Laguna and the Eureka CELTIC-NEXT project C2020/2-2 IMMINENCE funded by the Centro para el Desarrollo Tecnológico Industrial (CDTI).
110
D. Escanez-Exposito and P. Caballero-Gil
References 1. Escanez-Exposito, D.: QuantumSolver. https://github.com/jdanielescanez/quant um-solver. Accessed 06 Apr 2023 2. Escanez-Exposito, D., Caballero-Gil, P., Martin-Fernandez, F.: QuantumSolver: a quantum tool-set for developers. In: The 2022 World Congress in Computer Science, Computer Engineering, and Applied Computing. CSCE 2022, p. 149 (2022) 3. Escanez-Exposito, D., Caballero-Gil, P., Martin-Fernandez, F.: Qiskit quantum hardware testing via implementations of QKD algorithms. Conference on Cryptographic Hardware and Embedded Systems (CHES) (2022). https://ches.iacr.org/ 2022/posters/ 4. Escanez-Exposito, D., Caballero-Gil, P., Martin-Fernandez, F.: Study and implementation of an interactive simulation of quantum key distribution using the E91 cryptographic protocol. Int. Conf. Ubiquitous Comput. Ambient Intell. UCAm I 2022, 965–970 (2022) 5. Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17(6), 525– 532 (1973) 6. Muthukrishnan, A.: An introduction to quantum computing, quantum information seminar, classical and quantum logic gates (1999) 7. Swathi, M., Rudra, B.: Implementation of reversible logic gates with quantum gates. In: 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), pp. 1557–1563. IEEE (2021) 8. Tokheim, R.L.: Digital principles. McGraw-Hill, New York, N.Y (1994) 9. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information: 10th, Anniversary. Cambridge University Press, Cambridge (2011)
Bytecode-Based Android Malware Detection Applying Convolutional Neural Networks Alberto Miranda-Garcia(B) , Iker Pastor-L´opez, Borja Sanz Urquijo, Jos´e Gaviria de la Puerta, and Pablo Garc´ıa Bringas University of Deusto, Bilbao, Spain {miranda.alberto,iker.pastor,borja.sanz,jgaviria, pablo.garcia.bringas}@deusto.es
Abstract. Over the past decade, mobile devices have become an integral part of our daily lives. These devices rely on applications to deliver a diverse range of services and functionalities to users, such as social networks or online shopping apps. The usage of these applications has led to the emergence of novel security risks, facilitating the rapid proliferation of malicious apps. To deal with the increasing numbers of Android malware in the wild, deep learning models have emerged as promising detection systems. In this paper, we propose an Android malware detection system using Convolutional Neural Networks (CNN). To accomplish this objective, we trained three distinct models (VGG16, RESNET50, and InceptionV3) on the image representation of the Dalvik executable format. Our assessment, conducted on a dataset of more than 13000 samples, showed that all three models performed up to 99% of the detection of malicious Android applications. Finally, we discuss the potential benefits of employing this type of solution for detecting Android malware. Keywords: Android Bytecodes
1
· Malware detection · Deep learning · CNN ·
Introduction
In the last few decades, the use of mobile devices has increased due to the large number of conveniences they offer. There is a wide variety of operating systems on the market, but among them there are two that stand out from the rest. Android and iOS account for 99.4% of the mobile OS market in 2022 [20]. Among them, Android leads with a 71.8% and IOS with 27.6%. Due to these figures, users of these operating systems have a greater choice of mobile applications as well as new functionalities on a regular basis. This aspect also implies a series of risks to be taken into account, as the increase in malware is closely linked to this growth. OS developers provide mechanisms to address risks, such as explicit permission requests for media access control. This requires knowledge to discern c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 111–121, 2023. https://doi.org/10.1007/978-3-031-42519-6_11
112
A. Miranda-Garcia et al.
between malicious and benign applications. New malware samples are sophisticated, making it harder to identify and detect them by evading antivirus detection through code obfuscation and encryption. Over the years, malware has evolved and with it new types and families. Broadly speaking, we can define 4 categories of malware that have a higher incidence rate nowadays: Spyware, Adware, Ransomware and Banking Trojans. Spyware collects and transmits personal data to attackers, including login credentials, contacts, and browsing history, and can activate a device’s camera or microphone [15]. Adware displays unwanted ads and can slow down device performance, often bundled with free apps [6]. Ransomware encrypts data and demands payment for decryption, locking users out of their device [4]. Banking Trojans steal financial information, including credit card numbers and bank account information [3]. Openness of Android market poses higher malware risk than iOS due to thirdparty stores. Official Android store has anti-malware techniques, but unknown in third-party stores. iOS doesn’t allow third-party stores except in the EU. Coexistence of app stores in Android doesn’t have a big impact on risk. Research on preventing malware has increased using AI-based techniques, with machine learning showing high potential for detection, but requiring expertdefined features, which can be time-consuming and error-prone. Deep learning addresses this issue by learning abstract and non-linear patterns through a layered architecture, aiding generalisation and the identification of previously unconsidered features, leading to successful malware detection using image and text classification. This paper is organized as follows: in Sect. 2, we discuss related work; in Sect. 3, we present an overview of our approach; in Sect. 4, we present the experiments carried out and we discuss our experimental results; finally, in Sect. 5, we draw conclusions and present possible further work.
2
Related Works
Since its launch in 2008, Android, the most popular mobile operating system, has been a prime target for malware creators [13]. Notably, DroidDream, discovered in 2010, could gain root access, steal user data, and install additional malware [13]. The discovery of GingerMaster in 2011 showed similar abilities [10]. Over the years, malware families such as HummingBad, CopyCat, and Xavier emerged, performing various malicious activities, including stealing user data and displaying ads. Recent years have seen more sophisticated Android malware, including banking trojans like Anubis, Cerberus, and Alien, which target banking apps and steal sensitive information [9]. Malware comes in various forms, including viruses, worms, trojans, and ransomware. Security researchers use two main methods to analyze and combat malware: static and dynamic analysis [14,22]. Static analysis examines the structure, content, and behavior of malware code without executing it, using tools such as disassemblers, debuggers, and manual inspection by experienced researchers
Bytecode-Based Android Malware Detection
113
[22]. Dynamic analysis involves executing malware in a controlled environment to observe its behavior, identifying capabilities such as files created, network connections made, and system changes attempted [14]. While static analysis is useful for identifying known malware and vulnerabilities, dynamic analysis is more effective for detecting new or unknown malware [5]. There is a large body of research on machine learning approaches for malware detection, both with dynamic analysis [18,21] and static analysis [23,24]. Several classifiers, including support vector machines (SVM) [1], Naive Bayes [8], and k-Nearest Neighbor [19], have been employed in other approaches to static malware detection using manually derived attributes including API calls, intents, permissions, and commands. Researches that have applied this approach of using only characteristics such as permissions [2,11,17] have obtained favorable results. Other approaches have opted to make use of more low-level features such as operational codes. [16].
3
Materials and Methodologies
In this work we propose a malware detection method applying a convolutional neural network. In the following section, we provide a detailed description of the dataset used in our research, and the transformation process we carried out. The two phases of the transformation process are defined, how an Android application is disassembled to give a sequence of raw Dalvik byte-codes, and then explain how this byte-code sequence is transformed and then processed by the convolutional network. 3.1
Experimental Dataset
Our research used a comprehensive dataset of Android APKs, named CICMalDroid. This dataset is a research initiative [12] that involves the dynamic analysis of Android samples using CopperDroid, a virtual machine introspection (VMI) based dynamic analysis system. The aim of this initiative is to automatically reconstruct low-level OS-specific and high-level Android-specific behaviors of the Android samples. In the course of this endeavor, a total of 17,341 samples were collected and analyzed. Of these samples, 13,077 were successfully executed, while the remaining samples failed due to various errors such as time-out, invalid APK files, and memory allocation failures. The successful execution of the majority of the collected samples demonstrates the efficacy of the approach adopted by CICMalDroid and provides valuable insights into the behavior of Android malware. The dataset contains all samples classified into the following categories: Adware, Banking malware, SMS malware, Riskware, and Benign. Due to the small number of samples in some of the categories, it has been decided to work initially on two main categories, benign and, in a unified way, malware. Thus, our dataset consists of 5192 benign samples and 7581 malware samples. We used 70% for training, 20% for testing, and 10% for the model validation process.
114
3.2
A. Miranda-Garcia et al.
Android Bytecodes to Image Transformation
The source code of Android applications is not commonly available, so a common practice is to analyse the bytecode of the app. The bytecode is contained in a Dalvik executable file with a “dex” extension. The “dex” (Dalvik Executable) files of Android apps contain compiled code that is executed by the Dalvik virtual machine, which is the runtime environment used by Android. The “dex” file format is optimized for small size and efficient execution on mobile devices with limited resources. This file contains all the information about classes, methods, strings, etc. maintaining always the same structure. “dex” files are compressed together with other relevant files such as resources, a folder with compiled code, libraries, etc. Our method is based on the graphical representation of all the information in the application. To do this, the “dex” file is extracted from the application. We perform a dump of the binary file, carrying out a conversion of the entire byte stream to decimal. The values obtained are between 0 and 255. Each value obtained has been used to generate an RGB representation, the value being a specific RGB channel. There are a multitude of possible graphical representations. Studies have shown that CNNs perform worse on grayscale images, so our method proposes to generate graphical representations in other scales. Our methodology is shown in Fig. 1.
Fig. 1. Data transformation
The “dex” files are structured sequentially with the different components of the application, starting with the header, followed by the IDs, strings, etc.
Bytecode-Based Android Malware Detection
115
Depending on the size of the application, in the graphical representations, the components will be placed at different heights. That is why, in order to give more relevance to the strings (ASCII values between 31 and 127), they have been represented in another RGB channel different from the rest of the values. Our approach involves dividing the entire data stream into x subsequences, √ where x is determined by the pseudo formula x = ceil( y) and y represents the length of the data stream. The function ceil(), short for “ceiling”, is a mathematical function that rounds a given number up to the nearest integer greater than or equal to that number. Our goal is to generate a matrix, each subsequence is treated as a row, resulting in a matrix representation of the “dex” file that is x ∗ x in size. To account for subsequences that are shorter than x, we use the Zero Padding technique [7] at the end of those subsequences. All these elements in the matrix represent one byte of the “dex” file by means of a list of 3 values corresponding to the RGB channels. The last step is the generation of the image, for which each element of the matrix is transformed into a pixel with the corresponding colour with the RGB values. The image representation comprising two distinct samples, one classified as malware and the other benign, is shown in Fig. 2.
Fig. 2. Left Bening Sample & Right Malware Sample
3.3
CNN Image Classification
Deep learning models have been used to address the problem of Android malware detection. The VGG16, RESNET50, and InceptionV3 models are among the most widely used models for image classification tasks and have been applied in this context. These models have complex architectures that allow them to learn abstract features from images and detect patterns in the input data. VGG16 has a deep and complex architecture that enables it to extract complex features from images. It is composed of 16 convolution and pooling layers, followed by three fully connected layers. VGG16 can learn image features at different levels of abstraction, from low-level features such as edges and contours to high-level features such as the shape and texture of objects. RESNET50 addresses the degradation problem in deep neural networks. This problem refers to the decrease in performance of a neural network as its depth
116
A. Miranda-Garcia et al.
increases. RESNET50 uses residual connections, which allow the network to skip layers and transmit information from earlier layers to later layers. This reduces performance degradation and allows the network to be deeper. InceptionV3 is widely used due to its computationally efficient architecture. It uses Inception modules, which combine convolution layers of different filter sizes in parallel. This allows the network to capture features of different scales and levels of abstraction efficiently and without a high computational cost. InceptionV3 uses regularisation techniques such as data augmentation and dimensionality reduction to avoid overfitting and improve model generalization. It also uses depth-separable convolution layers, which are more computationally efficient than traditional convolutional layers. In summary, VGG16, ResNet50, and InceptionV3 are some of the best options for pattern detection in images due to their innovative and computationally efficient architectures, their ability to efficiently capture features of different scales and levels of abstraction, their generalization capability to detect patterns in different types of images, their use of regularization techniques to prevent overfitting, and their availability as pre-trained models.
4
Experiments and Results
In the Experiments section, we provide a detailed description of the evaluation metrics used to measure the performance of our models. Additionally, we describe how they were calculated to ensure that our results are both reliable and valid. Overall, the Experiments section provides a comprehensive overview of the methodology used in our study, which is essential for readers to understand the validity and reliability of our findings. 4.1
Evaluation Method
We define the use of the following metrics to assess the performance of the proposed method: Precision, Recall, and F1 Score. Precision is defined as the number of true positives divided by the sum of true positives and false positives. It measures the proportion of positive predictions that are actually correct. This metric is important when the cost of a false positive is high, as it ensures that the model is correctly identifying the positive cases. Recall, on the other hand, is defined as the number of true positives divided by the sum of true positives and false negatives. It measures the proportion of actual positive cases that are correctly identified by the model. This metric is important when the cost of a false negative is high, as it ensures that the model is correctly identifying all positive cases. The F1 score is the harmonic mean of precision and recall and provides a way to balance both metrics. It is calculated as shown in Eq. 1. This metric is useful when precision and recall are both important and need to be balanced. F 1s. =
2 ∗ (precision ∗ recall) (precision + recall)
(1)
Bytecode-Based Android Malware Detection
4.2
117
Experiments
In this study, we explored the use of transfer learning to fine-tune pre-trained neural networks for our specific task. We conducted three separate experiments, each utilizing a different pre-trained network and training process. Firstly, we employed the VGG16 network and trained it over 15 epochs, with 77 iterations each, using the SGD optimizer and a learning rate of 0.001. The result was an impressive accuracy of 98.05% during the training process. Secondly, we utilized the RESNET50 network and carried out a transfer learning for our task, training it for 15 epochs with 69 iterations each, using the Adam optimizer and a learning rate of 0.01. The accuracy achieved during this training process was 97.34%. Lastly, we conducted an experiment with the InceptionV3 pre-trained network and fine-tuned it using transfer learning. The training process was carried out for 10 epochs, with 69 iterations each, using the SGD optimizer and a learning rate of 0.001. The resulting accuracy achieved during this training process was 97.48%. Overall, these experiments demonstrate the effectiveness of transfer learning and the importance of choosing an appropriate pre-trained network and training process for a specific task. 4.3
Results
Once we completed the training process of the three models, we carried out the validation process. During this process, we fed the models with a new set of data and obtained the prediction results. We then used these results to extract the prediction correlation matrix for each model. The correlation matrix helps to evaluate the accuracy and consistency of the model’s predictions across the validation data set. The correlation matrix for the three models is shown in Fig. 3.
Fig. 3. Performance comparison: Classification confusion matrix
Additionally, to further assess the performance of the models as mentioned in the previous section, we used Recall, Precision, and F1 Score as metrics. By using these metrics, we can gain a deeper understanding of the model’s strengths
118
A. Miranda-Garcia et al.
and weaknesses and identify which model performs best on the validation data set. Table 1 shows the values for each of the models utilized. Based on the F1 Score, precision, and recall metrics obtained for the three models, it appears that VGG16 has performed the best overall. It achieved the highest F1 Score of 0.995101, indicating that it has a good balance between precision and recall. VGG16 had the highest recall score of 0.997195, indicating that it correctly classified a higher percentage of actual positives than the other two models. Resnet50 had the lowest recall score of 0.987377, indicating that it had a higher false negative rate compared to the other models. Table 1. Models Validation Results Recall
Precision F1 score
VGG16
0.997195 0.993017 0.995101
Resnet50
0.987377 0.965706 0.976422
InceptionV3 0.990182 0.967123 0.978517
On the other hand, InceptionV3 had quite similar results to Resnet50, indicating that it had a high rate of false positives compared to VGG16. When comparing RESNET50 and InceptionV3, we observed a slight improvement in the false negative rate, as InceptionV3 displayed better recall results. While it is challenging to determine which is more crucial in our case, false positives or false negatives, it is generally preferable to prevent the use of an application due to a false positive rather than risking device infection due to a false negative. Thus, we highlight InceptionV3 over RESNET50 due to its superior performance in recall, albeit a minor improvement. Overall, it is important to consider both precision and recall when evaluating a model’s performance. The F1 Score takes into account both of these metrics, providing a more comprehensive evaluation of the model’s performance.
5
Conclusions and Further Work
In this contribution, we present the performance of three specific Convolutional Neural Networks (CNN) models, such as VGG16, RESNET50, and InceptionV3, as detection models for Android malware. Our evaluation, which was conducted on a dataset of 13,000 applications, demonstrates that VGG16 model has the best-performing score up to 99% based on the F1 score, precision, and recall metrics. Another notable strength of VGG16 is its recall score, which indicates a low false negative rate of malicious app detection. The effectiveness of these models further confirms the value of Android’s graphical representation of bytecodes, emphasizing that the conversion process from DEX files to images is executed without any loss of information.
Bytecode-Based Android Malware Detection
119
In conclusion, our research proved the viability of using Convolutional Neural Networks (CNN), specifically VGG16, RESNET50, and InceptionV3, to detect malware on Android by classifying images containing transformed bytecode sequences from the DEX file of the applications. Through the use of transfer learning and fine-tuning, we were able to achieve high accuracy rates in detecting malware samples, demonstrating the potential for this methodology to be applied in real-world scenarios. Finally, we hope our findings contribute to the ongoing efforts to improve malware detection and prevention for the Android ecosystem. In future work, we plan to explore the explainability of these networks to better understand their decision-making processes. This will allow us to classify and understand new malware mutations and address the issue of code obfuscation. Furthermore, we aim to test its effectiveness in detecting the various subcategories of malware, which could represent an advancement in the ability to classify and categorize malware samples.
References 1. Arp, D., Spreitzenbarth, M., H¨ ubner, M., Gascon, H., Rieck, K.: Drebin: effective and explainable detection of android malware in your pocket. In: Proceedings 2014 Network and Distributed System Security Symposium. Internet Society (2014). https://doi.org/10.14722/ndss.2014.23247 2. Aung, Z., Zaw, W.T.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2, 228–234 (2013) 3. Chanajitt, R., Viriyasitavat, W., Choo, K.K.R.: Forensic analysis and security assessment of android m-banking apps. Aust. J. Forensic Sci. 50(1), 3–19 (2018). https://doi.org/10.1080/00450618.2016.1182589 4. Chen, J., Wang, C., Zhao, Z., Chen, K., Du, R., Ahn, G.J.: Uncovering the face of android ransomware: characterization and real-time detection. IEEE Trans. Inf. Forensics Secur. (2018). https://doi.org/10.1109/TIFS.2017.2787905 5. Damodaran, A., Troia, F.D., Visaggio, C.A., Austin, T.H., Stamp, M.: A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hacking Tech. 13(1), 1–12 (2017). https://doi.org/10.1007/s11416-015-0261-z 6. Erturk, E.: A case study in open source software security and privacy: android adware. In: World Congress on Internet Security (WorldCIS-2012) (2012) 7. Hashemi, M.: Enlarging smaller images before inputting into convolutional neural network: zero-padding vs. interpolation. J. Big Data 6(1), 98 (2019). https://doi. org/10.1186/s40537-019-0263-7 8. Hegedus, J., Miche, Y., Ilin, A., Lendasse, A.: Methodology for behavioral-based malware analysis and detection using random projections and k-nearest neighbors classifiers. In: 2011 Seventh International Conference on Computational Intelligence and Security, pp. 1016–1023 (2011). https://doi.org/10.1109/CIS.2011.227 9. Iadarola, G., Martinelli, F., Mercaldo, F., Santone, A.: Formal methods for android banking malware analysis and detection. In: 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS), pp. 331–336 (2019). https://doi.org/10.1109/IOTSMS48152.2019.8939172
120
A. Miranda-Garcia et al.
10. Jeong, Y.s., Lee, H.t., Cho, S.j., Han, S., Park, M.: A kernel-based monitoring approach for analyzing malicious behavior on android. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 1737–1738. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10. 1145/2554850.2559915 11. Khariwal, K., Singh, J., Arora, A.: Ipdroid: android malware detection using intents and permissions. In: 2020 4th Conference on Smart Trends in System Security and Sustainability (WorldS4), pp. 197–202 (2020). https://doi.org/10.1109/ WorldS450073.2020.9210414 12. Mahdavifar, S., Abdul Kadir, A.F., Fatemi, R., Alhadidi, D., Ghorbani, A.A.: Dynamic android malware category classification using semi-supervised deep learning. In: 2020 IEEE International Conference on Cyber Science and Technology Congress (CyberSciTech) (2020). https://doi.org/10.1109/DASC-PIComCBDCom-CyberSciTech49142.2020.00094 13. Martinelli, F., Mercaldo, F., Nardone, V., Santone, A.: Twinkle twinkle little droiddream, how i wonder what you are? In: 2017 IEEE Workshop on Metrology for AeroSpace (MetroAeroSpace) (2017). https://doi.org/10.1109/MetroAeroSpace. 2017.7999579 14. Or-Meir, O., Nissim, N., Elovici, Y., Rokach, L.: Dynamic malware analysis in the modern era-a state of the art survey. ACM (2019). https://doi.org/10.1145/ 3329786 15. Saad, M.H., Serageldin, A., Salama, G.I.: Android spyware disease and medication. In: 2015 Second International Conference on Information Security and Cyber Forensics (InfoSec), pp. 118–125 (2015). https://doi.org/10.1109/InfoSec. 2015.7435516 16. Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, B., Laorden, C., Bringas, P.G.: Idea: Opcode-Sequence-Based Malware Detection. In: Massacci, F., Wallach, D., Zannone, N. (eds.) ESSoS 2010. LNCS, vol. 5965, pp. 35–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11747-3 3 ´ 17. Sanz, B., Santos, I., Laorden, C., Ugarte-Pedrero, X., Bringas, P.G., Alvarez, G.: PUMA: permission usage to detect malware in android. In: International Joint Conference CISIS’12-ICEUTE’12-SOCO’12 Special Sessions, pp. 289–298. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-33018-6 30 18. Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., Weiss, Y.: Andromaly: a behavioral malware detection framework for android devices. J. Intell. Inf. Syst. 38(1), 161–190 (2012). https://doi.org/10.1007/s10844-010-0148-x 19. Sharma, A., Dash, S.K.: Mining API calls and permissions for android malware detection. In: Gritzalis, D., Kiayias, A., Askoxylakis, I. (eds.) Cryptology and Network Security, pp. 191–205. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-12280-9 13 20. Statista: global mobile OS market share 2023 (2023). https://www.statista.com/ statistics/272698/global-market-share-held-by-mobile-operating-systems-since2009/ 21. Su, X., Chuah, M., Tan, G.: Smartphone dual defense protection framework: detecting malicious applications in android markets. In: 2012 8th International Conference on Mobile Ad-hoc and Sensor Networks (MSN) (2012). https://doi.org/10. 1109/MSN.2012.43 22. Vidyarthi, D., Kumar, C., Rakshit, S., Chansarkar, S.: Static malware analysis to identify ransomware properties. Int. J. Comput. Sci. Issues 16(3), 10–17 (2019). https://doi.org/10.5281/zenodo.3252963
Bytecode-Based Android Malware Detection
121
23. Wu, D.J., Mao, C.H., Wei, T.E., Lee, H.M., Wu, K.P.: DroidMat: android malware detection through manifest and API calls tracing. In: 2012 Seventh Asia Joint Conference on Information Security, pp. 62–69 (2012) 24. Yerima, S.Y., Sezer, S., McWilliams, G., Muttik, I.: A new android malware detection approach using Bayesian classification. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 121– 128 (2013). https://doi.org/10.1109/AINA.2013.88
Prediction of Water Usage for Advanced Metering Infrastructure Network with Intelligent Water Meters Łukasz Saganowski and Tomasz Andrysiak(B) Institute of Telecommunications and Computer Science, Bydgoszcz University of Science and Technology, Al. Prof. S. Kaliskiego 7, 85-796 Bydgoszcz, Poland {luksag,andrys}@pbs.edu.pl
Abstract. In this article, we present the solution of a prediction algorithm, applied to real world Advanced Network Infrastructure, consisting of intelligent water meters. In the first step of the algorithm time series are passed to outlier detection in order to remove possible disturbing values. The prediction process is carried out with the use of four types of machine learning and deep learning algorithms. The proposed solution based on real world univariate time series taken from multifamily houses is evaluated. The experimental results confirm that the presented solutions are both efficient and flexible. Keywords: Intelligent water meters · univariate time series analysis · outliers detection · prediction · neural networks · critical waterworks infrastructure
1 Introduction One of components of Smart Cities (an idea that is constantly introduced and improved as a part future city vision) are Intelligent Water Systems (IWSs). In this context, water network intelligence are to be assessed in Automatic Meter Reading (AMR) along with machine learning methods to analyze and predict registered time series, possibly in real time [1]. The solution we can find in Advanced Metering Infrastructure (AMI) within which at the lowest level there exist intelligent meters – the latest generation of water consumption measurement system providing two-way remote communication, along with data collection system. Such solutions should ensure accurate measurement of water consumption, integrated data transfer system and information environment adjusted to the amount of processed data. Moreover, they should provide the effective system of Automated Meter Management (AMM). The central part of above-mentioned AMI is Meter Data Management (MDM) or Mater Data Management System (MDMS) [2]. Prediction of univariate time series with the use of classic machine learning and deep learning algorithms isn’t an easy and obvious task. It comes from the fact that usage of more sophisticated deep learning algorithm isn’t the key to success in achieving good
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 122–131, 2023. https://doi.org/10.1007/978-3-031-42519-6_12
Prediction of Water Usage for Advanced Metering Infrastructure Network
123
predictions for univariate time series. Classical predictions algorithms like ETS Exponential Smoothing and ARMIA Auto Regressive Integrated Moving Average may still achieve better results for e.g. one-step predictions for selected type of time series [3]. But machine learning and deep learning algorithms may achieve better results in applications where examined signal has complex, sophisticated and time varying structure, strong noise, missing observations (like in our case of AMI water meters network) or relationship between time series [4]. That’s why there is a room for further investigations in subject of predictions applied to such types of univariate time series. In our case we have to choose algorithm with attention because it is easy to achieve not satisfactory results of predictions. This paper is organized as follows: after the introduction, Sect. 2 presents in detail: forecasting of time series for water flow consumption, and calculation methodology of the neural networks. The article finishes with experimental results and conclusions.
2 Prediction Algorithm for Water Usage Univariate Time Series in AMI Network Proposed solution of prediction algorithm was applied to real world AMI Advanced Network Infrastructure consisting of intelligent waters meters. Major steps of prediction algorithm were presented in Fig. 1. Every water meter consist of LTE/GPRS communication module for sending telemetric measurements consisting of water flows.
Fig. 1. Major steps of prediction algorithm for AMI Advanced Metering Infrastructure with intelligent water meters.
Additionally water meters consist of micro water power turbine that charge super capacitors and accumulators. Measurements from AMI network are sent to distant server
124
Ł. Saganowski and T. Andrysiak
where water flow measurements are aggregated into univariate time series (series where values are appears in constant time period). These time series are used by prediction algorithm. In the first step of algorithm time series are passed to outlier detection (see Sect. 2.1) in order to remove possible disturbing values. Such a samples have a bad impact on prediction especially on algorithms based on machine learning techniques. Also learning process of neural networks can also be disturbed and will have negative impact on quality of obtained model. 2.1 Outliers’ Detection and Elimination Based on Isolation Forest Algorithm In the proposed solution, the identification of outliers in the analyzed time series of water utilization is performed with the use of the Isolation Forest (IF) algorithm [5]. There are two stages of anomaly detection using the IF algorithm. The first of them, named the training step, builds isolation trees by recursively separating the training set until an instance is outlying or a tree height is reached. Note that the tree height limit is automatically derived from the subsample size, which is roughly of the average tree height. The reason the trees grow to an average height is that we are only concentrating on data points that are shorter than the standard path length because such points are more vulnerable to anomalies. The second that we call the testing stage, is based on running the test instances through the isolation trees to acquire an anomaly score for each example. The detected anomalies depend on the expected path length for each test instance. Contrary to the above mentioned conclusions, path lengths are determined by traversing each individual tree in the isolated forest example. Probing for the highest anomalies simply sorts the data in dropping order. The first cases show the biggest anomalies. Detecting and eliminating such outliers from the analyzed data set is the basis for correct forecast procedure. 2.2 The Water Flows Time Series Forecasting Using Neural Networks In the forecasting processes of the analyzed time series’ future values more and more often there are used approaches based on different type of artificial neural networks, which acquire ability to predict due to the course of learning processes. The Multilayer Perceptron Neural Network Multilayer Perceptron (MLP) is one of the most common neural network that most of the researches focus on. It is comprised of a series of completely interconnected layers of nodes where the connections are present only between adjacent layers. The first layer’s (the input layer) input is composed of various attribute values. The output of the nodes in the input layer, multiplied with the weights attached to the links, is passed to the nodes in the hidden layer. Next, a hidden node collects the incoming weighted output values of the previous layers. Moreover, it also receives the weighted value of a bias node. The total of the weighted input values is provided through a nonlinear activation function. The only prerequisites are that the output values of the function are bounded to an interval and that the nonlinear function can be differentiable. The output of a node in the hidden layer is fed into the nodes of the output layer. To each node in the output layer a class label is attached [6].
Prediction of Water Usage for Advanced Metering Infrastructure Network
125
The forecasting process of future values of the analyzed time series is realized in two stages. In the first one, we examine and test proper operation of the constructed MLP network on the basis of learning and testing data sets. In the learning processes we usually use the back-propagation method to minimize prediction error. The role of testing tasks, on the other hand, is verification whether the learning processes are running correctly. In the second stage, we feed the learned MLP network with values of the analyzed time series in order to predict its future values [7]. The One Dimensional Convolutional Neural Network The model of one-dimensional convolutional neural networks (1D CNNs) [8] are a specific type of Artificial Neural Networks (ANNs) that focus on many deep layers of neural networks that use convolutional strategies and pools [9]. 1D CNN typically uses two- or three-dimensional neural layers to solve prediction problems that take inputs in the same dimension to undergo a feature learning process. Creating a CNN starts with the configuration of external parameters called hyper-parameters [10]. Adjusting the hyperparameters is important to get a better model with high predictive power. It should be noted that compact 1D CNNs show better performance in those applications that have limited labeled data and large signal fluctuations. Each feature and unit extraction is split into a single process that can be enabled to maximize classification efficiency. This is the main advantage of 1D CNN, which can also include a small computational complexity, since the only operation is cost is a sequence of 1D convolutions, which are just linearly weighted sums of a 1D array. Such a linear operation during back-propagation operations can be performed [11]. The Long Short Term Memory Network and Encoder - Decoder Deep Learning Algorithm Among numerous types of artificial neural networks, the most effective tools for forecasting future values of time series are currently becoming neural networks in which there is a feedback loop. The loop should undergo the same rules that all the entrances do, i.e. weighting and backward error propagation. Then, the state of individual neurons depends not only on input data but also on the previous state of network, which theoretically enables to keep information within the network’s structure between consecutive iterations, so in a way it works as some kind of memory. If we imagined the scheme of operation of such a network, then in a moment of time each neuron with a feedback loop would be a kind of expansion of all its states, corrected by weighting factors. Then they multiply, and if the chain is long, the result of such an operation quickly approaches zero or infinity. This phenomenon, known as gradient decay or explosion, in practice means that for a longer sequence, the network is unable to learn any valuable data [12]. The solution to this problem was published in [13]. It consisted in integrating blocks with non-volatile memory and additional switches controlling the data flow in the structure of each neuron. The new network was dubbed Long Short-Term Memory, a short-range long-term memory network that has become impervious to “forgetting” and remembering patterns. In the process of LSTM neural network model compilation we use mean square error as a loss function and Adaptive Monument Estimation (ADAM) for optimization algorithm [14]. Then, we use one-dimensional time series for supervised learning process of
126
Ł. Saganowski and T. Andrysiak
the LSTM neural network model. In the process of adapting the model we use LSTM neural network for prediction of future values of the analyzed time series. A detailed description of the methods and techniques used to estimate the proposed LSTM network, such as the selection of parameters, network structure and description of learning processes, can be found in [15]. It also describes the subsequent stages of the required data transformation adopted for the proposed solution. The deep learning LSTM codec model was initially developed for natural language processing problems. The novelty of this approach stems from the fact that the model architecture uses a fixed-size internal representation at the core of the algorithm. This architecture consists of a first model for reading an input sequence and encoding that sequence into a fixed size vector. Another model is responsible for decoding the constantsize vector and calculating the predicted sequence [14, 15]. 2.3 The Condition of Neural Network MODEl’s Update It is highly probable that the nature and character of water flow implies the possibility of significant data variability in the analyzed time series. The reasons for this phenomenon should be sought in possible seasonal changes in water consumption by residents or the occurrence of unpredictable external factors (e.g. pandemic). Therefore, the following statistical condition should be formulated, the fulfillment of which should trigger the procedure of recursive learning of the neural network for a new data set / (μ − 3σ, μ + 3σ ) i = 1, 2, . . . , n, xi ∈
(1)
where {x 1 , x 2 , …, x n } is time series limited by n elements’ analysis window, μ is mean estimated from forecasts of the neural network in the analysis window, and σ is standard deviation of elements of the examined time series in reference to such mean.
3 Experimental Results Prediction of WF water flows time series in proposed algorithm were obtained for real world experimental AMI network of intelligent water meters network (see Fig. 2). Network consist of 100 intelligent water meters localized in two distant cities in Poland – ´ Bydgoszcz (46 m) and Swidnica (54 m). IP packets with water flow measurements are gathered by distant server which is responsible for creation of univariate time series (series where values are appears in constant time period), online model creation in sliding windows of water flow samples and 3 step ahead predictions. We examined four types machine learning and deep learning algorithms: MLP, 1DCNN, LSTM and Encoder – Decoder LSTM deep learning algorithm. We chose these machine learning algorithms because of their potential abilities in predicting time series. Selecting and checking algorithms for univariate time series is important step because using deep learning algorithms isn’t a key in achieving good results in comparing to for example to classic prediction algorithms like ARIMA or ARFIMA [16]. Taking into account for example Encoder-Decoder LSTM model we first made a mandatory steps like splitting input time series into desired data shape. Next we defined encoder with
Prediction of Water Usage for Advanced Metering Infrastructure Network
127
Fig. 2. Block scheme of test bed where water usage univariate time series are aggregated from intelligent water meters localized in two AMI networks spread into two cities.
LSTM hidden layer (we checked configuration from 100 to 200 units). As a activation function we used Rectified Linear Unit (ReLU). Next sequence of vectors will be passed to the LSTM decoder (with LSTM hidden layer consisting from 100 to 200 units). At the end we define output model. Each time step from the decoder is processed by fully connected layer and output layer. For prediction we used 10 input time steps and 3 steps out. Traffic for predictions is taken from one multi-family house. WF water flows [l/min] representing water usage is gathered in one minute time intervals (some samples of water flow has 0 values in case of 0 [l/min] water usage or when there was no communication). Univariate time series used for predictions where divided into one day periods vectors of values (example time series are presented in Figs. 3 and 5). We selected random days representing water usage in winter, spring and summer in order to evaluate performance of four machine learning algorithms. After process of outlier detection (see Sect. 2.1) we recalculate online models for a given algorithms and subsequently calculate 3 step predictions. Augmented 3 step predictions for a given algorithm are presented in Figs. 4(a–d) and 6(a–d). We can see that usually better predictions are achieve for first two predictions steps and worst for third prediction step. It comes from the fact that in case of more distant prediction steps prediction errors are accumulating. Additionally we calculated prediction intervals (see Sect. 2.3) which can be used by water supplier company maintenance staff to evaluate state of examined infrastructure or for example in order to detect water leaks. In the predicted signal, we can notice a certain cyclicality due to the daily life cycle of people and the water intake correlated with this fact. At the same time water flow signal consist of sophisticated and time varying structure, strong noise and missing observations. This causes more problems in selecting proper machine learning algorithm for prediction purposes in smart water meters network. In order to evaluate prediction accuracy of examined algorithms we used two measures: Root Mean Square Error (RMSE) and Scatter Index (SI[%]) [16]. Lower values for both measures represents better prediction accuracy. In case of Scatter index SI[%] (defined as a RMSE divided by mean of the observations times 100 [%] [17]) we can
128
Ł. Saganowski and T. Andrysiak
Fig. 3. Example of WF water flow l/min traffic feature representing one day (in winter) water usage for single multi family house together with 3 step predictions (green dots) for EncoderDecoder LSTM based algorithm.
Fig. 4. Enlarged view for 3 step predictions calculated for time series from Fig. 3 by different machine learning algorithms, a) LSTM, b) Encoder-Decoder LSTM, c) 1D-CNN, d) MLP.
state that values approximately up to 20 of SI means that model is good enough for prediction (boundary values for model usage depends on of course on required accuracy for a given application). In Table 1 we can see prediction accuracy for machine learning and deep learning algorithms achieved for MLP, 1D- CNN, LSTM and Encoder-Decoder LSTM neural networks models. Predictions were obtained for one multi-family house and time series represents randomly selected one day from winter months, one day from spring months and one day from summer months. We can observe that in average the best results were achieved for Encoder-Decoder LSTM subsequently for stacked LSTM (model where are multiple hidden LSTM layers and multiple memory cells in every layer), 1D
Prediction of Water Usage for Advanced Metering Infrastructure Network
129
Fig. 5. Example of WF water flow l/min traffic feature representing one day (in summer) water usage for single multi family house together with 3 step predictions (green dots) for EncoderDecoder LSTM based algorithm.
Fig. 6. Enlarged view for 3 step predictions calculated for time series from Fig. 5 by different machine learning algorithms. a) LSTM b) Encoder-Decoder LSTM c) 1D-CNN d) MLP.
CNN model and in the end MLP Multilayer Perceptron network model. Lowest values of RMSE (0.359) and SI index (8.30[%]) were achieved by Encoder-Decoder LSTM network and stacked LSTM RMSE (0.614) and SI index (14.18%). Additionally small absolute values of WF water flow [l/min] time series obtained from one multi-family house causes that it is more difficult to achieve acceptable level of prediction errors than in case where time series values are significantly bigger (e.g. 10 times or more bigger). To sum up machine learning and deep learning algorithms can be used for prediction of time series that has irregular temporal structure, noisy character or missing values. Achieved results can be useful for maintenance staff of water delivery company for estimating the state of infrastructure or for water leakage searching purposes. Solution was
130
Ł. Saganowski and T. Andrysiak
Table 1. Prediction accuracy comparisons for WF water flow [l/min] time series representing one random day in winter, spring or summer calculated for LSTM, Encoder-Decoder LSTM, 1D-CNN and MLP neural networks models. Comparisons are based on 4 consecutive runs of algorithm. Calculated parameters list consist of RMSE and SI. Prediction for WF water flow l/min time series
LSTM RMSE
Encoder – Decoder LSTM RMSE
1D CNN RMSE
MLP RMSE
LSTM SI[%]
One day in 0.614 winter
0.359
0.814
0.811
14.18
One day in 1.05 spring
1.343
1.126
1.424
One day in 0.773 summer
0.393
1.08
1.219
Encoder Decoder LSTM SI[%]
1D CNN SI[%]
MLP SI[%]
8.30
18.80
18.72
15.75
20.15
16.90
21.37
21.09
10.74
29.46
33.24
evaluated with real world AMI Advanced Metering Infrastructure Network consisting of intelligent water meters and can be extended for usage to nodes where water flows are summing up from many installations.
4 Conclusion In article we proposed solution for prediction of WF water flows in AMI Advanced Metering Infrastructure consisting of intelligent water meters. Algorithm was evaluated based on real world univariate time series taken from multi-family house. Prediction of univariate time series by means of machine learning and deep learning algorithms needs still more investigations because performance of achieved predictions depends on character of signals representing univariate time series and such an algorithms are not always the best choice for every application. We examined four machine learning and deep learning algorithms: MLP, 1D-CNN, LSTM and Encoder – Decoder LSTM. The best average results of predictions measured by means of RMSE and Scatter Index SI [%] were obtained for Encoder – Decoder LSTM RMSE (0.359) and SI index (8.30[%]) and stacked LSTM RMSE (0.614) and SI index (14.18%). Proposed solution can be adapted by water delivery companies for estimating the state of infrastructure or for water leakage searching process.
References 1. Billewicz, K.: Smart Metering. PWN (2020) 2. Blokdyk, G.: Advanced Metering Infrastructure (AMI), 5STARCooks (2022) 3. Makridakis, S., Spiliotis, E., Assimakopoulos, V.: Statistical and machine learning forecasting methods: concerns and ways forward. PLoS ONE 13(3), e0194889 (2018)
Prediction of Water Usage for Advanced Metering Infrastructure Network
131
4. Makridakis, S., Spiliotis, E., Assimakopoulos, V.: The M4 competition: results, findings, conclusion and way forward. J. Forecast. 34(4), 802–808 (2018) 5. Fei, T.L., Kai, M.T., Zhi-Hua, Z.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008) 6. Makridakis, S., Spiliotis, E., Assimakopoulos, V.: Statistical and machine learning forecasting methods: concerns and ways forward. PLoS ONE 13, e0194889 (2018) 7. Md Shiblee, P.K., Kalra, B.C.: Time series prediction with multilayer perceptron (MLP): a new generalized error based approach. In: Köppen, M., Kasabov, N., Coghill, G. (eds.) ICONIP 2008. LNCS, vol. 5507, pp. 37–44. Springer, Heidelberg (2009). https://doi.org/10. 1007/978-3-642-03040-6_5 8. Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M.: 1D convolutional neural networks and applications: a survey. Mech. Syst. Signal Process. 151, 107398 (2019) 9. Yoo, Y.: Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches. Knowl.-Based Syst. 178, 74–83 (2019) 10. Aszemi, N.M., Dominic, P.: Hyperparameter optimization in convolutional neural network using genetic algorithms. Int. J. Adv. Comput. Sci. 10, 269–278 (2019) 11. Harbola, S., Coors, V.: One dimensional convolutional neural network architectures for wind prediction. Energy Convers. Manage. 195, 70–75 (2019) 12. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994) 13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 14. Brownlee, J.: Long Short-Term Memory Networks with Python, Develop Sequence Prediction Models With Deep Learning. Machine Learning Mastery (2018) 15. Cho, K., et al.: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734. Association for Computational Linguistics, Doha (2014) 16. Du, S., Li, T., Yang, Y., Horng, S.J.: Multivariate time series forecasting via attention-based encoder-decoder framework. Neurocomputing 388, 269–279 (2020) 17. Bryant, M.A., Hesser, T.J., Jensen, R.E.: Evaluation statistics computed for the wave information studies (WIS), Technical Report ERDC/CHL CHETN-I-91 (2016)
Phishing URL Detection with Prototypical Neural Network Disentangled by Triplet Sampling Seok-Jun Bu and Sung-Bae Cho(B) Department of Computer Science, Yonsei University, Seoul 03722, Republic of Korea {sjbuhan,sbcho}@yonsei.ac.kr
Abstract. Phishing attacks continue to pose a significant threat to internet security, with phishing URLs being among the most prevalent attacks. Detecting these URLs is challenging, as attackers constantly evolve their tactics. Few-shot learning has emerged as a promising approach for learning from limited data, making it ideal for the task of phishing URL detection. In this paper, we propose a prototypical network (DPN) disentangled by triplet sampling that learns disentangled URL prototypes to improve the accuracy of phishing detection with limited data. The key idea is to capture the underlying structure and characteristics of URLs, making it highly effective in detecting phishing URLs. This method involves sampling triplets of anchor, positive, and negative URLs to train the network, which encourages the embedding space to be more separable between phishing and benign URLs. To evaluate the proposed method, we have collected and assessed a real-world dataset consisting of one million URLs, and additionally utilized two benchmark URL datasets. Our method outperforms the state-of-the-art models, achieving accuracies of 98.0% in a 2-way 50-shot task and 98.32% in a 2-way 5000-shot task. Moreover, the experiments highlight the advantages of using a disentangled representation of URLs, where t-SNE visualizations reveal distinct and well-separated URL prototypes. Keywords: Phishing URL detection · Few-shot learning · Prototypical network · Triplet sampling
1 Introduction Phishing attacks have become a major concern for internet security, as cybercriminals increasingly employ phishing URLs as a primary attack source [1]. These attacks often involve deceiving users into clicking on malicious links or providing sensitive information, leading to financial loss or unauthorized access to personal data [2]. The rapidly evolving tactics of attackers make the detection of phishing URLs an ongoing challenge [3, 4]. Traditional machine learning approaches often require a large amount of labeled data to achieve satisfactory performance. However, obtaining an extensive and diverse labeled dataset of phishing URLs is difficult and time-consuming, as attackers continuously adapt their methods and generate new phishing sites [5]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 132–143, 2023. https://doi.org/10.1007/978-3-031-42519-6_13
Phishing URL Detection with Prototypical Neural Network
133
Few-shot learning has emerged as a promising approach for addressing the challenge of learning from limited data [6]. By leveraging prior knowledge gained from related tasks, few-shot learning algorithms can generalize to new tasks with a small number of labeled examples. This makes few-shot learning particularly suitable for the task of phishing URL detection, where data scarcity is a common issue. In recent years, several few-shot learning methods have been proposed [7, 8]. Among the prominent, prototypical networks [9–11] have demonstrated promising performance in various few-shot learning tasks due to their ability to capture the representative samples for each class. In this paper, we propose a novel method called disentangled prototypical network (DPN) to detect phishing URL based on a triplet-trained prototypical network that learns disentangled representations. This method aims to capture the underlying structure and disentangled prototypes of URLs, making it highly effective for detecting such malicious links with limited samples. We compute triplets of anchor, positive, and negative URLs to train the network, which encourages greater separation in the embedding space between phishing and benign URLs. This approach allows the network to learn more robust and discriminative representations, leading to improved performance. To evaluate the effectiveness of the proposed method, we have collected and assessed a real-world dataset consisting of one million URLs, and additionally utilized two benchmark URL datasets. The method will be compared to the state-of-the-art models in a 2-way 50-shot task and a 2-way 5000-shot task. Furthermore, t-SNE visualizations of the disentangled prototypical network will reveal distinct and well-separated URL prototypes, confirming whether the disentangled prototypes of phishing and benign URLs are significantly more discernible compared to those generated by the conventional methods. We will investigate the benefits of using a disentangled prototypical network for phishing URL detection, offering an effective and efficient solution to address the ongoing threat of phishing attacks. Table 1. Related works on phishing URL classification including zero-day phishing URL classification Defense mechanism
URL representation
Method
URL Raw text: URLNet: CNN classification Char.-/word-level concatenation embedding [12] Texception: CNN modification with Inception-style blocks [13]
Dataset
Performance Acc
Recall
VirusTotal
0.9929
-
MS anonymized URLs
-
0.9972
(continued)
134
S.-J. Bu and S.-B. Cho Table 1. (continued)
Defense mechanism
URL representation
Method
Dataset
URL with screenshots
OCR with ResNet, homomorphic encryption [14]
MS URL screenshots
Convolutional autoencoder based anomaly detection [15]
PhishTank, 0.9610 PhishStorm, ISCX-URL-2016
URL augmentation with text GAN [4]
PhishTank
0.701 (AUC)
Adversarial label learning [16]
PhishTank
0.8990
CNN-LSTM with GA-optimized detection rules [17]
PhishTank, 0.9668 PhishStorm, ISCX-URL-2016
0.9392
Logic-integrated PhishTank, 0.9740 triplet network PhishStorm, for zero-day ISCX-URL-2016 phishing detection [18]
0.9465
Zero-day Char.-level phishing encoding URL classification • Anomaly Selected URL detection features • Data augmentation • Utilizing Raw text detection rules with deep learning Address bar, model HTTP request, Domain, Script, Char.-level encoding Address bar, HTTP request, Domain, Script, Char.-level encoding
Performance Acc
Recall
0.9057
-
0.9421
-
2 Related Works In recent years, there has been significant progress in malicious URL detection, particularly in applying machine learning and deep learning techniques to improve the performance of detection systems. This section provides a review of the relevant literature, focusing on the challenges and advances made in phishing URL detection and few-shot learning methods. Table 1 summarizes the key studies that have investigated the modeling and representation method of URLs, highlighting the different approaches of classification used. Traditional feature-based methods focus on extracting specific features from the URL string, such as lexical properties, domain information, and expert-crafted features. These features are then used as inputs for machine learning models, such as support vector machines (SVMs). However, these methods suffer from a few limitations, including the inability to capture semantic meaning and sequential patterns in URL strings, the need for extensive manual feature engineering, and poor generalization to unseen data.
Phishing URL Detection with Prototypical Neural Network
135
To address these limitations, deep learning-based methods have been proposed to automatically learn URL representations without relying on manual feature engineering. These methods employ various deep learning architectures, such as convolutional neural networks (CNNs) [12, 13], autoencoders [5, 15], and generative adversarial networks (GANs) [4]. The success of these methods can be attributed to their ability to capture complex patterns in URL strings, adapt to new types of attacks, and generalize well to unseen data. However, these deep learning-based methods face challenges in handling class imbalance with limited samples. To tackle these issues, researchers have proposed oversampling techniques based on generative deep learning methods and integrating deep learning with domain knowledge and first-order logic constraints [18]. Few-shot learning has emerged as a promising approach to address the limitations of traditional supervised learning, where a large amount of labeled data is required for training. In the context of phishing URL detection, few-shot learning can be particularly beneficial, given the evolving nature of phishing attacks and the scarcity of labeled data. Prototypical network [9] is a prominent few-shot learning method that learns a metric space in which classification can be conducted using the simple nearest neighbor rule. In this method, a neural network is trained to learn a meaningful embedding space, where each class is represented by a single prototype. During the few-shot learning task, new classes can be incorporated with only a few labeled examples, and the model can generalize well to new instances of these classes. The proposed method aims to build upon the strengths of prototypical network by disentangling the learned embedding space. By leveraging the strengths of both few-shot learning method and disentangled representation learning method, the method seeks to induce a more effective phishing URL detection system that can quickly adapt to new types of attacks with limited labeled data.
Fig. 1. Overview of the disentangled prototypical network (DPN) architecture.
136
S.-J. Bu and S.-B. Cho
3 The Proposed Method 3.1 Overview The disentangled prototypical network is a method designed to address the challenges of phishing URL detection in a few-shot learning setting. The core idea of this method is to find out representative samples (prototypes) of benign and phishing URLs, and leverage them for classification. Unlike the conventional prototypical networks, we enhance the discriminative power of the learned representations by enforcing a large separation between the prototypes through the disentangled representation learning. As described in Fig. 1, the architecture of the proposed method is composed of three key components: (1) a feature extraction module that converts raw URL inputs into compact feature representations, (2) a triplet sampling and learning process that disentangles the URL embedding by imposing a margin between positive and negative examples, and (3) a disentangled prototypical loss that computes class prototypes and classifies URLs based on their proximity to these prototypes. The synergy of the components results in a method that can effectively learn from limited labeled data while ensuring accurate classification. 3.2 URL Prototypes for Few-Shot Phishing Classification In this section, we argue the concept of URL prototypes as a powerful tool to address the challenges of few-shot phishing URL detection. Moreover, the disentangled prototypical network focuses on learning URL prototypes that represent each class effectively in the feature space. This is achieved by training the model to minimize the distance between the feature vector and the corresponding class prototype while maximizing the distance to prototypes of other classes. This process results in more discriminative and betterseparated class prototypes, which enhance the performance of the classifier in a few-shot setting. The utilization of URL prototypes is of great importance for phishing detection tasks, especially when dealing with limited samples. By using prototype-based approaches, we can effectively represent each class by a single representative point in the feature space. This enables us to perform classification even with very few labeled examples per class, which is particularly beneficial for addressing the challenges of few-shot phishing URL detection. We calculate the class prototype Pc for class c, where the prototype is the average feature vector of all the URLs belonging to the class. Here, f (x) represents the feature vector for a URL x, and |Cc | denotes the number of URLs in class c: Pc =
1 ∗ x∈Cc f (x) |Cc |
(1)
To obtain the feature vectors for URLs, we employ a character-level convolutional network for URL encoding. This approach enables us to capture the local structure and patterns within URLs, which is essential for accurately distinguishing between benign and phishing URLs.
Phishing URL Detection with Prototypical Neural Network
137
The prototypical loss function, Lproto , is used to train the model to learn URL prototypes for each class. The loss function encourages the model to minimize the distance d between the feature vector f (x) and the corresponding class prototype Pc while maximizing the distance to prototypes of other classes Pc : exp(d (f (x), Pc )) (2) Lproto = −c log c exp d f (x), Pc
3.3 Disentangling URL Prototypes Based on Triplet Learning The disentangled prototypes based on DPN allows for more effective learning of discriminative feature representations, which can be beneficial in the context of few-shot phishing URL detection. By leveraging triplet learning, we can enforce a higher degree of separation between URL prototypes, resulting in a more robust classification model. One key aspect of the proposed method is the integration of a disentangled representation learning process into the prototypical network framework. As described in Algorithm 1, the method not only seeks to optimize the distance between URL prototypes within each class but also enforces a strict separation among prototypes belonging to different classes. As described in Algorithm1, the triplet sampling process plays a crucial role in the disentangling process, as it generates batches of triplets consisting of an anchor URL, a positive URL, and a negative URL for training the model. By training on these carefully selected triplets, we encourage the model to learn a feature space that maximizes the inter-class distance while minimizing the intra-class distance. Triplet learning is a powerful method for learning discriminative feature representations by comparing the relationships between sets of three instances (anchor a, positive p, and negative n): (3) Ltriplet = max 0, Da,p − Da,n + margin The triplet loss function, Ltriplet , aims to enforce a margin between the distance of the anchor-positive pair Da,p and the anchor-negative pair Da,n . The margin is a predefined constant that determines how far apart the positive and negative pairs should be in the feature space.
138
S.-J. Bu and S.-B. Cho
Algorithm 1: Training Disentangled Prototypical Networks This presents the process of training the disentangled prototypical network using the triplet loss and prototypical network loss functions. Input: X_train: input URL training dataset Y_train: label vector for training URLs X_val: input URL validation dataset Y_val: label vector for validation URLs num_epochs: number of training epochs batch_size: number of triplets to generate per batch num_triplets: total number of triplets to generate model: disentangled prototypical network model Output: trained_model: trained disentangled prototypical network model 1: function train_DPN(X_train, Y_train, X_val, Y_val, num_epochs, batch_size, num_triplets, model) 2: triplet_generator = generate_triplets(X_train, Y_train, batch_size, num_triplets) 3: for epoch = 1 to num_epochs do 4: total_loss = 0 5: total_triplet_loss = 0 6: total_prototype_loss = 0 7: for batch = 1 to num_triplets // batch_size do 8: [anchor_URLs, positive_URLs, negative_URLs] = next(triplet_generator) 9: // Compute the Triplet loss for the batch 10: triplet_loss = compute_triplet_loss(anchors, positives, negative, model) 11: // Compute the Prototypical Network loss for the batch 12: prototype_loss = compute_prototype_loss(X_train, Y_train, model) 13: // Calculate the total loss as a weighted sum of the loss functions 14: loss = triplet_loss + prototype_loss 15: total_loss += loss 16: total_triplet_loss += triplet_loss 17: total_prototype_loss += prototype_loss 18: // Update the model parameters using the total loss 19: model = update_model(model, loss) 20: end for 21: // Evaluate the model on the validation set 22: val_accuracy = evaluate_model(model, X_val, Y_val) 23: end for 23: return model 24: end function
The Eq. (4) defines the distance metric used in the proposed DPN loss calculation. The distance d (x, y) between the feature vectors f (x) and f (y) for URLs x and y can be computed using various distance measures. In this implementation, we use the Euclidean distance: d (x, y) = ||f (x) − f (y)||2
(4)
Training the DPN model involves minimizing the total loss function, L total , which is a weighted sum of the prototypical loss and the triplet loss. The total loss function, Ltotal , is a weighted sum of the prototypical network loss (Lproto ) and the triplet loss (Ltriplet ). The weighting factors α and β determine the relative importance of the two loss components during training: Ltotal = α ∗ Lproto + β ∗ Ltriplet
(5)
Phishing URL Detection with Prototypical Neural Network
139
4 Experimental Results 4.1 Dataset Specification As described in Table 2, we use a comprehensive dataset with both benign and phishing URLs from various authoritative sources to ensure that our model is trained and evaluated on real-world data, closely representing real-life phishing detection challenges. The dataset is composed of three main sources: (1) Collected: Benign URLs are obtained from the DMOZ open directory project, while phishing URLs are gathered from PhishTank, providing a diverse set of malicious URLs for model training and evaluation. (2) ISCX-URL-2016: This is a well-known benchmark dataset, containing benign, phishing, malware, and spam URLs. (3) PhishStorm: To augment our dataset, we incorporate PhishStorm benchmark data, which offers a balanced mix of benign and phishing URLs. To evaluate the few-shot learning, we divide the data temporally. We use the data up to 2014 for pre-training the model on base classes of benign and phishing URLs. This pre-trained model is then fine-tuned on small datasets with limited examples extracted from the data after 2014. This setup is designed to evaluate the adaptability to new and emerging phishing strategies with limited number of samples. We split each data source into training, validation, and test at a ratio of 8:1:1. For the few-shot learning experiments, the training set is randomly sampled from the after-2014 data, representing different distributions and potentially new phishing strategies. Table 2. Specification of the collected URL dataset and two benchmark datasets. Source
Description
Collected
DMOZ open directory project (Benign)
ISCX-URL-2016
PhishStorm
Instances 1,048,582
Example http://geneba**.org/ftp/…
PhishTank (Phishing)
15,000
http://droopbxoxx.com/@@@.
Benign
35,000
http://metro.co.uk/2015/05…
Phishing
9,000
http://standardprincipal.pt/…
Malware
11,000
http://9779.info/%E5%88%…
Spam
12,000
http://adverse*s.co.uk/scr/cl…
Benign
47,682
en.wikipedia.org/wiki/dead…
Phishing
47,859
nobell.it/70ffb52d079109dc…
4.2 Performance Evaluation and Comparison Figure 2 offers a comparison between the proposed method and a prototypical network, a notable existing few-shot learning model, by progressively decreasing the training sample size from 5,000 to 2 samples. Our method starts off strong with an accuracy of
140
S.-J. Bu and S.-B. Cho
Fig. 2. Performance comparison between DPN and the prototypical network with varying training sample sizes.
Table 3. Accuracy and recall comparison of DPN and the state-of-the-art methods for three datasets. Benchmark Dataset PhishTank
ISCX-URL-2016
PhishStorm
Metrics
Accuracy
Recall
Accuracy
Recall
Accuracy
Recall
CNN (w/o triplet, w/o proto.)
0.9070
0.8374
0.9424
0.9015
0.9229
0.8785
URLNet [12]
0.9226
0.8785
0.9450
0.9390
0.9395
0.8864
Texception [13]
0.9319
0.9075
0.9765
0.9462
0.9710
0.9227
Prototypical Network [9]
0.9392
0.9082
0.9647
0.9448
0.9664
0.9135
Convolutional Autoencoder [15]
0.9540
0.9590
0.9732
0.9338
0.9690
0.9132
CNN-LSTM with GA-optimized Ruleset [17]
0.9668
0.9392
0.9692
0.9323
0.9542
0.9014
Logic-integrated Triplet Network [18]
0.9740
0.9465
0.9745
0.9510
0.9812
0.9464
Ours
0.9877
0.9641
0.9885
0.9690
0.9826
0.9484
0.9832 using 5,000 samples, compared to the prototypical network’s 0.9392. Notably, as the sample size is reduced to the few-shot regime (5-shot and 2-shot), our method exhibits exceptional resilience by maintaining an accuracy of 1.00, whereas the prototypical network’s accuracy drastically plummets to 0.5000 with 50 samples, highlighting the superiority of our method in extremely limited data scenarios. Table 3 shows the best existing methods’ accuracy and recall for the three datasets, illustrating that our method achieves the best performance when utilizing all the samples. Our method attains the highest accuracy and recall on all datasets, demonstrating the
Phishing URL Detection with Prototypical Neural Network
141
Fig. 3. t-SNE visualization of URL prototypes, demonstrating disentangled representations obtained by DPN.
performance improvement due to the combinatorial data augmentation enabled by the unique triplet sampling method. In Fig. 3, we use t-SNE to visualize the calculated URL prototypes, qualitatively demonstrating that our embedding is disentangled by triplet learning. In contrast to prototypical networks, our method successfully produces disentangled prototypes as the embedding minimizes intra-class distances while maximizing inter-class distances. The proposed method demonstrates superiority over existing few-shot learning methods in accuracy and recall. The high performance is achieved through a high-capacity feature extractor, a meta-learning scheme, and transfer learning. These components allow the model to effectively utilize small datasets and adapt quickly. Table 4. Time cost comparison with other Few-shot phishing detection models (n_phishing = 15,000, n_benign = 15,000, early stopping applied). Sampling
Training
Testing
Prototypical Network [9]
-
2,770
3
Disentangled Prototypical Network (Ours)
283
5,005
3
However, as indicated in Table 4, our method involves additional time costs due to triplet sampling and combined triplet-prototypical loss computation. Compared to the Prototypical Network [9], which takes 2,770 s for training, our method requires 283 s for sampling and 5,005 s for training, while both have a testing time of 3 s. This added training time, a trade-off for performance gains, should be considered in scenarios where swift model training is crucial.
5 Concluding Remarks In this paper, we proposed a novel disentangled prototypical network (DPN) method for few-shot phishing URL detection tasks. Our method effectively combined the strengths of prototypical networks and triplet learning to create a powerful and robust model capable of handling real-world phishing detection challenges. As the side effect, the triplet sampling process facilitated combinatorial data augmentation, significantly improving the model’s performance for various datasets and sample sizes.
142
S.-J. Bu and S.-B. Cho
By incorporating the disentangled URL prototypes generated through triplet learning, our method successfully minimized intra-class distances while maximizing interclass distances, leading to accurate and reliable phishing URL classification. The extensive experimental results demonstrated the superiority of the DPN method compared to the existing few-shot learning methods, highlighting its effectiveness and potential for broader applications in cybersecurity. As future research, we plan to investigate the feasibility and application of DPN to other cybersecurity problems, such as modeling constantly evolving malware. By adapting the DPN method to handle new and emerging threats, we can contribute to building more resilient cybersecurity solutions. Additionally, considering that our DPN method incorporates a triplet sampling process, which may impose a time cost, we aim to explore efficient techniques for alleviating this burden. This could include investigating alternative sampling methods or optimizing the triplet selection process, thereby improving the time efficiency of the model without compromising its efficacy. Acknowledgements. This work was supported by the Yonsei Fellow Program funded by Lee Youn Jae, Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2021-0-02068, Artificial Intelligence Innovation Hub), and Air Force Defense Research Sciences Program funded by Air Force Office of Scientific Research.
References 1. Purwanto, R.W., Pal, A., Blair, A., Jha, S.: PhishSim: aiding phishing website detection with a feature-free tool. IEEE Trans. Inf. Forensics Secur. 17, 1497–1512 (2022) 2. da Silva, C.M.R., Fernandes, B.J.T., Feitosa, E.L., Garcia, V.C.: Piracema. io: A rules-based tree model for phishing prediction. Expert Syst. Appl. 191, 116239 (2022) 3. Huang, L., Jia, S., Balcetis, E., Zhu, Q.: Advert: an adaptive and data-driven attention enhancement mechanism for phishing prevention. IEEE Trans. Inf. Forensics Secur. 17, 2585–2597 (2022) 4. Anand, A., Gorde, K., Moniz, J.R.A., Park, N., Chakraborty, T., Chu, B.-T.: Phishing URL detection with oversampling based on text generative adversarial networks. In: IEEE International Conference on Big Data, pp. 1168–1177. IEEE (2018) 5. Shirazi, H., Muramudalige, S.R., Ray, I., Jayasumana, A.P., Wang, H.: Adversarial autoencoder data synthesis for enhancing machine learning-based phishing detection algorithms. IEEE Trans. Serv. Comput. 16, 2411–2422 (2023) 6. Liu, C., et al.: Learning a few-shot embedding model with contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 10, pp. 8635–8643 7. Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018) 8. Jiang, W., Huang, K., Geng, J., Deng, X.: Multi-scale metric learning for few-shot learning. IEEE Trans. Circuits Syst. Video Technol. 31(3), 1091–1102 (2020) 9. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. in Neural Information Processing systems, vol. 30 (2017) 10. Wang, P., Tang, Z., Wang, J.: A novel few-shot malware classification approach for unknown family recognition with multi-prototype modeling. Comput. Secur. 106, 102273 (2021)
Phishing URL Detection with Prototypical Neural Network
143
11. Chai, Y., Du, L., Qiu, J., Yin, L., Tian, Z.: Dynamic prototype network based on sample adaptation for few-shot malware detection. IEEE Trans. Knowl. Data Eng. (2022) 12. Le, H., Pham, Q., Sahoo, D., Hoi, S.C.: URLNet: learning a URL representation with deep learning for malicious URL detection, arXiv preprint arXiv:1802.03162 (2018) 13. Tajaddodianfar, F., Stokes, J.W., Gururajan, A.: Texception: a character/word-level deep learning model for phishing URL detection. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2857–2861. IEEE (2020) 14. Chou, E.J., Gururajan, A., Laine, K., Goel, N.K., Bertiger, A., Stokes, J.W.: Privacy-preserving phishing web page classification via fully homomorphic encryption. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2792–2796. IEEE (2020) 15. Bu, S.-J., Cho, S.-B.: Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing URL detection. Electronics 10(12), 1492 (2021) 16. Arachie, C., Huang, B.: Adversarial label learning. AAAI Conf. on Artificial Intelligence 33(01), 3183–3190 (2019) 17. Park, K.-W., Bu, S.-J., Cho, S.-B.: Evolutionary optimization of neuro-symbolic integration for phishing URL detection. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 88–100. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86271-8_8 18. Bu, S.-J., Cho, S.-B.: Integrating deep learning with first-order logic programmed constraints for zero-day phishing attack detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2685–2689. IEEE (2021)
Special Session 1: New Methods and Models to Study the Spread of Malware and Fake News
Finding and Removing Infected T -Trees in IoT Networks Marcos Severt1 , Roberto Casado-Vara2(B) , Angel Mart´ın del Rey3 , an4 , and Jose Luis Calvo-Rolle4 Esteban Jove4 , H´ector Quinti´ 1
Universidad de Salamanca, Salamanca, Spain marcos [email protected] 2 Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Matem´ aticas y Computaci´ on, Escuela Polit´ecnica Superior, Universidad de Burgos, Av. Cantabria s/n, 09006 Burgos, Spain [email protected] 3 Department of Applied Mathematics, Universidad de Salamanca, Salamanca, Spain [email protected] 4 Department of Industrial Engineering, University of A Coru˜ na, CTC, CITIC, Ferrol, A Coru˜ na, Spain {esteban.jove,hector.quintian,jose.rolle}@udc.es
Abstract. The Internet of Things (IoT) is filling cities and buildings. It provides continuous monitoring and control to the point where a large proportion of everyday processes can be automated. This is possible because of the large number of devices that are available. However, one of the main challenges is securing all these devices due to the proliferation of malware aimed at manipulating data or even creating botnets attacking other sensor networks. In this paper, regardless of the network topology or even the type of malware sample infecting the sensor network, we propose a new algorithm capable of removing infected nodes from sensor networks. Two simulations on two networks with different topologies were performed to validate the algorithm.
Keywords: Malware propagation
1
· graph theory · optimal k-cuts
Introduction
Wireless networks with millions of heterogeneous devices have emerged with the advent of the Internet of Things. There is some evidence that in the next few years the number of devices per square kilometer will be in the order of millions [7]. This wireless connectivity, which is ubiquitous, large, diverse, and massive, is critical for big data analytics and automation in the smart world, which holds the promise of enhancing nearly every dimension of our lives such that healthcare, Industry 5.0, public safety, agriculture, retail, etc [8]. Despite the benefits of IoT adoption in our society, it comes at a high cost due to the new security challenges that are emerging. To mitigate these security challenges, IoT c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 147–156, 2023. https://doi.org/10.1007/978-3-031-42519-6_14
148
M. Severt et al.
device vendors are competing for leadership in the physical security and cybersecurity of IoT networks. While physical security such as anti-malware programs or intrusion defense mechanisms is limited by the limited processing power of sensors and their small amount of stored energy, this creates significant economic challenges for focusing malware mitigation efforts in this area [9,10]. The focus on software in countermeasures to reduce the spread of malware is primarily due to these factors. In the literature, the focus is mainly on software certification techniques [11,12], despite the significant problem of network traffic overhead and the centralization of these techniques. A second popular approach is to use game theory to select devices on which to install anti-malware software in order to stop the propagation of malware or to regularly install security patches on the nodes chosen by game theory [13,14], the limitation of this model is that this has only been proven to work in mixed epidemic models, which assume that propagation is unlikely to be to neighboring nodes. Finally, the techniques proposed in the literature that have proven to be the most effective are special firewalls (wireless devices capable of executing countermeasures to infiltrate sensor networks) distributed throughout the premises, and the use of percolation theory to reduce the advance of malware that infects networks, these models are based on randomness, which is the main reason for using percolation theory [15,16]. These models rely on the randomness of the placement of dedicated firewalls, whereas percolation models are limited by their assumption of sufficient proximity of wireless links, which ignores interference from the aggregate network. Both companies and researchers in the field of IoT network cybersecurity are spending considerable effort in detecting and stemming the spread of malware in IoT networks, as can be seen from the different types of solutions listed in this paragraph. This is because IoT networks are one of the main components of smart cities, enabling their information collection, analysis, and, in some cases, automation. If this information were to be corrupted, the privacy of the data collected by IoT devices would become a major issue, as well as a major problem for the autonomous operation of smart cities [17]. Let G = (V, E) be a simple undirected graph with non negative p edges weight l(e), e ∈ E, and let K be a set of p positive integers such that i=1 k −i ≤ |V |. A K-cut is a collection of disjoint node sets such that ∀i ∈ {1, ..., p}, |Pi | = ki . The p−1 p minimum k−cut problem is to find a k−cut such that i=1 j=i+1 l(Pi , Pj ) is minimized [5,6]. We have considered the problem of finding the minimum kcut in simple unweighted graphs and adapted it to the research area of malware propagation. Let us assume that G is a simple, unweighted graph representing an IoT network. A malware sample is introduced into this IoT network and spreads throughout the network, infecting all nodes within its range. Through the use of mathematical epidemiology, individual models can predict the states of the sensors in this network (i.e, whether infected or uninfected) at any time step t. Thus, the initial hypothesis of this research is as follows: After identifying the infected nodes in the IoT network using the mathematical epidemiology models, it is possible to cut off a subgraph of G in a way that reduces or removes the spread of the malware sample in the IoT network.
Finding and Removing Infected T -Trees in IoT Networks
149
In this study, we have developed a novel algorithm that identifies infected nodes in an IoT network, groups them into a subgraph, and k-cuts them away from the IoT network to address the gap in the state-of-the-art for mitigating malware propagation in IoT networks. The malware is then quarantined so that the malware cannot spread further through the IoT network and cause more damage, like stealing or tampering with the data collected from the IoT device. Applying this algorithm to the IoT network, where the state of the nodes is known at each time step t, the network can be separated into the infected and the non-infected portions. This algorithm has no limitations with respect to network size, properties, or form. Moreover, our algorithm neither overloads the network nor requires central execution. This paper is organized as follows: we present the mathematical background and the proposed algorithm in Sect. 2. Section 3 presents the setup of our simulations and the results which validate the performance of the proposal. Finally, Sect. 4 concludes the conducted research and proposes future lines of work.
2
Minimum k -Cut Malware Infected-Tree
In this section, we will provide a mathematical background on graph theory and an overview on k-cuts, but also a detailed description of the proposed algorithm. We finish this section with the pseudo-code of the algorithm to enable a feasible reproducibility of this research. 2.1
Preliminaries
Let G = (V, N ) be a graph on n vertices and m edges representing an IoT network. We consider all graphs G in this paper to be unweighted, undirected multigraphs with no inherent loops. A graph is simple when for any pair of vertices in the graph, there exists at most one edge between them. For any v ∈ V , let deg G (v) be the number of its neighbors in V . For any subset S ⊆ G, let volG (S) = v∈S degG (v) be the total number of node degrees in the subset S, also let G[S] be the induced subgraph of S on G. Let T ∈ G a partition tree if T is a tree in which nodes W1 , W2 , ..., Wj are the disjoint subsets of V ensuring that V = W1 ∪...∪Wj . For each node W ∈ T , the graph GT [W ] = (VT [W ], ET [W ]) is created contracting each subset Wi into a single vertex of the original graph G. T is called Gomory-Hu equivalent (GH-equivalent), if exists a partition tree T that could be formed by using the Gomory-Hu algorithm under certain circumstances. Suppose there exists a partition tree T which is GH-equivalent. Let W be a vertex of T and let R ⊆ W be a subset. A refinement of T relative to R is to carry out a succession of Gomora-Hu iterations iteratively, each time choosing two distinct vertices s, t ∈ R that are located at the same node of T . Thus, once T has been refined relative to R, it is still GH-equivalent [1]. Given a graph G, the minimum k-cut is defined as the subsets S1∗ , ..., Sk∗ that contain the elements of the k-cut. The minimum k-cut is nontrivial if none of the sets are empty (i.e. |Si∗ | = 1).
150
M. Severt et al.
Lemma 1. Once the refinement is done on a GH-equivalent tree T given R, for arbitrary a, b ∈ R nodes of T , then the min-cut (s, t) in T is also a min-cut in G for (a, b) [1]. A k-partial tree T is a GH-equivalent partition tree whose vertices u ∈ V with degG (u) ≤ k are each singleton of T . The next lemma claims that k-partial trees are ever present and computationally feasible for k sufficiently small. Lemma 2. There exists an algorithm that, considering an undirected graph with unit edge capacities and parameter k on n vertices, computes a k-partial tree in time min{O(nk 2 ), O(mk)} [2]. Hariharan claims in this lemma that in undirected graphs one can find k-partial trees, in particular, spamming trees in polynomial time. Starting from this point, let T be a spamming tree of G. By deleting k − 1 edges of T and considering the connected components as the k-cut, the minimum k-cut can be obtained. Definition 1. A tree T ⊆ G is an infected tree if the set of vertices V (T ) are infected by malware. This definition naturally extends to k-partial trees and spamming trees. Henceforth, all trees in this article will be infected trees. 2.2
Finding Infected k -Cuts on Infected T -Trees
We first assume, with loss of generality, that the tree T is “spider-like”: the path can be split into an edge-disjoint set of paths with common endpoint r. Let each edge-disjoint path from r be a branch. Otherwise, we could apply heavy-light decomposition on the tree, breaking it into a disjoint union of branches that path from the leaf to root overlaps the edges of the branches [3]. Let G be an unweighted graph mapping an IoT network and let T be a tree of G such that we can root T at a vertex r ∈ V (T ) where T is a disjoint union of maximal branches. Under these circumstances, we propose the following theorem. Theorem 1. Let G be an IoT unweighted graph and let T be a tree of G. There is an algorithm Inf ectedT reeCut(G, T, s) which ensures that if an infected k-cut exists in G with size ≤ s, then InfectedTreeCut returns an infected k-cut in G. In the next paragraphs, we describe our proposed algorithm to deal with this case. It is assumed that the minimum k-cut of G is at most equal to λ because the algorithm could return any result otherwise. First, we randomly select a vertex vi of the graph G. Then we have to check if the selected vertex is infected by the malware or not (note that we have assumed that we have the information of the infected nodes at each time step t). Otherwise, find another random vertex. Once we have an infected vertex vi of G, we construct the tree T (vi ) whose root is vi ∗ and whose vertices are the neighboring vertices of vi . Let v1∗ , ..., vk−1 the infected ∗ ∗ ] the vertices which are neighbors of vertex vi . Thus, let ET := ET [S1∗ , ..., Sk−1 set of infected edges to delete. Notice that it’s important to remove the self-loops to avoid the double-counted edges. With the loss of generality, we assume that there is an infected k-cut S1∗ , ..., Sk∗ such that ET [S1∗ , ..., Sk∗ ] contains at most one
Finding and Removing Infected T -Trees in IoT Networks
151
edge from each maximal branch. We now define the multigraph H starting with G[ i∈(k−1) V (T (vi∗ ))] and contracting every vertex set V (T (vi∗ )) into a single vertex vi∗ . Notice that two of these graphs could share common vertices and edges with this design method of H. For the sake of simplicity, we will assume that H is connected. Since H is connected, it is possible to decompose H into the connected components Ci . One of these connected components will coincide ∗ ), as we assume that H is connected. The with the branches of T (v1∗ ), ..., T (vk−1 main idea is to build an infected spanning tree in each connected component TCi . To compute the infected spamming tree, we rely on Prim’s algorithm and an optimized version of the Breadth-First Search (BFS) algorithm [4]. For the Python code, this algorithm run in O(V (G)). We have added a check to the algorithm to ensure that when searching for vertices that it has not visited, the algorithm also checks whether the vertices are infected with malware. Therefore, this algorithm, tuned to fit our needs, will create the spamming tree of each of the connected components of H. We provide the Python code with the necessary modifications to create infected spamming trees since the BFS algorithm is widely known. Algorithm 1: Infected spamming tree algorithm graph ← graph array (Vertices and their surrounding vertices); visited ← empty array (List for visited nodes); queue ← empty array (Initialize a queue); infected ← infected V (G) (Infected vertices in G); Infected spamming tree creation function; def bfs(visited, graph, node): (function for BFS); visited.append(node); queue.append(node); while queue do m = queue.pop(0) ; print (m, end = “ ”) ; for neighbour in graph[m] do if neighbour not in (visited and infected) then visited.append(neighbour); queue.append(neighbour); else break; end end end Finally, we will iterate over the connected components Ci of size k−1 and with this information, we will estimate which is the infected k-cut. To achieve this, we will merge each of the infected spamming trees of the connected components TCi to create an infected spamming tree of H, TH . This is feasible because H is connected. By providing the pseudo-code of the algorithm, we conclude this detailed presentation of the proposed algorithm.
152
M. Severt et al.
Algorithm 2: Infected tree cut Let G = (V, E) an IoT graph; for i ∈ |V (G)| do Randomly pick a vertex vi ∈ V (G); if vj is infected then Contract each vertex of the set V (T (vi )) into vi ; Remove self-loops; else find another random vertex vj ; end end ∗ ))] be the infected set of (k − 1) edges to Let ET∗ := ET [V (T (vi∗ )), ..., V (T (vk−1 delete; Construction of multigraph H; Let H = T1 (v1∗ ) ∪ ... ∪ Tn (vn∗ ) be the multigrahp; Construction of infected spamming trees of connected components; for i ∈ |Ci | do TCi ← InfectedSpammingTree(Ci ); end TH ← TC1 ∪ ... ∪ TCk−1 ; return TH ;
When we apply the InfectedTreeCut algorithm to an IoT graph, the result is a subgraph TH that we cut from graph G. This allows us to identify the IoT nodes infected by the malware as it propagates at time step t.
3
Results
We present two case studies that we have carried out in the laboratory to demonstrate the performance of the proposed algorithm. In these case studies, the two main types of architectures for IoT networks have been selected: Cluster-Tree Network and Mesh Network [18,19]. These architectures have been randomly constructed with the Python NetworkX library (see Fig. 1). In these case studies, a SIR malware model based on stochastic and individual-based techniques was run [20]. At a given time step t, the nodes infected by the malware have been identified and then the proposed algorithm has been run to determine whether it locates these nodes and isolates them from the IoT network. To illustrate how the algorithm works, the subgraph containing the infected k-cut to be removed from each network is colored red. The edges that have no contact with malwareinfected nodes are colored green. 3.1
Case Study 1: Cluster-Tree Network
In this case study we have run the malware propagation model and at a certain time step t, the set of nodes {0, 1, 3, 7, 8, 17, 18} are infected by malware. The
Finding and Removing Infected T -Trees in IoT Networks
153
Fig. 1. IoT architectures for the case studies.
numbering of the nodes corresponds to that shown in Fig. 2. We can notice that the entire network is connected to a central point in this case study, where the IoT network architecture is a cluster-tree network. In this case, the malware propagation starts from the central node and spreads through one of the branches. When running the algorithm to remove the infected nodes, the edges to be cut are detected and the infected k-cut is constructed. Once the infected subgraph is removed, it can be observed that the resulting IoT network is disconnected.
Fig. 2. Case study 1 performance evaluation of the proposed algorithm
3.2
Case Study 2: Mesh Network
In this case study we have run the malware propagation model and at a certain time step t, the set of nodes {8, 9, 10, 11, 12} are infected by malware. The numbering of the nodes corresponds to that shown in Fig. 3. A mesh network architecture is used here. This type of IoT network, where all IoT devices can communicate by sharing information, is very common in medium-sized smart buildings. In this case, the malware is transmitted wireless, so it can spread to any node in the network. By applying the algorithm to create the infected k-cut, the infected nodes are detected and the infected subnetwork is removed. In this case, it can be seen that the resulting IoT network is actually connected.
154
M. Severt et al.
Fig. 3. Case study 2 performance evaluation of the proposed algorithm
4
Conclusion
In this work, we design a new algorithm to remove malware-infected nodes in IoT networks regardless of topology using optimal infection k-cut. This will significantly improve the security of IoT networks by isolating the infected nodes, while the remaining network will continue to function. However, the resulting IoT network after the optimal infected k-cut is not necessarily connected. Using the most common topologies in IoT networks: cluster tree network and mesh network, we have demonstrated the efficiency of the proposed algorithm in several simulations. Furthermore, in order to realistically infect nodes in the simulations, we propagated malware in both IoT networks. During the execution of the algorithm, we detected all the infected nodes and selected the optimal set of edges to perform the optimal infected k-cut. As a result, even though some of the nodes were no longer connected to the IoT network, the network continued to operate clean of malware. In future work, we will optimize our algorithm to run in polynomial time and the resulting IoT network is connected after removing infected nodes. Moreover, in future work, we will consider both individual and global malware specimens (such as [21,22]) to improve the performance of our algorithm regardless of the kind of malware found in IoT networks.
References 1. Abboud, A., Krauthgamer, R., Trabelsi, O.: Subcubic algorithms for Gomory-Hu tree in unweighted graphs. In: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 1725–1737 (2021) 2. Hariharan, R., Kavitha, T., Panigrahi, D., Bhalgat, A.: An O(mn) Gomory-Hu tree construction algorithm for unweighted graphs. In: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 605–614 (2007) 3. Gupta, A., Lee, E., Li, J.: Faster exact and approximate algorithms for k-Cut. In: 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 113–123. IEEE (2018)
Finding and Removing Infected T -Trees in IoT Networks
155
4. Bhaskar, R., Bansal, A.: Implementing prioritized-breadth-first-search for instagram hashtag recommendation. In: 2022 12th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 66–70. IEEE (2022) 5. Shinde, N., Narayanan, V., Saunderson, J.: Memory-efficient approximation algorithms for max-k-Cut and correlation clustering. Adv. Neural. Inf. Process. Syst. 34, 8269–8281 (2021) 6. Guttmann-Beck, N., Hassin, R.: Approximation algorithms for minimum k-Cut. Algorithmica 27, 198–207 (2000) 7. Agrawal, S., Chopra, K.: Analysis of energy efficient narrowband internet of things (NB-IoT): LPWAN comparison, challenges, and opportunities. In: Wireless Communication with Artificial Intelligence, pp. 197–217. CRC Press (2023) 8. Maddikunta, P.K.R., et al.: Industry 5.0: A survey on enabling technologies and potential applications. J. Ind. Inf. Integr. 26, 100257 (2022). https://doi.org/10. 1016/j.jii.2021.100257 9. Ntafloukas, K., McCrum, D.P., Pasquale, L.: A cyber-physical risk assessment approach for internet of things enabled transportation infrastructure. Appl. Sci. 12(18), 9241 (2022) 10. Li, S., Iqbal, M., Saxena, N.: Future industry internet of things with zero-trust security. Inf. Syst. Front. 1–14 (2022). https://doi.org/10.1007/s10796-021-101995 11. Yan, W., Fu, A., Mu, Y., Zhe, X., Yu, S., Kuang, B.: EAPA: efficient attestation resilient to physical attacks for IoT devices. In: Proceedings of the 2nd International ACM Workshop on Security and Privacy for the Internet-of-Things, pp. 2–7 (2019) 12. Namasudra, S., Sharma, P., Crespo, R.G., Shanmuganathan, V.: Blockchain-based medical certificate generation and verification for IoT-based healthcare systems. IEEE Consum. Electron. Mag. 12, 83–93 (2022) 13. Farooq, M.J., Zhu, Q.: Modeling, analysis, and mitigation of dynamic botnet formation in wireless IoT networks. IEEE Trans. Inf. Forensics Secur. 14(9), 2412– 2426 (2019) 14. Huang, Y., Zhu, Quanyan: Game-theoretic frameworks for epidemic spreading and human decision-making: a review. Dyn. Games Appl. 12(1), 7–48 (2022). https:// doi.org/10.1007/s13235-022-00428-0 15. ElSawy, H., Kishk, M.A., Alouini, M.S.: Spatial firewalls: quarantining malware epidemics in large-scale massive wireless networks. IEEE Commun. Mag. 58(9), 32–38 (2020) 16. Zhaikhan, A., Kishk, M.A., ElSawy, H., Alouini, M.S.: Safeguarding the IoT from malware epidemics: a percolation theory approach. IEEE Internet Things J. 8(7), 6039–6052 (2020) 17. Kumar, K. D., Sudhakara, M., Poluru, R. K.: Towards the integration of blockchain and IoT for security challenges in IoT: a review. Res. Anthology on Convergence of Blockchain, Internet of Things Secur. 193–209 (2023) 18. Alshohoumi, F., Sarrab, M., AlHamadani, A., Al-Abri, D.: Systematic review of existing IoT architectures security and privacy issues and concerns. Int. J. Adv. Comput. Sci. Appl. 10(7), 232–251 (2019) 19. Fotia, L., Delicato, F., Fortino, G.: Trust in edge-based internet of things architectures: state of the art and research challenges. ACM Comput. Surv. 55(9), 1–34 (2023) 20. del Rey, A.M., Vara, R.C., Gonz´ alez, S.R.: A computational propagation model for malware based on the SIR classic model. Neurocomputing 484, 161–171 (2022)
156
M. Severt et al.
21. Hernandez Guillen, J.D., Martin del Rey, A., Casado-Vara, R.: Propagation of the malware used in APTs based on dynamic Bayesian networks. Mathematics 9(23), 3097 (2021) 22. Guillen, J.H., Del Rey, A.M., Casado-Vara, R.: Security countermeasures of a SCIRAS model for advanced malware propagation. IEEE Access 7, 135472–135478 (2019)
Critical Analysis of Global Models for Malware Propagation on Wireless Sensor Networks A. Mart´ın del Rey1(B) , E. Frutos Bernal2 , R. Mac´ıas Maldonado3 , and M. Maldonado Cordero4 1
2
Department of Applied Mathematics, IUFFyM, Universidad de Salamanca, 37008 Salamanca, Spain [email protected] Department of Statistics, Universidad de Salamanca, 37007 Salamanca, Spain [email protected] 3 Faculty of Science, Universidad de Salamanca, 37008 Salamanca, Spain [email protected] 4 Department of Mathematics, IUFFyM, Universidad de Salamanca, 37008 Salamanca, Spain [email protected]
Abstract. Many models have appeared in the scientific literature with the aim of simulating the spread of malware on wireless sensor networks. Usually, these are global and unrealistic models that do not properly consider the characteristics of malware specimens and networks. In this paper a small critical analysis of this fact is carried out using the MDBCA model as an example. This study is illustrated with an improved model proposal. Keywords: Malware propagation · global models · wireless sensor networks · basic reproductive number · qualitative analysis
1
Introduction
Wireless sensor networks (WSN for short) play a fundamental role in the development and effective implementation of the technologies of Internet of Things (IoT), Industry 4.0, Smart Cities, etc. Consequently this type on networks are used extensively in several and (very) different domains. Depending on the particular application and the environment where sensor nodes are deployed (military applications, monitoring of critical infrastructures, etc.) the security of the WSN may become a basic and unavoidable requirement [8]. There are different cyber threats to WSN and malicious code (malware) is one of the most used tools [3,5]. As a consequence it is of great interest to design efficient methods to detect malware on this type of networks. In fact this has been the main goal of scientific community in the last years [2,4,7,10]. Apart from this line of research also the theoretical design, analysis and computational implementation of models to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 157–166, 2023. https://doi.org/10.1007/978-3-031-42519-6_15
158
A. M. del Rey et al.
simulate malware propagation on WSN is a very important task. In this sense several mathematical models have appeared in the scientific literature: some of them are individual-based models (see, for example, [1,11]) but the great majority are global models whose dynamics is described by differential equations on complex networks (see, for example, [6,9,12,14]). A detailed review of the literature reveals that global models exhibit some important drawbacks derived from a poor modeling of the characteristics of both, the specimen of malware and the device network. The main goal of this work is to highlight this fact by analyzing a paradigmatic example (the model proposed by Zhang et al. [13]) and improving it making reasonable and realistic assumptions. The rest of the work is organized as follows: In Sect. 2 the model by Zhang et al. is shown. A critical analysis of this model is introduced in Sect. 3. The improved model is described and analyzed in Sect. 4. Finally, a simple analysis of the control strategies derived from the study of the basic reproductive number is shown in Sect. 5.
2 2.1
The MDBCA Model General Description of the MDBCA Model
The MDBCA model is a SEIRD model where the device population is classified into five compartments: susceptible S, latent L (those infected devices that do not have the ability to spread the malware), infectious I, recovered R, and damaged D. Consequently the dynamics can be described as follows (see Fig. 1): when the specimen of malware reaches a susceptible device, it becomes latent at rate λ or infectious at rate τ where 0 < λ + τ ≤ 1. Latent devices can become infectious at rate 0 < σ ≤ 1 and infectious devices recover from malware infection at rate 0 < ≤ 1. The malware can damage devices and render them useless: it is supposed that an infected (exposed or infectious) device becomes damaged at rate 0 ≤ μ ≤ 1. In a similar way the (natural) consumption of energy can leave susceptible and recovered devices unusable at rate μ. Population dynamics is also considered such that the total number of non-damaged or active devices remains constant over time: new susceptible devices appear at rate μ.
Fig. 1. Flow diagram describing the dynamics of the model.
Critical Analysis of Models for Malware Propagation on WSNs
159
In MDBCA model it is assumed that WSNs are deployed in a two-dimensional L × L cellular lattice, such that it is divided into identical squares forming the cellular space C. Consequently each of these square areas represents a cell of the CA: Cab , 0 ≤ a, b ≤ L. The devices are distributed randomly or uniformly so that in each cell Cab there will be Nab active nodes (susceptible, latent, infected and recovered). Let t t t t , Ltab , Iab , Rab , and Dab be the fractions of susceptible, latent, infected, Sab t recovered and damaged devices in cell Cab at time t. Consequently Sab + Ltab + t t t Iab + Rab + Dab = 1, for each cell Cab ∈ C and every time t. The state of the cell Cab can be described by: t t t t ∈ S = [0, 1] × [0, 1] × [0, 1] × [0, 1] × [0, 1], (1) , Ltab , Iab , Rab , Dab Qtab = Sab and its neighborhood is defined by Vab = {Cij ∈ C : |a − i| ≤ r, |b − j| ≤ r}, where r denotes the communication radius of cells, being the same for all cells. Considering all these the state transition function can be tspecifications t t = F Q , Q , . . . , Q described by Qt+1 j i iα jα , where Vab = {Ci1 j1 , . . . , Ciα jα }. ab ab 1 1 Specifically for each cell one has: t+1 t t t Sab = Sab − (λ + τ ) Sab Iab − (λ + τ )
⎛
t+1 Rab t+1 Dab
Cij ∈Vab
m
Nij t t t , (2) S I + μ 1 − Sab Nab ab ij ⎞
Nij t t ⎠ − (σ + μ) Ltab , S I (3) Nab ab ij Cij ∈Vab ⎞ ⎛ Nij t t ⎠ t t t t t + σLtab − Iab = Iab + ⎝τ Sab Iab + τ m S I − μIab , (4) Nab ab ij
t t t ⎝λSab Lt+1 Iab + λ ab = Lab +
t+1 Iab
m
Cij ∈Vab
= =
t t t Rab + Iab − μRab , t t t Dab + μ Sab + Lab
+
t Iab
+
t Rab
(5) ,
(6)
where the coefficient 0 ≤ m ≤ 1 represents the communication connectivity between neighbor cells. Note that incidence (that is, the new infected devices per unit of time) is defined by the equation: ⎛ ⎞ ⎛ ⎞ Nij Nij t ⎝ t t ⎝ t Iab + m Iab + m (7) I t ⎠ + τ Sab It ⎠ , λSab Nab ij Nab ij Cij ∈Vab
Cij ∈Vab
where the first term stands for the susceptible nodes that have become latent, while the second one quantifies the fraction of susceptible nodes that have become infected, all of them due to contact with infectious devices placed both in the cell itself and in neighbor cells. For the sake of simplicity the authors of [13] suppose that the number of devices placed in each cell is the same, Nab = N , and taking the first terms of the Taylor expansion of the system (8)-(11), they obtain the following system of ordinary differential equations for each cell:
160
A. M. del Rey et al.
Sab (t) = − (λ + τ ) Sab (t)Iab (t) − (λ + τ )
⎛ Lab (t)
= ⎝λSab (t)Iab (t) + λ
Cij ∈Vab
= ⎝τ Sab (t)Iab (t) + τ
mSab (t)Iij (t) + μ (1 − Sab (t)) ,
(9)
⎞ mSab (t)Iij (t)⎠ + σLab (t) − Iab (t) − μIab (t), (10)
Cij ∈Vab
Rab (t) = Iab (t) − μRab (t),
2.2
(8)
⎞
mSab (t)Iij (t)⎠ − (σ + μ) Lab (t),
Cij ∈Vab
⎛ Iab (t)
(11)
The Underlying Global Model Associated to the MDBCA Model
The dynamics of the system can be described by means of the following system of ordinary differential equations: S (t) = − (λ + τ ) S(t)I(t) + μ − μS(t) L (t) = λS(t)I(t) − (σ + μ) L(t) I (t) = τ S(t)I(t) − ( + μ) I(t) + σL(t)
(12) (13) (14)
R (t) = I(t) − μR(t) D (t) = μ (S(t) + L(t) + I(t) + R(t))
(15) (16)
where S(0) = S0 > 0, L(0) = L0 ≥ 0, I(0) > 0, R(0) = R0 ≥ 0, D(0) = D0 ≥ 0. Note that as S(t) + L(t) + I(t) + R(t) + D(t) = 1 for every t > 0 then the system (12)-(16) can be reduced to the following: S (t) = − (λ + τ ) S(t)I(t) + μ − μS(t)
(17)
L (t) = λS(t)I(t) − (σ + μ) L(t) I (t) = τ S(t)I(t) − ( + μ) I(t) + σL(t)
(18) (19)
R (t) = I(t) − μR(t)
(20)
whose feasible region is Ω = {(x1 , x2 , x3 , x4 ) ∈ R4 : 0 < x1 + x2 + x3 + x4 ≤ 1}. A simple calculus shows that the equilibrium points associated to the system (17)-(20) are two: the disease-free equilibrium point P0∗ = (1, 0, 0, 0), and the endemic equilibrium point Pe∗ = (Se∗ , L∗e , Ie∗ , Re∗ ), where (μ + σ)(μ + ) , λσ + τ (μ + σ) λμ σ(λ + τ ) − μ2 + μ(τ − σ) − (μ + σ) ∗ Le = , (λ + τ )(μ + σ)(λσ + τ (μ + σ)) σ(λ + τ ) − μ2 + μ(τ − σ) − (μ + σ) , Ie∗ = μ (λ + τ )(μ + σ)(μ + ) σ(λ + τ ) − μ2 + μ(τ − σ) − (μ + σ) . Re∗ = (λ + τ )(μ + σ)(μ + ) Se∗ =
(21) (22) (23) (24)
Critical Analysis of Models for Malware Propagation on WSNs
161
The following result holds: Theorem 1. (i) The basic reproductive number associated to the model defined by (17)-(20) is the following: λσ + τ (μ + σ) R0 = . (25) ( + μ) (μ + σ) (ii) The disease-free equilibrium point is locally and asymptotically stable if R0 < 1. (iii) The endemic equilibrium point is locally and asymptotically stable if R0 > 1.
3
Critical Analysis of the MDBCA Model
In the model described in the last section there are some wrong assumptions and considerations that do not take into account the characteristics of the propagation process. In what follows we will show them with the aim of proposing an improved model: (1) It is well known that a fraction of WSN devices affected by malware (with latent or infected state) lose all their functionalities and go into the damaged state. In the MDBCA model it is assumed that the damaged compartment is additionally nourished by susceptible or recovered nodes which lose their functionality due to battery depletion. Furthermore, the fraction of nodes moving from any compartment to the damaged one is assumed to be the same, μ, which is a totally unrealistic assumption. It is not reasonable to assume that the fraction of nodes which lose their functionality at t due to malware is exactly the same as the fraction of nodes whose battery runs out. Consequently, when designing the malware propagation model and determining its dynamics, it is necessary to clearly distinguish between processes and consequences due to the malicious code and those attributable to natural causes as consumption of energy. Then the model must involve two different epidemiological coefficients: the damage rate η that only affects latent and infectious devices, and the (new) death rate that affects all compartments (note that malware could infect a device without causing harm). In this sense, the death rate must be different depending on the compartment to which it refers: μ for susceptible and recovered devices, and μ ˜ for latent and infectious devices with μ < μ ˜ since power consumption on compromised device is assumed to be higher. (2) In MDBCA model the number of infected nodes due to contact with infected nodes located in neighboring cells is given by the term: t t Sab Iij , (26) m (λ + τ ) Cij ∈Vab
162
A. M. del Rey et al.
where m is the communication connectivity between the cell Cab and its neighborhood Vab . It does not seem very realistic to assume that the coefficient m is constant. This shortcoming is due to the assumption that the WSN devices are homogeneously deployed. As a consequence, for each pair of neighbor cells, Cab and Cij , the model must consider a different communication connectivity coefficient mab ij . (3) The contagion rates (the latent rate λ, and the infectious rate τ ) are not defined properly. It is necessary to indicate explicitly how the associated contact rate depends on the population of devices placed in the cell: standard incidence, bilinear incidence or saturated incidence). Furthermore, from a technical point of view there are also some inconsistencies in the model described in [13], namely: (4) The definition of neighborhood of the cellular automaton given in [13] is incorrect since the communication radius is a characteristic of the nodes and not of the cell in which they are located. (5) The qualitative study done refers only to the disease-free equilibrium point and the study of the endemic equilibrium point is omitted. (6) A mathematical analysis of the design of optimal strategies for the control of the epidemic outbreak is not carried out.
4 4.1
The Novel -an Improved- Proposal Description and Analysis
The improved underlying model is based on the first improvement proposal stated in the last section. The dynamics of this model is governed by the following system of ordinary differential equations: S (t) = − (λ + τ ) S(t)I(t) + Λ − μS(t)
(27)
L (t) = λS(t)I(t) − (σ + η + μ ˜) L(t) I (t) = τ S(t)I(t) − ( + η + μ ˜) I(t) + σL(t) R (t) = I(t) − μR(t)
(28) (29) (30)
D (t) = μ (S(t) + R(t)) + (η + μ ˜) (L(t) + I(t))
(31)
where Λ is the replacement rate, S(0) = S0 > 0, L(0) = L0 ≥ 0, I(0) = I0 > 0, R(0) = R0 ≥ 0, D(0) = D0 ≥ 0. As it is supposed that the total number of active devices remains constant on the WSN then S(t) + L(t) + I(t) + R(t) = 1, and consequently: Λ = μ + (η + μ ˜ − μ) (L(t) + I(t)) for every t > 0. Thus the system (27)-(31) can be rewritten as follows: ˜ − μ) (L(t) + I(t)) S (t) = − (λ + τ ) S(t)I(t) − μS(t) + μ + (η + μ L (t) = λS(t)I(t) − (σ + η + μ ˜) L(t)
(32) (33)
I (t) = τ S(t)I(t) − ( + η + μ ˜) I(t) + σL(t) R (t) = I(t) − μR(t)
(34) (35)
Critical Analysis of Models for Malware Propagation on WSNs
163
Fig. 2. Flow diagram describing the dynamics of the improved model.
This system has two steady states: the disease-free equilibrium point given by P0∗ = (1, 0, 0, 0), and the endemic equilibrium point Pe∗ = (Se∗ , L∗e , Ie∗ , Re∗ ) where: (η + μ ˜ + σ)(η + μ ˜ + ) , (36) λσ + τ (η + μ ˜ + σ) λ L∗e = S∗I ∗ (37) η+μ ˜+σ e e μ η 2 + ησ − ητ − λσ + μ ˜2 + μ ˜(2η + σ − τ + ) − στ + (η + σ) ∗ (, 38) Ie = − μ(η + σ)(λ + τ ) + λμ˜ μ+μ ˜τ (μ + ) + τ (η + σ) + λ(μ + σ) Re∗ = Ie∗ . (39) μ Se∗ =
Proposition 1. The basic reproductive number associated to the improved underlying model (32)-(35) is: ητ + λσ + μ ˜τ + στ . (η + μ ˜ + σ)(η + μ ˜ + )
R0 =
(40)
Proof. Applying the next generation method, the basic reproductive number of the considered system is the spectral radius of the following matrix:
F0 · V0−1 = where F0 =
∂FL ∂L ∂FI ∂L
∂FL ∂I ∂FI ∂I
λ λσ (η+˜ μ+σ)(η+˜ μ+) η+˜ μ+ τ στ (η+˜ μ+σ)(η+˜ μ+) η+˜ μ+
P0∗
, V0 =
∂VL ∂L ∂VI ∂L
∂VL ∂I ∂VI ∂I
,
(41)
,
(42)
P0∗
and FL = λS(t)I(t), FI = τ S(t)I(t),
VL = (σ + η + μ ˜) L(t), VI = ( + η + μ ˜) I(t) − σL(t).
(43) (44)
A simple computation shows that the next generation matrix F0 · V0−1 has two ητ +λσ+˜ μτ +στ 1 eigenvalues: 0 and (η+˜ μ+σ)(η+˜ μ+) , thus finishing. Finally, note that R0 = S ∗ . e
Moreover, the study and analysis of the stability of the steady states yields the following results:
164
A. M. del Rey et al.
Theorem 2. The disease-free equilibrium point P0∗ is locally and asymptotically stable if R0 < 1. Theorem 3. The endemic equilibrium point Pe∗ is locally and asymptotically stable if R0 > 1. 4.2
Numerical Simulations
Now, some illustrative simulations related with the qualitative study shown in the last section are introduced. In Fig. 3-(a) the global evolution of the different compartments (susceptible -green-, infectious -red-, latent -orange-, and recovered -blue-) is shown when S(0) = 0.99, I(0) = 0.01, and λ = 0.8, τ = 0.005, μ ˜= 0.25, μ = 0.2, η = 0.03, σ = 0.2, = 0.01. Note that in this case the basic reproductive number is greater than 1: R0 ≈ 1.1667 and consequently the endemic equilibrium point is locally and asymptotically stable (the eigenvalues of Je∗ are: λ1 ≈ −0.78329, λ2 = −0.2, λ3 ≈ −0.19906, λ4 ≈ −0.029758). This steady state is theoretically given by Pe∗ ≈ (0.85714, 0.082338, 0.057637, 0.0028818). On the other hand, in Fig. 3-(b) the disease-free equilibrium is reached since R0 ≈ 0.99602 < 1. The numerical values of the epidemiological coefficients are the same than in the previous example with the exception of η = 0.06. The eigenvalues of the jacobian matrix J0∗ have negative real part: λ1 ≈ −0.82421, λ2 = λ3 = −0.2, λ4 ≈ −0.00078863, and P0∗ is locally and asymptotically stable.
Fig. 3. (a) Global evolution when R0 > 1. (b) Global evolution when R0 < 1
In Fig. 4 the evolution of the infected compartments (I(t) -red-, latent L(t) -orange- and infected I(t) + L(t) -brown-) with different basic reproductive numbers and initial conditions are shown. Subfigure 4-(a) corresponds to R0 ≈ −1.1667 > 1 and S0 = 0.99, in Subfigure 4-(b) the evolution of the number of compromised devices is shown when R0 ≈ 0.99602 and S0 = 0.99. If R0 ≈ 1.1667 and S0 = 0.75 the evolution is given by Subfigure 4-(c). Finally in Subfigure 4-(d) the case R0 ≈ 0.99602 with S0 = 0.75 is illustrated.
Critical Analysis of Models for Malware Propagation on WSNs
165
Fig. 4. Evolution of the latent and infectious compartments when: (a) R0 > 1, S0 = 0.99. (b) R0 < 1, S0 = 0.99. (c) R0 > 1, S0 = 0.75. (d) R0 < 1, S0 = 0.75.
5
Strategies for Control
As is known the basic reproductive number R0 plays a fundamental role when determining the future evolution of the epidemic process. In this sense it is very important to get R0 < 1 in order to control the malware propagation. Consequently the determination of the most efficient containment strategies is based on the cuantitative reduction of this threshold parameter. In our case the explicit expression of the basic reproductive number is given by the expression (40). Note that it depends on all the epidemiological coefficients involved in the model. If all these coefficients are fixed except for only one and it is supposed that the basic reproductive number is a function of this coefficient, a simple mathematical study of the monotony of such functions leads to the most basic containment strategies. That is: (1) As (2) As (3) As (4) As
dR0 σ dλ = (η+˜ μ+σ)(η+˜ μ+) then R0 decreases as λ decreases. dR0 1 = then R0 decreases as τ decreases. dτ η+˜ μ+ λ(η+˜ μ) dR0 dσ = (η+˜ μ+σ)2 (η+˜ μ+) then R0 decreases as σ decreases. λσ+τ (η+˜ μ+σ) dR0 d = − (η+˜ μ+σ)(η+˜ μ+)2 then R0 decreases as increases. η 2 τ +2ηλσ+2ηστ +λσ 2 +˜ μ2 τ +2˜ μ(τ (η+σ)+λσ)+σ 2 τ +λσ dR0 = − dη (η+˜ μ+σ)2 (η+˜ μ+)2
then R0 (5) As decreases as η increases. 2 2 +˜ μ2 τ +2˜ μ(τ (η+σ)+λσ)+σ 2 τ +λσ 0 (6) As dR = − η τ +2ηλσ+2ηστ +λσ then R0 d˜ μ (η+˜ μ+σ)2 (η+˜ μ+)2 decreases as μ ˜ increases.
166
A. M. del Rey et al.
Consequently, the basic reproductive number R0 decreases as contagion rates (λ and τ ) and infective rate σ decreases, or as “removed” rates (death rates μ and μ ˜ and damage rate η) increases. Acknowledgements. This work has been supported by Fundaci´ on Memoria D. Samuel Sol´ orzano Barruso (Universidad de Salamanca, Spain) under research grant FS/2-2022.
References 1. Batista, F.K., del Rey, A.M., Queiruga-Dios, A.: A new individual-based model to simulate malware propagation in wireless sensor networks. Mathematics 8(3), 410 (2020) 2. Bharati, S., Podder, P.: Machine and deep learning for IoT security and privacy: applications, challenges, and future directions. Secur. Commun. Netw. 2022, 1–41 (2022) 3. Conti, M.: Secure Wireless Sensor Networks. AIS, vol. 65. Springer, New York (2016). https://doi.org/10.1007/978-1-4939-3460-7 4. Cui, J.: Malware detection algorithm for wireless sensor networks in a smart city based on random forest. J. Test. Eval. 51(3), 1629–1640 (2023) 5. Faisal, M., Ali, I., Khan, M., Kim, J., Kim, S.: Cyber security and key management issues for internet of things: techniques, requirements, and challenges. Complexity 2020, 1–9 (2020) 6. Kumari, S., Upadhyay, R.: Exploring the behavior of malware propagation on mobile wireless sensor networks: stability and control analysis. Math. Comput. Simul. 190, 246–269 (2021) 7. Liu, J., Nogueira, M., Fernandes, J., Kantarci, B.: Adversarial machine learning: a multilayer review of the state-of-the-art and challenges for wireless and mobile systems. IEEE Commun. Surv. Tutor. 24(1), 123–159 (2022) 8. Lopez, J., Zhou, J.: Wireless Sensor Network Security. IOS Press (2008) 9. Nwokoye, C.H., Madhusudanan, V.: Epidemic models of malicious-code propagation and control in wireless sensor networks: an indepth review. Wirel. Pers. Commun. 125, 1827–1856 (2022). https://doi.org/10.1007/s11277-022-09636-8 10. Wazid, M., Das, A.K., Rodrigues, J.J.P.C., Shetty, S., Park, Y.: IoMT malware detection approaches: analysis and research challenges. IEEE Access 7, 182459– 182476 (2019) 11. Xu, B., Lu, M., Zhang, H., Pan, C.: A novel multi-agent model for robustness with component failure and malware propagation in wireless sensor networks. Sensors 21(14), 4873 (2021) 12. Yan, Q., Song, L., Zhang, C., Li, J., Feng, S.: Modeling and control of malware propagation in wireless IoT networks. Secur. Commun. Netw. 2021, 4133474 (2021) 13. Zhang, H., Shen, S., Cao, Q., Wu, X., Liu, S.: Modeling and analyzing malware diffusion in wireless sensor networks based on cellular automaton. Int. J. Distrib. Sens. Netw. 16(11), 1550147720972944 (2020) 14. Zhong, X., Peng, B., Deng, F., Liu, G.: Stochastic stabilization of malware propagation in wireless sensor network via aperiodically intermittent white noise. Complexity 2020, 2903635 (2020)
Benchmarking Classifiers for DDoS Attack Detection in Industrial IoT Networks Marcos Severt1 , Roberto Casado-Vara2(B) , Angel Mart´ın del Rey4 , ´ Herrero3 Nu˜ no Basurto3 , Daniel Urda3 , and Alvaro 1
4
Universidad de Salamanca, Salamanca, Spain marcos [email protected] 2 Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Matem´ aticas y Computaci´ on, Escuela Polit´ecnica Superior, Universidad de Burgos, Av. Cantabria s/n, 09006 Burgos, Spain [email protected] 3 Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Digitalizaci´ on, Escuela Polit´ecnica Superior, Universidad de Burgos, Av. Cantabria s/n, 09006 Burgos, Spain {nbasurto,durda,ahcosio}@ubu.es Department of Applied Mathematics, Universidad de Salamanca, Salamanca, Spain [email protected]
Abstract. In recent years, the proliferation of Internet of Things (IoT) devices has led to an increase in network DDoS attacks. This requires effective methods to classify traffic on IoT networks as benign or DDoSvulnerable. The present research compares the performance of some of the existing classification models to determine which is better at detecting suspicious packets from an IoT network. We evaluate six different Machine Learning (ML) models: linear, instance-based, SVM, probabilistic, tree-based, and boosted ones. The analyzed dataset contains labeled traffic packets, used to train and test the models on real-life data. Model performance is benchmarked in terms of the standard metrics: accuracy, precision, recall, TN Ratio, and F1-score. The outcomes of this study can be applied to improve IoT network security in Industry 4.0 environments as it provides valuable insights into the most effective ML algorithms to classify IoT network traffic. Keywords: Internet of Things · anomaly detection learning · supervised learning · malware analysis
1
· machine
Introduction
The term Internet of Things (IoT) network was coined by Kevin Ashtib in 1999. It refers to a paradigm in which the various devices that surround us in our daily lives are connected to each other and exchange useful information in an intelligent and autonomous way. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 167–176, 2023. https://doi.org/10.1007/978-3-031-42519-6_16
168
M. Severt et al.
Nowadays, the number of IoT networks is constantly growing and they are widespread in both professional and home environments, being especially relevant in industry environments. As a result, these IoT devices have become the target of many attacks and security problems [4] and they are exposed to various threats such as DDoS attacks [1]. Due to their low hardware performance, these devices have very limited computing and storage capacity, making them particularly vulnerable to DDoS attacks. Given the need to maintain the privacy, stability and persistence of the information handled by these devices, it is important to consider their ability to detect and address these types of threats [2,3]. Being able to effectively, efficiently and reliably classify the different network packets between IoT network devices is essential to determine and tackle potential threats such as DDoS attacks. Previous studies have proposed different Machine Learning (ML) techniques for this task [5–7], such as convolutional neural networks (CNNs) those based on Deep Leaning techniques [8,9]. Other recent studies explore the possibility of using classical ML models such as decision tree, random forest and SVMs [10]. Additionally, some research has focused on feature selection and dimensionality reduction techniques to improve the accuracy of packet classification in IoT networks [11]. The detection of adversarial attacks using ML techniques is another aspect studied in some papers. Due to the low resource capacity of IoT devices, these attacks are particularly complicated in IoT networks [12,13]. Another paradigm widely used today is based on hybrid models such as a model combining neural networks and SVM for the classification and detection of packets that are part of a DDoS attack [14]. The latter paradigm, based on hybrid models, is also widely used to detect problems and possible threats in the communication between different IoT devices on the network, as well as to detect whether the device is infected and is part of a botnet [15,16]. By comparing different classification model paradigms, this research proposes to solve the problem of classifying network packets of an IoT device to determine if it is a victim of a DDoS attack. The reason for this comparison is the determination of which of the different approaches is the most suitable for this problem in an IoT network. Despite all these studies and approaches, network packet classification in IoT devices is still an open research area. There are many challenges that need to be further explored. In this study, we aim to contribute to this area of research by conducting a comprehensive evaluation of different ML paradigms for packet classification in IoT networks. It is impossible to detect and classify DDoS attacks using traditional methods due to the number of devices that make up different IoT networks and the large number of packets they constantly receive. These methods are not compatible with the changing and heterogeneous nature of such an environment [17,18]. Therefore, there is a need for a paradigm shift that allows packet classification to be performed in a dynamic, effective and efficient manner, while adapting to the changing characteristics of the network. In terms of this paradigm, it is necessary to determine which of all the existing possibilities of classification
DDoS Attack Detection in Industrial IoT Networks
169
algorithms is the one best suited to the reality of the IoT networks [19]. These models will be responsible for distinguishing between the benign packets and the DDoS packets. To address this problem of malware detection in IoT networks, six different machine learning algorithms were proposed to find the one best suited to this classification problem. All these algorithms have been optimised using a hyperparameter optimisation process. Their performance has been measured using the usual metrics. It has been shown that the random forest is the ML-based algorithm with the best performance. In order to test which of the proposed paradigms and models is best suited to solve the problem of classification of packets from an IoT network as possible DDoS, the following experiment has been developed. In the first place, a large amount of traffic has been collected from an IoT network. Secondly, the dataset was prepared to be used for training and testing the models. The next step was to divide the dataset into a test set and a training set. With the prepared dataset, the different models were trained and the corresponding metrics were calculated. In this paper, where we have a very unbalanced dataset and a large amount of data, we have determined which of the paradigms is best suited to this classification problem. This paper is organized as follows: the analysed dataset, the different models used and the metrics to measure their effectiveness are described in the Sect. 2. The results obtained after training and testing of the different models, as well as a brief discussion of them, are presented in Sect. 3. Finally, Sect. 4 concludes the conducted research and proposes future lines of work.
2
Material and Methods
This section introduces the datasets used to train and test the different models, that are also described. Finally, the different metrics used to assess the efficiency, effectiveness and accuracy of the models are explained. 2.1
Description of the Dataset
Network traffic records of DDoS malware and benign anomalies were created using the Wireshark network capture program. A variety of IoT devices, including smart home devices, security cameras, sensors and many other types of IoT devices, were used to collect these records. For classification purposes, network packets were flagged as DDoS or benign attacks, for a total of 485,000 records. The dataset consists of PCAP files containing network traffic captured from IoT devices. Each capture file contains network traffic data from a single IoT device. These collections contain data from different network layers, including transport, network and application layers [20]. The labels are intended to provide researchers with a baseline of truth for training and testing their detection techniques, and are based on manual analysis of
170
M. Severt et al.
the network traffic logs. This dataset is intended for the research and development of machine learning and other techniques for the recognition of malware and benign anomalies in the IoT. The labeled nature of the dataset makes it suitable for supervised learning methods. This dataset can be used to develop different types of detection techniques, such as intrusion detection systems for IoT devices, threat intelligence systems and other security applications. As the network traffic captures are from real IoT devices, this dataset is particularly useful for evaluating the effectiveness of these techniques in real-world scenarios. The dataset has the following fields: – Connection time: This is the length of time that a particular network connection has been active. Knowledge of the duration of a connection can be useful in identifying potential attacks or suspicious behaviour. For example, an attacker may be attempting to exfiltrate data from the network if a connection lasts much longer than expected. – Source and response bytes: By monitoring the number of bytes that are sent and received by a device in each packet, we can detect a number of different anomalies in the network traffic. For example, abruptly sending packets with a lot of data or receiving many empty packets may indicate that a device is being subjected to a DDoS. – Missed bytes: The number of bytes that were expected to be received, but failed to be successfully delivered while connected. The number of missed bytes can indicate possible packet loss or tampering. This could be a sign of an ongoing attack or data exfiltration attempt. – Source and response packets: The total number of packets (or small units of data) that are sent or received from the source to the destination during the connection. Potential network attacks, such as DDoS attacks, or anomalies in network traffic that may indicate a security breach can be identified by analysing the number of packets sent and received. – Source IP Bytes and Response IP Bytes: The total number of Internet Protocol (IP) bytes sent and received from source to destination while connected. Potential attacks using IP-based protocols, such as IP spoofing or packet injection, can be detected by monitoring IP traffic. A study of the statistical correlation between the different variables in the dataset is carried out to improve the performance of the different models, reduce computational costs and aid interpretation of the results, as shown in Fig. 1. Correlated variables may provide redundant or irrelevant information, which can be removed when selecting features. This can simplify the model. It can reduce overfitting and improve generalisation performance. As we can see in the Fig. 1, there are no variables with a relevant correlation between them, eliminating the possibility of irrelevant or redundant information. 2.2
Model Description
We chose to use six different classification models approaches with some variants within them, such as linear models, instance-based models, SVM models, probabilistic models, tree-based models and boosting models, in order to compare
DDoS Attack Detection in Industrial IoT Networks
171
Fig. 1. Statistical correlation between the different variables of the dataset.
and test which of these algorithms are best suited to the problem of classifying a network packet that is read from an IoT device [21]. To begin with, in terms of linear models, Logistic Regression is chosen. When solving classification problems, this type of model is one of the most common. It works by estimating the parameters of a logistic function to predict the probability of belonging to a particular class. This logistic function is responsible for the assignment of a probability of being in the output classes to each input feature. It works well for binary classification problems and is simple and easy to interpret. Additionally, the K-Nearest Neighbours (KNN) is chosen as the model for instance-based models. KNN is a nonparametric model that makes predictions based on the k nearest neighbours in feature space. The model works by calculating the distances between each of the test points and each of the training points, and then selecting the k nearest points to be referred to as the neighbours. At the end, it assigns to this test point the class that is most common among its neighbours. This model works well for multi-class and binary problems, unlike the previous one [22]. Furthermore, the SVC (Support Vector Classifier) is chosen as the model for support vector machines (SVM) models. This model works by setting a maximum boundary. It then finds the hyperplane that separates the data into the different classes within that boundary. Specifically, the SVC is a type of SVM (Support Vector Machine) used to classify binary data [23].
172
M. Severt et al.
Moreover, Naive Bayes is the model of choice for probabilistic models. It is a model that has the assumption that the different variables are conditionally independent of the label assignment. The model works by estimating the a priori probabilities of the labels and of the variables as a function of the labels. Finally, it uses Bayes’ theorem to calculate the a posteriori probabilities. Both binary and multi-class classification problems can be addressed with this model [24]. In addition, the Decision Tree Classifier, Random Forest Classifier and Extra Trees Classifier are chosen as tree-based models. These algorithms works by constructing a set of decision trees using random subsets of the input features and training samples. Their predictions are then averaged to produce a final output. Both binary and multi-class classification problems can be addressed with this model [25]. Finally, the Ada Boost Classifier and the Gradient Boosting Classifier are chosen as the boosting models. These algorithms work by iteratively training weak classifiers on the residual errors of the previous classifiers. Their predictions are then added to the overall prediction. The final output is obtained by combining the predictions of all the weak classifiers, weighted by their contribution [26]. 2.3
Model Evaluation
In order to properly evaluate each model for the classification of an IoT device network packet as benign or part of a DDoS attack, a number of metrics have been used and will be presented in this section. Before the discussion of the metrics, it is necessary to define a number of concepts that are the basis for the calculation of these metrics: – True Positives (TP): After classification, this corresponds to the total number of DDoS attack packets correctly classified as such. – True Negatives (TN): After classification, this corresponds to the total number of benign packets correctly classified as such. – False Positives (FN): After classification, this corresponds to the total number of benign packets badly classified as DDoS. – False Negatives (FN): After classification, this corresponds to the total number of DDoS packets badly classified as benign. The first of the metrics used to evaluate the models is called Precision. Precision is the proportion of positives correctly identified and is a measure of model performance. Its formula is as follows: Precision =
TP TP + FP
(1)
The second metric used to evaluate the model is called Accuracy. Accuracy is a metric that evaluates the model by calculating the proportion of predictions that are correct out of the total number of predictions made. Its formula is as follows: TP + TN (2) Accuracy = TP + TN + FP + FN
DDoS Attack Detection in Industrial IoT Networks
173
The third metric used to evaluate the model is called Recall. Recall is a metric that evaluates the model by calculating the proportion of the actual positives that have been correctly identified. Its formula is as follows: Recall =
TP TP + FN
(3)
The fourth metric used is what we have referred to as the TN Ratio. This metric is exactly the same as the Recall metric, but it is applied to the TN. Its formula is as follows: TN (4) TN Ratio = TN + FP The last metric used is called F 1 score. This is a metric that is calculated by taking the harmonic mean of the precision score and the recall score. Because it takes into account both false positives and false negatives, it is considered an extension of the precision metric. Its formula is as follows: F1 = 2 ×
3
Ratio × Recall Precision + Recall
(5)
Results and Discussion
This section presents the metrics obtained after testing the different models described in the previous section, as well as a brief discussion of their results. In order to test the algorithms under the same conditions, a set of identical test and training data was created for all of them. For this purpose, the dataset was divided into 80% of network packets for training and 20% for testing. The process of dividing a dataset into training and test sets is a critical step in training and evaluating classification models. Defining the size and distribution of the sets is important. This is a guarantee that they are representative of the dataset as a whole. Also, at no point during the training and fitting process should the test set be used. The purpose of this is to avoid any misinterpretation of the performance of the model. The results that have been obtained for all the models described in Sect. 2.2 can be seen in Table 1. The table shows that all the algorithms perform very well. They all have perfect scores for accuracy, recall and F1. However, some algorithms have TN ratios as low as 0.00. At first sight, it might be thought that all of the models mentioned above are therefore suitable for the classification of packets in the network. However, this would be a mistake as the dataset is very unbalanced. Datasets where the algorithms tend to be more accurate in the majority class and less accurate in the minority class, present several challenges for training and evaluating classification models. This is because the model has a bias towards prediction of the majority class rather than the minority class, and the result is a bias towards the majority class. In this case, 98% of the packets will be
174
M. Severt et al. Table 1. Classification algorithms metrics.
Algorithm
Precision Accuracy Recall TN Ratio F1 score
Logistic Regression 0.97 K-Nearest Neighbours (KNN) 1.00 0.97 SVC 0.97 Naive Bayes 1.00 Decision Tree 1.00 Random Forest 1.00 Ada Boost 1.00 Gradient Boosting 1.00 Extra Trees
0.97 1.00 0.97 0.97 1.00 1.00 1.00 1.00 1.00
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
0.00 0.99 0.00 0.00 0.99 0.99 0.99 0.99 0.99
0.98 1.00 0.98 0.98 1.00 1.00 1.00 1.00 1.00
classified as DDoS and 2% will be classified as benign. In addition, the selection of the appropriate evaluation metric can be affected by unbalanced datasets. Accuracy, which measures the proportion of correct predictions compared to all predictions, may not be the best metric in an unbalanced dataset. This is because the model may be very accurate in the majority class, but very inaccurate in the minority class. Therefore, it is important to choose a metric that takes into account the imbalance of the dataset, such as the defined TN Ratio. Considering these metrics and the unbalanced nature of the dataset described above, we can see how paradigms based on linear, SVM and probabilistic models are unable to correctly classify benign traffic as such, resulting in a TN ratio of 0. On the other hand, we can see how instance-based, tree-based and boosting models solve the classification problem with perfect metrics.
4
Conclusion
In this paper, for the problem of classifying packets in an IoT network as DDoS attacks, we propose different classification models and determine which of them are the most suitable. Six different paradigms were used to cover the range of models, from the classical ones to the most widely used ones nowadays. We have collected a number of metrics by training and testing the models on a large dataset. This has allowed us to determine which of these paradigms are suitable for this classification problem. Our work has several limitations. These provide opportunities for future research. It is worth noting that future work will focus on perfecting the dataset to enable efficient, effective and accurate classification of various common threats in IoT networks, such as C&C FileDownload, C&C Mirai, C&C HeartBeat and Okiru.
DDoS Attack Detection in Industrial IoT Networks
175
References 1. Aliero, M.S., Qureshi, K.N., Pasha, M.F., Jeon, G.: Smart home energy management systems in internet of things networks for green cities demands and services. Environ. Technol. Innov. 22, 101443 (2021) 2. Casado2020-Vara, R., Mart´ın del Rey, A., Alonso, R. S., Trabelsi, S., Corchado, J. M.: A new stability criterion for IoT systems in smart buildings: temperature case study. Mathematics 8(9), 1412 (2020) 3. Karie, N. M., Sahri, N. M., Haskell-Dowland, P.: IoT threat detection advances, challenges and future directions. In: 2020 Workshop on Emerging Technologies for Security in IoT (ETSecIoT), pp. 22–29. IEEE (2020) 4. Jouhari, M., Amhoud, E. M., Saeed, N., Alouini, M. S.: A survey on scalable LoRaWAN for massive IoT: recent advances, potentials, and challenges. arXiv preprint arXiv:2202.11082 (2022) 5. Abbasi, M., Shahraki, A., Taherkordi, A.: Deep learning for network traffic monitoring and analysis (NTMA): a survey. Comput. Commun. 170, 19–41 (2021) 6. Idrissi, I., Azizi, M., Moussaoui, O.: IoT security with deep learning-based intrusion detection systems: a systematic literature review. In: 2020 Fourth International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–10. IEEE (2020) 7. Amanullah, M.A., et al.: Deep learning and big data technologies for IoT security. Comput. Commun. 151, 495–517 (2020) 8. Krupski, J., Graniszewski, W., Iwanowski, M.: Data transformation schemes for CNN-based network traffic analysis: a survey. Electronics 10(16), 2042 (2021) 9. Chen, L., Kuang, X., Xu, A., Suo, S., Yang, Y.: A novel network intrusion detection system based on CNN. In: 2020 Eighth International Conference on Advanced Cloud and Big Data (CBD), pp. 243–247. IEEE (2020) 10. Ferrag, M.A., Maglaras, L., Ahmim, A., Derdour, M., Janicke, H.: RDTIDS: rules and decision tree-based intrusion detection system for internet-of-things networks. Future internet 12(3), 44 (2020) 11. Pour, M.S., Bou-Harb, E., Varma, K., Neshenko, N., Pados, D.A., Choo, K.K.R.: Comprehending the IoT cyber threat landscape: a data dimensionality reduction technique to infer and characterize internet-scale IoT probing campaigns. Digit. Investig. 28, S40–S49 (2019) 12. Aloraini, F., Javed, A., Rana, O., Burnap, P.: Adversarial machine learning in IoT from an insider point of view. J. Inf. Secur. Appl. 70, 103341 (2022) 13. Zhang, J., et al.: AntiConcealer: reliable detection of adversary concealed behaviors in EdgeAI-Assisted IoT. IEEE Internet Things J. 9(22), 22184–22193 (2021) 14. Elsaeidy, A.A., Jamalipour, A., Munasinghe, K.S.: A hybrid deep learning approach for replay and DDoS attack detection in a smart city. IEEE Access 9, 154864– 154875 (2021) 15. Javeed, D., Gao, T., Khan, M.T., Ahmad, I.: A hybrid deep learning-driven SDN enabled mechanism for secure communication in internet of things (IoT). Sensors 21(14), 4884 (2021) 16. Abu Khurma, R., Almomani, I., Aljarah, I.: IoT botnet detection using Salp swarm and ant lion hybrid optimization model. Symmetry 13(8), 1377 (2021) 17. Munshi, A., Alqarni, N. A., Almalki, N. A.: DDoS attack on IoT devices. In: 2020 3rd International Conference on Computer Applications and Information Security (ICCAIS), pp. 1–5). IEEE (2020)
176
M. Severt et al.
18. Yang, K., Zhang, J., Xu, Y., Chao, J.: DDoS attacks detection with autoencoder. In: NOMS 2020–2020 IEEE/IFIP Network Operations and Management Symposium, pp. 1–9. IEEE (2020) 19. Vishwakarma, R., Jain, A.K.: A survey of DDoS attacking techniques and Defence mechanisms in the IoT network. Telecommun. Syst. 73(1), 3–25 (2020) 20. Azab, A., Khasawneh, M., Alrabaee, S., Choo, K. K. R., Sarsour, M.: Network traffic classification: techniques, datasets, and challenges. Digital Communications and Networks (2022) 21. Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107965 (2021) 22. Lavate, S. H., Srivastava, P. K.: An analytical review on classification of IoT traffic and channel allocation using machine learning technique. In: 2023 International Conference on Emerging Smart Computing and Informatics (ESCI), pp. 1–7. IEEE (2023) 23. Abdullah, D.M., Abdulazeez, A.M.: Machine learning applications based on SVM classification a review. Qubahan Acad. J. 1(2), 81–90 (2021) 24. Wickramasinghe, I., Kalutarage, H.: Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft. Comput. 25(3), 2277–2293 (2021) ˙ 25. C ¸ etInkaya, Z., Horasan, F.: Decision trees in large data sets. Int. J. Eng. Res. Dev. 13(1), 140–151 (2021) 26. Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., Asadpour, M.: Boosting methods for multi-class imbalanced data classification: an experimental review. J. Big Data 7(1), 1–47 (2020). https://doi.org/10.1186/s40537-020-00349-y
A Q-Learning Based Method to Simulate the Propagation of APT Malware ´ Jose Diamantino Hern´ andez Guill´en1(B) and Angel Mart´ın del Rey2 1
Department of Mathematics, University of Extremadura, 06800 Badajoz, Spain [email protected] 2 Department of Applied Mathematics, Universidad de Salamanca, IUFFyM, 37008 Salamanca, Spain [email protected] Abstract. Advanced persistent threats are cyberattacks characterized by its complexity, persistence and stealth. One of the basic tools employed in an APT campaign is specific specimens of advanced malware whose malicious payload consists of infecting some concrete devices. Consequently, this type of malware needs to have some type of knowledge of the network and devices. The main goal of this work is to introduce a novel model to obtain the most efficient path that a malware must follow to achieve its objective when no kind of information about the devices and network is known. The proposed model is based on Q-Learning methodology and it allows to consider some security countermeasures like honeypots (the model is able of find a path that avoids these honeypots). Furthermore, in order to avoid that APT malware gathers the information of the network, we propose using Moving Target Defense (MTD) which does not avoid malware propagation but it triggers that malware learns in a not proper way. Keywords: Malware propagation · Q-Learning · Advanced Persistent Threats · Machine Learning · Moving Target Defense · Cybersecurity
1
Introduction
An Advanced Persistent Threat (APT for short) could be defined as a stealthy cyber threat whose main goal is to gain unauthorized access to a computer network and remain undetected for an extended period of time [12,22]. The number of this type of cyber attacks has been growing dramatically in recent years and the organizations affected are both public (usually government agencies and critical infrastructures) and private (for example, companies as Google, Yahoo, Symantec, Northrop Grumman, Morgan Stanley and Dow Chemical, Facebook, etc.) [13,25]. Malicious code is one of the most important techniques used in this type of cyber attacks. We can distinguish two types of malware depending on the impact on the global population of devices connected to the network: Those specimens of malware that infect all devices within reach, and those specimens that have c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 177–186, 2023. https://doi.org/10.1007/978-3-031-42519-6_17
178
´ Mart´ın del Rey J. D. Hern´ andez Guill´en and A.
specific targets to attack and infect (propagation is not done indiscriminately). The first type tries to infect the most number of devices as possible whereas the second one only wants to infect some specific devices. Malware used in APTs campaigns belongs to the second group and it can affect the availability, integrity and confidentiality of some specific devices. The life cycle of an usual specimen of APT malware is constituted by the following six phases [6,25]: 1. Reconnaissance: The threat actor studies the environment of the target by analyzing and exploiting public open sources. 2. Establish Foothold: The malware enters in the target network. Usually, a zero-day malware is created to achieve this goal. 3. Command and Control Communication: Advanced malware is capable of controlling the infected network devices according to the attacker specifications. 4. Lateral Movement/Stay Undetected: Malware infects in a stealthy way the specific devices to steal the necessary data, gain privileges and maintain control. 5. Data Discovery: Advanced malware uses several techniques to find data of interest. 6. Exfiltration/Impediment: Sensitive data is gathered, compressed and sent to external locations. Early detection of APT malware is one of the great challenges the scientific community face and, in this sense, several ML-based methods have been proposed in the literature (see, for example, [1,14,15]). Also mathematical and computational models for advanced malware propagation have appeared (see, for example, [8,9,17,18]). The main goal of this work is related with the second approach last mentioned and it consists of the design of a reconnaissance malware (different from the APT malware itself) who uses ML techniques to obtain as much information about the device networks as possible in order to discover the most efficient path to reach the target devices by the advanced malware. Specifically the design of this support specimen of malware is based on Q-Learning methodology and it takes into account some security countermeasures like the presence of honeypots [5]. In this sense and due to some honeypots can be detected by malware (that triggering the creation of new approaches to improve their hiding from malware [23]) the model introduced in this work considers that honeypots can be detected. Due to the stealthy behavior of APT malware, security countermeasures and detection methods often do not work properly. Consequently, it could be a good strategy to try to prevent the malware from learning (applying ML techniques) correctly. In order to achieve this, it is used the Moving Target Defense (MTD) methodology which deploys different mechanisms and strategies which change over time in order to neutralize the cyber attack [4,7] in different scenarios [2,3,26,27]. Specifically, we propose the implementation of a honeypot that changes its location at short intervals among the predecessors of the device target.
A Q-Learning for APT Malware Propagation
179
The rest of this paper is organized as follows: In Sect. 2 the fundamentals of Q-learning theory are introduced. A Q-Learning based proposal for malware propagation is shown in Sect. 3. In Sect. 4 a simple example of a security countermeasure against this type of malware is described. Finally, the conclusions are shown in Sect. 5.
2
Basic Background of Q-Learning
Machine learning can be considered as a subdiscipline of Artificial Intelligence whose methods can be classified into four types of algorithms [20]: supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. Supervised learning algorithms are based on the use of training labeled data to learn and predict unlabeled data at a minimum cost (artificial neural networks, support vector machines, decision trees, etc.). Unsupervised learning is capable of finding hidden structures from unlabeled sample data sets (k-means clustering, hierarchical clustering, self-organization maps, etc.). Semi-supervised machine learning algorithms uses both labeled (small portion) and unlabeled data during training process (semi-supervised support vector machines, semisupervised hierarchical clustering, etc.). Finally, reinforcement learning methods are based on the use of some types of rewards or penalties in order for a certain agent to carry out the desired action. Q-learning is a model-free reinforcement learning algorithm which learns the value of the optimal policy independently of the action taken by the agent. Q-learning algorithm is based on using past data in order to improve future actions of a dynamical system [19]. This methodology has several applications in different fields [11]: playing games, hybrid control for robot navigation, adaptive traffic signal control, etc. There are two main components of a Q-learning algorithm [16,24]: the agent and the system. The agent takes an action, ut , and it is sent to the system. Next, according to this action, the system gives a new state, xt , and a reward, rt , to the agent (see Fig. 1). For example, an APT malware takes the action of spreading from one device to another and the system provides the agent of a new state that could be “being in a new device”, and the reward: “right” if this new device is the target or “wrong” if the reached device is not the target.
Fig. 1. Scheme of the Q-learning algorithm.
The learning problem can be modeled as Markov Decision Problem with the following notation:
180
´ Mart´ın del Rey J. D. Hern´ andez Guill´en and A.
– X is the state set: the set of all observable states -by the agent-. Moreover, it is considered Xt as the state of the system at the instant of time t. – U is the actions set: this is the set of all possible actions that an agent can take when it is endowed with a concrete state. Thus, we take into account Ut as the action of the system at the step of time t. – R(x, u) is the reward that the system gives to the agent when the system has the state x and the agent performs the action u. Usually, the forbidden actions have big penalties: R(x, u) < 0, and by this way the model learns that these actions cannot be taken. In the model, we have a reward at every instant of time t, Rt (x, u). We denote this reward as Rt in a simplified way. – p(Xt = x , Rt = r|Xt−1 = x, Ut−1 = u) is the probability of changing from the state x to the state x and receiving the reward r according to the action u. – The impact of future rewards on the present is measured by the discount factor γ. This parameter is a small number between 0 and 1, γ ∈ [0, 1], such that if γ < 1 then current rewards are more important than future rewards. – A policy, π(x, u), is a probability distribution that an agent uses to choose an action u when this is in a concrete state x. In a simplified way this is denoted as π. Then, the policy at time t is denoted as πt The accumulative rewards is given by the sum of all discounted rewards from step of time t: Gt = Rt+1 + γRt+2 + γ 2 Rt+3 + . . . =
∞
γ k Rt+k+1 .
(1)
k=0
Therefore, the objective is looking for the optimal policy, π ∗ , which is the best behavior of our agent. In other words, if we define the following function: Qπ (x, u) = Eπ (Gt |Xt = x, Ut = u) ∀x ∈ X, u ∈ U,
(2)
the optimal action is: Q∗ (x, u) = max Qπ (x, u) π
∀x ∈ X, u ∈ U.
(3)
In order to resolve this problem we can use the temporal difference learning: Qπ(new) (xk , uk ) = (1 − μ)Q(old) (xk , uk ) π +μ R(xk , uk ) + γ max Q(old) (xk+1 , u ) π u
(4)
where μ is the learning rate which measures the importance of the previous Qvalues to calculate the new Q-values. Usually, it is started with a high μ and then this is decreased in each step of time. Moreover, -greedy is used in the model [21]. This technique controls the value of exploration and exploitation during the propagation. If ∈ [0, 1] is low
A Q-Learning for APT Malware Propagation
181
(its values are close to 0), then the model priories exploitation and if is high (its values are close to 1) then the model prioritizes exploration. For example, if we consider = 0.2, we can apply this technique in the following way: we generate a random number 0 < rn < 1 and we compare them with . If rn ≤ then the action ut is randomly selected from the set of all possible actions. However, if rn > , the action taken is ut = arg maxu Q(xt , u). Usually, it is started with a high value of and then this is decreased in each step. Finally, we can compute the optimal action in each step of time t in the following way: arg max Qπ (xt , u). u
3
(5)
Malware Propagation
In this section we use the previous theory of Q-learning to simulate malware propagation. This means that the malware is intelligent and is capable of exploding this theory to reach its purpose. In this case, the main objective of this malware is finding the best path from a initial infected node to a target node. If the malware reached its goal, this information can be used in future attacks and the entity could be in danger. It is assumed that malware propagates on a device network that can be modeled by a directed random Erd¨ os-Renyi network with edge probability p = 0.05 and 50 nodes [10]. Although a malware usually moves in both directions (as an undirected graph), we have selected a directed graph because this has more general framework. You can consider both directions between nodes to create a undirected graph. Thus, and for the sake of the stealth, the APT malware can move from one device to another so that in each of this movements it is removed from the starting node. Consequently the specimen of malware can only be in only one device simultaneously. Moreover, it is considered that there are an only one target node and an only one initial infected node. The specimen of malware is sent several times to find the target node such that in each attempt the malware learns if the path traveled is a right or a wrong option to reach the target device. An illustrative example of this type of propagation is shown in Fig. 2. The state set X is determined by the nodes where the malware can be: X = [malware in node1, . . . , malware in node n].
(6)
When the malware is hosted in a node, this can only move to one of its neighbors. Then, the rest moves to other nodes are forbidden actions. In order to simplify the model, we have considered the number of actions that an agent can take as the highest degree γ that one node can have. This is usually less than the number of nodes. Then, if a node v has degree β with neighbors {v1 , ..., vβ }, we consider that the actions that the agent can take are: A = [move tov1 , . . . , move tovβ , stay inv1 , . . . , stay invβ ].
(7)
182
´ Mart´ın del Rey J. D. Hern´ andez Guill´en and A.
Fig. 2. Stealthy malware propagation on a directed network.
where the length of A is γ. This means that the forbidden action “stay in v” appears γ − β times in A. Then, if the malware stays in the same place, it has negative reward. This way we can use matrix with less range to do the operations faster. Thus, the Q−values are the coefficients of a matrix with order m × n, where m is the number of nodes and n is the highest degree. This matrix is initialized with zero coefficients. The rewards considered in this model are the following: – – – –
If the infected node remains in its state at t then Rt = −1 (forbidden actions). If malware moves to another node different from the target: Rt = −1. If malware reaches a honeypot, Rt = −1000. If malware reaches a target node, Rt = 100.
This way, we make that the model looks for the fastest path to reach the target device avoiding the honeypots. It is also considered that γ = 1 and μ is decreasing in each epoch as follows: μt = 1 −
0.8 · (t − 1) , N −1
1 ≤ t ≤ N,
(8)
where N is the number of epochs. Then, 0.2 ≤ μ ≤ 1. If there are not honeypots, the best policy is the shortest path from the infected node to the target node. If there are placed honeypots in the shortest way, this type of malware can detect them and therefore it will look for other path to reach the target in the following attempt. In what follows an illustrative example is shown. N = 10.000 epochs are performed with 30 steps for each epoch. The exploration technique that is utilized is −greedy with = 0.2. Furthermore, the model uses the equations (4) and (5) to learn the best policy. The model with this characteristics has been executed twice, without honeypots and with honeypots. In Figs. 3 and 4 the initial infected node is highlighted in red, the nodes infected during the epidemic process in orange, the honeypots in yellow and the target node in green. Figure 3 shows how the model reaches the target node in four steps (this is one of the shortest ways): 33-38-42-15-31. Figure 4 shows the propagation with some honeypots in the shortest ways: 33-7-47-14-28-31. We can see that the target node is reached in five steps in this case.
A Q-Learning for APT Malware Propagation
183
Fig. 3. Stealthy malware propagation without honeypots. The initial infected node is red. Nodes that are infected during propagation are colored orange. Finally, the node in green represents the target device.
4
Countermeasures
As is known, this type of malware (APT malware) is hardly detectable and, furthermore, it can detect honeypots what makes it more dangerous. Then, to minimize its impact on the security of the device network, one of the most efficient strategies is to confuse it. In order to do this, we propose to use Moving Target Defense (MTD). In our case, the MDP consists of moving the position of a honeypot among the predecessors that the target node has at each epoch. Then, there is an only honeypot in our device network. In this situation, the model does not learn which is the best way to reach the target. In fact, the malware tries to remain in the same node or to propagate across other paths that do not reach the target. This happens due to the penalty in these situations is lesser than colliding with a honeypot. Figure 5 illustrates an example of the propagation of the malware with this MTD taking into account similar parameters that the model of the previous section.
184
´ Mart´ın del Rey J. D. Hern´ andez Guill´en and A.
Fig. 4. Stealthy malware propagation with honeypots. The initial infected node is red. Nodes that are infected during propagation are colored orange. The honeypot is colored yellow. Finally, the node in green represents the target device.
In this example, the malware moves from node 33 (red) to nodes 41 (orange). Next, the malware remains in the node 41 during the rest of the states and it does not reach the target.
Fig. 5. Example of propagation with mobile honeypots. The initial infected node is red. Nodes that are infected during propagation are colored orange. The honeypot is colored yellow. Finally, the node in green represents the target device.
A Q-Learning for APT Malware Propagation
5
185
Conclusions
The model without honeypots is capable of finding the shortest path from the infected node to the target node. However, when there are placed honeypots in the device network the model finds the shortest but without honeypots. Then, this type of malware can be very dangerous since this information can be used in the future to infect the same target with different types of APTs. In order to avoid this, it is used Moving Target Defense. Specifically, we propose to create a dynamic honeypot that moves among the predecessors of the target node. The simulation results show that this kind of countermeasure is effective against this malware.
References 1. Anderson, H.S., Kharkar, A., Filar, B., Roth, P.: Evading machine learning malware detection. In: Proceedings of Black Hat Conference, vol. 2017, p. 6 (2017) 2. Azab, M., Eltoweissy, M.: Migrate: towards a lightweight moving-target defense against cloud side-channels. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 96–103 (2016). https://doi.org/10.1109/SPW.2016.28 3. Azab, M., Mokhtar, B., Abed, A.S., Eltoweissy, M.: Toward smart moving target defense for Linux container resiliency. In: 2016 IEEE 41st Conference on Local Computer Networks (LCN), pp. 619–622 (2016). https://doi.org/10.1109/LCN. 2016.106 4. Feng, X., Zheng, Z., Cansever, D., Swami, A., Mohapatra, P.: A signaling game model for moving target defense. In: IEEE INFOCOM 2017 - IEEE Conference on Computer Communications, pp. 1–9 (2017). https://doi.org/10.1109/INFOCOM. 2017.8057200 5. Franco, J., Aris, A., Canberk, B., Uluagac, A.: A survey of honeypots and honeynets for internet of things, industrial internet of things, and cyber-physical systems. CoRR abs/2108.02287 (2021). https://arxiv.org/abs/2108.02287 6. Ghafir, I., Prenosil, V.: Advanced persistent threat attack detection: an overview. Int. J. Adv. Comput. Netw. Secur. 4(4), 5054 (2014) 7. Hamada, A.O., Azab, M., Mokhtar, A.: Honeypot-like moving-target defense for secure IoT operation. In: 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), pp. 971–977 (2018). https://doi.org/10.1109/IEMCON.2018.8614925 8. Hern´ andez Guill´en, J., Mart´ın Rey, A., Casado-Vara, R.: Propagation of the malware used in APTs based on dynamic Bayesian networks. Mathematics 9, 3097 (2021). https://doi.org/10.3390/math9233097 9. Hernandez Guillen, J.D., Martin del Rey, A., Casado-Vara, R.: Security countermeasures of a SCIRAS model for advanced malware propagation. IEEE Access 7, 135472–135478 (2019). https://doi.org/10.1109/ACCESS.2019.2942809 10. Jackson, M.O., et al.: Social and Economic Networks, vol. 3. Princeton University Press Princeton (2008) 11. Jang, B., Kim, M., Harerimana, G., Kim, J.W.: Q-learning algorithms: a comprehensive classification and applications. IEEE Access 7, 133653–133667 (2019). https://doi.org/10.1109/ACCESS.2019.2941229
186
´ Mart´ın del Rey J. D. Hern´ andez Guill´en and A.
12. Khaleefa, E.J., Abdulah, D.A.: Concept and difficulties of advanced persistent threats (APT): survey. Int. J. Nonlinear Anal. Appl. 13(1), 4037–4052 (2022). 10.22075/ijnaa.2022.6230 13. Khalid, A., Zainal, A., Maarof, M.A., Ghaleb, F.A.: Advanced persistent threat detection: a survey. In: 2021 3rd International Cyber Resilience Conference (CRC), pp. 1–6 (2021). https://doi.org/10.1109/CRC50527.2021.9392626 14. M., G., Sethuraman, S.C.: A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev. 47, 100529 (2023). https://doi.org/10. 1016/j.cosrev.2022.100529 15. Mohamed, N., Belaton, B.: SBI model for the detection of advanced persistent threat based on strange behavior of using credential dumping technique. IEEE Access 9, 42919–42932 (2021). https://doi.org/10.1109/ACCESS.2021.3066289 16. Nian, R., Liu, J., Huang, B.: A review on reinforcement learning: introduction and applications in industrial process control. Comput. Chem. Eng. 139, 106886 (2020). https://doi.org/10.1016/j.compchemeng.2020.106886 17. Peng, Z., Xiaojing, G., Surya, N., Jianying, Z.: Modeling social worm propagation for advanced persistent threats. Comput. Secur. 108, 102321 (2021). https://doi. org/10.1016/j.cose.2021.102321 18. Peng, Z., Xiaojing, G., Surya, N., Jianying, Z.: Modeling social worm propagation for advanced persistent threats. Comput. Secur. 108, 102321 (2021). https://doi. org/10.1016/j.cose.2021.102321 19. Recht, B.: A tour of reinforcement learning: the view from continuous control. Annu. Rev. Control, Robot. Auton. Syst. 2, 253–279 (2019) 20. Sarker, I.H.: Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2(3), 160 (2021) 21. Singh, A.: Reinforcement learning based empirical comparison of UCB, epsilongreedy, and Thompson sampling. Int. J. of Aquatic Sci. 12(2), 2961–2969 (2021) 22. Tatam, M., Shanmugam, B., Azam, S., Kannoorpatti, K.: A review of threat modelling approaches for apt-style attacks. Heliyon 7(1), e05969 (2021). https://doi. org/10.1016/j.heliyon.2021.e05969 23. Tsikerdekis, M., Zeadally, S., Schlesener, A., Sklavos, N.: Approaches for preventing honeypot detection and compromise. In: 2018 Global Information Infrastructure and Networking Symposium (GIIS), pp. 1–6 (2018). https://doi.org/10.1109/GIIS. 2018.8635603 24. Wang, X., et al.: Deep reinforcement learning: A survey. IEEE Trans. Neural Networks Learn. Syst. 1–15 (2022). https://doi.org/10.1109/TNNLS.2022.3207346 25. Yang, J., Zhang, Q., Jiang, X., Chen, S., Yang, F.: Poirot: causal correlation aided semantic analysis for advanced persistent threat detection. IEEE Trans. Dependable Secure Comput. 19(5), 3546–3563 (2022). https://doi.org/10.1109/TDSC. 2021.3101649 26. Zeitz, K., Cantrell, M., Marchany, R., Tront, J.: Designing a micro-moving target IPv6 defense for the internet of things. In: 2017 IEEE/ACM Second International Conference on Internet-of-Things Design and Implementation (IoTDI), pp. 179– 184 (2017) 27. Zhuang, R., Zhang, S., Deloach, S., Ou, X., Singhal, A.: Simulation-based approaches to studying effectiveness of moving-target network defense. In: National symposium on moving target research, vol. 246. Citeseer (2012)
On the Statistical Analysis of an Individual-Based SI Model for Malware Propagation on WSNs E. Frutos-Bernal1(B) , A. Mart´ın del Rey2 , and Miguel Rodr´ıguez-Rosa1 1
Department of Statistics, Universidad de Salamanca, 37007 Salamanca, Spain {efb,miguel rosa90}@usal.es 2 Department of Applied Mathematics, IUFFyM, Universidad de Salamanca, 37008 Salamanca, Spain [email protected]
Abstract. Usually statistical techniques are employed to analyze malware behavior mainly through machine learning-based methods. However it seems legitimate to wonder if some statistical methods could be useful as a complementary tool to malicious code propagation models. This work explores this possibility with a first (and simple) application of the use of survival analysis to the study of the simulations obtained from a compartmental and individual SI model whose dynamics is described by means of a cellular automaton. The results obtained are in line with what could reasonably be expected. Keywords: Malware propagation · SI model Statistical analysis · Individual-based models
1
· Survival analysis ·
Introduction
The implementation and establishment of Internet of Things, Industry 4.0 and Smart Cities has made wireless connectivity ubiquitous, extending it to millions of new devices of all kinds. This massive connectivity is essential for the management of large amounts of data and the design of intelligent process automation that are expected to have a positive impact on our society. Wireless sensor networks (WSN for short) play an important role in the development of all these new paradigms [1,2]. This scenario is not exempt from several threats to security, many of which tend to exploit the limitations that many WSN devices present related with their computing resources and energy consumption [14]. Cybersecurity management in WSN networks should focus, among other things, on the design and implementation of malware control techniques. In this sense, having the most detailed knowledge possible of the processes of propagation and infection of a malware specimen is essential to be able to adequately design such detection and mitigation techniques. Consequently, the development and analysis of mathematical models that allow us to simulate the spread of malware is crucial for reducing its impact. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 187–196, 2023. https://doi.org/10.1007/978-3-031-42519-6_18
188
E. Frutos-Bernal et al.
Mathematical Epidemiology is the scientific discipline that has traditionally aimed at developing models that simulate the spread of biological agents. Currently, and due to the degree of digitization of our society, the propagation of malicious code has become a new object of study and analysis. The adjustment of the traditional models (devoted to the study of biological agents) to malware models has not been carried out properly since the development of malware models has been based on the same epidemiological framework so that both, the coefficients and the types of incidents used in their development, are defined in the same way to those used in the case of biological agents. Consequently, most of the models proposed to simulate the spread of malware lack sufficient realism to be efficient. The great majority of malware models proposed in the scientific literature are compartmental and global, that is, they are usually defined by means of differential equations to describe the dynamics of the system so that the densities of susceptible, infectious, etc. devices are computed. These models are very interesting from a theoretical point of view since they capture in a qualitative way some characteristics of the dynamics (qualitative behavior of steady states) [11,13,16]. Apart from this theoretical studies, also numerical analysis have been carried out with the aim of designing efficient numerical schemes to solve the systems of ordinary differential equations that govern the dynamics of the propagation process [9,12]. Global models exhibit major significant drawbacks consisting of ignoring the detailed topological structure of the network and the heterogeneity of the nodes (particular characteristics of the devices, etc.) These shortcomings can be circumvented by considering the individual-based paradigm. Individual models take into consideration the particular features of each node of the network (computing resources, energy consumption, location, activity, communication characteristics, etc.) so that the (epidemiological) state of each device can be computed at every step of time. Unfortunately, very few models of this kind have been proposed [8,15]. Propagation models not only make it possible to predict the behavior of malware on a wireless sensor network, but they are also a crucial tool for designing of control and mitigation methodologies. In this sense the importance of individual models not only lies in the fact that they can predict the evolution of malware in the network with a greater level of detail than the global ones, but also they allow to obtain data sets with which to train AI and ML malware detection techniques. Consequently, apart from the qualitative study that can be carried out on individual-based models, it seems very interesting to perform statistical studies that allow studying the data obtained for determining hidden correlations and other interesting conclusions. This is precisely the main objective of this work: to study the possibilities offered by statistical techniques (specifically survival analysis) in the analysis of data obtained from malware propagation model simulations. The rest of the paper is organized as follows: In Sect. 2 the notion of cellular automaton on a network is introduced. The description of the individual-based model and some illustrative simulations is shown in Sect. 3. In Sect. 4 a brief
Statistical Analysis of a SI Model for Malware Propagation on WSNs
189
analysis of the use of survival analysis is done. Finally the conclusions and future work is presented in Sect. 5.
2
Cellular Automata on Networks
Let us consider a complex network represented by the undirected graph G = (V, E) where V = {v1 , v2 , . . . , vn }. A cellular automaton on G (G-CA for short) can be defined as a finite state machine given by a 4-uplet AG = (C, S, N , F) where: – C is the cellular space of the G-CA where the i-th cell (denoted, for the sake of simplicity, as i ∈ C) stands for the node vi ∈ V. – S is the finite set of states that can be assumed by each cell/node at each step of time. Thus the state of the i-th cell at step of time t is denoted by state[i, t] ∈ S, with 1 ≤ i ≤ n. Moreover, C t = (state[1, t], . . . , state[n, t]) ∈ S n
(1)
is called configuration of the G-CA at step of time t. For t = 0, C 0 is the initial configuration of the cellular automaton. – N stands for the function that defines the neighborhood of each cell. This function assigns to each node its adjacent nodes: N : C → 2C
(2)
i → Ni = {i1 , i2 , . . . , iki } The image Ni is called neighborhood of node i and usually i ∈ Ni . Note that |Ni | = ki is the degree of vertex vi ∈ V, and if j ∈ Ni then (j, i) = (i, j) ∈ E. – F = {F1 , . . . , Fn } is the local transition functions family that governs the dynamic of the G-CA. In this sense, the state of the node i at a particular step of time t + 1 is defined by the local rule Fi whose variables are the states of the neighbor nodes at the previous step of time t, that is: state[i, t + 1] = Fi (state[i1 , t], . . . , state[imi , t]) ∈ S.
3 3.1
(3)
The Individual-Based SI Model General Description
The mathematical model for malware propagation on WSNs that will be studied in the next section is a compartmental model (SI model) where the device population is divided into two classes (compartments): susceptible devices -those devices that are not infected by malware-, and infectious devices. Two factors play crucial roles in shaping the dynamics of the malware propagation process on a WSN: (1) the internal structure of the wireless sensor network (that is,
190
E. Frutos-Bernal et al.
the specific contact topology), and (2) the heterogeneity of devices. As a consequence, the individual-based paradigm is the most adequate to simulate malware diffusion. In this sense the proposed model described below is based on a G-CA. The topology of the WSN where malware spreads determines both the cellular space and the neighborhood of the CA. Specifically, the i-th device represents the node vi ∈ V of the network, and there exists an edge (i, j) ∈ E if there exists a communication link between devices/nodes vi and vj , that is: j ∈ Ni and i ∈ Nj . As this is a SI model S = {0, 1} such that: 0, if the i-th node is susceptible at t state[i, t] = (4) 1, if i − th is infected at t The specifications of the functions that determines the transition between states define the nature of the model. In this work we will propose a deterministic approach where the local transition rules are as follows: – If state[i, t] = 1, then state[i, t + 1] = 1. – If state[i, t] = 0, then state[i, t + 1] =
> ui (t) 1, if |Iik(t)| i 0, otherwise
(5)
where Ii (t) ⊂ Ni is the set of infectious neighbors of the i-th node at step of time t, and 0 < ui (t) ≤ 1 is a threshold parameter which depends on the particular characteristics of node and the step of time. Note that individual-based models not only provide us the particular (epidemiological) evolution of each node but also the global evolution of the system. In this sense nd(t) stands for the total number of infected sensor nodes at time t, d(t) = i=1 state[i, t], whereas s(t) = n − d(t) represents the number of non-infected nodes. 3.2
Illustrative Simulations
In this subsection some illustrative simulations are shown. In all of them the number of devices is n = 100, and the simulations are computed during a period of 10 units of time: 0 ≤ t ≤ 10. Moreover, it is supposed that there is only one infectious device (“patient zero”) at t = 0, and the threshold parameter considered is the same for all devices: ui (t) =
˜ I(t) , k
1 ≤ i ≤ n,
(6)
˜ = 1 ˜ where I(t) 1≤i≤n Ii (t) is the mean of the number of infectious neighbors n at t, and k is the average degree of the network G.
Statistical Analysis of a SI Model for Malware Propagation on WSNs
191
On the other hand it is considered different initial conditions related to the topology of the device network: different types of networks are considered (random networks, scale-free networks and small-world networks) and also the structural characteristics of the patient zero will vary from one simulation to another (that is, it will be chosen taking into account the degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality). In Fig. 1 the global evolution of the number of susceptible and infectious devices on a ER random network G with edge probability 0.1 is shown. In this case the patient zero is the node with highest degree centrality. Note that an endemic steady state is reached. number 100
80
I 60
40 S
20
2
4
6
8
10
time
Fig. 1. Global evolution of the number of susceptible and infectious devices on a random complex network.
In Fig. 2 the individual evolution of the system is shown when a scale-free complex network is considered determined by the Barab´ asi-Albert algorithm where a new vertex with 3 edges is added at each step, and where the patient zero is chosen as the node with the highest closeness centrality.
Fig. 2. Individual evolution of network devices on a scale-free complex network.
192
E. Frutos-Bernal et al.
Finally, in Fig. 3 some epidemiological states of a small-world network G determined by the Watts-Strogatz algorithm rewiring probability 0.1 are shown at different steps of time. In this case the patient zero has been chosen considering the highest eigenvector centrality.
Fig. 3. Individual evolution of network devices on a scale-free complex network.
4 4.1
Survival Analysis Applied to Malware Propagation The Fundamentals
Survival analysis is the term used to describe the statistical analysis of data related to the determination of the period of time that elapses from an initial time to the occurrence of an event. Survival analysis was originally developed to measure lifespans of individuals (see, for example, [10]) but it has been also applied to analyze any process duration (see, for example [5,6]). As is mentioned in the Introduction section, this work deals with the study of the role of survival analysis in the field of propagation of malware on a wireless sensor network. Specifically, in this case a statistical analysis of the elapsed time between the activation of the sensor device and its “contagion” is performed.
Statistical Analysis of a SI Model for Malware Propagation on WSNs
193
Let T be the random variable which measures the time it takes for a sensor node to become infected. Suppose this variable has a probability distribution F (t) whose probability density function is f (t). Thus: t f (u)du. (7) F (t) = P (T < t) = 0
The survival function, S(t), is defined as follows: ∞ S (t) = P (T ≥ t) = f (u)du = 1 − F (t) ,
(8)
t
and therefore it represents the probability that a sensor node remains susceptible (uninfected) at least until time t. A non parametric method named Kaplan-Meier estimator is used to estimate the survival function S(t), and it is defined as follows [3]: m(ti ) − d(ti ) d(ti ) = S(t) (9) 1− = m(t ) m(ti ) i t 1. Notice that τ < ε + μ is a necessary condition for the stability requirement R0 < 1. 2.2
The Stochastic MDBCA Model
The deterministic exposition to the malware diffusion problem presented in Subsect. 2.1 by the use of the MDBCA model can not capture the inherent randomness of epidemiological dynamic. Then stochastic techniques are needed. An essential term in (1) is the incidence and, specifically, the parameter λ, which has a random nature. Consequently, compartmental models governed by stochastic differential equations are needed. In these models the randomness will be introduced perturbing the parameter λ by the introduction of a random noise ξt with intensity θ becoming the stochastic term λ + θ ξt . So the deterministic MDBCA system turns into the stochastic MDBCA model driven by the stochastic differential equation (SDE) in the Itô sense dSt = (μ − (λ + τ )S(t)I(t) − μS(t)) dt − θSt It dWt dEt = (λS(t)I(t) − (μ + σ)E(t)) dt + θSt It dWt dIt = (τ S(t)I(t) + σE(t) − (ε + μ)I(t)) dt dRt = (εI(t) − μR(t)) dt,
(7)
where Wt represents a scalar Wiener process. The SDE (7) has a unique equilibrium P0 = (1, 0, 0, 0). There are different notions of stability in the stochastic case. Here, using the concepts of mean-square stability and stability in probability and their relations we will study the stability of the stochastic equilibrium P0 .
3 3.1
Stability of the Stochastic MDBCA System Mean-Square and Stochastic Stability of the Linearized System
If we linearize (7) around the equilibrium P0 and define Xt = (Xt1 , Xt2 , Xt3 , Xt4 ) = (St − 1, Et , It , Rt ), we obtain the linear SDE dXt = M Xt dt + N Xt dWt , with
⎛
−μ 0 −λ − τ ⎜ 0 −σ − μ λ M =⎜ ⎝ 0 σ −μ − ε + τ 0 0 ε
⎞ ⎛ 0 0 0 −θ ⎜0 0 θ 0 ⎟ ⎟, N = ⎜ ⎝0 0 0 0 ⎠ −μ 00 0
(8) ⎞ 0 0⎟ ⎟. 0⎠ 0
(9)
200
S. Llamazares-Elías and A. Tocino
Denoting by Z the transpose of (the vector or the matrix) Z, the second moment of the solution of the linear system (8)-(9) is given by the matrix P (t) = E[Xt Xt ] with entries pij (t) = E[Xti Xtj ], i, j = 1, . . . , 4. It is known, see [3], that P (t) satisfies the equation dP (t) = M P (t) + P (t) M + N P (t) N . dt
(10)
Notice that P (t) is a symmetric matrix; then (10) reduces to a linear system of ten differential equations which can be written dY = MY , dt
(11)
where the vector Y has components pij (t), i, j = 1, . . . , 4, i ≤ j. For the sake of simplicity, we take Y = (p11 (t), p14 (t), p44 (t), p24 (t), p34 (t), p12 (t), p13 (t), p22 (t), p23 (t), p33 (t)) . With this arrangement, the system (10) reduces to the deterministic differential system (11) with ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ M=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
−2μ 0 0 0 0 0 2(−λ − τ ) 0 −2μ 0 0 −λ − τ 0 0 0 −2μ 0 2 0 0 0 0 0 −2μ − σ λ 0 0 0 0 0 σ −2μ + τ − 0 0 0 0 0 0 0 −2μ − σ λ 0 0 0 0 0 σ −2μ + τ − 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⎞ 0 0 θ2 ⎟ 0 0 0 ⎟ ⎟ 0 0 0 ⎟ ⎟ 0 0 ⎟ ⎟ 0 0 ⎟ 2 ⎟ 0 −λ − τ −θ ⎟ ⎟ 0 0 −λ − τ ⎟ 2 ⎟ 2(−μ − σ) 2λ θ ⎟ ⎠ σ −2μ − σ + τ − λ 0 2σ 2(−μ + τ − )
Since the asymptotic mean-square stability of the equilibrium of (8) is equivalent, see [3], to the asymptotic stability of the second moment P (t), MS-stability reduces to the stability of the equilibrium Y ≡ 0 of the linear equation (11). To
Stability Analysis of a Stochastic Malware Diffusion SEIR Model
201
sum up, mean-square stability analysis of (8) resolves around the conditions for the eigenvalues of M to lie in the left half-plane. Since M is a block triangular matrix of the form ⎛ ⎞ A ⎜0 B ⎟ ⎜ ⎟ ⎝0 0 B ⎠ 0 0 0 C where A, C are 3×3 matrices and B is a 2×2 matrix the eigenvalues of M are the combined eigenvalues of A, B, C. The eigenvalues of A are obviously negative. The eigenvalues of B are 1
−4μ − σ − + τ ± 4λσ + (σ + τ )2 + 2 − 2(σ + τ ) 2 and both are negative when λσ < (2μ + σ)(2μ − τ + ).
(12)
Notice that (12) implies τ < 2μ + ε. The eigenvalues of C are not easily manipulable, but its characteristic polynomial is given by P (x) = x3 + a1 x2 + a2 x + a3 with a1 = 3(2μ + σ − τ + ), a2 = 2 −2λσ + 6μ2 + 6μσ + σ 2 + τ 2 + 2 − 2τ (3μ + 2σ + ) +6μ + 4σ) , a3 = − 4λμσ − 4λσ 2 − 2σ 2 θ2 − 4λσ(μ − τ + ) + 4σ(μ + σ)(μ − τ + ) + 4(μ + σ)(μ − τ + ) + 8μ(μ + σ)(μ − τ + ) − 4τ (μ + σ)(μ − τ + ). Then Routh-Hourwitz conditions for C can be written σ 2 θ2 + (2μ + σ − τ + ) ((4μ + σ − 3τ + 3)(4μ + 3σ − τ + ) − 4λσ) > 0, 2μ + σ − τ + > 0,
(13) (14)
2(2μ + σ − τ + )((μ + σ)(μ − τ + ) − λσ) − σ 2 θ2 > 0
(15)
Condition (15) can be written θ2 σ2 −τ (λσ + μτ + στ ) + (2μ + σ + )(λσ + μτ + στ ) 2 + τ (μ + σ)(μ + ) < (μ + σ)(μ + )(2μ + σ + ) which is equivalent to
τ τ R0 :=R0 1 − + ε + 2μ + σ ε + 2μ + σ σ2 θ2 1 < 1. + 2 (μ + σ)(ε + μ)(ε + 2μ + σ)
where the definition in (6) has been used.
(16)
202
S. Llamazares-Elías and A. Tocino
Theorem 1. If τ < ε+σ, the equilibrium of the linear SDE (8) is asymptotically MS-stable if and only if condition (16) holds. Proof. Since ε, λ, μ, σ, μ are positive constants, the eigenvalues of M are negative if and only if (12)-(15) hold. Suppose that τ < ε + σ and (15) hold; then (14) holds, and using (14) in (15) gives λσ < (μ + σ)(μ − τ + ),
(17)
which in turn implies condition (12). On the other hand, from (17) 4λσ < 4(μ + σ)(μ − τ + ) < 4(μ + σ)(μ − τ + ) + 3(2μ + σ − τ + )2 = (4μ + σ − 3τ + 3)(4μ + 3σ − τ + ).
(18)
Using (14) and (18) we conclude that (13) holds. The value defined in (16) can be considered a stochastic epidemiological threshold, named mean-square reproductive number. Recall the condition lim P lim Xt (c) = 0 = 1 c→0
t→∞
for the trivial solution of a stochastic system to be asymptotically stable in probability (or stochastically stable). For a linear SDE with constant coefficients it was proved, see [5], that MS-stability implies stochastic stability. Then, from Theorem 1 we conclude: Corollary 2. If τ < ε + σ and the condition (16) holds then the equilibrium of the linear stochastic differential equation (8) is asymptotically stable in probability. 3.2
Stochastic Stability of the MDBCA Model
Finally, using that the stochastic stability behaviour of an SDE can be derived from the analysis of the linear counterpart equation, we obtain: Theorem 3. If τ < ε + σ and condition (16) holds, then the disease free equilibrium P0 = (1, 0, 0, 0) of the stochastic MDBCA model (7) is asymptotically stable in probability. Proof. Under the assumptions, the linear system (8) is stochastically asymptotically stable, see Corollary 2. From here, see Theorems 7.1 and 7.2 in [7], the trivial solution of (7) is asymptotically stable in probability. Remarks Be aware that 1. Condition (16) depends on all parameters of the model. The threshold R0 increases as the intensity of the noise θ increases.
Stability Analysis of a Stochastic Malware Diffusion SEIR Model
203
2. When τ = 0 the studied equation represents the classic SEIR model with demography. Notice that in this case R0 := R0 +
1 2 2 2σ θ
(μ + σ)(ε + μ)(ε + 2μ + σ)
and (a) since dR0 σ = dλ (μ + σ)(μ + ) then R0 decreases as λ decreases; (b) since σ 2 3θ2 μ + 2λμ + θ2 dR0 = dσ 2(μ + σ)2 (μ + )(2μ + σ + )2 2μσ θ2 + 2λ (2μ + ) + 2λμ(2μ + )2 + , 2(μ + σ)2 (μ + )(2μ + σ + )2 then R0 decreases as σ decreases; (c) since σ θ2 σ(3μ + σ) + 2λ(2μ + σ)2 dR0 =− dε 2(μ + σ)(μ + )2 (2μ + σ + )2 σ 2λ2 + 2θ2 σ + 4λ(2μ + σ) − 2(μ + σ)(μ + )2 (2μ + σ + )2 then R0 decreases as ε increases; (d) since σ θ2 σ 6μ2 + 6μσ + σ 2 + 2λ(2μ + σ)3 + 2λ3 dR0 =− dμ 2(μ + σ)2 (μ + )2 (2μ + σ + )2 2 2 σ θ σ + 6λ(2μ + σ) − 2(μ + σ)2 (μ + )2 (2μ + σ + )2 σ θ2 σ 2 θ2 σ(3μ + 2σ) + 3λ(2μ + σ)2 − , 2(μ + σ)2 (μ + )2 (2μ + σ + )2 R0 decreases as μ increases.
4
Conclusions
The stability of the stochastic MDBCA problem has been studied by means of MS-stability analysis of its linearized problem. Sufficient conditions for the stability in probability of the MDBCA model are given. These conditions have led to the formulation of a number R0 that can be considered a stochastic epidemiological threshold. Finally a trivial analysis of the threshold monotonicity is done. Further work that refine the obtained threshold is in progress.
204
S. Llamazares-Elías and A. Tocino
References 1. Abdulkarem, M., Samsudin, K., Rokhani, F.Z., A Rasid, M.F.: Wireless sensor network for structural health monitoring: a contemporary review of technologies, challenges, and future direction. Struct. Health Monit. 19(3), 693–735 (2020) 2. Acarali, D., Rajarajan, M., Komninos, N., Zarpelão, B.B.: Modelling the spread of botnet malware in IoT-based wireless sensor networks. Secur. Commun. Netw. 2019, 3745619 (2019) 3. Arnold, L.: Stochastic Differential Equations: Theory and Applications. Wiley, New York (1974) 4. Durišić, M.P., Tafa, Z., Dimić, G., Milutinović, V.: A survey of military applications of wireless sensor networks. In: Mediterranean Conference on Embedded Computing, pp. 196–199 (2012) 5. Gikhman, I.I.: Stability of solutions of stochastic differential equations. In: Limit Theorems Statist. Inference, pp. 14-45. Izdat. “Fan”, Tashkent (1966). English transl.: Selected Transl. Statist. and Probability, vol. 12, pp. 125-154. Am. Math. Soc., Providence, 1973 6. Ko, J., Lu, C., Srivastava, M.B., Stankovic, J.A., Terzis, A., Welsh, M.: Wireless sensor networks for healthcare. Proc. IEEE 98(11), 1947–1960 (2010) 7. Khasminskii, R.: Stochastic Stability of Differential Equations, 2nd edn. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-23280-0 8. Kumari, S., Upadhyay, R.K. Exploring the behavior of malware propagation on mobile wireless sensor networks: stability and control analysis. Math. Comput. Simul. 190, 246–269 (2021) 9. Ojha, R.P., Srivastava, P.K., Sanyal, G., Gupta, N.: Improved model for the stability analysis of wireless sensor network against malware attacks. Wirel. Pers. Commun. 116, 2525–2548 (2021) 10. Zhang, H., Shen, S., Cao, Q., Wu, X., Liu, S.: Modeling and analyzing malware diffusion in wireless sensor networks based on cellular automaton, Int. J. Distrib. Sens. Netw. 16(11), 1550147720972944 (2020)
General Track
Modelling and Simulation of Wind Energy Systems: Learning-by-Doing in a Master’s Course L´ıa Garc´ıa-P´erez1(B) and Matilde Santos2 1
2
Universidad Complutense, 28040 Madrid, Spain [email protected] Institute of Knowledge Technology, Universidad Complutense, 28040 Madrid, Spain [email protected]
Abstract. In the Master’s course in Energy, taught at the Faculty of Physics of the Complutense University of Madrid, the optional subject “Modeling and Simulation of Energy Systems. Projects” is part of the courses, among other subjects. This course has a practical nature, which facilitates the dynamisation of the class. The learning-by-doing approach is proving very useful for students to acquire new concepts and learn to use software tools they are not familiar with. This paper describes a series of actions that have been carried out with the aim of making classes more dynamic and facilitating learning. The range of activities has been very varied, from the presentation of articles, the discussion of practical cases, computer practices, team work, ... The results both in grades and in the degree of satisfaction of students and teachers are very good. Keywords: energy systems · learning-by-doing · modelling simulation · teaching activities · practical learning
1
·
Introduction
The objectives of the Master’s Degree in Energy of the Faculty of Physics of the Complutense University of Madrid are to provide students with the necessary theoretical and practical knowledge to be able to tackle the challenges that the development of energy sources will pose in the short and medium term. Master students are trained to be able to carry out their work in the field of energy sources. The subject “Modelling and Simulation of Energy Systems. Projects” is an optional subject in the second semester of the master degree. It is an eminently transversal subject whose objective is the acquisition of the necessary skills and knowledge to model and simulate energy processes in any of their aspects and modalities. This course has a practical nature, which facilitates the dynamisation of the class. This has allowed it the use of different teaching perspectives. One of them, the learning-by-doing approach, has been proved very useful for students c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 207–216, 2023. https://doi.org/10.1007/978-3-031-42519-6_20
208
L. Garc´ıa-P´erez and M. Santos
to acquire new concepts and to learn how to use software tools they are not familiar with [3]. Many experts on education have pointed out that a variety of cognitive skills and higher-order thinking skills can be nurtured through their application to a practical context [1]. Learning-by-doing is an educational approach that makes use of engaging learning activities that are tailored to the needs and interests of the learners. In the same way, teachers who utilize the learning by doing approach motivate students to learn by stimulating their curiosity [1]. In this subject, this approach has been especially necessary since, being a cross-curricular subject, what is intended is not to transmit abstract concepts and theoretical content, but to make them acquire programming and simulation skills, and above all, that they know how to apply this computational approach to address any power system. As insisted from the beginning, the objective of this course is that they learn to obtain all the information possible by simulating an energy system, to analyze the data and, most importantly, that this information serves as a basis for making decisions. In order to achieve this objective, which implies a more comprehensive training and focus rather than theoretical terms or content, a series of activities have been proposed. This paper describes some of the activities that have been carried out with the aim of making classes more dynamic and facilitating learning. The range of activities covers the presentation of articles, the discussion of practical cases, quiz games, computer practices, team work, etc. The results both in grades and level of satisfaction of students and teachers is very high. They show that the students have successfully integrated the knowledge and skills acquired within the different activities. On the other hand, the evaluation of the different activities by the students regarding their usefulness for the learning allows us to redesign some of them or focus on specific ones. The new proposal shows that one of the activities that has been offered for the first time has been the best valued. Besides, the marks of the students have slightly improved regarding previous years, meaning that they were more motivated by this methodology. The structure of the paper is as follows. After a brief state of the art, Sect. 2 presents the main activities carried out along the academic term in that course. Section 3 shows the results of the students’ evaluation of the activities. Conclusions and future work end the paper.
2
Brief State-of-the Art: Learning-by-Doing
Numerous models and strategies have been presented for improving education over time. An effective approach to disseminate education is learning-by-doing [1]. The learning-by doing approach is defined as learning that is a result of ones’ own actions, efforts, and experiences. It requires students to be actively involved in the learning process. It is not just the teacher who delivers the lectures; rather, students are expected to practice and participate in learning by performing learning activities as part of their learning process, which are dependent on the nature of the course [2].
Learning-by-Doing Wind Energy Systems
209
Just to mention some recent papers on this topic, [3] carry out a study to explore and produce knowledge about the pedagogical approach of learning bydoing and making in the context of craft and technology education in Finland. The findings of this study support the argument that technology education has the potential to develop students’ skills in many ways by providing pupils with opportunities to work in a practical way, accessing the domain of technological knowledge and working technologically. The paper by [4], summarizes the experience of an European project consortium that sought stimulating students with different academic backgrounds to establish connections across disciplines and in raising their awareness about climate change. They have developed and tested a methodology for the design of a structured ordinary practice for teaching urban decarbonisation to students in Higher Education. They offered students a combination of different approaches and working methods, that deployed in four co-working modules. In [2] the CodeLab tool, based on the learning-by-doing approach, has been incorporated into a programming course for a bachelor’s degree program. The learners are new to programming concepts and the tool is likely to help them acquire problem-solving skills.
3
Learning-by-Doing Activities in the Master Course
There are some experiences that proves the benefits of simulations and experiential learning. Specifically, students are able to apply knowledge from other classes to real-world situations, all while honing other skills (communication, presentation, team work), as well as their ability to analyze and synthesize information-skills that are critical to success in their decisions. This practical experience also gives students confidence as they prepare for and make decisions about their future career paths [5]. These are some of the goals pursued with the course “Modeling and Simulation of Energy Systems. Projects” , where master’s students apply the knowledge acquire in other subjects such as: wind energy, solar energy, nuclear energy, etc. The syllabus of the course consists of the following contents: 1. 2. 3. 4. 5. 6. 7. 8.
Systems and models: Types of models, examples. Applications. Obtaining models: modelling and identification. Bond graphs. Model building: representation, linearisation, verification and validation. Simulation. Introduction. Continuous and discrete simulation. Phases of simulation. Analysis of results and documentation of the simulation. Simulation tools. Distributions. Areas of application. Examples in the field of energy systems.
In each of the 8 sections of the course, in addition to the lectures, practical activities have been developed to facilitate the involvement of the students and to improve their learning.
210
L. Garc´ıa-P´erez and M. Santos
These activities can be divided into the following types: (1) Kahoot! quiz games, (2) case studies, (3) practical exercises and (4) energy system simulation project. 3.1
Kahoot! Quiz Games
Kahoot! is a digital game-based learning platform that allows teachers and students to interact through competitive knowledge games. It is a free web-based platform that allows users to create interactive quizzes and surveys. The benefits of using Kahoot! in education have been studied in recent years. It has been shown to increase student motivation and engagement [6,7] but also to improve performance over more traditional approaches [8]. Kahoot! has been used with two different objectives: (1) to encourage the practical application of theoretical knowledge in real cases and (2) to detect concepts that need to be reinforced. Specifically, on the topic of systems and models, a Kahoot! was carried out in which the students had to analyse for different models whether they are continuous or discrete, deterministic or stochastic, and dynamic or static. Figure 1 shows the results of the game in the Master’s class in the 2022– 23 academic year. It can be seen that the performance is similar for most of the students. Analysing the students’ answers to the different questions it was seen that the most frequent error among the students is the confusion between dynamic versus static systems.
Fig. 1. Results of Kahoot! play course 2022-23
3.2
Case Studies
The use case study is a traditional tool in engineering education. It allows students to approach a real example, in some cases well known, with a comprehensive and occasionally complete overview. In the case of the subject of “Modeling and Simulation of Energy Systems. Projects”, the case study is used to approach two course topics: (1) system modelling and (2) the phases of simulation in business projects.
Learning-by-Doing Wind Energy Systems
211
System Modelling. Students were asked to choose a scientific article on modelling in one of their areas of interest. Individually each student studied the article and made a 5-minute presentation to the rest of the class. This activity worked on different soft skills, in addition to the more specific contents of modelling: search for scientific articles using specific search engines such as google scholar, analytical ability to extract the most important aspects of the article related to modelling, ability to organise the information and oral presentation of the work in a short time. Phases of Simulation Projects. In this case, the aim was for the students, in groups of 3 or 4, to analyse a practical case and make a proposal of how they would deal with a simulation project, according to what they had studied in the course. In any of the cases, they had to answer the following questions: (1) Could you use the simulation to help the company in this case? (2) What data do you need to collect? What sources can you use? How are you going to collect this data? (3) What kind of model will you build? (4) How will you validate that model? (5) What results do you expect to get from the simulation? What aspects do you expect your simulation to contribute to? and (6) What documents will you produce? Each group has to present its proposal orally to the rest of the class and answer questions. The case studies proposed to the students are the following: 1. “Electr´ onica S.L.” is a small manufacturer that produces electronic components used by other manufacturers. Recently, they became aware of problems in a department that produces three different parts. The demand for these three products has slowly changed over time. The department is almost fully automated and consists of four lines. Each of the first three lines produces a single product. When the “almost finished” parts (Parts A, B, and C parts) leave these three lines, they all enter the fourth line. The new product range has led to the fourth line becoming a the fourth line to become a bottleneck [9]. 2. “RENT S.L.” was started as a cheap but friendly alternative for rental cars at the airport. Because”RENT S.L.” has been a low-cost provider, its rental counters are not typically located in the airport property area. Therefore, transport between the pick-up points at the airport and the rental counters is longer than that of the competition [9]. 3. “UNaBirra” is a small craft brewery located in a town on the outskirts of Madrid. They are therefore considering the installation of a solar energy plant through photovoltaic panels, hoping to obtain environmental benefits, reduce their carbon footprint and reduce the total dependence on the electricity company. 3.3
Practical Exercises
The practical exercises used in the course are designed to get students to apply the knowledge they have learnt to real problems. These “real world” situations provide a challenge that allows students to mobilise the knowledge they
212
L. Garc´ıa-P´erez and M. Santos
have acquired and deepen their understanding of the more complex concepts of the subject. Therefore, can be said that the practical exercises used in the course fall under the active learning methodology called challenge-based learning. Challenge-based learning actively engages the student in a relevant and challenging situation related to a real-world context. It involves the acquisition of knowledge, the definition of a problem and the implementation of a solution(s) [10,11]. The following practical exercises have been used in the course: – Modelling using the different techniques studied (differential equation modelling, state space modelling and transfer function modelling) of different real systems. – Identification of: (1) a system using data and (2) a first order system using Matlab and Simulink. – Simulation of different real problems using Matlab: simulation of accumulated blood toxicity, simulation of tumour growth, simulation of an ecological system and others. – Simulation of a discrete event system using Arena software from the company Rockwell Automation. – Modelling and simulation of a stochastic system. With real data (obtained in another subject or from a public internet database) students have to model wind speed using different probability distributions. – Simulation of energy resources, [12,13]. The following tasks are proposed to the students: (1) Implement and simulate with Matlab/Simulink the wind resource, (2) Implement and simulate with Matlab/Simulink the solar resource (irradiation), (3) Analyse the model, implement it and simulate with Matlab/Simulink a wind turbine (or others obtained from the literature), (4) Analyse the model, implement it and simulate with Matlab/Simulink a solar cell (or others obtained from the literature), (5) Search on the Internet the energy demand of a system (house, building, industry, company, institution, etc.) and simulate it. 3.4
Energy System Simulation Project
70% of the final mark of the course corresponds to a modelling and simulation project in the field of energy. Each student freely chooses a project in their area of interest where they have to carry out a simulation, putting into practice the knowledge acquired in the course. This project will be presented in a kind of mini scientific congress in the classroom. Each student will present a written work that will have the usual form of a communication in a scientific congress and will make a presentation to the rest of the students and teachers of a maximum duration of 15 min, having a 5-minute question time at the end. Students are provided in advance with evaluation rubrics for both written work and presentation. As an example, the titles of some of the papers presented by students from past courses are shown below: “Radiactive chains simulation tool in Matlab”, “Model of the thermal behavior of a mobile prefabricated house”, “Modeling
Learning-by-Doing Wind Energy Systems
213
and simulation of a Radioisotope Thermoelectric Generator”, “Simulation of the behavior of a lithium battery system and photovoltaic panel system applied to an electric vehicle using Matlab”.
4
Results and Discussion
To assess the students’ perception of the usefulness of the activities, a survey was prepared that they filled out anonymously at the end of the course. In this survey they were asked about each of the activities carried out during the course, to know the degree to which they perceived them as useful in their learning process. All questions were rated from 1 to 5 using a Likert scale. For example: “To study the phases of a simulation project, we suggested that you study three case scenarios in teams (an electronics assembly company, an airport car rental service company, and a craft brewery). Indicate to what extent this team exercise helped you to improve your understanding of the phases of a simulation project (1 means that it did not help you at all and 5 means that it helped you a lot)”. Figure 2 shows the histogram of students’ evaluation of the usefulness of the activities described in this article (1 indicates in all cases that the activity was perceived as not useful at all and 5 as very useful).
Fig. 2. Student’s evaluation of the activities performed in the course
It can be seen that the activity valued as most useful by the students is the modeling and simulation project of an energy system and the least useful is the case study. In both, the case study and the practical exercises, the students were asked to rate each of the activities of these types carried out. The histograms of the detailed responses in both cases are shown in the Figs. 3 and 4. The most valuable practice for students is the practice of modeling real systems with Matlab and the least valuable is the practice of modeling systems
214
L. Garc´ıa-P´erez and M. Santos
without a computer, Fig. 3. In the case of case studies the activity referred to the study of the simulation phases in three different real cases is perceived slightly more useful than that of the study of a scientific modeling article, Fig. 4.
Fig. 3. Individual practical exercises student’s evaluation
Fig. 4. Individual case studies evaluation
The last two questions asked the students to rank the activities from 1 to 9. In the first case by degree of usefulness (9 being the most useful activity and 1 the least). In the second case by how motivating they found them (9 is the most motivating and 1 the least). The Fig. 5 shows the mean values of the positions of each activity obtained from the rankings of each of the students. The graph in Fig. 5 shows that the activity considered most useful and motivating is the energy systems modeling and simulation project. The activity considered least useful is the modeling case study in a scientific article. However, the least motivating activity was the non-computer modeling exercises.
Learning-by-Doing Wind Energy Systems
215
Fig. 5. Activities ordered by usefulness and motivation (mean values)
5
Conclusions and Future Works
The study carried out in this work on learning by doing, describing activities for educational purposes carried out in a master’s degree subject, has been very inspiring. It have served both students and teachers to reflect on the usefulness of some practices. In general, the activities that accompany the theoretical classes are welcome by the students. Being an optional subject, there exists an initial interest from the students, in a higher education environment, a master’s degree, and in this specific case, master’s studies on energy. The conclusions of the analysis of the evaluation results that the students have made of the activities confirm that they feel attracted to real problems, and that putting knowledge into practice, even through simulation tools, helps them to consolidate the concepts and to learn more deeply. As future work, it is proposed to develop class debates so that students can get some knowledge about the work that an engineer does in a company when addressing a problem in the field of energy. From choosing the right computational tools for analysis to the evaluation of the cost of the deployment of a particular solution. This would give them a very realistic vision of this currently booming sector. Acknowledgments. This work has been partially supported by Spanish Ministry of Science and Innovation project MCI/AEI/FEDER number PID2021-123543OBC21.
References 1. Williams, M.: John dewey in the 21st century. J. Inq. Action Educ. 9, 91–102 (2017) 2. Iftikhar, S., Guerrero-Rold´ an, A.E., Mor, E.: Practice promotes learning: analyzing students’ acceptance of a learning-by-doing online programming learning tool. Appl. Sci. 12(24), 12613 (2022)
216
L. Garc´ıa-P´erez and M. Santos
3. Niiranen, S.: Supporting the development of students’ technological understanding in craft and technology education via the learning-by-doing approach. Int. J. Technol. Des. Educ. 31, 81–93 (2021) 4. Maccanti, M., et al.: Learning-by-doing methodology towards urban decarbonisation: an application in valletta (Malta). Sustainability 15(7), 5807 (2023) 5. Bradberry, L.A., De Maio, J.: Learning by doing: the long-term impact of experiential learning programs on student success. J. Polit. Sci. Educ. 15(1), 94–111 (2019) 6. Mekler, E.D., Br¨ uhlmann, F., Tuch, A.N., Opwis, K.: Towards understanding the effects of individual gamification elements on intrinsic motivation and performance. Comput. Hum. Behav. 71, 525–534 (2017). Elsevier 7. Minton, M., Brett, B.: Examining the use of Kahoot to support digital game-based formative assessments in UAE higher education. Stud. Technol. Enhanced Learn. 1(2), 445–462 (2021) 8. Ortiz-Mart´ınez, E., Santos-Ja´en, J.M., Palacios-Manzano, M.: Games in the classroom? Analysis of their effects on financial accounting marks in higher education. Int. J. Manag. Educ. 20(1), 100584 (2022). https://doi.org/10.1016/j.ijme.2021. 100584 9. Main Yaque, P.: Simulaci´ on de Sucesos Discretos. Pr´ acticas en casos reales con R. Competici´ on de estudiantes de simulaci´ on (2016) 10. Hern´ andez-de-Men´endez, M., Vallejo Guevara, A., Tud´ on Mart´ınez, J.C., et al. Active learning in engineering education. a review of fundamentals, best practices and experiences. Int. J. Interact. Des. Manuf. 13, 909–922 (2019). https://doi.org/ 10.1007/s12008-019-00557-8 ´ 11. Mu˜ noz de la Pe˜ na, D., Dom´ınguez, M., Gomez-Estern, F., Reinoso, Oscar., Torres, F., Dormido, S.: State of the art of control education. Revista Iberoamericana de Autom´ atica e Inform´ atica industrial 19(2), 117–131 (2022). https://doi.org/10. 4995/riai.2022.16989 12. Ramos-Teodoro, J., Rodr´ıguez, F.: Distributed energy production, control and management: a review of terminology and common approaches, Revista Iberoamericana de Autom´ atica e Inform´ atica industrial 19(3), 233–253 (2022). https://doi. org/10.4995/riai.2022.16497 13. Mikati, M., Santos, M., Armenta, C.: Electric grid dependence on the configuration of a small-scale wind and solar power hybrid system. Renew. Energy 57, 587–593 (2013)
Personalised Recommendations and Profile Based Re-ranking Improve Distribution of Student Opportunities ˇ ek Zid ˇ (B) , Pavel Kord´ık, and Stanislav Kuznetsov Cenˇ Faculty of Information Technology, Czech Technical University in Prague, Prague, Czechia [email protected]
Abstract. Modern technical universities help students get practical experience. They educate thousands of students and it is hard for them to connect individual students with relevant industry experts and opportunities. This article aims to solve this problem by designing a matchmaking procedure powered by a recommendation system, an ontology, and knowledge graphs. We suggest improving recommendations and reducing the cold-start problem with a re-ranking module based on student educational profiles for students who opt-in. Each student profile is represented as a knowledge graph derived from the successfully completed courses of the individual. The system was tested in an online experiment and demonstrated that recommendations based on student educational profiles and their interaction history significantly improve conversion rates over non-personalised offers. Keywords: cooperation with industry · knowledge graphs · recommender system · student profiling · job recommendation ontology
1
·
Introduction
Recommender systems (RS) help users explore large catalogues of items efficiently. Most RSs today combine collaborative filtering methods with contentbased strategies and heuristics such as contextual bandit approaches [1]. We focus on recommendations of job opportunities for university students. There are many papers and research on job recommendation [2,3], but there has been very little research on job recommendation for students. We aim to give students practical experience during their mostly theoretical studies. The job market is often large and confusing, especially for inexperienced students. [4] We aim to create a portal to help students navigate the labour market and recommend other potentially relevant schools, research and work opportunities. It could benefit students and potential employers to match-make them using an RS. [5] We believe this work could help students find their first c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 217–227, 2023. https://doi.org/10.1007/978-3-031-42519-6_21
218
ˇ Zid ˇ et al. C.
employment and help them realise the future specialisation they want to pursue. This article describes how we created such a system and improved a RS to make the matchmaking process more efficient. First, we explore current state-of-the-art methods concerning (student) job recommendation. We introduce our approach, presenting some novel ideas, such as a re-ranking module utilizing a knowledge graph from student profiles that improves the performance of a standard interaction-based RS. We focus on explainability for users. We evaluate the performance of the prototype in online A/B testing. Our system was deployed to production, serving over a thousand computer science students and sixty companies in the ecosystem.
2 2.1
State-of-the-Art Job Recommendation
Many platforms try to connect people seeking jobs with the companies that offer those jobs (such as LinkedIn or jobboard companies). The usage of RS for jobboards is quite evident. The RS can be used for both sides by recommending available jobs to candidates and potential candidates to companies seeking employees. The job board RS operates differently from other RS. Unlike e-shops or multimedia platforms, it does not recommend relevant candidates to additional users. LinkedIn’s RS researchers faced unique challenges in tuning their system [2]. Real-time personalized recommendations pose a challenge when scoring millions of candidate job documents without compromising data freshness and reasonable latency limits. Content-based models used in information retrieval systems can present challenges, as they primarily rely on explicit user context and interests from profiles, neglecting implicit interactions. Integrating different user-interaction signals into the relevance model can be challenging. The goal is to provide sufficient candidates without overwhelming job postings, as too many candidates can reduce the chances of job seekers and decrease user satisfaction [2]. User Job Profiles. The RS can use many different pieces of information to improve its capabilities. The user profile can consist of basic information about the user and interaction records. Basic user information includes educational background, previous work experience, demographic information, etc. [6] The interaction records usually consist of the user’s behaviour history with the RS, such as detail views of job postings, searches, and browsing history. 2.2
Job Recommendation for Students
Due to the lack of work experience among students, the task of recommending suitable jobs becomes challenging. However, we can gather more personal information about students, such as their majors and courses. During their initial
Personalised Recommendations and Profile
219
job search, many students seek suggestions from their families, friends, mentors, or supervisors [7]. Additionally, students often rely on their universities as the primary source of career-related information While student data is typically stored in school databases, it is often underutilized due to difficulties in generalizing the information contained within. The challenge lies in transforming the data into a non-confidential format that still maintains its utility for job recommendations [5]. Educational institutions are unable to provide sensitive student data to third-party companies. Even with the consent of individual students, the process of sharing sensitive data can be problematic, and it is preferable for such data to remain within the infrastructure of educational institutions. Student Profiling. There has been a lot of work done on student profiling. Such studies are usually done to help them in their school environment and not to do a job search. [8] The most common approach is classifying students into clusters created using basic clustering methods such as k-means, apriori algorithm or decision trees. All the mentioned articles [4,6,9] on student profiling focus on the student’s performance in the school environment and try to select extraordinary or problematic students. Overall, all the papers point out the advantage of including more information than just the student’s academic performance resulting in better student clustering. Student Skill Mining. Our previous work [10] focuses specifically on bridging the gap between educational facilities and industry, which sets it apart from most related studies. In recent years, there has been a growing interest from companies in connecting with university research and students. In the paper [10], we introduce an ontology called a “skill tree” for each student, which maps their courses and creates a well-organized student profile that is easily understood by external partners. We estimate skills based on evaluations from university courses, employing data mining processes to construct the skills tree. This skills tree is then transformed into a standardized set of skills. Using this mapping, we create a student profile that is both easy to interpret and aligned with potential industrial opportunities. The article [10] distinguishes between two types of skills: objective skills and subjective skills. Objective skills are computed based on successfully completed courses, while subjective skills are entered by the student themselves in their profile. The work takes into account the courses a student has completed and considers their grades. Two different grade mappings are proposed: a uniform mapping (A: 1, B: 0.8, C: 0.6, D: 0.4, E: 0.2) and a cumulative mapping that accounts for the varying difficulty of courses [10]. Recommender Systems and Knowledge Graphs. Survey of knowledge graph powered RSs [11] lists several approaches. The most relevant are two
220
ˇ Zid ˇ et al. C.
stages of learning methods, where knowledge graph-based ranking is produced and subsequently fed to a RS. Our approach is similar, but we first produce recommendations based on interaction history and then re-rank results if the knowledge graph-based user profile is available.
3
Our Approach
We aim to propose a method that would improve the recommendation of opportunities to students. Our baseline is a universally designed, fine-tuned RS that combines user-based techniques and collaborative filtering. 3.1
Re-ranking Module
We follow the steps of our previous work [10] (mentioned in Sect. 2.2). We implement the idea of acquiring skills in ontologies from courses a student has completed. We adopt only the objective skills part of the system. By doing that, we address the problem of outdated subjective skills, where students typically insert the data once and never update it. We believe that implementing active filters and search can partially replace custom preferences. Data Processing. In our study, we utilize four separate tables that contain different types of data. The first table contains student data, which consists of 13,810 rows. The second table contains information about university courses, with 1,122 rows. The third table contains classification data, with a total of
Fig. 1. Global recommendation schema with detail on the re-ranking module. On the left image, we observe the whole recommendation schema – the interaction-based module (on the left part) and the optional re-ranking module (on the right part). The right image displays the detail of the re-ranking module – the process of the embedding creation
Personalised Recommendations and Profile
221
271,847 rows. Lastly, the fourth table contains opportunities data, with 166 rows. For our analysis, we specifically focus on English course data, including descriptions and annotations. It is important to note that we only work with currently enrolled students. We process their current and past studies, such as merging bachelor and master studies with certain weights. Further details regarding data protection can be found in Section 5 of our paper. As depicted in Fig. 1b, we create a space of opportunity embeddings where we position the student profile embeddings. The student embeddings are derived from the courses the students have successfully completed, as explained in Sect. 3.1. Using a top@N recommendation approach, we select and recommend N opportunities to the student. Embedding Creation. Student profiles, opportunities and university courses are represented by an embedding. In Sect. 4, we experimented with multiple methods for embedding creation and ultimately chose a method based on tf-idf with a custom set of keywords. The embedding creation is mainly based on keyword extraction using the tf-idf method. We extract all the generated n-grams of lengths ranging from 1 to 3 words (more than 130,000 ngrams) along with their coefficients generated by the tf-idf. To address vague explainability, we use a limited set of keywords that will perform more intuitively during the explanation. [5] – Linkedin skills – we use the Linkedin dataset 1 of 50,000 professional skills published in 2019 and is composed of soft and hard skills. – India Skills dataset – the skills are taken from the company from India, It’s Your Skill. We use only free API capabilities to acquire about 5,700 skills. – ACM classification skills – in paper [5] they use 400 skill categories (all from the IT domain) from which they build their ontology trees. The 400 skills were inspired by the ACM digital library categories 2 . We merge all the datasets and match them with all the generated ngrams of the courses and opportunities, acquiring about 3,700 distinct skills. We review all the skills and remove those not useful for our task (Czech, assignment, etc.). We create a skill mapper, which selects the skills to be mapped to other skills (e.g. recommender to recommender system), reducing the dataset to 1913 specific skills. The opportunity embeddings are extended by the keywords that the company manually fills in (the tf-idf coefficient is replaced with the maximal value). Student Profile Creation. We combine the embeddings of the courses in student profiles – instances of knowledge graphs. We assign the value to the course based on the student’s grade (A: 5, B: 4, C: 3, D: 2, E: 1, F: 0) multiplied 1 2
https://www.linkedin.com/business/learning/blog/top-skills-and-courses/theskills-companies-need-most-in-2019-and-how-to-learn-them. https://www.acm.org/publications/class-2012.
222
ˇ Zid ˇ et al. C.
by the tf-idf coefficient. The keyword weight for courses that do not have a grade is set to 5. We do not consider the courses which the student failed in. For students who have more than one study at the university (e.g. a graduate student who completed an undergraduate programme at the same faculty), we modify the weights of the courses from the past studies by a discount factor of 0.2. This emphasises the courses taken in recent semesters while preserving the information from the past. When we did not modify the coefficient for past studies, the obligatory bachelor courses tended to have a more substantial effect than the specialised courses. 3.2
Explainability
During the explanation process, the algorithm is reversed. We find intersecting keywords (skills) for the student and opportunity embedding. Then we look up which courses the skills are gained from. Figure 2 shows an example of an explanation to the student why a particular opportunity was recommended by showing them the intersecting keywords and the courses from which the skills were acquired. 'Junior Data Scientist' # Matching keywords {'web','program analysis','recommender systems','programming', 'classification','code','interpretation','big data','databases'} # Matching courses with their keywords 'bi-emp': {'recommender systems','classification', 'business process', 'program analysis'}, 'bi-aag': {'programming', 'classification', 'parsing', 'bi-big': {'big data', 'databases'}
Fig. 2. Explainability of an opportunity recommendation. Matching keywords represent the intersecting keywords of the student and the opportunity. Below that, we display matched courses and the intersecting keywords.
4
Experiments
This section compares different approaches we tried when implementing the reranking module. We compare 4 different approaches to embedding creation: – the proposed method – tf-idf with a custom set of keywords, – tf-idf – pure tf-idf with 2000 features with a document frequency lesser than 19 %, – custom keywords method – the custom set of keywords only with term frequency, – neural embeddings – 2048 dimensional Babbage embeddings from, OpenAi API. The similarity was calculated between opportunities and university courses and put into a course-opportunity matrix multiplied by the coursestudent matrix with weights representing grades in a course. This matrix multiplication resulted in the recommendation.
Personalised Recommendations and Profile
4.1
223
Comparison
In Table 1 and Figs. 3a and 3b, we compare two metrics. We measure the number of distinct opportunities recommended to all students separately and the number of times the top opportunity is displayed in the top@N recommendation for all students. The proposed method and the basic tf-idf method perform the best in terms of the diversity of all keyword-based methods. The proposed method has the upper hand in a more relatable explainability. Overall, neural embeddings are capable of the most diverse recommendation regarding the number of distinct opportunities; in terms of top opportunity recommendations, it is similar to the proposed method. Measured diversity tells us only the capabilities of a certain method to separate students and opportunities; it does not give us information about the quality of the recommendations.
Fig. 3. Dependence of the different opportunity diversity on N where N represents the number of top@N recommendations for each student. The top@N recommendation was made for 1806 students with a total of 166 opportunities. Also, the dependence of the number of times the top opportunity was recommended on N, where N represents the number of top@N recommendations for each student. The top@N recommendation was made for 1806 students with a total of 166 opportunities.
Summary and Model Selection. If we focused only on the diverse capabilities of the proposed methods, it would seem appropriate to select the neural embedding method. However, diversity does not necessarily mean the best recommendation. The neural embedding method does not consider the quality of individual descriptions. One of the best opportunities recommended using this method was an opportunity without a description. It is very difficult to interpret this decision. Explainability Results. We recommended the same opportunity to the same student using the proposed method and the basic tf-idf method. There were fewer keywords matched for proposed method. Still, they were more complex, such as machine learning, web mining, data analysis, compared to some keywords of the basic tf-idf method such as direct, look, way, create, machine, learning.
224
ˇ Zid ˇ et al. C.
Table 1. Comparison of distinct opportunities of all methods: we compare the number of distinct opportunities that are recommended to students when recommending the top@N opportunities to each student and the number of times the top opportunity is recommended to all students. n-neighbours
Distinct opportunities 1 3 5 10 20
Custom keywords 49 78 85 108 71 106 124 153 Basic tf-idf 62 105 121 142 Proposed method Neural embeddings 104 143 158 165
132 160 152 166
1
Top opportunity 3 5 10
20
739 1464 1598 1682 1756 661 1011 1234 1457 1731 457 725 791 1036 1270 279 537 577 960 1221
We try to reduce the number of non-skill keywords in tf-idf, but there are still many. If we removed more, we would be nearing the custom keyword method. This result in the selection of the the proposed solution. Job Recommendation Constraints. Table 1 compares the number of times a certain opportunity is recommended to students during a top@N recommendation. The best opportunity is recommended many times for each scenario. This is due to first-year students not having the option to select any optional courses. The work Recommendation under Capacity Constraints [12] aims at creating a general solution that considers multiple adjustable results on an item/user basis (item capacities and user propensities). In our work, we deal with a specific subproblem of the work of [12] – the opportunities are usually removed after being filled and students are usually satisfied with a single opportunity. We implement a solution that aims to distribute the opportunity recommendations by setting a maximum number of students to whom the opportunity can be recommended on the first page. We iterate the students and distribute recommendations sequentially using the round robin system with a limit on each opportunity and predefined coefficient for each round.
5
Analysis of Results
We evaluate our RS implementations through online A/B testing. The experiment begins with a widget on the Faculty’s website, where we collect user interactions such as detail views, applications, and engagement with search and filters. Subsequently, we randomly assign users to three groups, each receiving different types of recommendations. We specifically test the significance of personalized recommendations based on student profiles for those who opt-in. Personalized recommendations are generated offline and delivered to students via email. To eliminate bias, all user groups are composed of subscribed students, considering the potential bias associated with subscription.
Personalised Recommendations and Profile
225
– Group A receives a personalised recommendation solely based on their previous interactions and the relationships between opportunities (only the attributes of the items are used). – Group B received a personalised recommendation based only on their profile (courses completed). – Group C receives the most complex recommendations. The recommendations combine methods group A and group B. First, both the interaction-based RS and the student profile-focused RS make recommendations. The recommendations are then rearranged on the basis of the combined position in both recommendation lists (the two positions are summed) (Fig. 4).
Fig. 4. The improvement of the interaction-based RS in its cold-start. Table 2. A/B testing statistics for groups A, B, C. We measured the following statistics: Engaged users – number of users who interacted with the mail, Click – number of clicks on the links, Clicks median – median of user clicks, Clicks/user – average number of clicks per user, Mean I – average index of the opportunities clicked, Median I – the mean index of the opportunities clicked. Group Engaged users Clicks Clicks median Clicks/user Mean I Median I A B C
7 11 10
18 37 33
3 3 3
2.6 3.4 3.3
3.9 3.5 4.4
2 3 4
We have 51 subscribers who are evenly distributed into 3 groups. Each student receives the top 10 recommendations, with links provided for detailed views, more recommendations, unsubscribe, and indicating irrelevance. Although we save the user identification token during subscription, many students had no previous measured interactions. This is likely due to the fact that the subscription request was sent via email, and most interactions occurred on desktop while students may have opened their emails on mobile. As a result, group A’s recommendations heavily rely on best-sellers, and group C combines recommendations with best-sellers instead of personalized recommendations. Out of the 51 subscribers, 32 students opened the email, and 28 of them clicked on at least one link. We received no dissatisfaction or unsubscribes, indicating positive engagement along with the number of clicks and opened emails.
226
ˇ Zid ˇ et al. C.
In Summary: – Groups B and C have captured the most clicks, with 37 clicks by 11 users in Group B and 33 clicks by 10 users in Group C (refer to Table 2). – Group B has the best click-per-user ratio of 3.4 clicks/user, closely followed by Group C with 3.3 clicks/user. – The best-seller recommendation in Group A has the best median click position of 2, indicating that users tend to click on higher-placed opportunities. – Group C, which utilizes a combined recommendation approach, has the highest mean and median click positions of 4.4 and 4, respectively In conclusion, the best-seller recommendation performed well, as users tend to click on higher-placed opportunities. However, personalized recommendations based on the student profile outperformed the best-seller recommendation, suggesting their effectiveness for new users (when student data is available) and promoting recommendation diversity displayed in Table 2. Overall, the combined recommendation approach in Group C demonstrated positive impact in terms of the number of clicks. All tested recommendation settings proved capable of providing relevant recommendations. The better performance of Group B compared to Group A highlights the usefulness of utilizing knowledge graphs for recommendation purposes. Fine-tuning the sorting position in Group C and exploring more complex combinations based on user interactions are potential areas for future testing and hyperparameter adjustments. AI Fairness. Student data are anonymised and remain in the faculty infrastructure. Anonymous user-id interaction data are collected based on passive consent. We implement a subscription policy in which the user’s consent is explicitly needed to process the study profile, and they can unsubscribe at any moment. Data were manipulated according to European GDPR standards.
6
Conclusion and Future Work
Our results show that the best performing algorithm (in terms of click-troughrate) for student job recommendation is actually the combination of traditional collaborative filtering and re-ranking based on educational profiles. We have explained how this profile can be obtained from courses descriptions. Among the tested methods, we selected the tf-idf model with custom keywords (ngrams) due to its satisfactory performance and interpretability. If implemented in different faculties or universities, we would currently opt for the basic tf-idf method, as it performed well and allowed for easier fine-tuning. We can transform the model to a limited set of skills, similar to what we have done in our faculty. Given the limited number of students in our experiment, we focused more on classic methods that were easier to fine-tune. Fine-tuning was largely done
Personalised Recommendations and Profile
227
heuristically due to the small sample size. In the future, we plan to experiment with a broader range of parameters in student profiling, such as weight adjustments for compulsory and optional courses or time-based weight adjustments. Additionally, we intend to explore the capabilities of neural embeddings as a state-of-the-art solution. However, acquiring a larger set of student profiles is necessary to properly tune the parameters.
References 1. Wang, L., Wang, C., Wang, K., He, X.: BiUCB: a contextual bandit algorithm for cold-start and diversified recommendation (2017). https://doi.org/10.1109/ICBK. 2017.49 2. Kenthapadi, K., Le, B., Venkataraman, G.: Personalized job recommendation system at linkedIn: Practical challenges and lessons learned. RecSys 2017 (2017). https://doi.org/10.1145/3109859.3109921 3. Combining content-based and collaborative filtering for job recommendation system: A cost-sensitive statistical relational learning approach. Knowledge-Based Systems (2017). https://doi.org/10.1016/j.knosys.2017.08.017 4. Peterson, A.: On the prowl: How to hunt and score your first job. Educ. Horiz. 92(3), 13–15 (2014). https://doi.org/10.1177/0013175X1409200305 ˇ 5. Kuznetsov, S., Kord´ık, P., Rehoˇ rek, T., Dvoˇra ´k, J., Kroha, P.: Reducing cold start problems in educational recommender systems. In: 2016 International Joint Conference on Neural Networks (IJCNN) (2016). https://doi.org/10.1109/IJCNN.2016. 7727600 6. Liu, R., Rong, W., Ouyang, Y., Xiong, Z.: A hierarchical similarity based job recommendation service framework for university students. Front. Comput. Sci. 11(5), 912–922 (2016). https://doi.org/10.1007/s11704-016-5570-y 7. Robert, R., Robert, S., Robert, T., Daniel, D.: School-to-work transition: mentor career support and student career planning, job search intentions, and selfdefeating job search behavior. J. Vocat. Behav. 85(3), 422–432 (2014). https:// doi.org/10.1016/j.jvb.2014.09.004 8. Cai, J., Morris, A., Hohensee, C., Stephen, H., Victoria, R., James, H.: Using data to understand and improve students’ learning: empowering teachers and researchers through building and using a knowledge base. J. Res. Math. Educ. 49(4), 362 (2018). https://doi.org/10.5951/jresematheduc.49.4.0362 9. Zhou, Q., Liao, F., Chen, C., Ge, L.: Job recommendation algorithm for graduates based on personalized preference. CCF Trans. Pervasive Comput. Interact. 1, 11 (2019). https://doi.org/10.1007/s42486-019-00022-1 10. Kord´ık, P., Kuznetsov, S.: Mining skills from educational data for project recom´ Baruque, B., Sedano, J., Quinti´ mendations. In: Herrero, A., an, H., Corchado, E. (eds) International Joint Conference. CISIS 2015. Advances in Intelligent Systems and Computing, vol 369. Springer, Cham (2015). https://doi.org/10.1007/978-3319-19713-5 54 11. Guo, Q., et al.: A survey on knowledge graph-based recommender systems. IEEE Trans. Knowl. Data Eng. 34, 3549–3568 (2020). https://doi.org/10.48550/arXiv. 2003.00911 12. Christakopoulou, K., Kawale, J., Banerjee.: Recommendation under capacity constraints (2017). https://arxiv.org/abs/1701.05228
AIM@VET: Tackling Equality on Employment Opportunities Through a Formal and Open Curriculum About AI Abraham Prieto1,3(B) , Sara Guerreiro2,3 , and Francisco Bellas2,3 1 GII, CITENI Research Center, University of A Coruña, A Coruña, Spain
[email protected]
2 GII, CITIC Research Center, University of A Coruña, A Coruña, Spain
{sara.guerreiro,francisco.bellas}@udc.es 3 Old Dominion University, Norfolk, VA, USA
Abstract. This paper presents the AIM@VET project, an educational initiative that aims to tackle equality on employment opportunities by means of the development of a reliable and accessible curriculum about Artificial Intelligence (AI) for pre-university students. The opportunities that AI will bring to future generations in terms of employment will be enormous, but it is very important to ensure that they are equal for all nations, independently of their economic level or social development. Policy makers are working hard in the development of education plans that include such AI training, but this process will take time. In the shortterm, is necessary to face this issue to provide support to current generations of secondary school students. This is the main goal of the AIM@VET project that is detailed here. Keywords: AI curriculum · AI resource for classroom · AI for VET Education
1 Introduction The latest digital education plans and guideline reports from global organizations as UNESCO, the European Commission, or the Computer Science Teachers Association include the topic of Artificial Intelligence (AI) as a nuclear element, mainly in preuniversity levels [1–3]. The main reason behind this decision is the relevance that AI will have in the coming years in social and economic aspects, which will directly impact the next generations, who must be properly prepared. Specific AI curriculums are now under development fostered by policy makers, but they will not be fully integrated at schools until the end of this decade [1]. In the shortterm, there is a necessity for training current upper secondary school students in AI fundamentals so they can be better prepared for tertiary education, independently of their future specialization. The authors of the current paper have addressed this issue during the last three years through the AI+ project [4]. The obtained results have been very successful, and all the developed teaching units are now available for other teachers © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 228–237, 2023. https://doi.org/10.1007/978-3-031-42519-6_22
AIM@VET: Tackling Equality on Employment Opportunities
229
and policy makers that want to have tested materials and experiences for their education plans [5]. We realized that not only a general training on AI fundamentals is required for accessing to tertiary education at universities, but also a more specialized one for those secondary school students that choose a professional career. This type of training has different names depending on the countries, but in Europe it is known as Vocational Education and Training (VET). It encompasses students from 16 to 20 years old, in general, that receive training for a specific job or profession. The commercial sector agrees with the critical importance of AI skills and training for current generations of students [7]. For example, [6] reports that by 2030 around 70% of companies will have integrated AI technologies into their business models, thus requiring an advanced workforce of staff well-trained in AI. In the same line, the EU’s Digital Education Action Plan 2021–2027 [2] points out that, to support competitiveness, we need people with the latest advanced digital skills. With this background, in 2022 the project called AIM@VET (AI modules at VET education), was selected for funding with a duration of three years, ending in 2025 [8]. The goal of the project is to develop new teaching units about AI, organized in independent “learning modules” and adapted to VET. The project results will be ready to be directly applied in any VET school in a 3-year period, facing the digital readiness of future generations of professionals all over the world. 1.1 Topic Relevance In alignment with Sustainable Development Goal (SDG) 8—promoting sustainable economic growth and decent work for all—the AIM@VET project seeks to democratize AI training, fostering development opportunities for regions worldwide, especially in the context of the impending AI revolution. This initiative responds to the high youth unemployment rate in Southern European countries, a problem reflected globally. The project addresses the demand for more adequately trained graduates from vocational/professional schools, which despite integrating practical training into their curricula, often fall short of companies’ needs. Companies seek up-to-date vocational/professional training that equips students with current skills and competencies, readying them for immediate employment post-schooling. Aligned with SDG 4—advocating for equitable quality education and lifelong learning opportunities—AIM@VET also aims to ensure the AI curricula are formal, feasible, and open to all nations, regardless of economic standing. This mirrors the strategy used in AI+ [4], where all teaching units were provided in an open, freely accessible format. AIM@VET commits to the ‘Leaving No One Behind’ principle, emphasizing nondiscriminatory access to quality education. 1.2 Research Organization The primary goal of the AIM@VET project is to create AI teaching units for Vocational Education and Training (VET), drawing upon learnings from the AI+ project. This initiative addresses two main AI training issues identified during the AI+ development.
230
A. Prieto et al.
Firstly, with AI education being a nascent teaching field at the pre-university level, fully trained teachers capable of confidently instructing AI will take time to emerge. Meanwhile, it is crucial not to exclude current generations from AI education. To address this, AIM@VET will center on supporting VET teachers in designing and implementing teaching units in the short term. Secondly, the project has recognized the importance of focusing learning resources on specific AI topics, given that students require adequate time to understand these new concepts. Broadly covering AI topics at a shallow depth is particularly detrimental in VET education, where students need targeted, market-relevant training. As such, AIM@VET will concentrate on three pivotal AI application areas: computer vision, robotics, and ambient intelligence. Formal training in these areas will unlock new market opportunities for VET students across sectors like Industry 5.0, Smart Environments, Autonomous Vehicles, among others. The project partnership of AIM@VET is made up of six teams from three different European countries: Spain, Slovenia and Portugal. The development of the learning modules will be led by AI experts from universities and tested by VET teachers and students, organized through “work islands” (WI), as displayed in Fig. 1. Each island includes one university and one VET school from the same country, and it is focused on one key application area of AI. The Slovenian work island will focus on computer vision, the Spanish one on robotics, and the Portuguese one on ambient intelligence. The three work islands will develop and test a predefined number of teaching units during each academic course from 2023 to 2025 (WP in Fig. 1). At the end of each, they will connect in training activities (TA) where the teaching units will be tested with mixed groups of students from the three VET schools. The strategy features crucial collaboration between different teams, or “work islands,” involved in the development and implementation of the teaching units. Regular transnational meetings will facilitate coordination on functional and organizational aspects. This liaison ensures the homogenous development and testing of teaching materials, enhancing the reliability and coherence of the learning modules, taking into account global perspectives rather than just individual students. Each team will also disseminate the project results through specific Multiplier Events and social media presence, ensuring wider reach to the educational community.
Fig. 1. Diagram representing the AIM@VET organization into 3 work islands (WI), each of them made up of one University and one VET school in Spain (ES), Slovenia (SL), and Portugal (PT).
AIM@VET: Tackling Equality on Employment Opportunities
231
2 Methodology Addressing the practical orientation of Vocational Education and Training (VET), this project adopts a STEM (Science, Technology, Engineering, Mathematics) approach that emphasizes the integrated learning of technical and scientific concepts. The teaching units will employ project-based learning (PBL), a model where students actively engage in planning, executing, and evaluating projects with real-world relevance beyond the classroom [9]. PBL embodies the “learning by doing” principle, where students acquire theoretical concepts through solving practical cases. Students’ primary tasks will involve programming simple AI systems. Prerequisites for this include secondary school-level mathematics and fundamental programming experience. For students lacking these skills, a preliminary course is proposed (as detailed in Sect. 6). AIM@VET’s teaching approach is not from a user perspective, where students learn AI-based tools, but an engineering one. The challenge lies in training students as AI developers without previous AI experience. To ease this, the project leverages existing standard software libraries in computer vision, robotics, and ambient intelligence for classroom application.
3 Case Studies The learning modules will be rolled out in VET schools over approximately two and a half years. University teams will provide teaching units (TUs) to VET teachers, who will implement these with students aged 16 and above, dedicating at least 2 h per week. The University teams will offer technical support via on-demand meetings, with the TUs being refined based on feedback from VET teachers and students. Following the implementation period, feedback will be collected to integrate final modifications into the TUs. Three main case studies are under consideration. Case Study 1: Computer Vision Learning Module This module aims to develop teaching units (TUs) and resources in computer vision (CV), an important field of AI that extracts information from images for decision-making. Developed by the University of Ljubljana’s Computer Vision lab [10] and tested by the Solski Center Velenje [14] in Slovenia, these TUs will help new generations understand CV methods and applications. The TUs will offer hands-on activities with Python, PyTorch, OpenCV, Orange, and related tools. They’ll be interlinked with examples of broader applicability. Three learning modules on CV will be created, each focusing on key concepts. CVLM1 will stress the importance of capturing and curating unbiased and well-distributed data, outlining the process of collecting, organizing, labelling, and maintaining image datasets to avoid biases. CVLM2 will focus on detection and segmentation, critical steps in most CV systems, using data from CVLM1. CVLM3 will cover tracking and recognition, demonstrating how temporal information from image sequences can be used in tracking scenarios, and how recognition can be applied to people, objects, gestures, and more. Case Study 2: Robotics Learning Module The second case study focuses on developing and testing teaching units (TUs) for
232
A. Prieto et al.
robotics, led by the University of A Coruña team [17] and Rodolfo Ucha VET school [12] in Spain. The goal is to show how AI empowers robots with adaptability for varied tasks and environments. The TUs will employ standard methods, tools, and software, including AI frameworks and 3D simulators, emphasizing open-source options. Three learning modules will be established: RLM1 will introduce students to autonomous robotics basics, including types of robots, sensors, and actuators. RLM2 will cover traditional robot control methods and the relationship between perception and control. RLM3 will introduce machine learning concepts and opportunities to develop an intelligent robot controller. These TUs aim to provide practical AI application in robotics, preparing students for careers requiring robotics and AI knowledge. The focus is on two key autonomous robotics areas: mobile robotics and robotic manipulator, relevant for their applicability in real companies and industry. Each learning module will feature specific challenges relating to these robot types. Case Study 3: Ambient Intelligence Learning Module The third case study focuses on the development and testing of teaching units (TUs) in Ambient Intelligence (AmI). These will be developed by the ISLAB group at University of Minho [16] and implemented by the VET school at Caldas das Taipas [11], both in Portugal. The TUs will delve into crucial concepts related to sensing, actuation, control in intelligent environments, and the ethical implications involved in creating context-sensitive solutions. These TUs will comprise three learning modules (LM). AmILM1 will concentrate on sensing and introduce Intelligent Environments, including concepts such as Internet of Everything, Internet of Things, and People’s Internet. AmILM2 will overview Ambient Intelligence, covering aspects from pervasive computing and intelligent interfaces to decision-making and learning within Ambient Intelligence architecture. AmILM3 will delve into applications of Ambient Intelligence like smart cities and assisted living environments, along with tackling safety and ethical issues like privacy and data protection. These modules will benefit sectors such as Industry 5.0 and Smart Environments by advancing knowledge on the labor market. They also aim to enhance VET students’ digital skills and encourage innovation. 3.1 Expected Results As explained above, the primary outcome of this project will be a set of teaching units to introduce VET students in the fundamentals of AI with a hands-on perspective. The units will be organized in three learning modules, focused on three key areas of AI. Each learning module will have an estimated duration of 140 h in total, with an average number of 12 teaching units (10 to 12 teaching hours for each unit). The teaching units will contain the following resources: 1. Teacher guide: a digital document in .pdf format including the following sections: Introduction, Context, Learning objectives, Contents, Timeline, Necessary resources, Bibliography, Group organization, Unit material, Challenge/Project, Solutions to programs, Evaluation, and Complementary activities.
AIM@VET: Tackling Equality on Employment Opportunities
233
2. Learner guide: a digital document in .pdf, with the following contents: Introduction, Necessary resources, Bibliography, Challenge/Project, Complementary activities. 3. A set of Python programs with solutions to the challenges/exercises will be also provided for teachers. 4. A set of additional files required to support the TU, like simulation files, Jupyter notebooks, or other digital resources. In addition to the development of the teaching units, this project will generate valuable feedback from both students and teachers, which will be used to improve the quality and effectiveness of the learning modules. The feedback obtained will be analysed and integrated into scientific articles to share it with the educational community, as well as into the development of new teaching methodologies and the enhancement of existing teaching units. 3.2 Evaluation Criteria The assessment of students’ progress in each teaching unit uses four methods: taskspecific evaluation, general student evaluation, student questionnaire, and presentation and collaboration, adding up to a final score out of 100. To standardize the evaluation process across all work islands, teachers receive assessment templates tailored to each unit and method, promoting consistent data collection. In task-specific performance evaluation, teachers use a checklist to assess program functioning and, if necessary, evaluate the submitted programming code. Instead of focusing on code quality, the evaluation emphasizes aspects like information searching, time management, and solution design. This test contributes up to 40 points. The general student evaluation, assessed by the teacher using an individual rubric for every activity, can yield a maximum of 28 points. It covers aspects like time management, solution design, and teamwork. Students complete an individual worksheet or questionnaire, worth a total of 32 points. The teacher grades this worksheet, accounting for the level of difficulty. Optionally, the teacher and classmates can evaluate presentation and collaboration skills during group projects, focusing on teamwork, communication, and adaptability. Regarding project results evaluation, questionnaires gather feedback from teachers and students on clarity, difficulty, and completion of teaching units, plus students’ motivation and learning goals attainment. The project’s impact is measured via quantitative indicators (number of units delivered, teachers, researchers, professors, and students involved) and qualitative indicators (interest of the educational community, enhancement of teachers’ digital skills, interest in Erasmus + projects and international cooperation, and increased focus on student digital training). Also considered is the curriculum’s success in sparking interest among VET schools and the attention from educational authorities towards implementing the modules into official programs. Feedback not covered in questionnaires is collected via interviews with teachers and students.
234
A. Prieto et al.
4 Challenges and Limitations The AIM@VET project faces significant challenges, primarily creating practical, reliable AI teaching materials for VET students with no prior background in the discipline, which traditionally has been taught at university level, requiring pre-existing knowledge in math, programming, or logic. This challenge will be mitigated by utilizing available programming tools, open source materials, and the combined expertise of AI researchers, teachers, and VET educators. The project will also leverage the project coordinator’s successful prior experience in this area. Another challenge is motivating VET teachers to include the AI modules in their short-term teaching programs. This necessitates that the modules are aligned with the current job market, which will be achieved through meetings with industrial sector representatives and policymakers during the project. The project’s primary limitation is the extent of the achievable impact. Developing new curricula requires the integration of different perspectives and takes time to become robust. Although the project aims to create globally applicable learning modules, the potential for such broad impact is limited. Nonetheless, a detailed dissemination plan is in place, including attending educational conferences, exhibitions, producing journal publications, and media outputs, to expand the project’s reach.
5 Ethical Considerations The AIM@VET project recognizes the ethical implications of AI, which will be addressed in two ways. First, it will include ethical considerations in the teaching units, as AI is set to impact many societal aspects, posing ethical and legal questions. These issues are currently being formalized, and it’s crucial for students to engage with them. Hence, all teaching units will incorporate activities related to ethical analyses. Second, ethical considerations will be applied to the project’s execution. As students develop, rather than use AI solutions, there’s no direct interaction with AI. However, testing and surveys may involve underage or vulnerable students, prompting an official review by the partners’ ethics committees to ensure safety. All students were informed about the project’s nature, objectives, and implications, and their consent was obtained, ensuring their voluntary participation. Regarding future tests, data confidentiality will be maintained per general data protection regulations. All data, excluding photos and videos, will be collected anonymously. Consent will be obtained for using photographs and videos for demonstration, teaching, and research purposes. Those not consenting will have their identities protected in these media files. After the project, anonymous data will be retained for 10 years. Unedited photos and videos will be deleted after two years, leaving only the edited versions in which participants cannot be identified, retained for 10 years.
AIM@VET: Tackling Equality on Employment Opportunities
235
6 Implementation Plan The AIM@VET project comprises three main activities: (1) development of Teaching Units, (2) training activities, and (3) transnational organizational meetings. Three such meetings are planned, the kick-off meeting having already occurred in January 2023, with two more scheduled for September 2023 and 2024 (Fig. 2). The project plan schedules six-week periods for the development of teaching units at universities, followed by another six-week span for testing in VET schools. This enables testing of the first two units by June 2023, while three units are created in the same timeframe. This pattern will continue until 2025, resulting in 12 teaching units tested and developed per AI area. Three training activities are planned, the first in June 2023 in Slovenia, and two more at the academic years’ end. In these activities, developed teaching units will be tested by students from three different countries. Students will work in groups, with one acting as an ‘expert’ on a topic, enhancing understanding through teaching others. This approach is designed to foster student collaboration and reinforce topic comprehension.
Fig. 2. Example of TU development plan for the first 6 months of the project
7 Project Team Description The project involves AI experts from the University of Ljubljana (UL), Slovenia, University of A Coruña (UDC), Spain, and University of Minho (UM), Portugal, and educators from Vocational Education and Training (VET) centers in the same countries: Solski Center Velenje (SCV), Slovenia, Centro Integrado de Formación Profesional Rodolfo Ucha
236
A. Prieto et al.
Piñeiro (RUP), Spain, and Escola Secundária de Caldas das Taipas (ESCT), Portugal. The team’s structure allows for convenient local testing of teaching units (TUs). In Spain, the UDC team, part of the Integrated Group for Engineering Research, focuses on autonomous robotics. Collaborating VET center, RUP, specializes in new information technologies, including microcomputing, application development, programming, and cybersecurity. The Slovenian group consists of UL’s Computer Vision laboratory team, experts in 2D and 3D visual data processing, machine learning in computer vision, and computerhuman interactions. SCV, one of Slovenia’s largest VET centers, offers expertise in various areas, including the development of new curricula and educational software, and research in education. In Portugal, the UM team belongs to the ISLAB, a research unit focusing on Ambient Intelligence, Conflict Resolution, Behavioral Analysis, and the use of AI methods in these areas. Partnering VET center, ESCT, offers specialties in Multimedia, Automatic Electronics, and Computer Science, forming the last work group for the Ambient Intelligence module.
8 Conclusions This paper describes the AIM@VET project, focused on the development of a reliable and accessible curriculum about Artificial Intelligence (AI) for VET education. The project has started in 2023 and the first application results have been obtained, which confirm the validity of the approach from and educational perspective. Combining the expertise of the three university groups with the experience of the VET teachers, the modules that are designed will be feasible to be applied in real education in the short-term, creating a highly valuable material for all teachers. Acknowledgments. This work was partially funded by the Erasmus+ Programme of the European Union through grant number 2022-1-ES01-KA220-VET-000089813. “CITIC” is funded by Xunta de Galicia and the European Union (European Regional Development Fund- Galicia 2014–2020 Program), by grant ED431G 2019/01. The “Programa de ayudas a la etapa predoctoral” from Xunta de Galicia (Consellería de Cultura, Educación y Universidad) supported this work through Sara Guerreiro’s grant.
References 1. K-12 AI curricula: A mapping of government-endorsed AI curricula. https://unesdoc.unesco. org/ark:/48223/pf0000380602.locale=en. Accessed 01 Mar 2023 2. Digital Education Action Plan 2021–2027 (2021). https://education.ec.europa.eu/focus-top ics/digital-education/action-plan. Accessed 01 Mar 2023 3. Artificial Intelligence for K12 initiative. https://ai4k12.org. Accessed 01 Mar 2023 4. Bellas, F., Guerreiro-Santalla, S., Naya, M., Duro, R.J.: AI curriculum for European High Schools: an embedded intelligence approach. Int. J. Artif. Intell. Educ. (2022). https://doi. org/10.1007/s40593-022-00315-0 5. Official results’ page of the AI+ project, Erasmus+ program. https://erasmus-plus.ec.europa. eu/projects/search/details/2019-1-ES01-KA201-065742. Accessed 01 Mar 2023
AIM@VET: Tackling Equality on Employment Opportunities
237
6. Bughin, J., Seong, J., Manyika, J., Chui, M., Joshi, R.: Notes from the AI frontier. Modelling the impact of AI on the world economy. McKinsey Global Institute (2018), Discussion Paper. https://www.mckinsey.com/featured-insights/artificial-intelligence/notes-from-the-aifrontier-modeling-the-impact-of-ai-on-the-world-economy. Accessed 01 Mar 2023 7. Lane, M., Saint-Martin, A.: The impact of Artificial Intelligence on the labour market: What do we know so far? In: OECD Social, Employment and Migration Working Papers, No. 256, OECD Publishing, Paris (2021). https://doi.org/10.1787/7c895724-en 8. Artificial Intelligence learning modules to adapt VET to the digital transformation of the labour market. https://aim4vet.udc.es, Accessed 01 Mar 2023 9. Kokotsaki, D., Menzies, V., Wiggins, A.: Project-based learning: a review of the literature. Improv. Sch. 19(3), 267–277 (2016) 10. Computer Vision Laboratory, University of Ljubljana. https://www.fri.uni-lj.si/en/laborator y/lrv. Accessed 01 Mar 2023 11. Escola Secundaria Caldas das Taipas. https://esct.pt. Accessed 01 Mar 2023 12. Rodolfo Ucha Piñeiro VET school. https://www.cifprodolfoucha.es. Accessed 01 Mar 2023 13. Miao, F., Holmes, W., Ronghuai, H., Hui, Z.: AI and education: guidance for policy-makers (2021). https://unesdoc.unesco.org/ark:/48223/pf0000376709. Accessed 01 Mar 2023 14. Solski Center Velenje. https://www.scv.si. Accessed 01 Mar 2023 15. Holmes, W., et al.: Ethics of AI in education: towards a community-wide framework. Int. J. Artif. Intell. Educ. 1–23 (2021). https://doi.org/10.1007/s40593-021-00239-1 16. Synthetic Intelligence Group, University of Minho. https://algoritmi.uminho.pt/research_ teams_labs/synthetic-intelligence-group-islab/. Accessed 01 Mar 2023 17. Integrated Group for Engineering Research, University of A Coruña. https://gii.udc.es/?id=3. Accessed 01 Mar 2023
System Identification and Emulation of a Physical Level Control Plant Using a Low Cost Embedded System ´ Daniel M´endez-Busto, Antonio D´ıaz-Longueira, Alvaro Michelena, M´ıriam Timiraos, Francisco Zayas-Gato(B) , Esteban Jove, Elena Arce, and H´ector Quinti´ an Department of Industrial Engineering, University of A Coru˜ na, CTC, CITIC, Ferrol, A Coru˜ na, Spain {daniel.mendezb,a.diazl,alvaro.michelena,miriam.timiraos.diaz, f.zayas.gato,esteban.jove,elena.arce,hector.quintian}@udc.es
Abstract. Educational methods have changed significantly in recent years. In this sense, the COVID-19 pandemic has led to an increase of remote teaching. Specifically, laboratory practices with real systems are essential in the field of engineering. The lack of physical plants and the implementation of Blended Learning experiences force the development of emulated laboratory plants. This work proposes a new approach with the identification and implementation of a specific section of an enhanced level control plant located at the Polytechnic Engineering School of Ferrol, using a low-cost embedded system. In this case, the solution is equipped with an improved design of a synoptic visualization of the plant developed with a Node-RED application. This emulation allows students to attend practice lessons of control subjects remotely by means of flexible hardware and software tools. Keywords: Blended Learning · Virtualization Education · System Identification
1
· Control Engineering
Introduction
In recent years, educators and education systems made huge efforts to adapt and innovate [16]. In this context, face-to-face teaching is transitioning to mixed teaching, which combine face-to-face and online teaching (Blended Learning) [9]. In this sense, COVID-19 has advanced this change and has forced to explore solutions for Blended Learning (BL) teaching [14]. Within engineering fields, theoretical and practical lessons should be combined for optimal training [12]. More specifically, universities are usually equipped with laboratories to carry out practices of control engineering subjects. However, laboratory equipment is often expensive and not enough to meet student demand. Hence, students need to organize themselves into working groups c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 238–247, 2023. https://doi.org/10.1007/978-3-031-42519-6_23
Physical Level Plant Identification and Emulation
239
to carry out their practices. In addition, laboratories without remote connection are not always available or conductive to BL teaching. One solution is to propose emulated systems. An emulated system is a virtual environment whose operation is similar to a real plant [3]. Usually, they run in a controller which can be a computer or embedded system. Emulated systems can be enhanced with the integration of a visualization synoptic to better understand its operation. In addition, they are a very interesting tool for proposing BL experiences [10] having an inexpensive emulated plant per student. On the other hand, to get the response of the real plant and the emulated plant to be similar, it is necessary to conduct a detailed study of the real plant by applying specific identification methods [7,8,13]. This work proposes the identification of a specific section of an enhanced level control plant located at the Polytechnic Engineering School of Ferrol (EPEF) from the University of A Coru˜ na (UDC) for the subsequent deployment of an emulated plant. Students use this plant to develop control engineering techniques [4,5], system identification methods, implementation of neural networks for error detection [11] and so on. This paper is structured as follows: after the current introduction, the next section describes the methodology and the case of study. Then, Sect. 3 exposes the experiments and results. Finally, the conclusions of this work are listed in Sect. 4.
2 2.1
Materials and Methods Background
As previously mentioned, this paper proposes the emulation of a physical laboratory plant from the EPEF where students take practical control engineering lessons. The main goal of the practices carried out with the level plant is to control the water level in a tank using control engineering techniques. In this case, the physical plant has many similarities with the one proposed in previous work [7], but loaded with some different elements that lead to a significant variation in its behavior. Figure 1 shows the enhanced level plant of the laboratory. This physical plant has a lower tank (Fig. 1(1)) that is used as a feed tank. Additionally, a pump (Fig. 1(2)) controlled through a Variable Frequency Drive (VFD) (Fig. 1(3)), moves water from the lower reservoir to the upper reservoir. The level of the upper tank (Fig. 1(4)) is measured by a pressure sensor (Fig. 1(5)). On the other hand, the emptying of the upper tank can be done by means of a manual valve or a solenoid valve (Fig. 1(6)). Signal exchange between the plant and the control system is carried out through a Data Acquisition Card (DAQ). The control system sends the command to set the drive speed and the opening of the solenoid valve. It should be mentioned that there are only a certain amount of plants. Additionally, it should be considered that some students follow their master’s degree online and are unable to physically attend the laboratory. Therefore, in order to propose a BL approach, a replacement for these real plants is required.
240
D. M´endez-Busto et al.
Fig. 1. Physical level control plant
2.2
Identification of the Real Plant
Identification Method. The identification method used to identify the selected physical plant is a closed-loop, offline and black-box method. The first step is to select the excitation signal to obtain the response of the real plant to the stimulus of the selected signal. The PseudoRandom Binary Sequence (PRBS) signal is a periodic and deterministic signal with white-noiselike properties. It fluctuates between -1 and 1 and can be multiplied by a constant so that the incidence on the real plant is greater. To obtain the PRBS signal in Matlab, the idinput instruction is used. This instruction generates a PRBS signal with a number of elements set by the user. The minimum number of elements that must be set to obtain an optimal PRBS signal is given by the following equation. (1) T s < Tm × N where: – Ts : System rise time – Tm : Sample time – N : Number of bits to generate the signal To obtain the rise time of the system, a step input of maximum amplitude (100%) is introduced. After generating the PRBS signal with the calculated data, it is necessary to carry out the closed loop in the real plant using a PID controller.
Physical Level Plant Identification and Emulation
241
The discretized PID controller is obtained analytically by solving the standard PID differential equation (Eq. 2) [15]. 1 de(t) u(t) = Kp e(t) + e(t)dt + T d (2) Ti dt – – – – –
u(t): Control signal Kp: Proportional constant e(t): Error T i: Integration time T d: Derivative time
On the other hand, Eq. 3 represents the difference equation of the PID algorithm. ∞ T e(k − 1) + e(k) T d + (e(k) − e(k − 1)) (3) u(k) = Kp e(k) + Ti 2 T k=1
– – – – – –
u(k): Control signal Kp: Proportional constant e(k): Error e(k − 1): error in the instant k − 1 T i: Integration time T d: Derivative time
The real plant with the PID controller is subjected to the action of the generated PRBS signal. Then, the control signal plus the PRBS signal and the system response are stored in a Matlab matrix (Fig. 2).
Fig. 2. PRBS signal introduced to the system
Getting Parameters. Transfer function’s parameters of the real system are obtained using Matlab Toolbox ident. This toolbox calculates the discrete parameters by analyzing the response of the real plant when the PRBS signal is applied to the input. It allows the use of parametric models. This models use differential equations and transfer functions to describe the operation of systems. Generallineal models are parametric. In these models a polynomial model is used. In this
242
D. M´endez-Busto et al.
application, experiments are carried out with the ARX, ARXMAX and Output Error models. The results obtained with the ARX model are the most similar to the real plant. This model, incorporates a stimulus signal to find the zeros and poles of the system [1]. The structure of the ARX model is illustrated in Fig. 3.
Fig. 3. ARX structure
The poles of the transfer function (TF) lie in the A component and the zeros in the B component. The degree of the numerator does not exceed the degree of the denominator to be a proper TF. Therefore, the discrete TF is defined by the Eq. 4. T F (k) = 2.3
B1z −n + B2z −(n−1) ... + B B = ; m>n A A1z −m + A2z −(m−1) ... + A
(4)
Node-RED
Node-RED is a graphical programming tool with browser-based flow editor for block or node interconnection. In in contrast to other graphical programming tools such as LabView, Node-RED is open source with support for running on low-power devices with linux-based operating systems. This makes it particularly interesting in an educational context. Although considered a general-purpose language, it stands out for its simplicity regarding networking, APIs and line services. Moreover, it also allows to create graphical interfaces for rich and advanced visualization [2]. 2.4
Embedded System
An embedded system is a computer system made up of a combination of hardware and software designed to perform a specific function [6]. They are small, low-power computers that work as part of a larger device or system. In this work, the Single Board Computer (SBC) Raspberry Pi 4 model B is proposed. This embedded system is a low cost single board computer including a Broadcom processor, RAM, GPU, USB ports, HDMI, Ethernet, 40 GPIO pins and so on. The integration of all the software components to deploy the emulated plant is done by means of this SBC.
Physical Level Plant Identification and Emulation
3 3.1
243
Experiments and Results Identification of the Real Plant
The minimum number of elements that must be configured to obtain an optimal PRBS signal is given by Eq. 1. Then, the results obtained by selecting a 1 s sampling time are shown below. 10 Ts = 10 = Tm 1 n = 2N −1 = 29 = 512
(5)
N=
(6)
PRBS signal is generated with the number of samples obtained. This procedure is repeated 7 times with 10% incremental steps to generate PRBS signal within the operating range of the real plant. This operating range is set between 30 and 80%. The transfer functions are calculated using the matrix obtained with the PRBS signal, the ident toolbox and Eq. 4. In this case, transfer functions degree 4 (m = 4, n = 3), degree 3 (m = 3, n = 2) and degree 2 (m = 2, n = 1) are obtained according to Eq. 4. Next, the transfer functions obtained and the real plant are subjected to the action of the same PID controller. In this sense, it should be noted that the TF with degree 2 (m = 2, n = 1) presents most similarity with the real plant. On the other hand, the TF is different at each operating point due to the system’s nonlinearity. This difference is noticeable when the TF is written with different format than the one used for the difference equation (Column 3, Table 1). Table 1 summarizes the transfer functions for each operating point. 3.2
Verification
Once the transfer functions have been obtained for the different operating points, it must be checked whether the responses are similar to the responses of the real plant. For this purpose, a Matlab script is developed in which the real system and the emulated system are stabilized using a PID controller. After stabilization, the system is subjected to changes in the set point and the responses of both systems are plotted. This analysis is performed for each operating point. Table 1. Transfer Functions obtained for each operating point SP (%) TF for Difference Equation (z) TF (z) 30 40 50 60 70 80
0.0438z −1 +0.0239 z −2 −0.9510z −1 +0.0018 0.0444z −1 +0.0261 z −2 −0.9601z −1 +0.0029 0.0445z −1 +0.0248 z −2 −0.9708z −1 +0.0019 0.045z −1 +0.0229 z −2 −0.9765z −1 +0.0035 0.0455z −1 +0.0226 z −2 −0.9772z −1 +0.0018 0.0452z −1 +0.0215 z −2 −0.9815z −1 +0.0038
13.28z 2 +24.33z z 2 −528.33z+555.55 9z 2 +15.31z z 2 −331.07z+344.82 13.05z 2 +23.42z z 2 −510.95z+526.32 6.54z 2 +12.86z z 2 −279z+285.71 12.56z 2 +25.23z z 2 −542.89z+555.56 5.66z 2 +11.89z z 2 −258.29z+263.16
244
D. M´endez-Busto et al.
Figure 4 shows the verification results for an operation point of 60%, where the red line represents the real process value, the blue line the emulated process value and the black line the operation or set point.
Fig. 4. Verification results for a 60% operation point
Additionally, the percentage of similarity obtained is shown in Table 2. It evaluates the similarity between real and emulated plant in response to a stimulus signal. More specifically, a PRBS signal in this case. It can be obtained using Eq. 7 implemented by Matlab ident toolbox [1]. norm (Y r − Y e) × 100 (7) S(%) = 1 − norm (Y r − mean (Y r)) where: – – – – –
S(%): Percentage of similarity Y r: Real system response Y e: Emulated system response norm: Euclidean norm mean: Average or mean value
Physical Level Plant Identification and Emulation
245
Table 2. Similarity results for each operating point Set Point (%) Similarity (%)
3.3
30
41.69
40
67.70
50
82.21
60
87.26
70
87.89
80
90.60
Implementation
Python File. A Python file is created including the transfer functions obtained in the previous section. In addition, the implementation of the PID controller is carried out following the Eq. 3. This file is hosted on a Raspberry Pi and is called cyclically every second. First, the configuration of the TF is executed depending on the setpoint selected by the user. Then, the input of the selected TF is connected to the output of the PID controller. Finally, the PID controller’s parameters (Kp, Ti, Td ) are configured by the user. Node-RED. In this application, a variable initialization node and a cyclic node that runs every second manage the app execution. More specifically, user commands and Python file execution are scheduled. Additionally, the graphical nodes allow the user to enter the configuration values and represent the outputs of the emulated plant.
Fig. 5. Synoptic visualization of the emulated plant
246
D. M´endez-Busto et al.
Finally, the visualization developed with Node-RED’s UI runs on a web server, enabling remote connection to the emulated plant. Figure 5 shows the synoptic of the emulated plant.
4
Conclusions
This work proposes a low-cost embedded system-based emulation environment for Control Engineering practical lessons. The methodology has achieved highly satisfactory results since the simulated plant’s response is consistent with the response of the physical level control plant. Therefore, students can benefit from using virtual plants sharing identical dynamic responses and then compare the results obtained. On the other hand, the emulator’s synoptic includes some interesting features such as remote access to the emulated plant with similar layout as the physical plant. This makes a significant contribution to BL methodology. Future research could suggest several genuine systems that might be incorporated to the application covered in this paper. More specifically, temperature control plants or aero pendulums from the EPEF’s lab could be proposed. Also, the experience could be enhanced by providing analog inputs and outputs to the SBC enabling external control of the emulated plant. In addition, a plant emulation deployed into a PC or a smartphone could be an interesting alternative approach to a SBC. Finally, a comparative study of both physical and virtual interaction with the plant, and its impact on the skills that students can acquire, can be considered in future work.
References 1. Matlab nonlinear model identification documentation website. https://www. mathworks.com/help/ident/nonlinear-model-identification.html. Accessed 10 May 2023 2. Node-red website documentation. https://nodered.org/about/. Accessed 10 May 2023 3. Ayani, M., Ganeb¨ ack, M., Ng, A.H.: Digital twin: applying emulation for machine reconditioning. Procedia CIRP 72, 243–248 (2018) 4. Fernandez-Serantes, L., Casteleiro-Roca, J., Calvo-Rolle, J.: Hybrid intelligent system for a half-bridge converter control and soft switching ensurement. Revista Iberoamericana de Autom´ atica e Inform´ atica industrial (2022) 5. Jove, E., et al.: Hybrid intelligent model to predict the remifentanil infusion rate in patients under general anesthesia. Logic J. IGPL 29(2), 193–206 (2020). https:// doi.org/10.1093/jigpal/jzaa046 6. Malinowski, A., Yu, H.: Comparison of embedded system design for industrial applications. IEEE Trans. Ind. Inf. 7(2), 244–254 (2011) 7. M´endez-Busto, D., et al.: Low-cost hardware platform implementation for system identification and emulation of a real-level control plant. In: INTED2023 Proceedings, pp. 2585–2593. IATED (2023)
Physical Level Plant Identification and Emulation
247
8. Porras, S., Jove, E., Baruque, B., Calvo-Rolle, J.L.: A comparative analysis of intelligent techniques to predict energy generated by a small wind turbine from atmospheric variables. Logic J. IGPL (2022). https://doi.org/10.1093/jigpal/jzac031 9. Porter, W.W., Graham, C.R., Spring, K.A., Welch, K.R.: Blended learning in higher education: institutional adoption and implementation. Comput. Educ. 75, 185–195 (2014). https://doi.org/10.1016/J.COMPEDU.2014.02.011 10. Reeves, S.M., Crippen, K.J.: Virtual laboratories in undergraduate science and engineering courses: a systematic review, 2009–2019. J. Sci. Educ. Technol. 30, 16–30 (2021) 11. Simi´c, S., Bankovi´c, Z., Villar, J.R., Simi´c, D., Simi´c, S.D.: A hybrid fuzzy clustering approach for diagnosing primary headache disorder. Logic J. IGPL 29(2), 220–235 (2020). https://doi.org/10.1093/jigpal/jzaa048 12. Singh, G., Mantri, A., Sharma, O., Kaur, R.: Virtual reality learning environment for enhancing electronics engineering laboratory experience. Comput. Appl. Eng. Educ. 29(1), 229–243 (2021) 13. Zayas-Gato, F., et al.: Intelligent model for active power prediction of a small wind turbine. Logic J. IGPL (2022). https://doi.org/10.1093/jigpal/jzac040 14. Zayas-Gato, F., et al.: 3D virtual laboratory for control engineering using blended learning methodology. In: Garc´ıa Bringas, P., et al. International Joint Conference 15th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2022) 13th International Conference on EUropean Transnational Education (ICEUTE 2022). CISIS ICEUTE 2022 2022. Lecture Notes in Networks and Systems. vol 532. Springer, Cham (2023). https://doi.org/10.1007/ 978-3-031-18409-3 25 15. Zayas-Gato, F., Quinti´ an, H., Jove, E., Casteleiro-Roca, J.L., Calvo-Rolle, J.L.: Dise˜ no de controladores PID. Universidade da Coru˜ na, Servizo de Publicaci´ ons (2020) 16. Zhao, Yong, Watterston, Jim: The changes we need: education post COVID-19. J. Educ. Change 22(1), 3–12 (2021). https://doi.org/10.1007/s10833-021-09417-3
A Simulation Platform for Testing Negotiation Strategies and Artificial Intelligence in Higher Education Courses Adri´ an Heras1 , Juan M. Alberola2(B) , Victor S´ anchez-Anguix3 , Vicente Juli´ an2,4 , and Vicent Botti2,4 1
4
Universitat Polit`ecnica de Val`encia, Camino de Vera, s/n, 46022 Valencia, Spain [email protected] 2 Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Polit`ecnica de Val`encia, Camino de Vera, s/n, 46022 Valencia, Spain {jalberola,vjulian,vbotti}@upv.es 3 Instituto Tecnol´ ogico de Inform´ atica, Grupo de Sistemas de Optimizaci´ on Aplicada, Ciudad Polit´ecnica de la Innovaci´ on, Edificio 8g, Universitat Polit`ecnica de Val`encia, Camino de Vera s/n, 46022 Valencia, Spain [email protected] ValgrAI (Valencian Graduate School and Research Network of Articial Intelligence), Universitat Polit`ecnica de Val`encia, Camino de Vera, s/n, 46022 Valencia, Spain
Abstract. Teaching Artificial Intelligence in higher education develops critical thinking, problem-solving, and computational skills. Negotiation is a crucial aspect of multi-agent systems, enabling agents to achieve their goals through communication and collaboration. In this area, simulation platforms provide a flexible and safe way to experiment, leading to improvements in negotiation and decision-making in a wide range of scenarios. Board games, which include interaction and negotiation between players, provide a low-cost and low-risk way to experiment with different negotiation strategies. In this context, we present a novel simulation platform based on the rules of Catan, a popular boardgame that entails both strategic thinking and negotiation. This platform is oriented to teach negotiation in artificial intelligence and multi-agent systems. The platform allows students to develop and program intelligent agents to play autonomously, improving their technical skills and applying their understanding of Artificial Intelligence concepts. Keywords: Simulation platform teaching
1
· negotiation · Catan · board game ·
Introduction
Artificial Intelligence is rapidly becoming a key technology in various fields of the society. With the increasing demand for Artificial Intelligence professionals across industries, it is essential that European higher education institutions provide their students with the necessary skills and knowledge to succeed in this c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 248–257, 2023. https://doi.org/10.1007/978-3-031-42519-6_24
A Simulation Platform for Testing Negotiation Strategies
249
competitive job market [13]. Teaching Artificial Intelligence in higher education is not only about preparing students for future careers but also about developing their critical thinking, problem-solving, and computational skills [9]. Artificial Intelligence provides unique opportunities for enhancing teaching and learning, as well as advancing research in diverse fields such as psychology, neuroscience, and education [8]. When analyzing courses and subjects related to Artificial Intelligence topics, such as those based on the Association for Computing Machinery (ACM) curriculum recommendations, it can be observed that some fields of Artificial Intelligence are widely and commonly taught, such as Machine Learning, adversarial search, or multi-agent systems [2]. In this context, platforms that offer simulations of Artificial Intelligence-powered systems are particularly useful for teaching Artificial Intelligence concepts and programming skills [3]. Specifically, negotiation is a crucial aspect of multi-agent systems as it enables agents to achieve their individual and collective goals through communication and collaboration [1,10,11]. Negotiation allows agents to reach agreements on how to allocate resources, coordinate actions, and resolve conflicts in a way that benefits all parties involved. Effective negotiation strategies can lead to more efficient resource allocation, reduced conflict and improved overall system performance. Overall, simulation platforms offer a flexible and safe way to experiment with negotiation strategies and Artificial Intelligence techniques in multi-agent systems [4,6,7]. In this context, board games which include interaction and negotiation between players provide challenging scenarios for teaching, promoting the engagement and the acquisition of learning objectives. In this paper, we present a novel simulation platform for testing negotiation strategies in the area of artificial intelligence and multi-agent systems. This platform provides a unique opportunity for students to develop and program intelligent agents that can play the game autonomously. This not only improves their technical skills but also enables them to apply their understanding of multi-agent and other Artificial Intelligence concepts, such as machine learning and decision-making, to real-world scenarios. By creating a controlled environment that mirrors real-world negotiation settings, board games provide a low-cost and low-risk way to experiment with different negotiation strategies and observe how they affect the outcome of the negotiation. Additionally, board games offer complexity and interactivity, and they promote engagement in students’ learning [5,12]. The rest of the paper is organized as follows. Section 2 provides a review of several board games in order to compare them according to the negotiation and social interaction components. Section 3 describes the simulation platform and the internal components. Section 4 presents some experiments in order to validate the simulation platform. Finally, in Sect. 5 we draw some concluding remarks and future work lines.
2
Board Games
In this section, we provide a description of different board games that include some kind of interaction and negotiation between players and could be used to
250
A. Heras et al.
develop a simulation platform for negotiation. All of these games have different levels of social interaction and strategic complexity, making them potential options for testing negotiation and decision-making strategies. Citadels is a district construction game with hidden role mechanics, where players take on hidden roles and build their own city while trying to destroy their opponents’. The game involves deduction, as players must use their role to their advantage and figure out their opponents’ roles. Diplomacy is one of the most representative military strategy games in the genre, along with Risk. The objective is to control at least half of Europe’s supply centers. It is a game where communicating with other players is key to being able to obtain territories that no one else would obtain. It lacks an element of luck, making it entirely dependent on convincing other players of actions that one intends to take and then betraying them to obtain the territories that one really intended to capture. Due to the high need for communication between players, negotiation is crucial to gaining points without going to war with another player or without the need to betray them. Munchkin is a humorous card game with a fantasy theme, where players compete to reach level 10 by defeating monsters and collecting loot. The game involves a lot of verbal communication and competitiveness, as players can both help and hinder each other’s progress. Clue is a classic deduction game where players solve a murder mystery by gathering clues and eliminating suspects. The game involves a lot of verbal communication and deduction, as players must use the information they gather to piece together the solution. Carcassonne is a strategy game where players lay tiles and build a landscape, trying to score the most points by completing features such as roads, cities, and monasteries. There is a component of strategy and luck to see which player closes the largest city or the largest meadow since it grants extra points. The game involves both competitiveness and teamwork, as players must balance their own goals with the need to cooperate in order to complete features. Battleship is a classic game of naval strategy where players try to sink their opponent’s ships by calling out coordinates on a grid. The game involves a lot of competitiveness and deduction, as players must use the information they gather to strategically place their ships and make educated guesses about their opponent’s placement. Mill is a strategy game where each player tries to place 3 of their pieces in a row, next to each other, by moving vertically or horizontally one of their pieces on their turn. This game requires strategy at the beginning to decide where to place the initial pieces so as not to produce a block or lose a piece before starting. Werewolf is a social deduction game where players are assigned secret roles as either werewolves or villagers. The game requires players to use deduction and persuasion skills to convince others of their innocence or guilt. Catan, also known as Settlers of Catan, is a popular board game that involves players taking on the role of settlers building and expanding their settlements on a newly discovered island. The game mechanics involve resource management,
A Simulation Platform for Testing Negotiation Strategies
251
trading, and negotiation with other players. Players must negotiate trades and alliances with each other to secure resources they need, while also trying to hinder their opponents’ progress (Table 1). Table 1. Board games comparison. Game
Theme
Type
Players Difficulty Duration
Citadels
City building
2–8
Medium
Diplomacy
Military and political strategy Adventure and fantasy
Strategy, hidden roles Strategy, negotiation
2–7
High
Card, role-playing
3–6
Low
Deduction, board game
3–6
Low
Carcassonne Landscape building Battleship Naval strategy Mill Strategy, abstract
Strategy, board 2–5 game Board game 2 Board game 2
Low
Werewolf
Deduction, 7–20 party game Strategy, board 3–4 game
Low
Munchkin
Clue
Catan
Mystery and detectives
Village deception Trade and building
Low Low
Medium
Social Interaction
30–60 min
Negotiation, Deduction 4–10 hours Negotiation, Verbal Communication 60–120 min Verbal Communication, Competitive 30–60 min Verbal Communication, Deduction 30–45 min Competitive, Teamwork 20–30 min Competitive 10–20 min Planning, Strategic placement 30–60 min Deception, Social deduction 60–120 min Negotiation, Teamwork
In terms of creating a simulation platform, Catan stands out as a particularly strong option for using as a simulation platform to compare negotiation strategies due to its emphasis on trade and building (Table 2). Unlike other games in the table such as Clue or Battleship, which have limited player interaction and lack negotiation components, Catan encourages players to negotiate trades and alliances in order to acquire the resources they need to build their settlements and cities. This results in a dynamic and complex negotiation environment that is more reflective of real-world negotiations. Catan has a broader set of rules and mechanics that could be better adapted to a simulation platform than other simpler games as Munchkin. Additionally, Catan provides greater variety of strategies than other games such as Citadels, what could make the simulation model more interesting and challenging for players. What is more, Catan’s medium difficulty level and relatively short playing time make it a more accessible option than Diplomacy for undergraduate students, which requires a high level of strategic thinking and can take hours to play and understand. Overall, Catan has strategic depth, replayability, and social interaction, making it a popular choice for both casual and serious gamers. Catan’s emphasis on negotiation and resource management makes it a strong option for using as a simulation platform in order to incorporate negotiation strategies.
252
3
A. Heras et al.
Simulation Platform
The general architecture of the simulation platform can be observed in Fig. 1. This platform consists of two main components: the simulation environment and the visualizer. This division allows decoupling the game execution from the visualization, which enables the deactivation of this second component for large-scale testing. To visualize a previously executed game, the information can be loaded into a JSON file format in the visualizer. This file contains a trace of everything that happened in the game, which could be used as historical information to implement negotiation strategies.
Fig. 1. Architecture of the simulation platform
Our simulation platform has been developed to allow for the incorporation of intelligent agents without the need to recompile the code, so that they can be loaded as independent scripts. Therefore, Python has been used as the programming language, which allows agents to be launched as independent scripts, as it does not require compilation. In addition, there are many libraries related to Artificial Intelligence for Python, which would allow for the integration of other technologies for agent development. The visualizer component has been implemented in HTML and Javascript. 3.1
Visualizer
The visualizer’s design is shown in Fig. 2. Its design is oriented to provide information without the need to consult a user guide. The interface prioritizes the trade log since it allows us to see what happened in the turn. The visualizer is designed to display all the important data of the game. The visualizer has various advance and rewind buttons, which can be used, respectively, to advance or rewind the phases, turns, or rounds. It also has a
A Simulation Platform for Testing Negotiation Strategies
253
Fig. 2. Visualizer
play-pause button, to which the turn speed can be selected. The visualizer also has a dropdown menu that contains two buttons to allow the user to choose a specific round to observe and to load a JSON file with a game, thereby enabling them to observe everything that happened in the game. 3.2
Simulation Environment
The simulation logic is described in the diagram shown in Fig. 3. As it can be seen, the execution of a game consists of a sequence of rounds where turns are passed among the four players. Each turn is divided into four phases: (i) the start phase: where the dice are rolled and materials are received; (ii) the commerce phase, where trades with other players are proposed; (iii) the construction phase, where materials can be spent to build buildings; (iv) and the end phase: where victory points are counted and special condition cards are awarded if the conditions are met. If a player has 10 victory points, the game ends.
Fig. 3. Execution diagram
254
A. Heras et al.
To carry out the aforementioned execution diagram, the simulation environment consists of a series of components. The two main components are the GameDirector and the GameManager. On the one hand, the GameDirector is responsible for directing the game, passing the turn to each player and controlling the order of rounds, as well as checking if someone has won the game. On the other hand, the GameManager is responsible for executing the actions determined by the GameDirector, checking that they comply with the game rules. Specifically, in each specific phase of the game, the GameDirector executes the specific function and then the GameManager calls the trigger of the intelligent agent that corresponds to the turn. This trigger returns the action that the intelligent agent wants to perform, and the GameManager checks if it is legal. Apart from the GameManager, there are other components responsible for executing the core of the game: TurnManager, CommerceManager, BotManager, and TraceManager. The TurnManager is responsible for keeping track of the rounds, turns, and phases of the game. The CommerceManager is responsible for making all trades with the bank. As previously mentioned, the GameManager handles player-to-player trades, so the CommerceManager is delegated when intelligent agents want to exchange their materials with the bank. Finally, the BotManager manages all the data of each intelligent agent. When the GameManager needs to check if the intelligent agents have enough materials to perform any action, it communicates with the BotManager. In addition, the BotManager also keeps track of victory points, victory point cards each player has, development cards in hand, number of knights, whether they have the special cards for largest army and longest road, and whether they have played a development card this turn. Finally, the TraceLoader is responsible for exporting a trace of the game to a JSON object that a viewer can use to display the game. 3.3
Negotiation
Negotiation is an important part of the game, as players must trade resources to build their settlements, cities, and roads. In this sense, during the commerce phase, players can offer resource exchanges with other players, such as trading a quantity of wheat for a number of bricks. Figure 4 shows the diagram of the commerce phase implemented in our simulator. As can be seen, there is a maximum number of trades set per turn to prevent negotiation from dragging on. From there, trade is limited by development cards and by the amount of resources each player has, which will limit their ability to trade through a port. The on_commerce_phase trigger is associated to the trading behavior of agents. This trigger has a different behavior depending on the action it returns: – The materials being offered and those being demanded. The trade will always be 4:1, 3:1, or 2:1, depending on whether they do not have a port, have a port, or have a port specialized in the offered material, respectively. – An offer with the quantity and materials to be traded with others. – A development card where the card to be used is indicated. – None if no action is desired.
A Simulation Platform for Testing Negotiation Strategies
255
Fig. 4. Commerce phase diagram
If any action is taken, it is checked whether the maximum number of possible trades has been exceeded to end the commerce phase. The use of this simulation platform in the classroom would enable the experimentation of different negotiation strategies and the development of programming skills in Artificial Intelligence techniques. We believe that the platform offers an interactive and engaging learning experience, where students can learn by doing and apply their knowledge to solve real-world problems.
4
Evaluation
In this section, we present some tests to validate the usability and behavior of the simulation platform. For this purpose, an intelligent agent with a simple negotiation logic has been implemented and faced against other baseline agents that serve as a basis for students to improve in their courses. A total of 120 games have been played, distributed in 4 groups of 30 each, where the position of the intelligent agent has been varied between players 1 and 4, in order to avoid possible biases associated with the player’s position in each round of the game. The negotiation logic of the intelligent agent for the commerce phase is described as follows. The agent plays a Monopoly card if it has traded three or more materials of the same type with a player in the previous trade. It never trades with the bank, whether it has ports or not. If it has at least one settlement built and already has enough materials to build a city, it closes the trade phase. If it has at least one settlement and does not have enough materials, it asks for the missing ones to complete a city in exchange for the same amount of materials but for those that are not necessary for a city. If it does not have
256
A. Heras et al. Table 2. Results of the intelligent agent playing against baseline agents. Player 1 Player 2 Player 3 Player 4 Intell. Base. Intell. Base. Intell. Base. Intell. Base. % victories (avg)
0.57
0.14
0.53
0.16
0.6
0.13
0.6
0.13
points (avg)
8.07
4.77
8
4.96
8.13
4.42
8.47
4.63
# largest army (avg) 15
3.67
15
3.33
12
2.67
19
2.67
# longest road (avg) 7
1
4
0.67
7
1
4
0.33
settlements, it asks for the materials needed to create a settlement, unless it already has enough, in which case it ends the trade phase. Table 2 shows the results obtained from the experiments. In all cases, the intelligent agent wins the majority of the games, having a wide margin of victories and points compared to the baseline agents. It should be noted that in all cases, the intelligent agent obtains an average of 8 or more points out of 10. In addition, the special cards for the largest army and longest road are also shown. The intelligent agent clearly outperforms the baseline agents in these parameters. In view of the results, it is expected that the implementation of agents with more complex artificial intelligence techniques will offer better results. However, these tests demonstrate that the simulation platform works correctly and that a simple intelligent logic that can be designed by both undergraduate and graduate students is capable of beating the behavior of implemented baselines, something very interesting when the simulation platform is intended for students.
5
Conclusions
In this paper, we presented a novel simulation platform for testing negotiation strategies when teaching Artificial Intelligence in higher education studies. As we stated, simulation environments provide a controlled and repeatable setting to test different negotiation strategies and observe how agents interact with each other. Specifically, in our simulator, these strategies are carried out in a context inspired by the popular board game of Catan. Compared to other popular board games, Catan provides opportunities for cooperation, which makes it a viable option to be used as a simulation platform for integrating negotiation strategies. In our experiments, we developed a simple but intelligent agent and compared its performance against baseline agents. The results showed that the intelligent agent consistently outperformed the baseline agents in all cases, winning the majority of the games with a significant margin of victories and points. The intelligent agent also performed better in obtaining the special cards for the largest army and longest road. These results suggest that the implementation of agents with more complex artificial intelligence techniques would likely yield even better results. Nonetheless, the simulation platform has demonstrated its effectiveness in showcasing the benefits of incorporating negotiation strategies in a social and cooperative board game like Catan. This makes it a valuable
A Simulation Platform for Testing Negotiation Strategies
257
tool for students and researchers interested in exploring negotiation strategies in a simulated environment, especially when teaching Artificial Intelligence. In future work, we plan to develop more complex strategies and use this simulation platform to compare those strategies. In addition, we also plan to incorporate this platform into university teaching in subjects related to Artificial Intelligence and negotiation in multi-agent systems. Acknowledgements. This work is supported by DIGITAL-2022 CLOUD-AI-02 funded by the European Comission and grant PID2021-123673OB-C31 funded by MCIN/AEI/ 10.13039/501100011033 and by ”ERDF A way of making Europe.
References 1. Aydo˘ gan, R., Jonker, C.M.: A survey of decision support mechanisms for negotiation. In: Hadfi, R., Aydo˘ gan, R., Ito, T., Arisaka, R. (eds) Recent Advances in Agent-Based Negotiation: Applications and Competition Challenges. IJCAI 2022. Studies in Computational Intelligence. vol 1092. Springer, Singapore (2023). https://doi.org/10.1007/978-981-99-0561-4 3 2. Clear, A., Parrish, A.S., Impagliazzo, J., Zhang, M.: Computing curricula 2020: introduction and community engagement. In: Proceedings of the 50th ACM Technical Symposium on Computer Science Education, pp. 653–654 (2019) 3. Dai, C.P., Ke, F.: Educational applications of artificial intelligence in simulationbased learning: a systematic mapping review. Comput. Educ. Artif. Intell. 3, 100087 (2022) 4. Fabregues, A., Sierra, C.: DipGame: a challenging negotiation testbed. Eng. Appl. Artif. Intell. 24(7), 1137–1146 (2011) 5. Fachada, N.: ColorShapeLinks: a board game AI competition for educators and students. Comput. Educ. Artif. Intell. 2, 100014 (2021) 6. Hindriks, K., Jonker, C.M., Kraus, S., Lin, R., Tykhonov, D.: GENIUS: negotiation environment for heterogeneous agents. In: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. vol. 2, pp. 1397–1398 (2009) 7. Nakamura, N., et al.: Constructing a human-like agent for the werewolf game using a psychological model based multiple perspectives. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2016) 8. Ouyang, F., Jiao, P.: Artificial intelligence in education: the three paradigms. Comput. Educ. Artif. Intell. 2, 100020 (2021) 9. Popenici, S.A.D., Kerr, S.: Exploring the impact of artificial intelligence on teaching and learning in higher education. Res. Pract. Technol. Enhanced Learn. 12(1), 1– 13 (2017). https://doi.org/10.1186/s41039-017-0062-8 10. Sanchez-Anguix, V., Julian, V., Botti, V., Garc´ıa-Fornes, A.: Tasks for agent-based negotiation teams: analysis, review, and challenges. Eng. Appl. Artif. Intell. 26(10), 2480–2494 (2013) 11. Sanchez-Anguix, V., Tunalı, O., Aydo˘ gan, R., Julian, V.: Can social agents efficiently perform in automated negotiation? Appl. Sci. 11(13), 6022 (2021) 12. Sardone, N.B., Devlin-Scherer, R.: Let the (board) games begin: creative ways to enhance teaching and learning. Clearing House J. Educ. Strat. Issues Ideas 89(6), 215–222 (2016) 13. Tegmark, M.: Being Human in the Age of Artificial Intelligence. Alfred A Knoph, New York (2017)
Special Session 1: Using Machine Learning Techniques in Educational and Healthcare Settings: A Path Towards Precision Intervention
Eye-Tracking Technology Applied to the Teaching of University Students in Health Sciences María Consuelo Sáiz-Manzanares1(B) , Irene González-Díez1 and Carmen Varela Vázquez2
,
1 Facultad de Ciencias de La Salud, Burgos University, Paseo de Comendadores S/N, 09001
Burgos, Spain [email protected] 2 Facultad de Psicología, Barcelona University, Gran Via de Les Corts Catalanes, 585, 08007 Barcelona, Spain [email protected]
Abstract. Eye-tracking technology together with supervised and unsupervised Machine Learning techniques facilitate the monitoring of students during the performance of learning tasks. Specifically, the use of both has been shown to be effective for the analysis of the learning process in Health Science students. The objectives of this work were: 1) to know the effect of performance (correct vs. incorrect) in a conceptual comprehension test on some eye-tracking parameters; 2) to find out whether the effect of execution was a predictor some eye-tracking parameters; 3) to know which of the eye-tracking metrics would have a greater relevance for the classification of students with respect to the effect of performance 4) to study the possible clustering without a prior classification variable. The effect of performance was found to be 21% predictive of the metrics studied. In addition, the metrics with the highest classification effect were Average Duration Fixation, Fixation Number and Average Fixation Pupil Diameter. Also, two grouping clusters were found where Fixation Number was discriminant between them. The study of eye tracking metrics together with the applied of Machine Learning techniques facilitates the precision analysis of students’ learning patterns and guides a personalised intervention by the teacher. Keywords: eye-tracking · Machine Learning · personalised learning
1 Use of Eye Tracking Technology Applied to University Teaching 1.1 Applied of Eye Tracking Technology to Teaching in the Health Sciences Eye-tracking technology makes it possible to record the visual tracking of users when they perform a task or action. Traditionally, this type of methodology has been applied in the field of marketing [1]. However, it is currently being implemented in other environments such as education [2, 3]. Specifically, in recent years this technology has been used very successfully to monitor learning in Health Science students [4, 5]. Precisely, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 261–271, 2023. https://doi.org/10.1007/978-3-031-42519-6_25
262
M. C. Sáiz-Manzanares et al.
eye-tracking technology in this field is used for the analysis of student learning patterns in activities framed within the teaching methodology of evidence-based learning [6]. This methodology is implemented through simulation dolls or virtual workplaces that include avatar figures that regulate the learning process [7]. The aim of using this technology is to find attentional parameters that facilitate the prediction of successful learning patterns or behaviours [8–10]. In addition, the use of eye tracking technology facilitates the analysis of the cognitive processes linked to the performance of a learning task and enhances knowledge about the cognitive load that the execution of the activity entails for each learner [8]. However, the use of eye tracking technology entails challenges that are described below. 1.2 Applied of Machine Learning Techniques for the Analysis of Eye Tracking Parameters Eye tracking technology records many metrics on the visual tracking of the learner during the execution of a task. Among the possible metrics, static metrics and dynamic metrics can be distinguished. Among the former, fixations (refer to the positioning point of the pupil of the eye within a stimulus) and saccades (refer to the passage from one part of a stimulus to another). In addition, within both, we can differentiate the parameters of frequency, number, dimension, speed, etc. Also, within the static metrics are the pupil diameter and the number of visits to parts of a stimulus. With regard to the dynamic metrics, these refer to the position chain in the Cartesian coordinate space of the fixation points. To find this chain, analysis techniques such as string-edit methods or k-means clustering must be used [2]. Likewise, dynamic metrics allow the use of data visualisation techniques such as heat maps, gaze point or scan path. The former shows the frequency of eye positioning within the parts of a stimulus along the interaction chain during execution, and the latter shows the eye positioning points along the execution path followed by each learner [2, 8]. All these analyses include the use of Machine Learning techniques. These can be supervised, among which we differentiate between prediction and classification techniques, and unsupervised, among which clustering techniques stand out. A more complete description can be found in Sáiz-Manzanares et al. [2, 3, 6, 8]. In short, the use of eye tracking technology and Machine Learning techniques makes it easier to know the cognitive load and the way each learner learns [8–10]. A summary of the application procedure of eye-tracking technology in the field of behavioural learning analysis can be found in Fig. 1.
Fig. 1. Eye tracking technology and Machine Learning techniques in personalised learning.
According with the aforementioned state of the art, the aim of this study was to analyse the learning patterns of students of the Bachelor’s Degree in Occupational Therapy
Eye-Tracking Technology Applied to the Teaching of University Students
263
during the learning process in virtual simulation laboratories for the resolution of clinical cases. Likewise, the research questions (RQ) were: RQ1: Will there be significant differences in the performance among participants in the TTID and TRD metrics depending on the results obtained in the performance of the conceptual comprehension test on the contents seen in the virtual simulation laboratory (correct vs. not correct)? RQ2: Will correct vs. not correct performance in the conceptual comprehension test be a predictor of the eye-tracking monitoring parameters. RQ3: Which of the eye-tracking monitoring parameters will have a greater relevance in the classification of students’ learning patterns? RQ4: Will different groupings be found for the eye-tracking monitoring parameters without a prior assignment variable? The method used in this study, the results found with respect to the research questions posed and the conclusions will be presented below.
2 Materials and Methods 2.1 Samples We worked with 20 students of Health Sciences, in the Degree in Occupational Therapy (19 women and 1 man) age range 20.1–21.2. The selection of the sample applied a convenience sampling. 2.2 Instruments a) The eye-tracking equipment used was Tobii pro lab version 1.194.41215 and 15.6 inch monitor with a resolution of 1920 X 1080 was used. 64 Hz was applied. In this study the static metrics (TTID, TRD, ADF, FN, FN, AFPD, APSV, MinSA, MaxSA) and dynamic metrics (Heat Map and Gaze Point) were analysed. The statistical software SPSS v.28 [11] and the Data Mining software Orange v.3.34 [12] were used to study the recorded data. b) The stimulus applied was a virtual simulation laboratory in which it was explained how to solve a clinical case of a child (chronological age 0–3 years) with pervasive developmental impairment in order to carry out a therapeutic intervention. The duration of the laboratory depended on each user, as he/she decided to move on to the next scene according to his/her degree of understanding of the information seen in the previous scene. The Virtual Lab was developed in the framework of the European project eEarly-Care-T (http://eearlycaret.eu/) and is available in the project’s Virtual Learning Environment. c) At the end of the virtual lab, students had to solve a conceptual comprehension test. To do so, they had to rank the degree of affectation of the supposed patient in the different areas of evolutionary development worked on in the video (psychomotor, cognitive, communication and language and personal autonomy and socialisation) from highest to lowest.
264
M. C. Sáiz-Manzanares et al.
2.3 Procedure Prior to the study, a positive report was obtained from the Bioethics Committee of the University of Burgos, No. IO 04/2022. Participation in the research was voluntary and all participants previously signed a written informed consent form. Interested students carried out the virtual laboratory during the practical classes. The use of the eye-tracking technology was carried out by a specialist in this technology in a room free of stimuli and with light, sound and temperature control. The first step was the calibration of each participant (adjustment of the visual focus to the cardinal points of intersection on a screen), with parameters between 87% and 97% being considered valid adjustment parameters. Next, the student visualised in the virtual laboratory the steps to carry out the intervention in a case of a child with a general developmental impairment. In this laboratory, the figure of an avatar simulating an experienced therapist was regulating the student’s performance in a simulation framework. At the end, the experimental student had to take a comprehension test on the most important concepts seen in the simulation laboratory. At the end of the performance, the experimental student received feedback on the answer given. 2.4 Data Analysis Given the characteristics of the sample, non-parametric statistics (Mann Whitney U is the non-parametric version of the usual Student’s t-test) were applied to test RQ1. U1 = n1 n2 +
n1 (n1 + 1) − R1 2
U2 = n1 n2 +
n2 (n1 + 1) − R12 2
where U it is the minimum of U1 and U2 . Likewise, to test the: RQ2 the supervised Machine Learning technique of prediction (linear regression is a mathematical model used to approximate the dependence relationship between a dependent variable and several independent variables) was applied Yt = β0 + β1 X1 + β2 X2 + · · · + ε where the independent variable was the correct vs. not correct performance in the conceptual comprehension test and the independents variables were the parameters ADF, FN, AFPD, APSV, MinSA and MaxSA. With regard to the RQ3 the supervised Machine Learning technique of classification (nearest neighbour -k-NN is used as a method of classifying objects based on training by means of close examples in the item space) was applied: density function F x Cj of the predictor variables x for each class Cj . Also, to contrast RQ4 the unsupervised Machine Learning technique of clustering, k-means (it is an unsupervised classification (clustering) algorithm that groups objects
Eye-Tracking Technology Applied to the Teaching of University Students
265
into k groups based on their characteristics). argmins
k
xj − μi 2
i=1 xj∈Si
The statistical analysis software SPSS v.28 [11] and the Data Mining program Orange v.3.34 [11] were used to perform the analyses [12].
3 Results Regarding RQ1, no significant differences were found between participants who solved the conceptual comprehension test correctly vs. those who did not solve it in the performance duration parameters (TTID: U = 32 p = 0.67; TRD: U = 26 p = 0.35). The distribution of performance on the participants’ metrics can be seen in Fig. 2. Those who did not have correct answers in the first option in the comprehension test are indicated with a circle.
Fig. 2. Relationship between the eye-tracking parameters and knowledge test solution.
Next, to contrast the RQ2, a linear regression analysis was applied to determine the predictive value of the correct versus incorrect execution variable in the comprehension test of the content seen in the virtual laboratory on the static parameters (ADF, FN, AFPD, APSV, MinSA and MaxSA). An R2 = 0.21 was obtained, which indicates that the result in conceptual understanding predicts 21% of the results obtained in the aforementioned measurement parameters (see Table 1). Significant differences were also found in ADF. Tolerance (T) values were greater than 0.1 and Variance Inflation Value (VIV) values did not exceed 10 in any case. Therefore, it is assumed that there is no collinearity between the static parameters analysed. Subsequently, it was studied which of the parameters listed in Table 1 would have the highest ranking value, for which the nearest neighbour algorithm was applied. The most significant ranking parameters were found to be ADF, FN and AFPD. The distribution of the students with respect to these parameters can be seen in Fig. 3.
266
M. C. Sáiz-Manzanares et al. Table 1. Linear regression analysis.
Eye-tracking parameters
Metric meaning
SC Beta
ADF
Long fixations indicate that the learner needs more time to interpret a piece of information. The average duration is between 200-260ms
0.60
FN
Reference overall performance measures
t
p
T
VIV
2.24
0.04*
0.66
1.51
0.50
1.82
0.09
0.63
1.58
AFPD
Provides information on -0.11 the interest or cognitive load of a stimulus
-0.49
0.70
0.65
1.54
APSV
Refers to the speed of moving from one stimulus to another
0.82
2.10
0.06
0.31
3.18
MinSA
Novice trainees tend to have shorter saccades
-0.14
-0.58
0.07
0.76
1.32
MaxSA
May indicate reduced cognitive effort
-0.72
0.08
0.07
0.34
2.94
Note. Standardised coefficients = SC; T = Tolerance; VIV = Variance Inflation Value. * p < 0.05
Fig. 3. Graphical representation of the results in the nearest neighbour algorithm.
To test RQ3, the k-means algorithm was applied. Two clusters were found (see Table 2) and the elbow method was previously performed to find the recommended optimal clusters (figure available at complementary information in http://bit.ly/3nq69Km). Also, in the ANOVA between the final centres of the clusters, significant differences were found in the FN and MaxSA parameters (see Table 2). Cluster 1 members showed more fixation and a higher amplitude of saccades. Figure 4 shows the distribution of the participants according to a hierarchical clustering for the variables where significant differences were found (FN and MaxSA). Figure 5 also shows the hierarchical clustering in the FN and MxSA parameters with the assignment of the participants to each of them.
Eye-Tracking Technology Applied to the Teaching of University Students
267
Table 2. Final Cluster Centers and ANOVA. Eye-tracking parameters
Cluster 1 n=5
Cluster 2 n = 15
df
F
p
ADF
318
350
18
0.16
0.70
FN
1313
743
18
45.01
< 0.001*
AFPD
2.79
2.86
18
0.007
0.93
APSV
163.33
143.81
18
1.90
0.19
MinSA
30.23
30.07
18
0.15
0.71
MaxSA
1206.17
810.85
18
5.46
0.03*
Nota. df = degrees of freedom. *p < 0.05.
Fig. 4. Visualisation of the hierarchical distribution of participants with respect to the FN and MaxSA parameters.
Regarding the analysis of the dynamic metrics, an analysis of Euclidean distances between the execution paths found in the participants was applied (see Fig. 6). Also, as an example, a trajectory of a student assigned to cluster 1 and a student assigned to cluster 2 is presented. In cluster 1, the execution trajectory is denser in the centre of the screen and in cluster 2 the execution contains more positions both in the centre of the pan-screen and at the edges (see Fig. 7). This analysis can be better appreciated in the Heat Map image (see Fig. 8).
268
M. C. Sáiz-Manzanares et al.
Fig. 5. Distribution of participants in the hierarchical cluster on FN and MaxSA parameters.
Fig. 6. Representation of the Euclidean distance matrix with respect to the participants’ execution trajectory.
Fig. 7. Gaze point of participant cluster 1 vs. 2 participant cluster 2.
Eye-Tracking Technology Applied to the Teaching of University Students
269
Fig. 8. Examples of gaze point of a cluster 1 vs. cluster 2 participant.
4 Conclusions In summary, it seems that the correct execution of a comprehension task predicts 21% of the measures in the ADF, FN, AFPD, APSV, MinSA and MaxSA metrics where the most affected parameter was ADF. On the other hand, it seems that the parameters FN, ADF and APSV are referential for the classification of the results obtained by the students in the knowledge test [8–10]. Likewise, in a cluster analysis without a prior assignment variable, the FN and MaxSA parameters have a differentiating weight. Specifically, in cluster 1, a greater number of fixations and a greater amplitude of the saccade were observed. Therefore, and taking these results from the relativity of the type of sampling and the number of participants. It can be pointed out as a line for future research that there are differences in performance among participants that are not fully explained by the success vs. non-success in the knowledge testing tests. The influence of variables related to the number of fixations, speed and amplitude of the saccade have been found to be directly related to the perceived cognitive load of each trainee [8]. These data translate into the development of execution patterns in the trainees that are based on different attentional levels and the use of metacognitive skills, specifically orientation. These results open up an important scenario for research in the field of the Psychology of Instruction or Learning related to the study of the way of learning by applying eyetracking technology and Machine Learning techniques [2, 3, 6, 8], where the ultimate goal will be the enhancement of precision learning in educational environments [8– 10], similar in procedure to the development of precision medicine. This objective is a challenge and an opportunity especially in Higher Education. Acknowledgements. This study has been funded by two research projects: European eEarlyCareT project No. 2021–1-ES01-KA220-SCH-000032661 and SmartLearnUni R&D&I project No. PID2020-117111RB-I00.
270
M. C. Sáiz-Manzanares et al.
Abbreviations and Explanation Abbreviations
Parameter
Explanation
TTID
Total Time of Interest Duration
TRD
Total Recording Duration
Longer duration usually indicates a greater information processing effort
ADF
Average Duration Fixation
FN
Fixation Number
AFPD
Average Fixation Pupil Diameter
It may provide information about the level of attention or interest in the information provided by the stimulus.
APSV
Average Peak Saccades Velocity
The greater the amplitude of the saccade, the lower the cognitive effort. However, it can also refer to problems in understanding information.
MinSA
Minimum Saccade Amplitude
Novice trainees tend to have shorter saccades
MaxSA
Maximum Saccade Amplitude
May indicate reduced cognitive effort
A greater number of bindings on a stimulus can may indicate that the learner has less knowledge of the task or difficulty in discriminating relevant information.
References 1. Duchowski, A.T., Peysakhovich, V., Krejtz, K.: Using pose estimation to map gaze to detected fiducial markers. Procedia Comput. Sci. 176, 3771–3779 (2020). https://doi.org/10.1016/j. procs.2020.09.010 2. Sáiz-Manzanares, M.C., Rodríguez Diez, J.J., Marticorena Sánchez, R., Zaparaín Yáñez, M.J., Cerezo Menéndez, R.: Lifelong learning from sustainable education: an analysis with eye tracking and data mining techniques. Sustainability 12(5), 1970 (2020). https://doi.org/ 10.3390/su12051970 3. Sáiz-Manzanares, M.C., et al.: Eye-tracking technology and data-mining techniques used for a behavioral analysis of adults engaged in learning processes. J. Vis. Exp. 172(172), 1–16 (2021). https://doi.org/10.3791/62103 4. Mills, B.W., Carter, O.B., Rudd, C.J., Claxton, L.A., Ross, N.P., Strobel, N.A.: Effects of low- versus high-fidelity simulations on the cognitive burden and performance of entry-level paramedicine students: a mixed-methods comparison trial using eye-tracking, continuous heart rate, difficulty rating scales, Video Observation and Interviews. Simul Healthc 11(1), 10–18 (2016). https://doi.org/10.1097/SIH.0000000000000119 5. Tsai, P.Y., Yang, T.T., She, H.C., et al.: Leveraging college students’ scientific evidence-based reasoning performance with eye-tracking-supported metacognition. J Sci Educ Technol 28, 613–627 (2019). https://doi.org/10.1007/s10956-019-09791-xMm 6. Sáiz-Manzanares, M.C., Marticorena-Sánchez, R., Rodríguez-Díez, J.J., Rodríguez-Arribas, S., Díez-Pastor, J.F., Ji, Y.P.: Improve teaching with modalities and collaborative groups in an LMS: an analysis of monitoring using visualisation techniques. J. Comput. High. Educ. 33(3), 747–778 (2021). https://doi.org/10.1007/s12528-021-09289-9
Eye-Tracking Technology Applied to the Teaching of University Students
271
7. Siassi, B., Ebrahimi, M., Noori, S., Sheng, S., Ghosh, D., Seri, I.: Virtual neonatal echocardiographic training system (VNETS): an echocardiographic simulator for training basic transthoracic echocardiography skills in neonates and infants. IEEE J. Transl. Eng. Health Med. 6, 1–7 (2018). https://doi.org/10.1109/JTEHM.2018.2878724 8. Sáiz-Manzanares, M. C., Marticorena Sánchez, R., Martín-Antón, L.J., González-Díez, I., Carbonero Martín, M.Á.: Using eye tracking technology to analyse cognitive load in multichannel activities in University Students. Int. J. Hum. Comput. Interact. 1–19 https://doi.org/ 10.1080/10447318.2023.2188532 9. Merchie, E., Heirweg, S., Van Keer, H.: Mind maps: processed as intuitively as thought? investigating late elementary students’ eye-tracked visual behavior patterns in-depth. Front. Psychol. 13, 821768 (2022). https://doi.org/10.3389/fpsyg.2022.821768 10. Nückles, M.: Investigating visual perception in teaching and learning with advanced eyetracking methodologies: rewards and challenges of an innovative research paradigm. Educ. Psychol. Rev. 33(1), 149–167 (2020). https://doi.org/10.1007/s10648-020-09567-5 11. IBM Corporation.: Statistical Package for the Social Sciences (SPSS) (Version 28) [Software], IBM (2022). https://www.ibm.com/es-es/products/spss-statistics. Accessed 26 Mar 2023 12. Demsar, J., et al.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14, 2349–2353 (2013)
En_Línea. An Online Treatment to Change Lifestyle for People with Overweight and Obesity. A Pilot Study Carmen Varela1,2(B) , Irene González-Diez1 , María Consuelo Sáiz-Manzanares1 , and Carmina Saldaña2 1 University of Burgos, Faculty of Health Sciences, Paseo de Comendadores S/N Burgos,
09001 Burgos, Spain [email protected] 2 University of Barcelona, Faculty of Psychology, Gran Via de Les Corts Catalanes, 585, 08007 Barcelona, Spain
Abstract. The high prevalence rates of obesity and its comorbidity with serious diseases like diabetes or hypertension and psychological problems like depression or anxiety, highlight the necessity of new treatments for this condition. This is a pilot study, which main objective is analyzing the correct functioning of the program with a small sample. The sample of this pilot study is composed by 27 participants. The En_Línea program to change lifestyle in people with overweight and obesity is based on the LEARN program. The results showed significant differences between pre-intervention weight and the middle treatment assessment at week 8 (n = 9, z = −2.521, p = 0.012, r = 0.84). Significant differences were also observed between pre- and post-intervention weight (n = 6, z = −2.201, p = 0.028, r = 0.9) and at 3 months follow-up (n = 5, z = -2.023, p = 0.043, r = 0.9). These results were supported by high effect sizes. In general, the En_Línea program seems a tool with a great potential but presents some limitations, mainly the low adherence rates or the length of the program. This pilot study was helpful to identify and address these limitations before its application with a larger sample. Keywords: Overweight · obesity · treatment · online · lifestyle
1 Introduction The high prevalence rates of obesity [1, 2] and its comorbidity with serious diseases like diabetes or hypertension [3] and psychological problems like depression or anxiety [4], highlight the necessity of new treatments for this condition. Traditionally, behavioral programs have been the more effective [5]. However, their results are usually short-term and with low adherence rates [6]. Online interventions have appeared as new option, although further investigation is needed to prove their effectiveness [7]. However, some advantages have been observed for these kinds of interventions © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 272–278, 2023. https://doi.org/10.1007/978-3-031-42519-6_26
En_Línea. An Online Treatment to Change Lifestyle
273
compared to the traditional ones. For example, they are faster, less expensive, have greater reach and include reminders [8]. This study is part of the En_Línea project, an online treatment to change lifestyle for people with overweight and obesity. A protocol to apply this intervention has been developed, the objective is to prove the effectiveness of this new proposal compared to a traditional intervention and a control group [9]. This is a pilot study, which main objective is analyzing the correct functioning of the program with a small sample. The specific objectives are testing the website and the mobile application and check its proper functioning, confirm that these tools are easy to access and use. To achieve these goals the variables analyzed will be: 1) differences between pre- and post-intervention weight; 2) the maintenance of the weight loss during the follow-ups; 3) adherence rates, at least the 80% of the web activities and 80% of daily records should be completed; 4) differences in quality of life between pre- and post-intervention.
2 Method 2.1 Participants The sample of this pilot study is composed by 28 participants. After applying inclusion criteria [presence of overweight type II or obesity type I, Body Mass Index (BMI) between 27 – 34.9 kg/m2 ; age between 18–65 years; complete all questionnaires in the assessment phase]. One participant did not accomplish the inclusion criteria, 27 participants were included in the final sample and access to the En_Línea program, 74.1% were females. The mean age of the sample was 35.9 (SD = 12.6) years and mean BMI of 32.4 (SD = 3.7) kg/m2 . The following exclusion criteria were also considered: presence of a serious disease, presence of a serious psychological problem, presence of other eating disorder, use of drugs, pregnancy and participate in another weight control program. 2.2 Intervention This pilot study presents only an intervention group, the En_Línea program to change lifestyle in people with overweight and obesity. This treatment is based on the LEARN program [10]. The main treatment areas are lifestyle, physical activity, attitudes, relationships, and nutrition. The En_Línea program is administrated using a web and mobile application to record foods, exercise, and weight. This treatment consists of 17 weekly sessions, each week the participant receives personalized feedback from a specialized therapist. The main objective is the acquisition of long-term health habits. Each participant has a personal username and password only known by themselves; any personal data is required during the sessions. 2.3 Procedure Participants were recruited by posters, the web of the University, email, and social networks. Interested people sent an email to the coordinator of the project and a first
274
C. Varela et al.
meeting is scheduled to explain the project, its objectives, confirm the inclusion criteria and signed the informed consent. After a brief interview to explain the application and the website, anthropometric measures are taken by the therapist (weight, height and waist, arm, and thigh circumferences). To complete the questionnaires, participants receive an email with a link to a web platform. After completing the assessment phase, the included participants receive another email with instructions to begin the program, from this moment the communication will be using the space on the website authorized for this purpose. The presential meeting are at the beginning, at the middle and at the end of the intervention. Moreover, follow-up meeting will be scheduled at 1, 3, 6 and 12 months. 2.4 Instruments During the assessment phase the Spanish version of the following questionnaires was applied. In addition to an ad hoc sociodemographic questionnaire. – ETONA, structured interview to assess food habits [11]. – Temperament and Character Inventory Revised (TCI-R). This is a 240 items instrument to assess temperament and character features [12, 13]. – Bulimic Investigatory Test Edinburgh (BITE). A 33 items tool to assess symptoms of bulimia nervosa [14, 15]. – States of Change for Weight Management (S-Weight) [16]. A 5 items questionnaire to assess the five change periods of weight loss. – Process of Change for Weight Management (P-Weight) [16]. A 33 items questionnaire to assess attitudes and behavior about weight control. – Night Eating Questionnaire (NEQ) [17, 18]. Self-reported scale of 14 items to assess symptoms of night eating syndrome. – SF-36 Health Survey (SF-36) [19, 20]. This is a 36 items scale to assess health and wellbeing. – Coping Strategies Inventory (CSI) [21, 22]. A 40 items tool to assess eight kinds of coping strategies. – The Dutch Eating Behavior Questionnaire (DEBQ) [23, 24]. This is a 33 items questionnaire to assess emotional, external and restrictive eating behavior. – Depression, Anxiety, Stress Scale (DASS-21) [25, 26]. This is a 21 items questionnaire to assess depression, anxiety and stress during the las week. 2.5 Statistical Analysis IBM SPSS Statistics 25 was used to conduct the statistical analysis. Sociodemographic data were presented for the entire sample, using means and standard deviations for quantitative variables. Frequencies and percentages were used to present qualitative variables. To compare the pre- and post-intervention weight differences and the follow-ups the Wilcoxon signed rank test was carried out. The biserial correlation point r was calculated to to check the magnitude of the difference [27].
En_Línea. An Online Treatment to Change Lifestyle
275
Table 1. Shows the sociodemographic data for the entire sample. Variables
Total (n = 27)
Age (M, SD)
35,9 (12,6)
BMI (M, SD)
32,4 (3,7)
Gender (n, %) Women
20 (74,1)
Men
7 (25,9)
Education (n, %) Secondary education
17 (63,0)
Superior education
10 (37,0)
Income (n, %) < 1 MW
11 (40,7)
1–2 MW
11 (40,7)
3–4 MW
2 (7,4)
≥ 5 MW
3 (11,1)
M = Mean; MW = Minimum Wage; SD = Standard Deviation
3 Results The Wilcoxon Test showed significant differences between pre-intervention weight and the middle treatment assessment at week 8 (n = 9, z = −2,521, p = 0,012, r = 0,84). Significant differences were also observed between pre- and post-intervention weight (n = 6, z = −2,201, p = 0,028, r = 0,9) and at 3 months follow-up (n = 5, z = -2,023, p = 0,043, r = 0,9). These results were supported by high effect sizes. Despite of the differences for the 1-, 6- and 12-months follow-ups compared to the baseline were not significant, a weight loss was observed. Figure 1 shows the weight loss of the participants who completed the treatment. During the first weeks a 78.6% of the participants dropped out. However, the participants who completed the program showed high adherence rates, completing more of the 80% of the website activities. Finally, quality of life improved for the six participants who completed the program (n = 6, z = −2,214, p = 0,027).
4 Discussion The main objective of this pilot study was to prove the proper functioning of the website and mobile application designed to En_Línea intervention program for people with overweight and obesity. In general, the participants who completed the program were satisfied with the website, describing it as simple and accessible. However, the contents were described as
276
C. Varela et al.
Weight Evolution 110 100 90 80 70 60
Participant 1
Participant 2
Participant 3
Participant 4
Participant 5
Participant 6
Fig. 1. Weight evolution during En_Línea treatment and follow-ups
dense. Participants agreed that the program was helpful and were satisfied with the achievement of their objectives. However, program was too long and required too much time. Therefore, this could be one of the reasons to explain the dropout rates. For that reason, the content of the program will be deeply reviewed to shorten it but maintaining the achievement of the main objectives of the program. In addition, the content will be delivered using more multimedia resources like videos, graphics, or images. This intervention was based on the LEARN program with high rates of effectiveness. However, this kind of programs presented two relevant limitations, there were no longterm results and the adherence rates [5, 6, 10]. The adherence results of this study are in line with previous studies, being the highest rates of dropouts during the first weeks of the treatment. The main reasons for these results according to participants statements were low motivation and job incompatibility. For those reasons, the program will be modified considering the limitations identified during this pilot study and, before conducting other studies. The online format is an aspect to consider for the adherence. It has been observed that the intensive contact with a health professional are variables of good prognosis in lifestyle change programs for people with overweight and obesity [28]. However, the participants
En_Línea. An Online Treatment to Change Lifestyle
277
were satisfied with this format, considering that there was a health professional providing weekly feedback. Regarding long-term results, in this study it was observed a maintenance of weight loss for the participants who completed the program. Also, an improvement of quality of life was observed between pre- and post-intervention. Considering these results, the most significant improvement should be at the beginning of the program to increase the adherence rates. The inclusion of the motivation interview could be considered because of the positive results showed for lifestyle change [29]. Other methods to consider in future investigations could be use a machine learning algorithm or artificial intelligence to try to understand better the behavior of the participants. In conclusion, the En_Línea program seems a tool with a great potential but some changes should be considered to ensure more effectiveness in future investigations.
References 1. Di Cesare, M., et al.: Trends in adult body-mass index in 200 countries from 1975 to 2014: a pooled analysis of 1698 population-based measurement studies with 19.2 million participants. The Lancet 387, 1277–1396 (2016) 2. World Health Organization. Obesity and overweight. Disponible en https://www.who.int/en/ news-room/fact-sheets/detail/obesity-and-overweight (2019) 3. James, P.: Obesity: the worldwide epidemic. Clin. Dermatol. 22, 276–280 (2004) 4. Rajan, T.M., Menon, V.: Psychiatric disorders and obesity: a review of association studies. J. Postgrad. Med. 63, 182–190 (2017) 5. Wadden, T.A., Butryn, M.L., Hong, P.S., Tsai, A.G.: Behavioral treatment of obesity in patients encountered in primary care settings: a systematic review. J. Am. Med. Assoc. 312, 1779–1791 (2014) 6. Booth, H.P., Prevost, T.A., Wright, A.J., Guildford, M.C.: Effectiveness of behavioural weight loss interventions delivered in a primary care setting: a systematic review and meta-analysis. Fam. Pract. 31, 643–653 (2014) 7. Oosterveen, E., Tzelepis, F., Ashton, L., Hutchesson, M.J.: A systemtic review of eHealth behavioral interventions targeting smoking, nutrition, alcohol, physical activity and/or obesity for young adults. Prev. Med. 99, 197–206 (2017) 8. Varela, C., Ruiz, J., Andrés, A., Roy, R., Fusté, A., Saldaña, C.: Advantages and disadvantages of using the website SurveyMonkey in a real study: psychopathological profile in people with normal-weight, overweight and obesity in a community sample. E-Methodology 3, 77–89 (2016) 9. Varela, C., Saldaña, C.: En_Línea. An online treatment to change lifestyle in overweight and obesity. BMC Public Health 19, 1552 (2019) 10. Brownell, K.D.: The LEARN program for weight management. American Health Publishing Company, Dallas (2000) 11. Saldaña, C.: Entrevista para la evaluación de comportamiento alimentario y actividad física en niños y adolescentes, versión padres. Proyecto E-TONA, Barcelona (2010) 12. Cloninger, C.: The Temperament and Character Inventory-Revised. Center for Psychobiology of Personality. Washington University, St. Louis (1999) 13. Gutierrez-Zotes, J.A., et al.: Inventario del Temperamento y el Carácter-Revisado (TCI-R). Baremación y datos normativos en una muestra de población general. Actas Españolas de Psiquiatría, vol. 32, pp. 8–15 (2004)
278
C. Varela et al.
14. Henderson, M., Freeman, C.P.: Self-rating scale for bulimia: the BITE. Br. J. Psychol. 150, 18–24 (1987) 15. Vaz, F.J., Peñas, E.M.: Differential study of the complete and subclinical presentations of bulimia nervosa. Actas Esp. Psiquiatr. 27, 359–365 (1999) 16. Andrés, A., Saldaña, C., Gómez-Benito, J.: The transtheoretical model in weight management: validation of the processes of change questionnaire. Obesity Facts (2011) 17. Allison, K.C., et al.: The Night Eating Questionnaire (NEQ): pyschometric properties of a measure of severity of the night eating syndrome. Eat. Behav. 9, 62–72 (2008) 18. Moizé, V., Gluck, M.E., Torres, F., Andreu, A., Vidal, J., Allison, K.: Transcultural adaptation of the Night Eating Questionnaire (NEQ) for its use in the Spanish population. Eat. Behav. 13, 260–263 (2012) 19. Alonso, J., Prieto, L., La Antó, J.M.: versión española del SF-36 Health Survey (Cuestionario de Salud SF-36): un instrumento para la medida de los resultados clínicos. Med. Clin. 104, 771–776 (1995) 20. Ware, J.E., Snow, K.K., Kosinski, M., Gandek, B.: SF-36 health survey: manual and interpretation guide. Nimrod Press, Boston, Massachusetts (1993) 21. Cano-García, F., Rodríguez-Franco, L.: Adaptación española del Inventario de Estrategias de Afrontamiento. Actas Esp. Psiquiatr. 35(1), 29–38 (2007) 22. Tobin, D., Holroyd, K., Reynols, R., Kigal, J.: The hierarchical factor structure of Coping Strategies Inventory. Cogn. Ther. Res. 13, 343–361 (1989) 23. Cebolla, A., Barrada, J.R., van Strien, T., Oliver, E., Baños, R.: Validation of the Dutch Eating Behavior Questionnaire (DEBQ) in a sample of Spanish women. Appetite 73, 58–64 (2014) 24. Van Strien, T., Frijters, J.E., Bergers, G.P., Defares, P.: The Dutch Eating Behavior Questionnaire (DEBQ) for assessment of restrained, emotional and external eating behavior. Arch. Pharmacal Res. 5, 295–315 (1986) 25. Bados, A., Solanas, A., Andrés, R.: Psychometric properties of the Spanish version of depression, anxiety and stress scales (DASS). Psicothema 17, 679–683 (2005) 26. Lovibond, S.H., Lovibond, P.F.: Manual for the Depression Anxiety Stress Scales (DASS). Psychology Fundation Monograph, New South Wales (1993) 27. Ledesma, R., Macbeth, G., Cortada de Kohan, N.: Tamaño del efecto: revisión teórica y aplicaciones con el sistema estadístico ViSta. Revista Latinoamericana de Psicología 40, 425–439 (2008) 28. Sherrington, A., Newham, J.J., Bell, R., Adamson, A., Mccoll, E., Araujo-Soares, V.: Systematic review and meta-analysis of internet-delivered interventions providing personalized feedback for weight loss in overweight and obese adults. Obes. Rev. 17, 541–551 (2016) 29. Gálvez Espinoza, P., Gómez San Carlos, N., Nicoletti Rojas, D., Cerda Rioseco, R.: Is the individual motivational interviewing effective in overweight and obesity treatment? A systematic review. Atencion Primaria 51, 548–56 (2018)
Use of Eye-Tracking Methodology for Learning in College Students: Systematic Review of Underlying Cognitive Processes Irene González-Diez1(B) , Carmen Varela1,2 and María Consuelo Sáiz-Manzanares1
,
1 University of Burgos, Faculty of Health Sciences, Paseo de Comendadores S/N Burgos, 09001
Burgos, Spain [email protected] 2 University of Barcelona, Faculty of Psychology, Gran Via de Les Corts Catalanes, 585, 08007 Barcelona, Spain
Abstract. Learning processes and outcomes have been one of the main focuses of interest in educational research. The main objective of this systematic review is to provide a comprehensive and integrative view of the application of eye tracking to identify cognitive processes involved in college students’ learning process. The bibliographic sources for this review were databases Web of Science, PsycINFO and Scopus. After carrying out the screening process by two independent reviewers, 17 articles were included according to prespecified inclusion criteria. In general, it was observed a deeper learning and comprehension of the concepts when visual or picture stimulus were presented compared to text cues. This systematic review is a rapid updated of the processes involved in learning in college students, providing a comprehensive view of the current situation to design innovative educational programs. Keywords: Eye tracking · college students · learning · cognitive processes
1 Introduction Learning processes and outcomes have been one of the main focuses of interest in educational research [1]. For that reason, the researchers interested in the area have developed studies to improve the techniques to assess learning processes. The main objective was to evolve from traditional interview based on think-aloud protocol [1, 2] to more innovative proposals, where the assessment implies more accuracy. To achieve this objective, new technologies have been applied in the learning field. Concretely, eyemovement applications have been shown positive results in this area [1, 3]. Moreover, the use of this kind of applications is according to the current learning methodology, where the students have to use virtual classrooms and multimedia materials jointly the face-to-face traditional environment [4].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 279–293, 2023. https://doi.org/10.1007/978-3-031-42519-6_27
280
I. González-Diez et al.
Eye tracking focuses mainly on attention processes, this technique identifies where people are looking and for how long. The resulting patterns of eye movements could provide information about more complex processes [4], for example the depth of processing information [5] and the difficulties during this process [6]. The eye tracking metrics to assess are mainly based on fixation and saccades. Fixation refers to the stability of the eye at one point and saccades the speed of eye movements between fixations, showing the change in focus of attention [4, 7]. Therefore, the study of cognitive processes using new technologies like eye tracking could promote the designing of innovative learning strategies, more focused on the actual students’ interests. Also, the eye tracking results provide information about the strengths of the participants during the learning process and their difficulties. The research in this area is constantly updated. For that reason, a systematic review providing a comprehensive view of the achievements in the last decade could be useful to know the progress made in this area. Moreover, the gaps in this field could be easily identified to orient the future lines of investigation. The main objective of this systematic review is to provide a comprehensive and integrative view of the application of eye tracking to identify cognitive processes involved in college students’ learning process.
2 Method The bibliographic sources for this review were databases Web of Science, PsycINFO and Scopus. The time span was set from 2013 to 2023, which includes the last 10 years. In the first stage, three sets of keywords were organized for searches using the Boolean operator ‘AND’, including “eye move*”, “eye track”, “eye tracking”, “gaze move”; learn, learning; and college, “college student*”, university*. Subsequently, these three sets of keywords were combined using the Boolean operator ‘OR’. The results were 430 articles: 226 from Web of Science, 58 from PsycINFO and 146 from Scopus. In the second stage, the results were exported to Rayyan meta-analysis software. The search for duplicates was requested, eliminating a total of 164 articles that appeared duplicates in any of the selected databases. Thirdly, two independent reviewers manually and systematically screened the article titles and abstracts and confirmed that the selected articles accomplished the following inclusion criteria: (1) have as their participants only college students excluding animals or specific populations like students with dyslexia or autism; (2) used eye tracking devices, and (3) study underlying cognitive processes of learning. Disagreements were solved by discussion. Articles that were not complete in open access and those in languages other than English and Spanish were also removed, finding several in Chinese. Finally, 17 papers were identified as the research sample pool of this review.
3 Results After conducting the search 430 articles were identified and 164 duplicates were eliminated. The 266 remaining articles were screened by two independent reviewers, removing 186 at the title and abstract phase for not meeting the prespecified inclusion criteria. The same independent reviewers assessed by full text reading the remaining 80 articles, including 17 at this systematic review. Figure 1 presents the process followed.
Use of Eye-Tracking Methodology for Learning in College Students
281
The main characteristics of the included studies are summarized in table 1. The included articles were published from 2014 to 2022 and carried out in Germany (n = 5), China (n = 4), United States of America (n = 2), Taiwan (n = 2), Colombia (n = 1), Turkey (n = 1), South Korea (n = 1) and Italy (n = 1). The total number of participants was 1403 college students, in 12 included studies women represented more of 50% the sample. The 66.6% of the included studies (n = 12) were focused on studying the attention and its effects during learning process. In general, it was observed a deeper learning and comprehension of the concepts when visual or picture stimulus were presented compared to text cues. Table 1 presents in an accuracy way the associations between the different underlying cognitive processes and the learning process, assessed using eye-tracking methods.
Included
Screening
Identification
Identification of studies via databases and registers
Studies identified from*: Databases (n = 3) Registers (n = 430)
Studies removed before screening: Duplicate records removed (n = 164)
Studies screened by tittle and abstract (n = 266)
Studies excluded** (n = 186)
Studies assessed by full text (n = 80)
Studies excluded: (n = 63)
Studies included in review (n = 17)
Fig. 1. PRISMA flow chart of the selection processes for the articles included following the PRISMA 2020 design from Page et al. (2921)
282
I. González-Diez et al. Table 1. Summary of included articles
Authors, year and country
Objective
Sample size N (% Females)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
Ariasi & Outline the link Mason (2014) between online Italy processes and offline outcomes in these learners
63 (Females = 60.3)
M= 23.8, SD = 2.8
Working memory
Fixation time (first pass, look-back and look-from)
Learning from non-refutation test was predicted only by working memory capacity and learning from refutation text indicate that the quality, not quantity, was related to it
Bacca-Acosta et al., (2022) Colombia
41 (Females = 63.4)
Range 18–26
Attention
Fixations frequency Fixations duration Revision
Students focused more their attention on the scaffolds in the form of dynamic texts and on-site indicators for virtual reality environments The results showed that scaffolds in virtual reality used to be closer to the objects that students interact with, are more effective for increasing their learning performance
Analyze the role of scaffolders in attention process in a virtual reality learning environment
(continued)
Use of Eye-Tracking Methodology for Learning in College Students
283
Table 1. (continued) Authors, year and country
Objective
Cheng & Yang (2014) Taiwan
Explore the cognitive activities during spatial problem solving and probe the relationship between spatial ability and science concept learning
Cheng et al. (2014) Taiwan
Analyze the association between eye movements with retrieval process and the accuracy of the response
Sample size N (% Females)
20 (Females = 45)
63 (Females = 38.1)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
Range 21–24
Problem solving strategies
Total fixation duration Average Fixation Duration Areas of Interest Number of Regressions
Rotation angles as well as levels of plane invisibility inserted significant effects on the online processes and performances of the spatial problem solving. The accuracy performance was correlated with eye movement patterns. It was found that concept performance was not correlated with the rotation test performance but associated with spatial memory and problem-solving strategies
Range 18–22
Retrieval process
Fixation duration Saccade distance Re-reading time
The mean fixation time has the better prediction of the accuracy of the response in the text presentation. The lecture time is the second-best predictor of accuracy in the response The mean saccade distance presented a negative power to predict the response of physics concepts in the picture’s presentation
(continued)
284
I. González-Diez et al. Table 1. (continued)
Authors, year and country
Objective
Sample size N (% Females)
Fan et al. (2022) China
Analyze the association between the individual’s pattern during reading and reading comprehension
Koc-Januchta et al. (2017) Germany
Examine the differences between visualizers and verbalizers in the way they gaze at pictures and texts while learning
32 (Females = 68)
Kokoç, Ilgaz & Altun (2020) Turkey
Explore the effect of video lecture types and learners’ sustained attention levels on their learning performance in an e-learning environment
201 (Females = 43.7)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
Comprehension
Fixation Duration Time between fixations Angle between fixations
The individual reading cognitive pattern captured by the eye-tracker can predict the level of reading comprehension through advanced deep learning models
M= 24.63, SD = 2.31
Attention
Areas of Interes (AOI): entry time, dwell time (durations from all fixations and saccades that hit the AOI in ms) and transitions
The group of visualizers achieved better result on comprehension test. Verbalizers tended to enter non-informative, irrelevant areas of pictures sooner than visualizers. Visualizers spent significantly more time inspecting pictures than verbalizers, while verbalizers spent more time inspecting texts
M= 20.92, Range = 19–27
Sustained Attention
Fixation count Fixation duration
The main effect of learners’ sustained attention levels and video lecture types on learning performance were significant. The use of picture-in-picture types of video lectures led higher learning performance scores regardless of level of sustained attention
80 (Females = 52.5)
(continued)
Use of Eye-Tracking Methodology for Learning in College Students
285
Table 1. (continued) Authors, year and country
Objective
Kühl et al. (2018) Germany
Explore the underlying cognitive processes of text information and spatial abilities in learning
Sample size N (% Females)
198 (Females = 75.8)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
M= 22.26, SD = 2.83
Fixation Duration Time between fixations Angle between fixations
An analysis of the eye tracking data revealed that this beneficial effect of animations over static pictures was mediated by a pupillometry measure that is supposed to reflect effortful cognitive processing. Spatial abilities acted as a compensator in learning with the two visualization formats: The advantage of animations was particularly evident for learners with low spatial abilities, but not for learners with high spatial abilities. These results indicate that the informational advantage of animations over static pictures cannot easily be compensated through text information, but by learners’ spatial abilities
Effortful cognitive processing
(continued)
286
I. González-Diez et al. Table 1. (continued)
Authors, year and country
Objective
Li et al. (2019) China
Analyze gestures by pedagogical agents and the association with attentional processes
Sample size N (% Females)
123 (Females = 85.4)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
Mean Age = 20.5, SD = 2.3
Fixation time Fixation count Average Fixation First Fixation Duration Revisits
An analysis of students’ eye movements during learning showed that students in the specific-pointing group paid more attention to task-related elements than did students in the other groups (as indicated by fixation time and fixation count on the target area of interest). Students in the specific-pointing group also performed better than the other groups on retention and transfer tests administered immediately after the lesson and after a 1-week delay
Attention
(continued)
Use of Eye-Tracking Methodology for Learning in College Students
287
Table 1. (continued) Authors, year and country
Objective
Moon & Ryu (2020) South Korea
Analyze the association between social and cognitive cues with learning comprehension
Pardi et. al, (2022) Germany
Introduce a new methodological approach to capture and analyze the processing and use of text, images, and video content during free web search-based learning
Sample size N (% Females)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
64 (Female = 70.3)
M= 22.55, SD = 2.43
Attention Comprehension
Fixation time Fixation counts Dwell time on the areas of interest
This study found that students’ different visual-attention patterns appeared in pictorial information processing. In terms of pictorial information processing, the study finding implies that whereas social cues caused visual distractions and lowered learning comprehension, cognitive cues as visual cues helped learners to integrate pictorial information via visuospatial clues
108 (Females = 85.2)
M= 22.81, SD = 2.83
Attention
Fixation time The participants directed their attention significantly longer to text than to video or image resources
(continued)
288
I. González-Diez et al. Table 1. (continued)
Authors, year and country
Objective
Peng et. al, (2021) China
Explore the influence of the visual aesthetics of positive emotional interface design on students’ cognitive processes, emotional valences, learning outcomes, and subjective experience
Ponce & Mayer (2014) USA
Examine how study aids (highlighting and graphic organizers) affects cognitive processing during learning
Wang et al., (2020) USA
Analyze the association between attetion to the instructor video and learning experience
Sample size N (% Females)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
M= 22.48, SD = 1.54
Attention
Areas of Interest (AOI): fixation duration and fixation count
The positive emotional design group invested higher cognitive effort, put more attentional focus in the relevant knowledge content module, and achieved better learning performance
130 (Females = 83)
M= 19.45, SD = 1.26
Attention
Fixation time Fixations in the regions Areas of interest (AOIs) Saccades between AOIs (up-down and left-right)
Eye tracking measures showed that highlighting primed the cognitive process of selecting. In contrast, eye tracking measures showed that graphic organizers primed the cognitive processes of selecting, organizing, and integrating
60 (Females = 65)
M= 18.36, SD = 0.66
Attention distribution
Percentage of fixations Fixation length
The learners distributed a high level of overt visual attention to the instructor and the increased attention to the instructor was found to positively predict learner’s satisfaction levels with the videos
83 (Females = 51.8)
(continued)
Use of Eye-Tracking Methodology for Learning in College Students
289
Table 1. (continued) Authors, year and country
Objective
Yang et al. (2021) China
Examine whether the achievement motivation moderate the effects of instructional design on students’ attention allocation and learning performance
Sample size N (% Females)
63 (Females = 90.5)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
M= 20.68, SD = 1.64
Percentage of dwell time First Fixation Time
The results showed that after controlling for prior knowledge, students with high achievement motivation benefitted more from the prequestions than students with low achievement motivation. Among students with high achievement motivation, there was longer fixation duration to the learning materials and better transfer in the pre-questions condition than in the no-questions condition, but these differences based on video type were not apparent among students with low achievement
Attention allocation
(continued)
290
I. González-Diez et al. Table 1. (continued)
Authors, year and country
Objective
Zander et al., (2017) Germany
Analyze the association between personalized learning and the allocation of attention
Zander et al. (2015) Germany
Analyze the association between personalized learning and the allocation of attention
Sample size N (% Females)
Age Cognitive (M, SD process or Range)
Metrics
Conclusions
37 (Females = 43.2)
M= 25.03, SD = 3.47
Attention allocation
Fixation rate Fixation duration Average fixation Fixation transitions
Analysis revealed comparable results to the few existing previous studies, indicating an inverted personalization effect for potentially aversive learning material. This effect was specifically revealed in regard to decreased average fixation duration and the number of fixations exclusively on the images in the personalized compared to the formal version. These results can be seen as indicators for an inverted effect of personalization on the level of visual attention
37 (Females = 43.2)
M= 25.03, SD = 3.47
Attention
Fixation rate Fixation duration Average fixation Fixation transitions
Eye-tracking data revealed a higher reading depth for the main picture areas of interest in the personalized condition. Additionally, participants found the personalized version more appealing and inviting
Note. M = Mean, SD = Standard Deviation
Use of Eye-Tracking Methodology for Learning in College Students
291
4 Discussion As can be seen most of the articles analyzed in this review, they focus on the attentional process. It is the first cognitive process involved in learning, regardless of the teaching modality (e-learning, online, video). This may be because the most used metrics in eye-tracking research are fixations and saccades [4, 7], as can also be seen in Table 1. Fixations are used as the principal metrics in most of the studies. The most used fixation parameters are duration and count, followed by the average. Other metrics such as Areas of Interest (AOI) are used in combination with fixations to analyze the attentional process. This is especially interesting when variables such as cognitive and social cues [17], differences between pictorial or text stimuli [13] or interface designs [19] are included in the study. When the research objectives involve more complex cognitive processes (task solving, cognitive image processing, retrieval or comprehension) [4–6] it is common to find other metrics such as fixations in Areas of Interest (AOI), frequency of fixations and saccades, duration of fixations and saccades, or time between fixations. A total of 1403 subjects participated in the studies (M = 82.53, SD = 53.76), of which 904 were women (M = 53.18, SD = 40.63). Regarding the sample of each study, they range from 20 subjects [10] to 201 [14], although 70% of the studies found in the systematic review have fewer than 100 subjects. Regarding gender, in most of the selected studies there are more women than men and in those that are not the majority they are close to half, except for one study in which only 38.1% are women [11]. From a gender perspective, it is interesting since equality is not common when we talk about online environments or concepts such as spatial vision, usually more associated with the male gender. In future studies, it may be interesting to investigate the possible gender differences of the metrics used in the eye-tracking methodology, seeking whether there are gender differences and whether they reach higher education. Regarding the origin of the subjects that make up the sample, we can observe a greater interest in Asia (China [12, 16, 19, 22], Taiwan [10, 11], South Korea [17] and Turkey [14]), followed by Europe (Germany [13, 15, 18, 23, 24] and Italy [8]), North America (USA [20, 21]) and South America (Colombia [9]). The country in which the most experiments have been carried out is Germany [13, 15, 18, 23, 24], where we found 5 of the 17 articles included in this review. The implications of these data are the high internationalization of interest in the use of eye-tracking and its implication in higher education. Most of the studies have exploratory objectives: analyze associations or explore cognitive processes. In the review, only one replication was found [23], which is another interesting future line to consolidate the results of eye-tracking research. Nor have metaanalyses been found, which would be very interesting to strengthen and consolidate the results that are being obtained internationally. In future studies it would be interesting both to confirm the results obtained and to investigate these processes with objectives that allow improving the understanding of these cognitive processes and the different variables that promote student learning in synchronous and asynchronous teachinglearning environments. To conclude, it is also interesting to mention the annual progress of the use of Eyetracking Methodology for Learning in College students, which will define future
292
I. González-Diez et al.
lines of research. The results shown in Table 1 show a consolidation of this methodology from 2020, influenced by the health crisis caused by COVID-19 [7] and continuing in the wake marked by the increase in interest in new technologies by students [1]. In recent years there has been a greater interest in the eye-tracking methodology applied to virtual platforms and e-learning [4, 9, 14, 22]. Future studies should follow this line of applied research that allows the knowledge and improvements obtained through research to be transferred to pedagogical practices in online and offline university environments. Acknowledgements. This study has been funded by two research projects: European eEarlyCareT project No. 2021–1-ES01-KA220-SCH-000032661 and SmartLearnUni R&D&I project No. PID2020-117111RB-I00.
References 1. Lai, M., et al.: A review of using eye-tracking technology in exploring learning from 2000 to 2012. Educ. Res. Rev. 10, 90–115 (2013) 2. Mintzes, J.J., Wandersee, J.H., Novak, J.D.: Assessing science understanding. Academic Press, San Diego (1999) 3. Skaramagkas, V., et al.: Review of eye tracking metrics involved in emotional and cognitive processes. IEEE Rev. Biomed. Eng. 16, 260–277 (2023) 4. Alemdag, E., Cagiltay, K.: A systemtic review of eye tracking research on multimedia learning. Comput. Educ. 125, 413–428 (2018) 5. Rayner, K.: Eye movements in reading and processing information: 20 years of research. Psychol. Bull. 124(3), 372–422 (1998) 6. Jacob, R.J., Karn, K.S.: Eye-tracking in human-computer interaction and usability research: ready to deliver the promeses. In: Hyona, J.R., Radach, H.D. (eds.). The mind’s eyes: Cognitive and applied aspects of eye movements. Elsevier Science, Oxford (2003) 7. Sáiz-Manzanares, M.C., Marticorena-Sánchez, R., Rodríguez-Arribas, S., EscolarLlamazares, M.C., Alonso-Martínez, L.: Estudio de los Procesos Cognitivos y Metacognitivos: Utilización de la Tecnología Eye Tracking Ventajas e Inconvenientes. Investigación y Práctica en Contextos Clínicos y de la Salud. Dykinson, Madrid (2022) 8. Ariasi, N., Mason, L.: From covert processes to overt outcomes of refutationtext reading: the interplay of science text structure and working memory capacity through eye fixations. Int. J. Sci. Math. Educ. 12, 493–523 (2014) 9. Bacca-Acosta, J., Tejada, J., Fabregat, R., Kinshuk, J.G.: Scaffolding in immersive virtual reality environments for learning English: an eye tracking study. Educ. Technol. Res. Dev. 70(1), 339–362 (2021). https://doi.org/10.1007/s11423-021-10068-7 10. Chen, Y., Yang, F.: Probing the relationship between process of spatial problems solving and science learning: an eye tracking approach. Int. J. Sci. Math. Educ. 12, 579–603 (2014) 11. Chen, S., She, H., Chuang, M., Wu, J., Tsai, J., Jung, T.: Eye movements predict students’ computer-based assessment performance of physics concepts in different presentation modalities. Comput. Educ. 74, 61–72 (2014) 12. Fan, K., et al.: Predicting the reader’s English level from reading fixation patterns using the Siamese convolutional neural network. IEEE Trans. Neural Syst. Rehabil. Eng. 30, 1071–1080 (2022)
Use of Eye-Tracking Methodology for Learning in College Students
293
13. Koc-Januchta, M., Höffler, T., Thoma, G., Prechtl, H., Leutner, D.: Visualizers versus verbalizers: effects of cognitive style on learning with texts and pictures–an eye-tracking study. Comput. Hum. Behav. 68, 170–179 (2017) 14. Kokoç, M., Ilgaz, H., Altun, A.: Effects of sustained attention and video lecture types on learning performances. Educ. Technol. Res. Dev. 68(6), 3015–3039 (2020). https://doi.org/ 10.1007/s11423-020-09829-7 15. Kühl, T., Stebner, F., Navratil, S., Fehringer, B., Münzer, S.: Text information and spatial abilities in learning with different visualizations formats. J. Educ. Psychol. 110(4), 561 (2018) 16. Li, W., Wang, F., Mayer, R.E., Liu, H.: Getting the point: which kinds of gestures by pedagogical agents improve multimedia learning? J. Educ. Psychol. 111(8), 1382 (2019) 17. Moon, J., Ryu, J.: The effects of social and cognitive cues on learning comprehension, eyegaze pattern, and cognitive load in video instruction. J. Comput. High. Educ. 33, 39–63 (2021) 18. Pardi, G., Hienert, D., Kammerer, Y.: Examining the use of text and video resources during web-search based learning—a new methodological approach. New Rev. Hypermedia Multimedia 28(1–2), 39–67 (2022) 19. Peng, X., Xu, Q., Chen, Y., Zhou, C., Ge, Y., Li, N.: An eye tracking study: positive emotional interface design facilitates learning outcomes in multimedia learning? Int. J. Educ. Technol. High. Educ. 18(1), 1–18 (2021) 20. Ponce, H., Mayer, R.: Qualitatively different cognitive processing during online reading primed by different study activities. Comput. Hum. Behav. 30, 121–130 (2014) 21. Wang, J., Antonenko, P., Dawson, K.: Does visual attention to the instructor in online video affect learning and learner perceptions? An eye-tracking analysis. Computers & Education 146, 103779 (2022) 22. Yang, J., Zhang, Y., Pi, Z., Xie, Y.: Students’ achievement motivation moderates the effects of interpolated pre-questions on attention and learning from video lectures. Learn. Individ. Differ. 91, 102055 (2021) 23. Zander, S., Wetzel, S., Kühl, T., Bertel, S.: Underlying Procesess of an Inverted Personalization Effect in Multimedia Learning: An Eye-Tracking Study. Front. Psychol. 8, 2202 (2017) 24. Zander, S., Reichelt, M., Wetzel, S., Kämmerer, F., Bertel, S.: Does personalisation promote learners’ attention? An eye-tracking study. Frontline Learning Research 3, 1–13 (2015)
Using Machine Learning Techniques in eEarlyCare Precision Diagnosis and Intervention in 0–6 years Old María Consuelo Sáiz-Manzanares(B) Burgos University, Facultad de Ciencias de La Salud, Paseo de Comendadores s/n Burgos, 09001 Burgos, Spain [email protected]
Abstract. The use of Machine Learning techniques and technological resources that facilitate diagnosis and intervention at early ages (0–6 years) will facilitate the development of both processes from the point of view of accuracy. This paper analyses the most useful Machine Learning techniques to be applied to diagnosis and therapeutic intervention in the field of early care. It also describes the development of a web application, eEarlyCare, which includes the recording and interpretation of the results through Learning Analytics techniques of the observation of different early development problems. In addition, an evaluation of the usability of this web application is carried out as part of a training project aimed at updating the technological and data analysis strategies of early intervention professionals. The results support the use of this type of computer applications in which learning analytics and visualisation of results techniques are included. The proposals for improvement focus on the use of technological resources similar to intelligent voice assistants that regulate the work of the therapy professional. Further studies will address these proposals for improvement. Keywords: Machine Learning · Artificial Intelligence · early care professionals training
1 Machine Learning Techniques Applied to Improve Diagnosis and Intervention at an Early Age 1.1 Types of Techniques and Implication for Diagnosis and Intervention In the Machine Learning techniques, we can differentiate between supervised and unsupervised techniques. Within the former, we can differentiate between supervised classification Machine Learning techniques and within them we can differentiate between Support Vector Machine (SVM), Discriminant Analysis, Naïve Bayes and Nearest Neighbour algorithms (k-nn). Likewise, within the classification techniques, we can differentiate the Regression techniques that facilitate the prediction of the weight of different variables over others. Also, within the prediction techniques are the algorithms of SVM, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 294–305, 2023. https://doi.org/10.1007/978-3-031-42519-6_28
Using Machine Learning Techniques
295
Ensemble Methods, Decision Trees and Neural Networks. On the other hand, unsupervised learning techniques can be differentiated, specifically clustering techniques and within them, k-means, k-means ++ algorithms, etc. can be differentiated. [1]. All these analysis techniques will facilitate both diagnostic accuracy and therapeutic intervention. A description of the meaning of these techniques and algorithms and their potential usefulness in the field of early intervention is presented in Table 1. Table 1. Supervised and unsupervised learning techniques in the application to diagnosis and therapeutic intervention. Supervised learning techniques
Meaning
Application to diagnosis
Application to therapeutic intervention
Support Vector Machine (SVM)
It is based on Vapnik’s statistical learning theory [2]. It searches for the best separation boundary between two classes. The instances of each separation class are the support vectors
It can help in making differential diagnoses based on a number of pre-existing characteristics
It can help to choose the most appropriate intervention for each patient within each problem area
Discriminant analysis
Its purpose is to describe whether there are significant differences in x groups of objects for which discriminating variables are observed. The means of the classifier y variables in the x groups are compared and described
This technique helps to detect the variables that carry the most weight and can therefore help to make a differential diagnosis
This technique can help to differentiate the most appropriate therapeutic intervention for a given problem
Nearest Neighbor This method is used (k-nn) to estimate a density function of a series of x variables for each class. It can be both a prediction and a classification method
It can be used to classify determinants into diagnostic groups
It can be used to guide a type of intervention depending on the type and degree of impairment of a diagnosis (continued)
296
M. C. Sáiz-Manzanares Table 1. (continued)
Supervised learning techniques
Meaning
Application to diagnosis
Application to therapeutic intervention
Linear Regression
A mathematical model used to approximate the relationship between two or more variables, one of which is considered independent and the others dependent It can give information on the prediction percentage of the dependent variable and whether the independent variables are redundant (Tolerance and Variance Inflation Value values)
It can facilitate the percentage of prediction of one variable over others and help to make an accurate differential diagnosis
It can help predict the most appropriate therapeutic intervention for each patient
Decision trees
The algorithm is used for hierarchical knowledge of the assignment process to a class. The criterion for the selection of attributes and values is the optimisation of a function [3]. They are used for the construction of metaclassifiers
Its use makes it easier to know the most relevant variables in order to make a precise diagnosis
Their use can help to determine, based on the classifiers found, the most accurate type of therapeutic intervention
(continued)
Using Machine Learning Techniques
297
Table 1. (continued) Supervised learning techniques
Meaning
Application to diagnosis
Application to therapeutic intervention
Neural networks
They operate in a way that mimics human processing. The input information is processed by establishing neural networks in different layers and finally into an output information (Multi-layer Perceptron Neural Networks, Radial-Based Neural Networks)
They can help in the prediction of a type of diagnosis and are therefore of value in making a differential diagnosis
They can help predict the most effective type of treatment for a given condition
No supervised learning Techniques
Meaning
Application to diagnosis
Application to therapeutic intervention
This algorithm facilitates clustering without a given assignment variable
This algorithm can identify grouping characteristics without a prior classification variable. This will make it easier to identify common characteristics among patients, which will help to identify types within a given diagnosis
Their use can assist in the assignment to a particular treatment type based on the groupings found
Clustering k-means
1.2 Examples of the Application of Automatic Learning Techniques in the Field of Psychology The use of Machine Learning techniques is being applied more and more frequently in the field of psychology. Specifically, in the field of precision diagnosis. Exactly, research in the last two years has focused on the field of mental illnesses. For example, neural network algorithms have been used to perform differential diagnosis between compulsive disorder, bipolar disorder and schizophrenia [4]. Classification algorithms are
298
M. C. Sáiz-Manzanares
also being used to make cognitive diagnoses in the field of learning in order to implement personalised learning procedures [5]. Multilevel classifiers are also being used to adjust diagnosis in psychiatry using psychological, behavioural, social and cultural data of patients [6]. Also, Machine Learning techniques are being used to predict some behaviours related to the diagnosis of schizophrenia or bipolar disorder [7]. On the other hand, Machine Learning is being used in this environment to determine precision therapeutic treatments in psychiatry [8]. An example of the process of collection, processing and application of Machine Learning techniques can be seen in Fig. 1.
Fig. 1. Processing data and Machine Learning techniques application in Heath environments.
1.3 Use of the Precision Diagnosis and Treatment Model in the Field of Early Childhood Care The work approach described in the previous points has been applied in the field of early care aimed at children with a developmental age of 0–6 years. Specifically, a web application was designed to collect data related to diagnostic characteristics, chronological age and developmental age. Three research studies have been carried out: a) Study 1 [9]. This paper discusses how to include observational data on functional development in children 0–6 years old in a web application or a desktop application. It also facilitates simple learning analytics on functional development. It would refer to a first step of recording, cleaning and processing the data. b) Study 2 [10]. This study refers to the use of a web application, eEarlyCare, for recording data on the functional development of children aged 0–6 years. It also reports on how the database can be extracted and imported into data mining programs such as Orange [12]. Specifically, k-nn algorithms, cluster analysis, distance analysis and dendrograms were applied in this study. The use of supervised and unsupervised Machine Learning techniques facilitated the understanding of the relationship of prior diagnosis with functional development and the prediction of the most relevant functional development variables in the overall performance outcomes. c) Study 3 [11]: In this research work, the eEarlyCare web application is used, the database is extracted and imported into programmes that apply Machine Learning algorithms. The objective in this case was to determine the most appropriate therapeutic intervention programme for each user. Supervised learning algorithms (Multiple Linear Regression) and Unsupervised learning algorithms (k-means clustering, dendrograms and hierarchical clustering) were used.
Using Machine Learning Techniques
299
In accordance with the aforementioned state of the art, the aim of this study was to analyse the usability of the eEarlyCare web application by early care professionals who are undergoing a training programme to update their skills in the use of intelligent resources and automatic learning techniques applied to diagnosis and intervention with children with functional developmental impairments in the 0–6 age group.
2 Method 2.1 Sample We worked with a sample of 16 early care professionals (5 special education teachers, 4 physiotherapists, 5 speech therapists, 2 occupational therapists). The age interval was 45– 49 (31.25%), 30–34 (18.75%), 35–39 (18.75%), 40–44 (18.75%) and over 50 (12.5%). Respecting gender 93.75% were women and 6.25% men. This professionals working in a centre for people with motor impairments and in an early care unit, both attached to the ASPACE Salamanca centre. 2.2 Instruments a) The eEarlyCare web application was used. This application was developed by Doctors Sáiz-Manzanares, Marticorena-Sánchez and Arnaiz-González with funding from three concept tests [9–11]. Registration rights were ceded to the University of Burgos. The eEarlyCare application is presented in a bilingual format (Spanish and English) and includes two access roles, the manager or centre director role and the therapist role. Likewise, the web application within the therapist role allows the insertion of the results of the observation related to the development of functional skills. The collection of information can be carried out in three measurements every three months. It also facilitates the comparison of development profiles of the same user in three measurements or between users. A description of these functionalities can be found in Fig. 2. b) Survey to assess the usability of the eEarlyCare web application (SAUEA). This survey is an adaptation of the opinion on the features of the platform and materials. The survey consists of 12 closed questions on usability aspects and three open questions to determine the most relevant aspects, aspects to be included and aspects to be removed. This survey is an adaptation of the “User Experience Questionnaire” Laugwitz et al. [13]. 2.3 Procedure Prior to the study, two positive reports were obtained from the Bioethics Committee of the University of Burgos for 1) the use of the eEarlyCare web application (No. IR 09/2020) and 2) the development of the training plan for early care professionals within the European project eEarlyCare-T (No. IO 04/2022). The families were then informed about the use of this resource and their informed consent was obtained. The professionals tested the application over a period of six months. Finally, the professionals carried out a survey to analyse the usability of the application. The steps of the procedure are shown in Fig. 3.
300
M. C. Sáiz-Manzanares
Fig. 2. Operation of the eEarlyCare web application.
Step 1
• Bioethics Committee report use of the application. • Bioethics Committee report on the implementation of the training activity.
•Informing family members and obtaining consent. Step 2 •Implementation of the eEarlyCare web application. •Analysis of the usability of the eEarlyCare web Step 3 application.
Fig. 3. Stages of the procedure.
2.4 Data Analysis A descriptive analysis of the answers given and a reliability analysis were carried out using Cronbach’s Alpha (α). These analyses were carried out with the SPSS v. 28 statistical package [14]. A qualitative analysis was also carried out on the responses given to the open-ended questions of SAUEA. For this purpose, text mining analyses were carried out. Specifically, a frequency analysis with word cloud and a question-answer analysis using a Sankey diagram were applied. The qualitative analysis software Atlas.ti v.23 [15] was used to perform these analyses.
3 Results The descriptive data of the usability analysis can be seen in Table 2. Also, the reliability of the scale obtained an index of α = 0.82 for the overall scale and the indices for the individual items ranged from = 0.77- 0.82 (see Table 2). It can therefore be confirmed that the eEarlyCare web application has a high degree of usability. The highest rated items (mean above 4) were: 4 (user-friendly interface), 5 (usability), 6 (navigability), 7 (intuitive use), 8 (customisation of the intervention programme), 10 (the application does
Using Machine Learning Techniques
301
not require continuous help to use), 11 (effective recording of user assessment data), 12 (interpretation of records - Learning Analytics module). Table 2. Descriptive data and reliability analysis to SAUEA. Items
Mean
SD
Min
Max
Cronbach’s Alpha if Item deleted
1. The use of the web eEarlyCare 3.9 application facilitates the recording of functional skills assessment results in young children (1 = never- 5 = always)
0.6
3
5
0.81
2. The use of the web eEarlyCare application facilitates the interpretation of the results of the assessment of the functional abilities of the patients (1 = never5 = always)
3.8
0.7
2
5
0.81
3. Usability of the eEarlyCare web 3.8 application (1 = Strongly disagree 5 = Strongly agree)
0.4
3
5
0.83
4. The use of the web eEarlyCare application appears to me to be (1 = unpleasant-5 = agreeable)
4.4
0.6
3
5
0.81
5. The use of the web eEarlyCare application software application seems to me to be (1 = unpleasant-5 = agreeable)
4.4
0.6
3
5
0.82
6. Navigating the of the web 4.2 eEarlyCare application seems to me to be (1 = unpleasant-5 = agreeable)
0.6
3
5
0.82
7. The web eEarlyCare application appears to me to be (1 = Not understandable-5 = Intuitive)
4.2
0.6
3
5
0.79
8. I find the personalised intervention programme in the web application to be (1 = Very bad - 5 = Very good)
4.4
0.4
4
5
0.81
9. I have found the web eEarlyCare 2.0 application easy to use (1 = Always - 5 = Never)
1.3
2
5
0.84
(continued)
302
M. C. Sáiz-Manzanares Table 2. (continued)
Items
Mean
SD
Min
Max
Cronbach’s Alpha if Item deleted
10. I needed help using the web 4.0 eEarlyCare application (1 = Never - 5 = Always)
0.6
3
5
0.77
11. The use of the web eEarlyCare application facilitates the recording of the results of the assessment of functional skills in young children (1 = Never - 5 = Always)
4.2
0.9
2
5
0.78
12. The use of the web eEarlyCare 4.8 application facilitates the interpretation of the results and the assessment of functional skills (1 = Never - 5 = Always)
0.3
4
5
0.81
Note. SD = Standard Deviation; Min = Minimum; Max = Maximum
The responses to the open-ended questions were also analysed. A word cloud was made for each of the questions (see Fig. 4). In summary, the most relevant aspects of the eEarlyCare web application focused on its ease of use and the fact that it provides both a personalised profile of the functional development of each user as well as specific guidelines for therapeutic intervention. It also includes a Learning Analytics module that allows the comparison of the development of the same user over time and the of functional development between users in the same programme or therapeutic intervention centre. Regarding the aspects to be included, the validators referred to the fact that the personalised therapeutic intervention orientations offered on a screen image could be exported. They also referred to broadening the contents of the assessment and intervention programme in some areas such as socialisation and language, and working with people with high functional developmental impairments. They also referred to developing a specific assessment and intervention module for the first year of development (0–12 months). The analysis of the questions and answers can be found in the Sankey diagram, see Fig. 5.
Using Machine Learning Techniques
303
Fig. 4. Word cloud on the answers to the open-ended questions.
Fig. 5. Sankey diagram on the answers given to the open-ended questions.
4 Conclusions The use of tools that facilitate both diagnostic interpretation and the implementation of personalized therapeutic intervention programs within what would be understood as a precision diagnosis and intervention [5, 8] carried out using Machine Learning and
304
M. C. Sáiz-Manzanares
Artificial Intelligence techniques [1] is a necessity in the field of early care (children with functional ages 0–6 years) [9–11]. To achieve this goal, computer resources are needed to facilitate both the collection of data and records and the inclusion of Machine Learning and Deep Learning algorithms [2–4] that allow the interpretation of these records in real time. In addition, the inclusion of data visualization techniques [12] that facilitate the interpretation of these results for early care professionals. For this, research projects that develop web applications that include Machine Learning techniques are necessary. In addition, once these applications are implemented, usability studies of the tools are needed to analyse their strengths and weaknesses [13]. This analysis will lead to improvement proposals that will make the process develop in a circle of continuous improvement. In this paper we have addressed Machine Learning algorithms that can be applied to the field of early care. We have also presented an example of the development of a web application, eEarlyCare, which facilitates both the registration and interpretation of data using Learning Analytics techniques, which allows obtaining a personalized development pro-file for each user, as well as a specific therapeutic intervention program [9–11]. We have also tested the usability of this web application and analysed the improvement proposals made by the validators from a quantitative and qualitative point of view. The conclusions point to a very good assessment of the web application and the functionalities it offers and to lines of continuity of the project including a resource to help the therapist based on the use of intelligent voice assistants that regulate the process of assessment and therapeutic intervention. As well as the extension of the web to other areas of early development. However, the work we present has the limitation of the sample with which the effectiveness of the web application and the study of usability has been tested, since we have worked with a small sample of therapists. Future studies will attempt to address larger population samples of professionals. In summary, the use of computer applications that include Machine Learning techniques in Learning Analytics to facilitate diagnosis and therapeutic intervention by early care professionals is still in its infancy and requires further research and knowledge transfer studies, although the results augur a promising future. Acknowledgements. This study has been funded by European research project eEarlyCareT project No. 2021–1-ES01-KA220-SCH-000032661. Also, The eEarlyCare application web was funded by three Proyect Test concept boosting the valorisation and commercialisation of research results PLAN TCUE 2018–2020 co-financed by the European Regional Development Fund (ERDF) and the Junta de Castilla y León: VI, VII and VII Edition. A special mention to the management of the ASPACE Salamanca centre for their involvement and support to research in the field of AI application aimed at improving diagnosis and therapeutic intervention. As well as the users and their families for their collaboration in these applied research projects.
References 1. García, S., Luengo, J., Herrera, F.: Data preprocessing in data mining. intelligent systems reference library, 72. Springer, New York (2015). https://doi.org/10.1007/978/-3-319-102 47-4
Using Machine Learning Techniques
305
2. Vapnik, V.: The nature of statistical learning theory. Springer Sciences & Business Media, New York (2013). https://doi.org/10.1007/978-1-4757-2440-0 3. Breiman, L.: Classification and regression trees. Taylor & Francis, Boca Ratón, FL (2017) 4. Kamra, V., Kumar, P., Mohammadian, M.: An intelligent disease prediction system for psychological diseases by implementing hybrid hopfield recurrent neural network approach. Intell. Syst. Appl. 18(200208), 1–9 (2023). https://doi.org/10.1016/j.iswa.2023.200208 5. Zhang, S., Liu, J., Ying, Z.: Statistical applications to cognitive diagnostic testing. Ann. Rev. Statist. Appl. 10(1), 651–675 (2023). https://doi.org/10.1146/annurev-statistics-033021111803 6. Gómez-Carrillo, A., Paquin, V., Dumas, G., Kirmayer, L.J.: Restoring the missing person to personalized medicine and precision psychiatry. Front. Neurosci. 17(1041433) (2023). https:// doi.org/10.3389/fnins.2023.1041433 7. Montazeri, M., Montazeri, M., Bahaadinbeigy, K., Montazeri, M., Afraz, A.: Application of machine learning methods in predicting schizophrenia and bipolar disorders: a systematic review. Health Science Reports 6(1), 13 (2022). https://doi.org/10.1002/hsr2.962 8. Lee, C.T., Palacios, J., Richards, D., et al.: The Precision in Psychiatry (PIP) study: testing an internet-based methodology for accelerating research in treatment prediction and personalisation. BMC Psychiatry 23, 25 (2023). https://doi.org/10.1186/s12888-022-04462-5 9. Sáiz-Manzanares, M.C., Marticorena, R., Arnaiz, Á., Díez-Pastor, J.F., García-Osorio, C.I.: Measuring the functional abilities of children aged 3–6 years old with observational methods and computer tools. J. Vis. Exper. e60247, 1–17 (2020). https://doi.org/10.3791/60247 10. Sáiz-Manzanares, M.C., Marticorena, R., Arnaiz, Á.: Evaluation of functional abilities in 0–6 year olds: an analysis with the eEarlyCare computer application. Int. J. Environ. Res. Public Health, 17(9), 3315, 1–17(2020). https://doi.org/10.3390/ijerph17093315 11. Sáiz-Manzanares, M.C., Marticorena, R., Arnaiz-Gonzalez, Á.: Improvements for therapeutic intervention from the use of web applications and machine learning techniques in different affectations in children aged 0–6 years. Int. J. Environ. Res. Public Health 19, 6558 (2022). https://doi.org/10.3390/ijerph19116558 12. Demsar, J., et al.: Orange: data mining toolbox in Python. J. Mach. Learn. Res. 14, 2349–2353 (2013) 13. Laugwitz, B., Schrepp, M., Held, T.: Construction and evaluation of a user experience questionnaire. In: Holzinger, A. (Ed.): USAB 2008, LNCS 5298, pp. 63–76 (2008). Retrival from https://www.ueq-online.org/. Accessed 18 Apr 2023 14. IBM Corporation: Statistical Package for the Social Sciences (SPSS) (Version 28) [Software]. IBM (2022). https://www.ibm.com/es-es/products/spss-statistics. Accessed 26 Mar 2023 15. ATLAS.ti: Qualitative analysis software (Version 23). ATLAS.ti (2023). https://atlasti.com/? x-clickref=1101lwIe5qYm. Accessed 19 Apr 2023
A Machine-Learning Based Approach to Validating Learning Materials Frederick Ako-Nai1(B) , Enrique de la Cal Marin1 , and Qing Tan2 1 University of Oviedo, 33005 Oviedo, Spain
[email protected]
2 Athabasca University, Athabasca, AB T9S 3A3, Canada
[email protected]
Abstract. In this paper, we propose a Machine Learning-based approach to validate suggested learning materials. Learning material validation is an essential part of the learning process, ensuring that learners have access to relevant and accurate information. However, the process of manual validation can be timeconsuming and may not be scalable. Traditional learning contents are often only updated or changed in the yearly course revisions. This can be presented with some challenges, especially to courses on emerging subjects and catering to diversified learners, which includes the ability to provide adaptive and updated learning contents to the learners, and the opportunity to continually incorporate feedback. We present a solution and framework that utilizes machine learning algorithms to validate learning materials in an open learning content creation platform. Our approach involves pre-processing the data using Natural Language Processing techniques, creating vectors using TF-IDF and training a Machine Learning model to classify the subject of the learning material. We then calculate the similarity with existing materials for the given course to make sure there is not an existing mate-rial with same content and the new material will add new value. Using an augmented TFIDF score, we check if the suggested learning materials satisfies the key phrases for the course. We evaluate our approach by comparing the Machine-Learning based approach to manual validation. Not only does the machine-learning based approach reduce the time and effort needed for validation, but it also achieves high accuracy in detecting duplicates and similarity matches. Keywords: Open learning · Machine Learning · TF-IDF · Learning materials
1 Introduction The process of validating learning materials is critical to ensure that learners have access to relevant and accurate information. Today’s digital age presents a vast repository of learning materials including online courses, podcasts, and videos. However, some of these materials may be inaccurate or outdated and as such, it is essential to validate suggested learning materials to ensure their quality and accuracy. In many cases, educators and content creation experts manually review learning materials to ensure that they are valid and appropriate for learners. However, manual validation can be a time-consuming © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 306–315, 2023. https://doi.org/10.1007/978-3-031-42519-6_29
A Machine-Learning Based Approach
307
and resource-intensive process, especially when dealing with large volumes of learning materials. Additionally, manual validation may not be scalable, particularly when there is a need to validate learning materials in real-time or at a large scale. This research builds on the validation algorithm proposed by [1–3] in their ongoing research to help create an open and collaborative platform for creating and sharing learning contents and simplify the validation and time it takes to add new materials to a course or module in a Learning Management System (LMS). Therefore, we propose a machine learning-based approach to validate suggested learning materials for learners. In the traditional setting, adding new learning materials or updating the course content will usually follow the Software Development Lifecycle (SDLC) and Change Management practices of the institution. This processes, though not to be side-stepped, often prolongs the time it takes to make little needed changes. To add a new material to a course for example, the process will follow similar steps shown in Fig. 1.
Fig. 1. Traditional process flow example of adding new learning materials.
The steps will be: Step 1: User suggests learning material. Step 2: Material is (peer) reviewed for i. Topic or course relation ii. Content validity Step 3: Test/review material Step 4: Create change ticket for implementation. Step 5: Present at Change Approval Board (CAB) Step 6: Implement to Add to LMS if approved. Such a process of manual validation could take days if not weeks to be able to add a new content or update an existing content. A machine learning based approach would simplify the validation process and shorten the time it would take to add or introduce new learning contents.
308
F. Ako-Nai et al.
When deciding if the material is valid or appropriate for the course, instructors and /or content creators will need to compare the suggested material to datasets which can contain thousands of learning materials. When done manually, this could be very time consuming, requiring lots of resources and become very expensive to manually cycle through, analyze and annotate the data [4–6]. Once initial annotations and labelling are done, this can be automated with the help of machine learning algorithms to classify and validate the materials appropriateness for use in the course.
2 Related Works There has been a lot of research and proposals in the field content classification. These works include experiments with machine learning algorithms to classify and group contents from social media sites and detect problem topics of discussion [5], using machine learning to detect fake news [7], and use of machine learning to classify negative contents from websites so these can be blocked instead of using a manual labour-intensive approach [6]. Another research used machine learning to classify web page contents into freelance, remote work and other full-time jobs [8]. Khan et al. [9], reviewed different machine learning algorithms for text-document classification in which included a hybrid SVM K-NN algorithm. The SVM-K-NN classifier had a higher accuracy compared to individual SNM and KNN models when used independently. In another hybrid example, [10] proposed a fusion model for classifying and evaluating quality of user generated contents using (machine) meta learning. This research used the correlation of two or more different machine learning classifiers to arrive at a better-quality score for the content. Mazari and Djeffal [11], used machine learning and deep learning models such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to analyze sentiments from comments collected from social media sites for a deeper understand of opinions. It uses data from social medial sites which is first cleaned, and some annotations added. The data is then used on the machine learning and deep learning models to analyze sentiments. The following works studied and proposed improvements for TF-IDF text classification in the areas of word meanings and their position withing the document. And thus, have proposed some enhancements to measuring TF-IDF scores. In traditional TF-IDF, the semantics of the text or term and its context to the whole document are sometimes missed. [12, 13] proposed a weighted hybrid model to calculate text similarity which takes into consideration potential semantics of the text and its influence of words on text semantics to produce different weights for words and phrases. [14] proposed an improved TF-IDF algorithm that gives higher weights to words with stronger characteristics within a document than for words dispersed in all documents. [15] proposed representing documents as TF-IDF-weighted sum of their embedding vectors for identifying authors. Clusters are created from dissimilar documents and cosine similarity among the clusters used to link documents for a given author. [16] introduced a technique of boosting TF-IDF scores to better represent the term distribution among classes by incorporating entropy of class details with term frequency of the words.
A Machine-Learning Based Approach
309
3 Validation Suggested Materials Suggested materials need to be validated to ensure they are fit for the purposes stated by the course and will add new content or knowledge. We use a three-stage approach, Subject Classification, Course Similarity Analyzer, and Augmented Term Frequency Tagging to validate suggested materials, as shown in Fig. 2.
Fig. 2. Suggested material validation process.
3.1 Content Subject Classification The first stage in our proposal will use Machine Learning algorithms to automatically classify or label the suggested learning materials. These processing steps are: 1. Data collection: Our dataset is generated with course information from Coursera (https://www.coursera.org/). 2. Data Preparation: The data will be cleaned by removing unnecessary elements such as punctations, symbols and emojis, emails, URLs, and stop words. 3. Feature Extraction: Once the data is cleaned, features are extracted for use in our model. Bag of Words (BoW) is used to extract and create a bag or multiset of words from the documents and Term Frequency – Inverse Document Frequency (TF-IDF) determines the level of importance of a given term in a document [6] and creates a vector. 4. Model Training: A machine learning model is trained on the dataset of learning materials to classify the materials based on the extracted features. 5. Classification: The trained model is then used to classify or label learning materials. The dataset used for this test contained a little over 1000 records for Computer Science courses with categories (features) listed in the Table 1 below. We applied the Synthetic Minority Oversampling Technique (SMOTE) to create synthetized records from the existing examples with lower represented subcategories to balance the dataset. The number of records before and after oversampling is shown in Table 1. Table 2 shows the accuracy of tests obtained from four Machine Learning algorithms. For this proposal, we chose the Random Forest algorithm for use in the ML classification stage.
310
F. Ako-Nai et al. Table 1. Dataset Category/features
Feature/Label
Count Before Oversampling
Count After Oversampling
Algorithms
133
463
Software Development
472
472
Design and Product
132
474
Computer Security and Networks
142
475
Mobile and Web Development
121
505
Table 2. Accuracy (mean) test results using 5-fold cross validation. Algorithm
Precision Score
Recall Score
F1 Score
K-Nearest Neighbour
90%
88%
88%
Time (secs) 3.170
Random Forest
97%
95%
95%
62.819
Support Vector Machine
78%
51%
48%
205.450
Naïve Bayes
85%
80%
81%
1.335
3.2 Course Content Similarity To ensure there is no repeated materials add to the course, the suggested materials need to be checked for their similarity with existing learning materials. To do this, we employ the Jaccard Similarity Coefficient to check for the similarity or distance between the classified content and the course objectives and outcomes. Let A be a set of existing learning materials for a course and B a suggested learning material with the universal set being all possible learning materials for a specified course, then, Assumption: For the set of existing course learning materials for a given course, A, there exist new learning material, B, whose similarity with the existing learning material A is between x, minimum threshold (e.g., 25%) and y, maximum threshold (e.g., 75%).
A|(∃B) (B · A)/B · A => x, (B · A)/B · A 1) or negatively (lift < 1) influences the consequent. Lif t(X =⇒ Y ) =
Support(X =⇒ Y ) Support(X) · Support(Y )
(3)
The Apriori algorithm proposed in [2] is the most well-known and cited algorithm in the literature to find AR and is the basis for most existing algorithms. The main improvement over the previous algorithms lies in the way candidate sets are generated, as it enforces the property of frequent sets: Any subset of a frequent set must also be a frequent set. This property ensures that many of the frequent sets required by other algorithms are not constructed unnecessarily. This algorithm is based on prior knowledge or “a priori” of frequent sets, and uses a breadth-first search strategy. Note that this algorithm obtains AR by discretizing continuous values in the database prior to processing. Conceptually, the Apriori algorithm has the following steps for generating frequent itemsets: – Generation of all itemsets with one item. Use these itemsets to generate those with two items, and so on. – Calculation of the support of the itemsets, in order to obtain the resulting set by eliminating those subsets that do not exceed the minimum support. The algorithm addresses the problem by reducing the number of sets considered so that the user defines minimum support and Apriori generates all sets that meet the condition of having support greater than or equal to that threshold. Any rules that do not satisfy the restrictions imposed by the user, such as the minimum confidence, are discarded, and the rules that do satisfy them are retained.
3
Methodology
This section details the context (Sect. 3.1), the main items analyzed (Sect. 3.2), and preparation from student satisfaction surveys (Sect. 3.3).
322
3.1
M. J. Jim´enez-Navarro et al.
Student Satisfaction Surveys
The procedure for obtaining completed questionnaires from students regarding the teaching activity of their faculty can be carried out using a self-managed or online system. Each teacher can freely choose the system they consider most appropriate. However, the online system is only available to faculty included in the main activity of the subject, meaning that it is not available to lecturers who only teach practical classes. The items in the Student Satisfaction Survey Questionnaire with the US Teaching Activity are organized into 18 questions, as shown in Table 1. It can be observed that the questions are organized into four categories: educational planning and organization (Q1, Q2, Q5, Q6, Q7, and Q8), student support (Q3, Q4, Q9, Q10, Q11, Q12, Q13, Q14, and Q15), evaluation (Q17) and general satisfaction (Q18). Table 1. The items of the questionnaire of the student satisfaction survey with the teaching activity using in US. Questions Q1: It has given me the orientation to learn about the teaching project of the subject Q2: Its teaching is in accordance with the planning foreseen in the teaching project Q3: I adequately attended tutorials Q4: The tutoring schedule is adequate Q5: The bibliography and other recommended teaching materials are proving to be useful for me to follow the course Q6: The teaching is well organized Q7: The means you use to teach are adequate for my learning Q8: The bibliography and other recommended teaching materials are available to students Q9: Explains clearly Q10: Is interested in the degree of comprehension of their explanations Q11: Provides examples to put into practice the contents of the course Q12: Resolves the doubts that arise Q13: Promotes a work and participation climate Q14: Motivates students to take an interest in the subject matter Q15: Treats students with respect Q16: The teaching is helping me to achieve the objectives of the course Q17: The evaluation criteria and systems are adequate to assess my learning Q18: In general, I am satisfied with the teaching performance of this professor
3.2
Main Items Analyzed
In this section, the main elements assessed from the student satisfaction surveys are presented. The various components, along with their descriptions and potential values, are as follows: – Question: refers to the question evaluated in the survey presented. The values have been shortened to Qn, where n is the question number. There are 18 questions listed in Table 1.
Teaching Quality Valuation
323
– Subject: indicates the subject in which the survey was conducted, such as Algorithm Theory, Data Analysis, or Web Development. Subjects have been abbreviated and may be represented by one of the following: ADDA (Analysis and Design of Data and Algorithms), DT (Design and Testing), DSA (Data Structures and Algorithms), MSIT (Management of Services and Information Technologies), ISEIS (Introduction to Software Engineering and Information Systems), OS (Operating Systems), or PF (Programming Fundamentals). – Degree: represents the academic program in which the survey was conducted, all of which fall under the computer science discipline. The values may be one of the following: SE (Software Engineering), IT (Information Technology), CE (Computer Engineering), or ITM (Information Technologies and Mathematics). – Course: refers to the period of years during which the survey was collected, spanning from 2017-18 to 2021-22. – ClassType: distinguishes between two types of classes: theoretical, in which the teacher provides a theoretical explanation of the subject; and practical, in which the class focuses mainly on student work, in addition to teacher explanation. – Covid: indicates whether surveys were conducted before 2020 (Precovid), during 2020 (Covid), or after 2020 (Postcovid) [6]. – QType: differentiates between two types of questions: direct, which are related to some aspect of the teacher or the teaching approach, and indirect, which are generally based on some aspect more related to the subject than to the teacher. Indirect questions include Q5, Q6, Q7, Q8, and Q17. – Score: refers to the target of the analysis, which is the value of the student’s answer in the range [1, 5] or Do not know /Do not answer (Dk/Da). 3.3
Analysis Process
To analyze the survey results, individual surveys are collected for each interested teacher. The surveys include a table with each row representing a question and columns displaying the different scores. The values in the table represent the number of students who evaluated the question with a particular score. Additional information is included in the metadata, as detailed in Sect. 3.2. Once the surveys have been collected, the tables are transformed to build an itemset. For each score in each question evaluated by a student, a transaction is built that includes all the metadata from the survey, as well as the specific question and score obtained. Finally, the Apriori algorithm described in Sect. 2 is applied to the itemset. The algorithm uses minimum support and confidence to filter out irrelevant features, which must be established based on the quantity and quality of the target rules obtained. The output of the algorithm includes the rules and associated metrics. To eliminate redundant information, any rule with a subset of its antecedents having the same or higher confidence is filtered out.
324
4
M. J. Jim´enez-Navarro et al.
Results and Discussion
In this section, we present an analysis of 1673 surveys collected between 2017 and 2021, supported by the use of AR, which items are described in Sect. 3.2. Our analysis focuses on rules with support greater than 1%, confidence greater than 50%, lift greater than 1, and the score item as the consequent of the rule. We divide this section into five subsections that describe the analysis, focusing on the different items from the survey except for the Covid and QType as these items usually appear just in combination with other items. 4.1
Overview
This section shows an overview of the main rules obtained by the study using Fig. 1.
Fig. 1. Summary of the main rules obtained.
As observed, the items that appear as antecedents with a higher support and confidence score of 5 in the consequence are Q15, direct questions, precovid surveys, and the DGIITIM degree. For a score of 4, Q14, surveys collected during postcovid, and the DGIITI degree seem to be evaluated with greater confidence and support.
Teaching Quality Valuation
4.2
325
Question
Table 2 displays the principal rules that contain the question item in its antecedent sorted by confidence. To enhance visual clarity, the rules have been condensed, highlighting the most illustrative ones. Some questions are not included in the table, as the rules evaluated in our surveys did not produce significant findings regarding support and confidence. Table 2. Principal rules related to questions. The rules are summarized and sorted by confidence. Antecedents {Q15,ADDA,2019-20} {Q14,DIT} {Q15} {Q12,ADDA,Postcovid} {Q12}
Score Support Confidence 5 4 5 5 5
0.016 0.012 0.054 0.013 0.041
0.963 0.909 0.882 0.759 0.673
The top rule obtained contains Q15 item obtaining a maximum score of 88.5% confidence overall. The rule with best confidence establishes that during the 2019-20 course, in the ADDA subject, the Q15 score was maximum with more confidence. This appears to indicate that this group of students was particularly satisfied with the treatment they received from their teacher, with an increase in confidence of 7.8%. Q14 exhibits a higher degree of variability; as a consequence, there is no rule with this item in the antecedent. However, certain combinations with other items yield useful information. All the rules discovered achieved a score of 4, with confidence being particularly high in the DIT degree at 90.9%. Q12 achieved a maximum confidence of 67.3% overall. This query achieved the highest confidence score in ADDA subject and prior to COVID at a 75.9% confidence level. This suggests that the scores in other subjects are more varied and do not provide adequate support or confidence to draw any conclusions. 4.3
Subject
The rules in Table 3 show the antecedents related to the subject that have the strongest correlation with scores of 4 or 5. The first two rules ({DSA, 2021-22} and {DSA, Postcovid} antecedents) suggest that students who took the DSA course during the 2021-22 academic year or after the COVID pandemic rated the subject quality higher. These results may indicate that changes in teaching methods or adjustments made in response to the pandemic may have positively impacted subject quality. However, since the support values are relatively low, we should be cautious when making strong conclusions based on these rules alone.
326
M. J. Jim´enez-Navarro et al.
The next two rules ({ADDA, Practical, Direct} and {ADDA, Direct} antecedents) suggest that students who took the ADDA course and had direct interactions with the professor rated the quality of the subject higher. These findings align with common educational practices that emphasize teacher-student engagement, which has been shown to positively impact learning outcomes. The second rule with higher support value suggests that direct interactions with the professor may be more important than other factors, such as practical sessions, in determining subject quality for ADDA students. Table 3. Principal rules related with the subject. The rules are summarized and sorted by confidence. Antecedent
Score Support Confidence
{DSA, 2021-22} {DSA, Postcovid} {ADDA, Practical, Direct} {ADDA, Direct}
4.4
4 4 5 5
0.034 0.034 0.069 0.250
0.559 0.559 0.542 0.520
Degree
Table 4 shows the principal ARs related to the degree of the students enrolled in the courses. The rule {ITM, 2019-20, Direct} → 5 indicates that when a student enrolled in ITM in the 2019-20 academic year gives a high score, it is a high probability that they also responded positively to direct questions that evaluate the professor. Similarly, the rule ITM, Theory → 5 indicates that when a student enrolled in ITM takes a theoretical subject, there is a high probability that they also give a high score. Table 4. Principal rules related with the degree. The rules are summarized and sorted by confidence. Antecedent {ITM, 2019-20, Direct} {ITM,Theory} {ITM} {SE,2021-22} {CE, Direct}
Score Support Confidence 5 5 5 4 5
0.044 0.056 0.097 0.034 0.045
0.881 0.797 0.706 0.559 0.524
These results suggest that professors should pay special attention to their performance when teaching theoretical subjects to ITM students, as this seems
Teaching Quality Valuation
327
to be a key factor in obtaining high scores. Additionally, professors may want to encourage students to answer direct questions in the survey, as these questions are highly associated with high scores, indicating that they provide valuable feedback for professors. 4.5
Course
Table 5 shows the main ARs related to the courses enrolled. Following the same nomenclature as in the previous sections, we can see that {2017-18, Theory, Direct} and {2018-18, Direct} → 5 indicate that the direct answers for the 2017-18 course get a 5 with a confidence of 52.4% and 51.8%, respectively. We can see a similar rule ({2019-20, Theory, Direct}→ 5) for the 2019-20 course with similar confidence (51.4%). On the other hand, there exists the rule {2021-22, Practice, Direct}→ 5 that indicates with confidence of 50.5% that the questions for the group of practices of the 2021-22 course on direct questions about the teacher get a 5. Furthermore, we find {2021-22, Direct}, which with a confidence of 50.1% obtained a score of 5. Table 5. Principal rules related to the course. The rules are summarized and sorted by confidence. Antecedent {2017-18, {2017-18, {2019-20, {2021-22, {2021-22,
5
Theory, Direct} Direct} Theory, Direct} Practice, Direct} Direct}
Score Support Confidence 5 5 5 5 5
0.045 0.094 0.096 0.088 0.127
0.524 0.518 0.514 0.505 0.501
Conclusions
This work presents a simple methodology for analyzing teaching quality using association rules mining in student satisfaction surveys. This method is intuitive due to its “if-then” structure, which provides useful information about the strong points and weak points during teaching. Additionally, the method is generalizable to almost any survey with minimal changes. In our study, most of the rules obtained good confidence and support scores, indicating that students are generally satisfied with the treatment of teachers. However, some aspects still require reinforcement since rules related to important teaching skills do not appear with sufficient confidence. This methodology can serve as a starting point for a self-improvement process that clearly identifies strengths and weaknesses. As future work, we intend to enhance our study by including surveys from a greater number of years and a wider range of subjects. This will help us to
328
M. J. Jim´enez-Navarro et al.
enrich our analysis and yield more comprehensive findings. Additionally, we’re interested in exploring additional types of academic data to uncover patterns related to student dropouts within the university context. Acknowledgements. The authors would like to thank the Spanish Ministry of Science and Innovation for the support under the projects PID2020-117954RB-C22 and TED2021-131311B-C21, and the Junta de Andaluc´ıa for the project PYC20 RE 078 USE.
References 1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. 22, pp. 207–216, (1993) 2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Databases, vol. 20, pp. 478–499 (1994) 3. Castro Morera, M., Navarro Asencio, E., Blanco Blanco, A.: The quality of teaching as perceived by students and university teachers: analysis of the dimensionality of a teacher evaluation questionnaire. Educacion XX1 23(2), 41–65 (2020) 4. Garc´ıa-Concepci´ on, M.A., Jim´enez-Ram´ırez, A., Mart´ınez-Ballesteros, M., Gasca, R.M., Parody, L., Soria-Morillo, L.M.: Extensiones para el ciclo de mejora continua en la ense˜ nanza e investigaci´ on de ingenier´ıa inform´ atica. Revista de Ense˜ nanza Universitaria 38, 4–26 (2011) 5. Isla-D´ıaz, R., Hess-Medler, S., Marrero-Hern´ andez, H.: The role of students in the docentia process after ten years of evaluation: evaluating the teacher or the subject? that is the question. Revista Espanola de Pedagogia 79(280), 393–411 (2021) 6. Kumar, A., Sarkar, M., Davis, E., Morphet, J., Maloney, S., Ilic, D., Palermo, C.: Impact of the covid-19 pandemic on teaching and learning in health professional education: a mixed methods study protocol. BMC Med. Educ. 21, 439 (2021) 7. Garc´ıa Mart´ın, A., Teresa Montero Cases, Josefina Garc´ıa Le´ on, V´ azquez Arenas, G.: Validez de las encuestas de satisfacci´ on de los estudiantes para evaluar la calidad docente: el caso de la UPCT (cartagena). REDU: Revista de Docencia Universitaria (2020) ´ 8. Mart´ınez Ballesteros, M., Troncoso, A., Mart´ınez-Alvarez, F., Riquelme, J.C.: Improving a multi-objective evolutionary algorithm to discover quantitative association rules. Knowl. Inf. Syst. 49, 481–509 (2016) 9. Mas Torell´ o, O.: Las competencias del docente universitario: la percepci´ on del alumno, de los expertos y del propio protagonista. Revista de Docencia Universitaria, REDU (2012) ´ 10. Rubio-Escudero, C., Mart´ınez-Alvarez, F., Atencia-Gil, E., Troncoso, A.: Implementation of an Internal Quality Assurance System at Pablo de Olavide University of Seville: Improving Computer Science Students Skills. In: Proceedings of International Conference on European Transnational Education, vol. 10, pp. 340–348 (2020) 11. Thijssen, M.W.P., Rege, M., Solheim, O.J.: Teacher relationship skills and student learning. Econ. Educ. Rev. 89, 102251 (2022)
Robustness Analysis of a Methodology to Detect Biases, Inconsistencies and Discrepancies in the Evaluation Process Jose Divas´on(B) , Francisco Javier Mart´ınez-de-Pis´on, Ana Romero, and Eduardo S´ aenz-de-Cabez´on University of La Rioja, Logro˜ no, Spain {jose.divason,fjmartin,ana.romero,esaenz-d}@unirioja.es Abstract. This paper analyzes the robustness and stability of a published methodology to improve the evaluation of complex projects in university courses. For this purpose, different types of experiments are performed on a dataset (e.g. elimination of features, input perturbations) of a subject in Computer Systems at the University of La Rioja (Spain); then, the methodology is reapplied, analyzing whether the final conclusions remain similar. The results show that the conclusions obtained, despite the variations introduced, are consistent.
Keywords: sensitivity analysis learning
1
· GAparsimony · SHAP · machine
Introduction
Many of the courses belonging to Science and Engineering degrees in universities include as part of the evaluation the development of projects. The evaluation of this type of projects has certain difficulties, since they usually involve a technical component (and therefore, more objective) and a creative part (more subjective). Therefore, discrepancies (unexpected differences among the evaluation of different teachers for the same project), biases (prejudice for or against a person or group) and inconsistencies (unexpected differences in the evaluation of similar quality projects by the same teacher) may arise, even if there is a well-defined rubric and teachers coordinate with each other. In a recent paper [1], a new AI-based methodology has been proposed to detect evidence of biases, discrepancies and inconsistencies in the evaluation of complex student projects that involve both technical and creative components. It is based on the identification of the most representative features that influence the grade by performing feature selection (FS) and hyperparameter optimization (HO) techniques, where the output feature to predict is precisely the final grade. To measure this influence, interpretable models (those with built-in feature importance) or explainable AI techniques (if black-box models are involved) are employed. The methodology consists of seven steps that are summarized below: c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 329–339, 2023. https://doi.org/10.1007/978-3-031-42519-6_31
330
J. Divas´ on et al.
1. Identification of features: teachers should identify the variables that could influence the grade and group them in three types: technical, style and context-based features (such as the number of lines of code of a computer program, the aspect of a graphical user interface and student gender, respectively). 2. Data extraction: a dataset should be generated from previous years’ deliveries. To do so, teachers should quantify each feature. In the case of style features (and therefore subjective), several experienced teachers should jointly perform the task to agree on a score for each instance. 3. Data preprocessing: usual ML task to increase the quality of the estimates. 4. Basic ML training and optimization: the goal is to train baseline models based on the database to estimate the final grade of the project from the features that were identified in the first step. This step permits one to detect if the set of identified features must be revisited, and to assess the quality of the predictions and the complexity of the models. 5. Selection of one or several ML models, perform FS and HO tasks: depending on the available hardware and the results of the previous step, a model is chosen to be optimized by advanced techniques, like genetic algorithms, and a FS is performed. 6. Obtain the influence of the selected features on the model’s output by means of some explainability technique such as LIME, ELI5 and SHAP: the best model is analyzed to determine the influence of each of the selected features. 7. Analyze the results in order to: (a) Generate a rubric or refine an existing one: based on the selected features and their influence, a rubric can be generated semi-automatically based on the evaluation carried out in previous years. (b) Detect possible discrepancies, inconsistencies and biases: the analysis of the selected features and their influence helps to determine biases (for example, if the gender appears as a selected feature), inconsistencies (for example, if a feature appears, but has an opposite influence to what is expected) and discrepancies (for example, if a teacher appears as having an influence on the grade). As particular instances of steps 5 and 6, the paper [1] proposes the use of GAparsimony [4], a technique based on genetic algorithms for FS and HO that seeks a parsimonius optimized model with a reduced number of features, while SHAP [2] is suggested as the preferable explainability tool for black-box models. The methodology was tested with a Computer Systems course at the University of La Rioja, see [1, Section 4] for details. That course belongs to both the degree in Mathematics and the degree in Computer Science. Specifically, the project in that course consists of creating a website (in groups of two or three people) that must meet certain requirements. Data were collected over five years and the first three steps of the methodology were applied, forming a dataset of 322 projects (the rows) and 38 features (the columns). The complete list of identified features and their description are presented at https://bit.ly/ 3iOFTn7.
Robustness Analysis of a Methodology to Detect Problems in the Evaluation
331
Following the rest of the methodology, the best model obtained was a multilayer perceptron (MLP) that required 21 neurons and involved 14 features. It reached a root-mean-square error (RMSE) of 0.114. The default GAparsimony configuration was 40 particles with a rerank set to 0.01 (see [1] for explanations and further details). The SHAP analysis is summarized in Fig. 1. In that figure, the features are ordered according to their importance and each point is a SHAP value for a feature and an instance. Points with a negative impact on the grade are those placed on the left-hand side of the x-axis, while those placed on the right-hand side have a positive impact. Colors represent the value of the feature from low (blue values) to high (red values). The methodology allowed us to detect several facts; for instance, there is no evidence of bias (the feature gender was not selected). In addition, one teacher had a strong (and negative) influence on the grade (Teacher1). Also, SHAP analysis detected an inconsistency in the correction with respect to the number of members of the group: the feature GroupMembers was selected, which is expected, since 3-person projects should require more effort than projects developed in pairs for the same grade. How- Fig. 1. Final SHAP summary following the ever, SHAP analysis revealed methodology that 3-member groups have a minor positive influence on the grade, but 2-member groups were penalized (blue dots placed at the left). The methodology was shown to work; however, further studies were pending to ensure its robustness and stability. The following research questions arise: – RQ1: Do the initial configuration values of GAparsimony affect the conclusions? – RQ2: If a feature is missing in step 1, does the methodology still show similar conclusions? – RQ3: If there are small perturbations in the data, are similar conclusions obtained? RQ1 would help to find out if different GAparsimony configurations modify the conclusions. RQ2 would allow one to find out if the methodology is robust, analyzing the outcomes if teachers forget to identify some feature in step 1. RQ3 would analyze the outcomes when small modifications occur in the dataset, which could be caused by the teachers during the creation of the dataset (due to small errors, doubts with some parts, discrepancies and so on).
332
2
J. Divas´ on et al.
Methods
To answer RQ1, RQ2 and RQ3 we conduct three types of experiments. In all of them, we started from the same original dataset presented in [1]. – Experiment 1: with the original dataset, we modify the initial GAparsimony configuration, namely the number of individuals within the values {5, 10, 15, 20, 25, 30, 35}. We will also modify independently the rerank parameter within the values {0.001, 0.005, 0.05, 0.1}. To increase the robustness of the experiments, for each modification the GAparsimony optimization process is repeated 10 times. – Experiment 2: one of the 38 features from the original dataset is removed, and then the GAparsimony method and SHAP analysis are performed with the other 37 features. This process is carried out for each of the 38 features (dropping one of them in each case) and repeated 10 times for each feature. – Experiment 3: minor data perturbations are performed on the style features (which are error-prone, since they are subjectively evaluated by teachers). Concretely, we randomly modify 10% of such features. We stress that each experiment consists of several tests and each test consists of 10 models. To measure the effect of modifications on the results of the FS phase in each experiment, we will compare the set of features obtained after performing the GAparsimony methodology with the ground truth. We consider the ground truth as the set of features that are selected following the methodology with the original dataset and the default configuration. That is, the ground truth G is the set of 14 features presented in Fig. 1. We use Jaccard index to measure the average similarity of a pairwise selected feature subsets Si and Sj : J(Si , Sj ) =
|Si ∩ Sj | |Si ∪ Sj |
ˆ i ) the index that will measure the stability of each test Let us denote as J(T indexes between G and the feature Ti . We define it as the mean of the Jaccard 10 j=1 J(G,Sj ) ˆ sets of the 10 models of the test: J(Ti ) = 10 In addition, we also calculate the Jaccard index between G and the best model obtained in each test. The Jaccard index allows us to compare the stability of the selected features, i.e., whether the features that are most important in the evaluation hold up to small changes in the default settings, the identified features or the evaluation of the features. To complete the study, we also check if the SHAP values are similar for each shared feature between G and each Sj . To do so, we use the Kullback–Leibler divergence (KL). The Kullback–Leibler divergence quantifies how much one distribution P differs from another distribution Q, where P usually represents the true distribution of data or observations and Q represents a theory, model, or approximation of P . The closer the value of KL is to zero, the more similar the distributions are. If it is equal to zero, then both distributions are equal. Given the feature to be compared between two SHAP analyses (e.g., comparing Teacher1 in G and a model M of a test T ),
Robustness Analysis of a Methodology to Detect Problems in the Evaluation
333
Fig. 2. SHAP summaries of different models of Experiment 1.
Fig. 3. SHAP summaries of different models of Experiment 2.
we will construct P (G, x) as 322 2D samples (xi , yi ) where xi is the value of the feature x for the i-th instance and yi is the corresponding SHAP value in the ground truth. Similarly, we construct 322 2D samples for M (denoted as P (M, x)) and calculate KL(P (G, x), P (M, x)) following a method to compute it from multidimensional samples [3].
334
J. Divas´ on et al.
For example, if we compare the SHAP of the feature Teacher1 in Fig. 1 and in Fig. 2a (obtained from one of the experiments), both are visually very similar (the distance between points and SHAP values are similar, the color distribution is very similar). The KL value in this case is 0.005. However, if we compare for example Teacher1 to GeneralAppearance the KL value is 13.1186, since they are very different distributions (in fact, we are comparing different features!). To compare the KL values between the ground truth and another model, we define our own index: x∈G∩F (Mj ) KL(P (G, x), P (Mj , x)) IKL (G, Mj ) = |G ∩ F (Mj )| where P (Mj , x) is the set of 322 tuples (feature value, SHAP value) of the feature x in the model Mj as explained before, F (Mj ) is the set of features selected in B) is a function that returns 1 if KL(A, B) < 0.1 the model Mj and KL(A, and 0 otherwise. That is, what we essentially do is to count the KL values less than 0.1 for the shared features between both models and divide by the number of shared features. In this way, we obtain an index between 0 and 1, so that the closer the number is to 1 the greater the number of features whose SHAP analysis is similar in both models. As each individual test Ti is repeated 10 times, it is straightforward to extend the index IKL (G, Mj ) to cover all 10 models, and we denote this as IˆKL (G, Ti ).
3
Results and Discussion
Tables 1, 2, 3, 4 present the results of the experiments. Each row in the tables is a test of an experiment, and each test is repeated 10 times. RMSE and nfs represent, respectively, the mean RMSE and mean number of features of the 10 models of the corresponding test. In addition, best RMSE and nfs best are the RMSE and number of features of the best model among the 10 repetitions. Figures 2, 3, 4 show the SHAP summaries of different models for each experiment. In view of the results of Tables 1, 2, 3, 4, several conclusions can be drawn. Table 1. Results for Experiment 1 and different number of particles. npart RMSE nfs best RMSE nfs best 5 10 15 20 25 30 35
0.1267 0.1269 0.1241 0.1235 0.1253 0.1227 0.1217
24.7 19.7 20.5 20 22.8 18.5 17.2
0.1220 0.1223 0.1211 0.1205 0.1212 0.1190 0.1191
23 22 20 15 22 18 13
Jˆ
ˆ J(G, best)
IˆKL
0.4222 0.4761 0.5000 0.5242 0.4379 0.5229 0.6464
0.4231 0.5652 0.5455 0.6111 0.6364 0.6000 0.9286
0.8319 0.8796 0.8158 0.7719 0.8649 0.7615 0.8534
Robustness Analysis of a Methodology to Detect Problems in the Evaluation
335
Table 2. Results for Experiment 1 and different rerank values. rerank RMSE nfs best RMSE nfs best 0.001 0.005 0.05 0.1
0.1179 0.1196 0.1226 0.1185
17.2 17.1 21.8 21.2
0.1162 0.1170 0.1189 0.1157
14 16 15 22
Jˆ
ˆ J(G, best)
IˆKL
0.7219 0.6974 0.8 0.5803
1.0000 0.8750 0.7000 0.4400
0.7641 0.6905 0.8640 0.7969
Concerning experiment 1 and the number of particles, the Jaccard index is in general around 0.50, i.e., half of the features coincide even if one applies GAparsimony with very few particles. Moreover, if the number of particles is greater than 30, the results improve quite a lot, especially if we look at the best of the 10 models. Note that with few particles (5, 10) the average nfs is lower than if more particles are used (30, 35), probably because GAparsimony would need more iterations with so few particles. The indices for comparing the SHAP values (IˆKL ) are rather high, always higher than 0.75, and this indicates that most of the coincident selected features have a SHAP distribution that can be considered equivalent compared to the ground truth. Further inspection of the results shows that, within the selected features, the most important ones appear in the vast majority of the feature sets and their SHAP distributions are very similar. As expected, the RMSE value decreases (i.e., improves) the higher the
Fig. 4. SHAP summaries of different models of Experiment 3.
336
J. Divas´ on et al. Table 3. Results for Experiment 2.
deleted feature
RMSE nfs best RMSE nfs best
HTMLfiles CSSfiles HTML2CSS Forms Tables Ul Ol Videos AccessibilityNotSatisfied AccessibilityErrors likelyAccProblems PotentialAccErrors getTotalBytesHTML BytesCSS CSSModifiers CSSTags HTMLTags Languages IMGLabel ImageFiles JavaScript Responsive GroupMembers ValidatorUses Gender Degree Teacher4 Teacher5 Teacher1 Teacher6 Teacher3 Teacher2 GeneralAppearance Functionality Contents Positioning Contrasts Legibility
0.1218 0.1212 0.1221 0.1207 0.1214 0.1236 0.1202 0.1215 0.1244 0.1201 0.1205 0.1225 0.1213 0.1221 0.1211 0.1211 0.1234 0.1218 0.1227 0.1245 0.1208 0.1205 0.1259 0.1202 0.1217 0.1203 0.1224 0.1206 0.1227 0.1222 0.1213 0.1253 0.1295 0.1236 0.1260 0.1221 0.1233 0.1224
19.4 19.2 19.7 16.9 18.9 21.1 17.7 18.3 18.5 16.8 18 20.5 18 19 19.7 19.1 20.7 17.5 19.8 21.5 18.9 17.5 20.5 17.9 18.9 17.9 17.3 19.5 19 16.7 18.2 21.6 19.5 18.3 18.1 18.7 15.4 17.5
0.1192 0.1168 0.1169 0.1175 0.1179 0.1192 0.1179 0.1175 0.1216 0.1185 0.1166 0.1197 0.1172 0.1200 0.1177 0.1182 0.1184 0.1181 0.1191 0.1208 0.1167 0.1171 0.1229 0.1176 0.1191 0.1181 0.1192 0.1179 0.1195 0.1184 0.1192 0.1222 0.1262 0.1218 0.1238 0.1182 0.1180 0.1199
15 15 15 15 16 22 17 18 17 15 14 18 17 17 19 17 17 17 15 20 14 18 18 19 15 20 19 18 18 14 19 21 22 19 20 20 17 15
Jˆ
ˆ J(G, best)
IˆKL
0.5736 0.6150 0.5986 0.6845 0.6387 0.4817 0.6337 0.6303 0.5195 0.6171 0.7127 0.5831 0.6269 0.5834 0.5878 0.6377 0.5118 0.6004 0.5260 0.4368 0.6674 0.6155 0.4768 0.6475 0.5857 0.6289 0.5464 0.5987 0.5777 0.6028 0.6243 0.4893 0.4902 0.5541 0.4513 0.5746 0.6658 0.5991
0.8125 0.9333 0.9333 0.9333 0.8750 0.4583 0.8235 0.7778 0.6667 0.6111 1 0.6316 0.8235 0.6316 0.5 0.8235 0.6667 0.7222 0.7500 0.5714 1 0.6842 0.5500 0.7368 0.8125 0.6190 0.6 0.7778 0.6316 0.7500 0.7368 0.5455 0.4583 0.6 0.6500 0.4783 0.7647 0.8666
0.7311 0.7317 0.7869 0.7295 0.7419 0.8611 0.7623 0.7236 0.8762 0.8362 0.5938 0.7983 0.7833 0.7750 0.7642 0.8268 0.9204 0.7949 0.8727 0.9519 0.6406 0.7563 0.9159 0.7398 0.8500 0.7686 0.8558 0.7724 0.8879 0.7168 0.7851 0.8750 0.8558 0.8636 0.8947 0.7542 0.7411 0.7328
number of particles is. In addition, when studying different reranks, we see that, except for the 0.1 case, the other reranks obtain a fairly similar best model. The case 0.1 does not give good solutions, but this is due to the fact that the
Robustness Analysis of a Methodology to Detect Problems in the Evaluation
337
maximum number of iterations is not enough. If one increases the iterations, the results between different reranks become comparable and the solutions are similar. Figure 2 presents two SHAP summaries, one of a model obtained with rerank set to 0.001 and the other with the number of particles fixed to 35. As shown, both are extremely similar to the ground truth (i.e., Fig. 1) and thus, one could draw the same conclusions as those presented in [1, Section 4]. Regarding experiment 2, it is important to note that in all cases the RMSE gets worse, which is consistent because the dataset contains less information (one feature is dropped in each test). If one takes a careful look at Table 3, most of the ˆ values J(G, best) and IˆKL are very high, which indicates a high robustness and stability of the methodology against feature identification oversights by teachers. To analyze Table 3 deeply, we can separate the features into three groups: those features that belong to the ground truth and also have high SHAP values (shown at the top in Fig. 1), those features that belong to the ground truth but have low SHAP values (shown at the bottom in Fig. 1), and those that do not belong to the ground truth. In the first case (for example, if one removes GeneralAppearance, Teacher1 or Contents), then the models choose other extra features to compensate for the loss. The most important features are maintained in the SHAP summary with a very similar SHAP distribution and the conclusions obtained would be similar to those of the methodology. See, for instance, Fig. 3b where GeneralAppearance has been eliminated: new features are chosen but the main important ones are still there with the same SHAP plot; for example, Teacher1 still has an important negative influence on the grade and there is an inconsistency in the evaluation with respect to the number of members. Even so, in these cases there is some uncertainty and conclusions should be taken with caution because the model is not so good in its predictions: in fact, there is evidence that some important feature is missing, because the RMSE has worsened a lot (from 0.114 in the groud truth to a mean of 0.1295), which would suggest teachers to revisit step 1 of the methodology (the identification of features) to obtain a better model and further conclusions. If one removes some of the features that were not in the ground truth (for instance, HTMLfiles or Javascript), then the same conclusions are reached and the models obtained are very similar. In fact, in these cases, models are obtained with an RMSE much closer to that obtained in the ground truth and the set of features selected for each model is very similar. If one eliminates a feature that was present in the ground truth but was one of the least important (such as IMGlabel and Contrasts), then some minor differences arise but the conclusions on the main features involved do not change: they are still selected and their SHAP distribution is very similar. See, for instance, Fig. 3a, which presents the SHAP summary of a model when Table 4. Results for Experiment 3. RMSE nfs best RMSE nfs best 0.1326 16.8
0.1264
19
Jˆ
ˆ J(G, best)
IˆKL
0.6028
0.7127
0.8922
338
J. Divas´ on et al.
Contrasts (one of the selected features with lower importante) is deleted from the dataset. Such a plot can be compared to the ground truth (Fig. 1), showing that SHAP distributions of the features remain almost identical. With respect to the latter experiments, the RMSE worsens notably when modifying subjective variables (those style-related variables that teachers evaluate, such as GeneralAppearance and Contrasts). However, the mean number of selected features remains quite similar with respect to the ground truth. Figure 4 shows two SHAP summaries of this experiment. It can be seen that the SHAP plots have changed compared to Fig. 1 (e.g., in GeneralAppearance the SHAP values are quite different in scale), but the distribution and colors remain similar (although the KL divergence is not smaller than 0.1). In other words, the conclusions that could be drawn would be similar, compare, for instance Teacher1, GeneralAppearance and PotentialAccErrors in Figs. 1, 4a and 4b. These results allow us to answer the research questions formulated in the introduction. On RQ1, the initial values of GAparsimony generally do not affect the final conclusions, except in the case of rerank 0.1 which is too large for the problem, at least if the number of iterations is not increased. It is worth recalling that other forms of FS and HO are also valid: the methodology does not impose the technology to be used. Regarding RQ2, if teachers do not identify an important feature in step 1, more features are selected to compensate and the RMSE rises. Thus, a high RMSE may indicate to teachers that something is wrong in that step. In other cases, if the eliminated feature is not important, the conclusions obtained will be similar. As for RQ3, small perturbations can change the SHAP of the selected variables, especially when the perturbations are performed in the most important variables. However, the visual analysis of SHAP allows one to obtain quite similar results.
4
Conclusions
This work presents some experiments to test whether small changes in the data and input parameters of a methodology to detect biases, inconsistencies and discrepancies modify the conclusions obtained. Concretely, Experiment 1 has shown that very similar results are obtained if the GAparsimony hyperparameters are modified. If variables are eliminated, Experiment 2 has shown that the predictive capacity of the model worsens, but the selected variables remain stable with similar SHAP values. Experiment 3 has shown that modifications in the subjective variables produce changes in the model, but visually the SHAP values allow one to maintain the original conclusions. In view of the results obtained in the experiments, we can conclude that, in general terms and using the original dataset of the article [1], the methodology is robust and stable. Acknowledgments. This work is supported by grant PID2020-116641GB-I00 funded by MCIN/AEI/ 10.13039/501100011033.
Robustness Analysis of a Methodology to Detect Problems in the Evaluation
339
References 1. Divas´ on, J., Mart´ınez-de-Pis´ on, F.J., Romero, A., S´ aenz-de Cabez´ on, E.: Artificial intelligence models for assessing the evaluation process of complex student projects. IEEE Trans. Learning Technol., 1–18 (2023) 2. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 4768-4777. Curran Associates Inc. (2017) 3. P´erez-Cruz, F.: Kullback-Leibler divergence estimation of continuous distributions. In: 2008 IEEE ISIT, pp. 1666–1670 (2008) 4. Urraca, R., Sodupe-Ortega, E., Antonanzas, J., Antonanzas-Torres, F., Mart´ınezde-Pis´ on, F.J.: Evaluation of a novel GA-based methodology for model structure selection: the GA-PARSIMONY. Neurocomputing 271, 9–17 (2018)
Evaluation of the Skills’ Transfer Through Digital Teaching Methodologies ´ Javier D´ıez-Gonz´alez(B) , Paula Verde, Rub´en Ferrero-Guill´en, Rub´en Alvarez, Nerea Juan-Gonz´ alez, and Alberto Mart´ınez-Guti´errez Department of Mechanical, Computer and Aerospace Engineering, Universidad de Le´ on, 24071 Le´ on, Spain [email protected]
Abstract. Information and Communication Technologies (ICT) have great potential to transform education and university training. One of these technologies is Virtual Reality (VR), which allows students to experience and explore virtual environments that are similar to their physical world. However, the transfer of skills that students can achieve with these technologies has not been evaluated. For this reason, the authors have fairly compared three different methodologies (i.e., VR, computer application, and real environment training) in the same case study. For this purpose, a training platform has been developed based on a Digital Twin (DT) that models the behavior of an industrial vehicle. Furthermore, applications that virtualize the environment simulated by the DTs have been implemented in this training platform. In this way, it has been possible to evaluate the times and number of failures in driving an industrial vehicle before and after training in each of the methodologies. The data obtained show that the methodology based on VR increases the transfer of skills with respect to computer applications highlighting the benefits of an immersive experience for the user while learning. In addition, results validate digital methodologies as effective training methods that allow the opening of novel learning techniques reducing the gap between academia and the business worlds while developing virtually the skills of the students enabling their preparation for a rapid adaptation to their jobs from university. Keywords: Automatic Ground Vehicle Twin · Virtual Reality · Skills Transfer
1
· Digital Training · Digital
Introduction
The education of students in the European Higher Education Area (EHEA) is essential for the development of new skills and knowledge for future professionals [1]. The implementation of Information and Communication Technologies This work was partially supported by the Spanish Research Agency (AEI) under grant number PID2019-108277GB-C21/AEI/10.13039/501100011033 and the University of Le´ on. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 340–349, 2023. https://doi.org/10.1007/978-3-031-42519-6_32
Skills’ Transfer Through Digital Teaching
341
(ICT) in different sectors has accelerated the employment of new digital methodologies at all educational levels [15]. In this manner, the gap between training and professional development can be reduced, facilitating the effective integration of the students in their professional work. ICT has established new teaching and learning processes adapted to the enterprises’ needs. This allows students to acquire knowledge more efficiently and at a pace suited to their needs [13]. The use of digital training methodologies such as educational platforms, virtual classrooms, and collaborative tools, enables teachers to easily adapt the course contents to the interests of students, increasing their motivation through learning. In this sense, the required skills for handling real industrial equipment are a necessary competence for the professional development of the students [14]. This competence is mostly acquired through case studies in real environments in order to learn in a realistic way adapting to the necessities of the job position [11]. In these cases, the information technology resources have limited access. This is due to the scarce development of the existing simulation applications, as well as the complexity of the development and adaptation of these applications to particular jobs. For this reason, most of the learning practices in the EHEA are personally carried out in laboratories where real experiments are proposed to learn the required abilities to facilitate the integration of the student into the professional world. In addition, the use of these traditional on-site methodologies also allows teachers to develop more comprehensive monitoring of the progress of their students. In this way, teachers identify individual student problems in order to provide the necessary help to improve their performance [2]. Therefore, it makes it easier for teachers to maintain personalized communication with students, which can significantly improve the quality of education [4]. However, the design of effective digital teaching methodologies can attain similar learning performance to traditional on-site experimentation although the differences in the skills acquired by the students have not yet been sufficiently measured in a common experimentation framework. In this sense, in order to provide numbers on the effectiveness of digital learning, the authors have digitized a practice [10] based on the handling of an industrial vehicle. For this purpose, the laboratory used in the practices has been virtualized for the driving of this vehicle by the students through two types of digital applications: one with Virtual Reality (VR) [9] and the other with smart mobile devices. The modeling of the environment has been included in a Digital Twin (DT) [7], which allows determining the calculation of the position and orientation of an industrial mobile vehicle based on the movement of a joystick. The position and orientation data of the vehicle are transferred by means of a novel digital platform to a graphic engine installed in the students’ applications. With the development of VR and smartphone applications, the objective is to evaluate the transfer of skills [5] reached by the students based on the methodology used (i.e., VR, 2D computer application, and through practice with real equipment in the laboratory). For this purpose, a case study is created
342
J. D´ıez-Gonz´ alez et al.
where students complete a driving challenge with the industrial vehicle. In this way, it is sought to establish which of the three methods confers the greatest performance allowing the student to improve their driving skills. The evaluation of driving skills attained through different learning methodologies in the EHEA also represents a novelty in the scientific literature. This study has been carried out on the subject of process planning in the Mechanical Engineering degree of the University of Le´ on in which topics related to internal logistics are addressed in the subject to optimally handle the plant material necessities. Therefore, revising the experimentation described, the main contributions of this article can be summarized as follows: – The development of a software application for the management of industrial vehicles through the use of a DT. – The development of an application with VR glasses for skills training. – Implementation and evaluation of methodologies in a practice of the process planning subject. The remainder of this paper is organized as follows: Sect. 2 presents the technical implementation of digital applications. Section 3 explains the methodology followed for the evaluation of the learning methodologies. Section 4 shows the results obtained and finally, the conclusions are presented in Sect. 5.
2
Technological Implementation
Recreate laboratory practices in virtual environments requires the modeling of objects as well as their interaction with the environment. To do so, it is necessary to model the objects in order to predict their behavior depending on the environment or other actions. In this sense, DTs model both the environment and the objects allowing them to interact in a realistic way. In our case, the industrial vehicle has been mathematically modeled to be included in the DT. For this purpose, a framework based on Robot Operating System (ROS) [3] has been used to simulate the elements of the industrial vehicle such as engines and other safety sensors [8]. In addition, the laboratory where the practices are carried out has been modeled in this environment so that the environment is as faithful to reality as possible. The DT is connected to a server where the students have access to the joystick control and management. The joystick positions as a function of time are translated into linear and angular velocities which are sent to the DT. With these data, the DT moves the industrial vehicle into the virtual laboratory recreated in the ROS ecosystem. The evaluation of the position and orientation (i.e., pose) is performed by the virtual sensors in the same way as the real vehicle does. The vehicle pose information is sent to the server where it is published so that both computer and VR applications can represent the vehicle in space. To do this, a script has been programmed to read the data from the server in both the computer and VR applications, which have been developed with the Unity
Skills’ Transfer Through Digital Teaching
343
graphics engine. In addition, with this graphic engine, the laboratory, and the industrial vehicle have been modeled with greater realism so that the students have a more immersive experience. Once the environment and the objects with the physics were programmed, they were compiled into applications for various types of devices. For the interaction of the students with the virtual environment, two types of interfaces have been used: VR glasses and mobile devices. The VR goggles used are the Oculus Quest 2 from the target company. On the other hand, for smartphones, the application has been compiled for devices with Android and iOS operating systems, which are the most common ones used by students. Figure 1 shows the dependencies of the different applications to represent how they work.
Fig. 1. Diagrams of the elements used for technological implementation.
3
Methodology
The objective of this work is to evaluate the transfer of skills acquired through digital learning with the students of the subject of process planning in the Mechanical Engineering Degree of the University of Le´ on. For this reason, a methodology is proposed where the three learning methods (i.e., with real equipment, with computer application, and with VR glasses) are compared in a fair way. To measure the transfer of skills, a practice has been designed where students must drive an industrial vehicle in a preset route from an initial position to a
344
J. D´ıez-Gonz´ alez et al.
final goal. In order to evaluate the skills of the students in an objective way the time spent and the number of failures while addressing the challenge have been measured. The same scenario has been modeled for training in the virtual world in order to make the results comparable. In this way, the behavior of the environment and objects in the virtual world will be as close to reality as possible. The study population are the students of the process planning course, which are initially evaluated due to the diversity of the initial skills of each one of them performing the driving challenge. This initial evaluation is performed in a real laboratory environment with real industrial equipment. After the initial assessment, the population will be randomly divided into three groups, one control group (i.e., training with real equipment) and two experimentation groups (i.e., training with virtual reality and training with a computer application). The entire population has been trained for five attempts in order to improve their initial skills regardless of the training method followed. Once the training period was over, we proceeded to the final evaluation in the real environment where the number of failures and the time taken to run the circuit were measured.
4
Results
Real testing was performed in the E3 Laboratory of the School of Engineering at the University of Le´ on, Spain. For this purpose, 54 experimental subjects experienced the three proposed training methodologies (i.e., training with real equipment, training through a computer application, and training through VR goggles) although one of them was the principal methodology with which they did the training phase to improve their AGV driving ability. The described experimentation required the definition of the initial and final skills of each experimental subject to fairly compare the results of the experimental groups since they can have different initial skills while addressing the driving challenge proposed. Thus, we provide in Figs. 2 and 3 a comparison between the mean failures committed while addressing the driving challenge before and after the experimentation and a comparison of the mean total time employed to complete the actual challenge proposed to the students in these two situations. As can be seen in Fig. 2, both VR-trained and the subjects that conduct real equipment experimentation reduce the number of failures committed before and after the training phase. However, the subjects trained through a computer application projected on a screen do not attain a reduction of failures after training. This can be due to the necessity of experiencing immersive training to enable the transfer of the skills acquired from training to reality as proposed in [6]. However, all the training methodologies attain a reduction in the total time employed by their subjects to complete the driving challenge. This is especially remarkable in the subjects of digital training since they do not use real equipment
Skills’ Transfer Through Digital Teaching
345
Fig. 2. Comparison of the mean number of failures committed by each training methodology before and after training.
Fig. 3. Comparison of the mean time employed in seconds to complete the driving challenge divided by training methodology before and after training.
from the initial to the final evaluation. Therefore, we can validate these digital methodologies to achieve an effective transfer of skills from the digital to the real world for the handling of an industrial AGV. The validation of these digital methods produces numerous benefits such as enabling delocalized training or reaching independence from the real equipment for operators’ training which reduces the costs associated with the training process.
346
J. D´ıez-Gonz´ alez et al.
However, the performance improvement in digital training methodologies presents a higher variance than in the training with real equipment. We present these significant differences in the two metrics analyzed (i.e., the difference in the errors committed in the initial and the final evaluation and the relative time improvement to complete the driving challenge) for measuring the skills acquired by the experimental subjects through training. This is presented in Figs. 4 and 5.
Fig. 4. Representation of the distribution of the difference of errors committed by each training methodology between the initial and the final evaluation.
Fig. 5. Relative Time Improvement distribution in the three experimental methodologies.
Skills’ Transfer Through Digital Teaching
347
Figures 4 and 5 show that digital training is not having the same efficiency on human learning as training with real equipment and that the learning depends on the characteristics of the trainee. It could be deduced that there are subjects that present higher effectiveness when learning through digital methodologies which opens a new research direction to analyze which kind of persons are more suitable to receive digital learning which we will address in our future research papers. Even, it is remarkable to highlight that the best performances from the VR-trained group exceed the top marks of the real training both in the reduction of failures and in the relative time improvement which further increases the interest for analyzing in future studies the suitable profiles for digital learning. Another important fact to discuss is that almost all the experimental subjects attain a reduction of times in the final evaluation but a considerable amount of trainees through digital learning increase the number of failures committed in the final evaluation. We notice this during the experiments and asked the experimental subjects about this fact. Many of them told us that they risked more during training since they did not have the same risk perception while driving a virtual robot as when driving the real one and they tended to further explore the limits of the vehicle in the virtual world. This is following the same pattern as the study of Shi et al. [12] that proposed virtual reality to train construction workers. Thus, the increase in the number of failures committed by the digitally trained subjects could also be motivated by their teaching methodology which will also be studied in our future studies. Therefore, the results of the proposed experiment not only validate the effectiveness of digital training methodologies to learn the handle of industrial AGVs or other different industrial equipment but also raise interesting questions for future research. One area of interest concerns the identification of suitable individuals for conducting digital training, as their characteristics and traits may influence the outcomes of such training programs. In addition, an essential aspect to explore further is the disparity in risk perception between driving a virtual robot and driving a real one. To further explore these aspects and analyze the results obtained, it would be valuable to employ explainable artificial intelligence (XAI) techniques. By using XAI, we can gain insight into the behavioral patterns of the participants and understand the underlying factors that influence their decision-making processes. These future efforts promise to improve our understanding of the complex dynamics at learning through digital training and contribute to the advancement of both academia and industry in preparing people for the challenges of the business world in a delocalized way.
5
Conclusions
The transfer of skills in the EHEA is carried out through face-to-face practices in controlled environments. However, these real practices consume considerable human and economic resources, making them difficult to implement them in laboratories. In this scenario, ICT is a tool to facilitate access to internships. This requires the development of applications compatible with multiple interfaces (e.g., virtual reality, smartphone) which simulate the behavior of all the
348
J. D´ıez-Gonz´ alez et al.
elements of the real environment. Nevertheless, when faced with the same application and simulation engine, the user’s experience and learning capacity can be modified by the interfaces used. For this reason, a challenge has been developed to evaluate the skills digitally acquired by the students of a mechanical engineering degree course. In this way, the transfer of skills is compared with the experience of the students using two digital technologies: Virtual Reality and computer application. To obtain the data, a uniform educational challenge has been established so that the results are comparable between the control group and the two experimental groups. The data obtained through this methodology reveal that the experience of the students between the virtual reality and the computer application produces an improvement of their skills through training methods conducted exclusively in the virtual world. Although the transfer of skills with digital methods is lower than with practices in real environments, digital methods provide other advantages such as the delocalization of the students or a decrease of the necessary resources further validating the methodology proposed.
References 1. Ashcroft, L.: Developing competencies, critical analysis and personal transferable skills in future information professionals. Library review (2004) 2. Cho, M.K., Kim, M.K.: Investigating elementary students’ problem solving and teacher scaffolding in solving an ill-structured problem. Int. J. Educ. Math. Sci. Technol. 8(4), 274–289 (2020) 3. Estefo, P., Simmonds, J., Robbes, R., Fabry, J.: The robot operating system: package reuse and community dynamics. J. Syst. Softw. 151, 226–242 (2019) ´ 4. Ferrero-Guill´en, R., D´ıez-Gonz´ alez, J., Verde, P., Alvarez, R., Perez, H.: Table organization optimization in schools for preserving the social distance during the covid-19 pandemic. Appl. Sci. 10(23), 8392 (2020) 5. Kyaw, B.M., Posadzki, P., Paddock, S., Car, J., Campbell, J., Tudor Car, L.: Effectiveness of digital education on communication skills among medical students: systematic review and meta-analysis by the digital health education collaboration. J. Med. Internet Res. 21(8), e12,967 (2019) 6. Makransky, G., Petersen, G.B.: The cognitive affective model of immersive learning (camil): a theoretical research-based model of learning in immersive virtual reality. Educational Psychology Review (2021) ´ 7. Mart´ınez-Guti´errez, A., D´ıez-Gonz´ alez, J., Ferrero-Guill´en, R., Verde, P., Alvarez, R., Perez, H.: Digital twin for automatic transportation in industry 4.0. Sensors 21(10), 3344 (2021) ´ 8. Mart´ınez-Guti´errez, A., D´ıez-Gonz´ alez, J., Verde, P., Ferrero-Guill´en, R., Alvarez, R., Perez, H., Viz´ an, A.: Digital twin for the integration of the automatic transport and manufacturing processes. IOP Conf. Ser. Mater. Sci. Eng. 1193(1), 012,107 (2021) 9. Mustafa Kamal, N.N., Mohd Adnan, A.H., Yusof, A.A., Ahmad, M.K., Mohd Kamal, M.A.: Immersive interactive educational experiences–adopting education 5.0, industry 4.0 learning technologies for malaysian universities. In: Proceedings of the International Invention, Innovative & Creative (InIIC) Conference, Series, pp. 190–196 (2019)
Skills’ Transfer Through Digital Teaching
349
10. Rocca, R., Rosa, P., Sassanelli, C., Fumagalli, L., Terzi, S.: Integrating virtual reality and digital twin in circular economy practices: a laboratory application case. Sustainability 12(6), 2286 (2020) 11. Romm, I., Gordon-Messer, S., Kosinski-Collins, M.: Educating young educators: a pedagogical internship for undergraduate teaching assistants. CBE-Life Sci. Educ. 9(2), 80–86 (2010) 12. Shi, Y., Du, J., Ahn, C.R., Ragan, E.: Impact assessment of reinforced learning methods on construction workers’ fall risk behavior using virtual reality. Autom. Constr. 104, 197–214 (2019) 13. Siddiq, F., Scherer, R.: Is there a gender gap? a meta-analysis of the gender differences in students’ ICT literacy. Educ. Res. Rev. 27, 205–217 (2019) 14. Tomczyk, L : Declared and real level of digital skills of future teaching staff. Educ. Sci. 11(10), 619 (2021) 15. Wang, D., Zhou, T., Wang, M.: Information and communication technology (ICT), digital divide and urbanization: evidence from Chinese cities. Technol. Soc. 64, 101,516 (2021)
Educational Innovation Project in the Field of Informatics Jose Manuel Lopez-Guede1,2(B) , Javier del Valle2 , Ekaitz Zulueta1,2 , Unai Fernandez-Gamiz3 , Josean Ramos-Hernanz1,4 , Julian Estevez1,5 , and Manuel Gra˜ na1,6 1
Computational Intelligence Group, University of the Basque Country (UPV/EHU), San Sebastian, Spain 2 Department of Automatic Control and System Engineering, Faculty of Engineering of Vitoria-Gasteiz, University of the Basque Country (UPV/EHU), C/Nieves Cano 12, 01006 Vitoria-Gasteiz, Spain [email protected] 3 Department of Nuclear Engineering and Fluid Mechanics, Faculty of Engineering of Vitoria-Gasteiz, University of the Basque Country (UPV/EHU), C/Nieves Cano 12, 01006 Vitoria-Gasteiz, Spain 4 Department of Electrical Engineering, Faculty of Engineering of Vitoria-Gasteiz, University of the Basque Country (UPV/EHU), C/Nieves Cano 12, 01006 Vitoria-Gasteiz, Spain 5 Department of Mechanical Engineering, Faculty of Engineering of Gipuzkoa, University of the Basque Country (UPV/EHU), C/Europa Plaza 1, 20018 Donostia, Spain 6 Department of Computer Science and Artificial Intelligence, Faculty of Informatics, University of the Basque Country (UPV/EHU), C/Paseo Manuel de Lardizabal 1, 20018 Donostia, Spain
Abstract. Active Learning is a known and very promising approach to boost the learning processes driven by the students. In this paper a novel Educational Innovation Project is described in order to solve several problems found in two specific subjects related to Computer Architecture. More specifically, the project is based on Research Based Project, a less known paradigm that encourages the students to address the gaps between the theoretical information received in classical lessons and the real practice looking for information in serious scientific sources of information. The project is described, including its objectives, Sustainable Development Goals, transversal competences, improvements to be implemented, teaching-learning methodology and temporal scheduling. Keywords: Education Innovation Project Computer Architecture
1
· Research Based Learning ·
Introduction
Active Learning is a well known and wide paradigm which includes several methods based on the main role of the students in their own learning process [1,2]. One c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 350–357, 2023. https://doi.org/10.1007/978-3-031-42519-6_33
Educational Innovation Project in the Field of Informatics
351
of the most used of these methods is Cooperative Learning, which is a paradigm where the learning activities are planned looking for the positive interdependence between the participants of such learning [3,4]. Despite its good results, it would be convenient to train the students to develop their work investigating original research resources, so an educational innovative project proposed in the field of Computer Architecture (at the University of the Basque Country) has been proposed, and introducing that proposal is the main aim of this paper. The remainder of the paper is organized as follows. Section 2 introduces the justification of the project, giving its objectives, sustainable development goals, transversal competences to be developed and the improvements in the teachinglearning processes to be reached. The scope of the intervention is given in Sect. 3, while Sect. 4 gives the project development proposal. Finally, Sect. 5 explain our main conclusions.
2
Justification of the Project and Description of the Objectives
In the study plan of the degrees, the Computer Architecture subject is terminal in the sense that it is the last subject that deals with issues related to hardware, when really the students at the start of the degree are more oriented towards software. For years it has been detected that this is a problem and makes learning difficult for students. This educational innovation project aims to learn by doing within this context of the COVID-19 pandemic, specifically making CO2 level meters aimed at having them in the center’s classrooms and generating air currents only when necessary and generating the strictly necessary discomfort in the classrooms, since the climate in Vitoria-Gasteiz (North of Spain) is not warm. In this way, the students will see that the contents of the subject are directly applicable to real life. Subsection 2.1 indicates the sustainable development goals incorporated into the proposal, Subsect. 2.2 enumerates the transversal competences which would developed and Subsect. 2.3 compiles the list of improvements that would be introduced in the teaching-learning process if the project is carried out. 2.1
SDGs Incorporated into the Project
The Sustainable Development Goals (SDGs) that are incorporated into the project are the following: – 3. Health and well-being: since it has a direct use in the current pandemic situation, – 4. Quality education: since it is intended that students learn by doing, working with active methodologies, – 10. Reduction of inequalities: since it is intended to implement devices with a much lower price than the commercial ones on the market, facilitating their acquisition by people and institutions with fewer available resources,
352
J. M. Lopez-Guede et al.
– 12. Responsible production and consumption: since the construction of the devices can be carried out using components from other previous designs that are no longer being used, and once the meters are no longer of interest, the discrete components used can be used for other designs later. In this way, recycling is facilitated and responsible consumption becomes a reality. In addition, the environmental footprints will be calculated using the OpenLCA software and the Ecoinvent 3 database, using different calculation methods (ReCiPe, Aware, CML-Baseline, Cumulative Energy Demand and IPCC). 2.2
Transversal Competences Developed by the Project
The transversal competences developed by the project will be as follows: – Autonomy and Self-regulation: since the work will be carried out in a group mode, where self-management will be encouraged, – Social Commitment: since the work to be carried out, even though it is eminently technical, has a clear social vocation in trying to improve people’s health conditions, – Ethics and Professional Responsibility: since we use our own technical knowledge (it will be worked on and improved through this activity) to generate measurement instruments accessible to the general public without obtaining excessive benefits (actually none of economic nature), – Innovation and Entrepreneurship: since the working groups that are formed must investigate the latest technical advances and work proactively to carry out their respective implementations, – Critical Thinking: since the students will have to analyze different possibilities, weighing the pros and cons of each one of them, – Teamwork: since the work to be implemented is developed through cooperative learning techniques that are implemented through work groups, – Oral and written communication: since they are the vehicles through which the results achieved will be communicated to the rest of the class and to the teaching staff. 2.3
Improvements in the Teaching-Learning Processes
Finally, we give a list of the improvements that will be introduced in the teachinglearning processes: – A component clearly focused on the SDGs will be introduced for the first time in the course, – A series of transversal competences will be worked on, – The students will be able to learn part of the specific competences of the subject by doing something real and useful, both for the classroom and for the world outside of it, minutes after having carried out the implementation. This distinguishes this action from the rest of the activities of the subject, since these are expected to be useful in the best of cases at the earliest within the degree, being in any case a future event.
Educational Innovation Project in the Field of Informatics
3
353
Scope of Intervention
In this section authors indicate briefly the subjects and their data on which action would be taken in the case of executing this proposal of project, which are shown in Tables 1 and 2. Table 1. Subject to be re-designed in the 2nd course University
University of the Basque Country
Degrees
Bachelor’s Degree in Computer Management and Information Systems Engineering
Faculty
Faculty of Vitoria-Gasteiz
Department
Department of Systems Engineering and Control
Subject
Computer Architecture
Course year
2nd
Re-designed ECTS 1 ECTS
Table 2. Subject to be re-designed in the 3rd course University
University of the Basque Country
Degrees
Bachelor’s Double Degree in Computer Engineering and Business Management and Administration
Faculty
Faculty of Vitoria-Gasteiz
Department
Department of Systems Engineering and Control
Subject
Computer Architecture
Course year
3rd
Re-designed ECTS 1 ECTS
4
Project Development Proposal
This section is devoted to explain the development of the proposed project. More specifically, Subsect. 4.1 develops the idea of using Research Based Learning as the core of the project, Subsect. 4.2 enumerates the tasks which compose the project and finally, Subsect. 4.3 gives the chronological organization to develop the previous tasks. 4.1
Teaching-Learning Methodology
The group of teachers participating in this project already has some experience using active methodologies, as evidenced by the coordination and participation in several educational innovation projects, as well as the implementation of these and other related activities. For the development of this project, we have decided to use the active Research Based Learning (RBL) methodology. The reasons are as follows: – As far as the project team knows, this methodology has not been used so far in the two degrees in which it will be implemented,
354
J. M. Lopez-Guede et al.
– It is a methodology that allows merging theory and practice, – It is an interesting first approach to the world of research for 2nd and 3rd year students, since it is increasingly common for them to take master’s and doctorate studies. Even in companies it is a profile that is increasingly valued, – Despite its relative novelty compared to other methodologies, it already has been reported a wide range of positive experiences in several fields of knowledge. Regarding the field of Engineering, we also have recent successful experiences. Specifically, and since this project is developed from a technical point of view in the field of computing/electronics/control, there are bibliographic references that report its successful application in the field of Computer Engineering [5] in 2019, from the Electronic Engineering [6] in 2020 and Control Engineering [7] in 2020. These are just some examples, since there are other successful cases of application, – It is also a methodology that has been used in experiences focused on Sustainable Production Engineering [8] in 2015, so we appreciate that it can be a facilitating methodology for the inclusion of SDGs in the project. 4.2
Description of the Actions to Be Developed
Regarding the tasks to be carried out in working groups, they could be summarized as follows: – Task 1: Definitively determine scope and evaluation: In this task, the SDGs and transversal competences to be addressed are definitively determined (which a priori are those specified previously), as well as the way of carrying out the evaluation. For this, all the teachers of the team will be available. It entails the modification of the Teaching Guide, – Task 2: Carry out a workshop on the search for scientific information: In this task, the aim is to train students on the search for proven scientific information, – Task 3: Carry out a workshop on fluid mechanics: Since it is about making a CO2 meter (a fluid), there are specialized teachers in Fluid Mechanics, so that students are trained in this task in that issue, – Task 4: Carry out a workshop on sustainability and energy footprint: In this task, the teacher specialist in SDGs and circular economy is involved in training students in the field of SDG “12. Responsible production and consumption” and environmental footprints, – Task 5: Search for scientific information: In this task, the groups must search for scientific information specific to the task to be carried out with the previous training received. For this task they will have the teachers of the subject, – Task 6: Determine alternatives and choose the best one: In this task, students must reason and with critical thinking, order and classify the options or alternatives detected after searching for information, to end up choosing the most convenient. For this, they will have the teachers of the subject and the specialist in SDGs and in circular economy, taking into account the calculation of the footprint,
Educational Innovation Project in the Field of Informatics
355
– Task 7: Implement the chosen alternative: In this task, the students will carry out the implementation of the chosen alternative, taking into account that it is necessary to collect the necessary material, even purchasing components if they are not obtained through reuse. For this, apart from the teachers of the subject, they will have the help of specialist teachers in Electrical Engineering. There is also the laboratory manager for access to the laboratory and material by students, – Task 8: Communicate the result of the work carried out: The different groups will present the work carried out to the rest of the class through an oral presentation and a demonstration of the developed product, as well as to the teachers of the subject through a written work, serving all for evaluation. It entails the delivery of a follow-up report of the intervention, – Task 9: Evaluate the learning result: With the work already done by the students, the learning result that has taken place in the intervention will be measured. For this, there will be external teaching staff unrelated to the previous tasks, – Task 10: Publish results in a congress: If the whole process turns out as expected, the results of the experience are expected to be published by the teaching members interested in it. It entails the delivery of the final report.
4.3
Work Planning
The general temporary conditions that we have planned the following: – It is expected that the resolution of the call will be known by the end of May of the academic year, – The modification of the teaching guide will be made with a limit of July of the academic year, Table 3. Hours per task for students Task
Time
1 2
1h
3
1h
4
1h
5
5h
6
2h
7
12 h
8
3h
9 10 TOTAL 25 h
356
J. M. Lopez-Guede et al.
– The intervention will be carried out in the first four-month period, so the field work will begin in September of the academic year and end in January of the next academic year, – The monitoring report will be delivered by February of the next academic year, – In analysis of results, it will be delivered by December of the next academic year. The development time of the tasks (time that each student will dedicate to each task) is shown in Table 3. As can be seen, the total time to be dedicated by the students is equivalent to 1 ECTS (25 real hours taking into account classroom and personal work outside the classroom). Finally, the time schedule for each of the tasks is shown in Fig. 1 as a Gantt diagram, indicating the month within each of the two years involved.
Fig. 1. Gantt chart of the project
5
Conclusions
The paper has started giving an introduction about active learning and its advantages, since the main aim of this work is to describe a new project based on it. Such project has been justified stating its objectives, sustainable development
Educational Innovation Project in the Field of Informatics
357
goals to be incorporated, transversal competences to be developed and improvements in the teaching-learning processes to be incorporated. The project development proposal over two subjects of different bachelor’s degrees has also been exposed, detailing the teaching-learning methodology to be used and the actions to be developed by means a detailed work planning. In short, this paper describes the road-map of an educational innovation project which bases its novelty on the research based learning approach and hopes to get promising results after its implementation. Acknowledgments. This work and the described project were supported by the 3a Convocatoria de Proyectos de Innovaci´ on I3KD - I3KD Laborategia, project i3kd22-23, research funds from the University of the Basque Country. Authors have also received support by Fundacion Vitoria-Gasteiz Araba Mobility Lab, as well as by FEDER funds for the MICIN project PID2020-116346GB-I00, by the Basque Government for the Grupo de Inteligencia Computacional, with the grant code IT1689-22, Elkartek projects KK-2022/00051 and KK-2021/00070.
References 1. Bonwell, C., Eison, J.: Active learning: creating excitement in the classroom ASHEERIC higher education report, no. 1 (1991) 2. Felder, R.M., Brent, R.: Cooperative learning in technical courses: procedures, pitfalls, and payoffs (1994) 3. Felder, R., Brent, R.: Effective strategies for cooperative learning. J. Cooper. Collabor. Coll. Teach. 10(2), 69–75 (2001) 4. Felder, R.M., Brent, R.: Active learning: an introduction. ASQ High. Educ. Brief 2(4), 1–5 (2009) 5. Noguez, J., Neri, L.: Research-based learning: a case study for engineering students. Int. J. Interact. Design Manufact. (IJIDeM) 13, 12 (2019) 6. Setiawan, A.W.: Development of research-based learning in introduction to biomedical engineering course for undergraduate electrical engineering students. In: 2020 10th Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), pp. 273–277 (2020) 7. Hern´ andez, A.M.O., Castellanos, T.E., Vega, M.A., Landa, F.A.U., Vasquez, L.V., Guerrero, L.E.V.: Research-based learning to attract students to control engineering. In: 2020 IEEE Global Engineering Education Conference (EDUCON), pp. 1451– 1457 (2020) 8. Blume, S., Madanchi, N., Bohme, S., Posselt, G., Thiede, S., Herrmann, C.: Die lernfabrik - research-based learning for sustainable production engineering. Procedia CIRP 32, 126–131 (2015). 5th Conference on Learning Factories. https://www. sciencedirect.com/science/article/pii/S221282711500195X
Explainable Artificial Intelligence for Education: A Real Case of a University Subject Switched to Python ´ Laura Melgar-Garc´ıa1 , Angela Troncoso-Garc´ıa1 , David Guti´errez-Avil´es2 , Jos´e Francisco Torres1 , and Alicia Troncoso1(B) 1
Data Science & Big Data Lab, Pablo de Olavide University, 41013 Seville, Spain {lmelgar,artrogar,jftormal,atrolor}@upo.es 2 Department of Computer Science, University of Seville, Seville, Spain [email protected]
Abstract. Explainable artificial intelligence aims to describe an artificial intelligence model and its predictions. In this research work, this technique is applied to a subject of a Computer Science degree where the programming language changed from Octave to Python. Experiments are performed to analyze the explainability using the SHapley Additive exPlanations algorithm for XGBoost regressor model (for numerical grade prediction) and XGBoost classifier model (for class grade prediction). After the validation and training process, several conclusions are drawn that validate the idea of changing the programming language to a more popular one such as Python. For example, regarding classification problems, the most important feature for the insufficient class in the Octave courses is the practical exam.
Keywords: Explainable artificial intelligence programming language
1
· university education ·
Introduction
These days, technology and artificial intelligence techniques are in the spotlight of the education industry because of their potential to transform traditional forms of education [22]. ChatGPT is a language model based on Generative Pretrained Transformer, from which it takes its name, that answer users’ questions through a conversation using prompts [9]. A UNESCO guide [19] states that there is a major concern about this technique in higher education in reference to academic integrity. Guidance and recommendations on the ethics of artificial intelligence will also be provided shortly by UNESCO [19]. In this context, it is essential to understand the new directions of the education system. Explainable artificial intelligence in education is increasing in c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. Garc´ıa Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 358–367, 2023. https://doi.org/10.1007/978-3-031-42519-6_34
Explainable Artificial Intelligence for Education
359
popularity [8]. One of its main objectives is to bring the meaning of these techniques closer to whoever may be interested and thus be able to easily understand why an artificial intelligence model proposes one prediction or another [6]. In this research work, an explainable artificial intelligence technique is used to better comprehend the influence of a change in the practical teaching of the Artificial Intelligence subject in a Spanish university. Specifically, the programming language of this subject of the Computer Engineering and Information Systems academic degree was changed from Octave to Python. Therefore, an analysis is carried out on which features most influence the performance of students when working with Octave and Python. It is important to consider that this study is conducted on a limited number of academic courses and students. This research contributes to the literature on machine learning in education [15]. The rest of the article is structured as follows: Sect. 2 presents the current state of the art of artificial intelligence in education and specifically in explainable artificial intelligence. Section 3 describes the methodology of this work. Section 4 discusses the results obtained and Sect. 5 summarizes all the ideas behind the study developed.
2
Related Works
In 2022, the authors of [16] stated that the increasing number of students dropping out of Computer Science university degrees is a worldwide phenomenon. Several approaches are being developed with the main goal of preserving university education. For example, the European Higher Education Area recommended some activities to enhance students’ motivation, such as: active learning methods, ongoing assessments, mentoring or small groups [10]. The latter activity was developed in a Computer Science degree at a technical university in Spain in [11]. Authors concluded that students rated almost all motivational indicators with a three in a one to four scale, which was a large increase. UNESCO identified artificial intelligence as a “potential” contribution to innovating teaching and learning practices by publishing a guide for the education community [18]. In [21] authors reviewed the current state of artificial intelligence approaches in education summarizing that much research remains to be conducted in this area. In addition, they found that educators are unclear about how to use the existing studied techniques to gain a pedagogical advantage in higher education. In this context, in [13] it was confirmed that there were few real implementation studies. An emerging technique in artificial intelligence is explainable artificial intelligence or XAI. The principal goal of XAI is to achieve accurate and explainable or understandable predictions. Explanation in education is very important for both students and teachers’ feedback, as described in [8]. Students’ success in secondary education was studied in [5] with the LIME interpretable algorithm for some classification models as logistic regression, k-nearest neighbors or XGBoost. Using LIME they comprehended what caused the forecasts to be the way they were. A review of XAI techniques for student performance prediction models was presented in [2]. In [4] career counseling for effective decision making using explainable and interpretable machine learning models was studied.
360
3
L. Melgar-Garc´ıa et al.
Methodology
This Section introduces the two main concepts of this research work: explainable artificial intelligence and the XGBoost model in Sects. 3.1 and 3.2. The motivation for this study is described in Sect. 3.3. Finally, an overview of the methodology followed is in Sect. 3.4. 3.1
Explainable Artificial Intelligence
Explainable artificial intelligence, also known as XAI, is a current trend in artificial intelligence (AI). It is used to provide explanations and describe an AI model as well as its results [6]. In addition, using XAI helps to understand and ensure that the system works as expected. SHapley Additive exPlanations or SHAP is a well-known feature relevance approach to explain the outputs of a machine learning model [14]. It is a game theory based strategy that generates local and global explanations. SHAP outlines a specific model’s output considering the involvement of each input in a prediction [1]. 3.2
Models
Decision trees are a popular supervised predictive model for classification and regression problems. The machine learning models used in this research work are based on decision tree ensembles, i.e., a combination of them, usually known as GBDT or Gradient Boosting Decision Trees. Specifically, the chosen algorithm is called XGBoost, an abbreviation of eXtreme Gradient Boosting, which is an implementation of parallel boosted tree algorithms [7]. Each tree of the XGBoost algorithm is trained on a set of the data. The final prediction is a combination of the individual predictions of each tree [3]. XGBoost can be applied to regression and classification problems depending on the target outcome variable. It consists of some important parameters that need to be fine-tuned to obtain the best training model. Some of them are: the learning rate, the maximum depth of the decision trees and the number of estimators (number of trees in the ensemble) [3]. 3.3
Motivation: Change of Programming Language
The motivation for conducting this study is to draw conclusions about the behavior of students in a subject of the Computer Engineering and Information Systems degree. In particular, the practical classes of this subject were modified from the Octave programming language to the Python programming language. The programming language of the subject was changed due to the fact that Python is a very popular programming language in the recent days, as shown by the TIOBE Index [17]. This index is an indicator of the popularity of programming languages. In April 2023, the first ranked programming language according
Explainable Artificial Intelligence for Education
361
to TIOBE is Python, maintaining the same position as a year earlier. However, Octave is not even among the top 100 programming languages in the TIOBE Index. Octave is almost fully compatible with Matlab, apart from specific and technical toolbox [20], which ranks as the 14th programming language in April 2023 and the 20th in April 2022. This change is motivated not only by the current popularity of Python, but also by its versatility, its open source nature and its simplicity of coding and interpretation. 3.4
Overview of the Study
This research is divided into four independent experiments. Individual conclusions are drawn from each experiment to discuss a common final idea about the change of the programming language in the studied subject. The experiments developed are as follows: – E1 : Academic years 2018-2019, 2019-2020 and 2020-2021. The subject was taught in Octave programming language. XGBoost for regression is applied considering the target value as the final numerical grade achieved. – E2 : Academic years 2021-2022 and 2022-2023. The course was taught in Python programming language. XGBoost for regression is applied considering the target value as the final numerical grade achieved. – E3 : Academic years 2018-2019, 2019-2020 and 2020-2021. The subject was taught in Octave programming language. XGBoost for classification is applied considering the target value as the final categorical grade achieved, i.e., insufficient (numerical grade lower than 5/10), sufficient (numerical grade between 5/10 and 6.9/10), notable (numerical grade between 7/10 and 8.9/10) and outstanding (numerical grade higher than 9/10). – E4 : Academic years 2021-2022 and 2022-2023. The subject was taught in Python programming language. XGBoost for classification is applied considering the target value as the final categorical grade achieved. The categorical grades are the same as those defined in E3. Depending on the chosen experiment, the selected output target is either composed of numerical grades and the XGBoost is a regression problem (E1 and E2 ) or it is composed of class grades and the XGBoost is a classification problem (E3 and E4 ). Then, the entire dataset is split randomly into 70% and 30% for training and test data respectively. The training phase of the corresponding XGBoost model is performed by a Cross Validation process in which the parameters are adjusted considering a number of folders for the training data. Once the best parameters are selected, the evaluation metrics of the trained model for the unseen test data are computed. Finally, the explainability phase of the methodology is initiated. In particular, the SHAP algorithm is applied to the trained model for the training dataset. Therefore, each instance has its corresponding explainability features that must be studied to understand and comprehend both the trained model and the predicted instances. The main steps of the methodology followed in this research work can be found in Algorithm 1.
362
L. Melgar-Garc´ıa et al.
Algorithm 1: Overview of the methodology (70% datatraining ,30% datatest ) ← Split dataset trainedmodel ← GridSearch CrossValidation(XGBoost, datatraining ) evaluationmetrics ← trainedmodel (datatest ) Explainability features of each instance ← SHAP ( trainedmodel , datatraining )
4
Results
This Section describes the results obtained. In particular, the dataset used is presented in Sect. 4.1 and the discussion of the results is in Sect. 4.2. 4.1
Dataset Description
The subject of study of this research is Artificial Intelligence. It is a compulsory subject in the last year of the Computer Engineering and Information Systems academic degree at the Pablo de Olavide University in Spain. Prior to the final year of the program, students have three full academic years in which they learn different programming languages. During the first year, they focus on learning the fundamentals of programming through C and Java. Afterwards, they study Matlab and R among other programming languages. The Artificial Intelligence subject focuses on the main concepts of a basic artificial intelligence and machine learning model. Specifically, the course begins by introducing the cost function and gradient descent algorithm. Then, the algorithm is fitted to different models during the development of the subject, such as linear regression, logistic regression, neural networks or clustering. The course is evaluated with four different exams: practical midterm, theoretical midterm, practical final and theoretical final. Students must obtain a minimum grade of three out of ten in the average grade between the two practical tests and also a minimum of three out of ten in the average grade between the two theoretical tests. The final grade is then computed as the average of the four exams. The dataset used consists of five academic years: 2018-2019, 2019-2020, 20202021, 2021-2022 and 2022-2023. During the first three years practical assignments were taught in the Octave programming language. During the last two academic years, the programming language used in the course has been Python. The motivation for this change of programming language is explained in Sect. 3.3 and the description of the four experiments carried out is in Sect. 3.4. Therefore, the dataset contains five features, i.e., the four exams and the academic year. In this research work, the dataset is preprocessed so that it does not include students who did not attend the subject exams. Table 1 describes with mean and standard deviation metrics the grades achieved by the students for the academic courses taught in Octave, Python and both.
Explainable Artificial Intelligence for Education
363
Table 1. Summary of average and standard deviation (in parentheses) grades Octave Practical midterm
Python
Both
7.68(± 1.76) 5.65(± 2.44) 6.82(± 2.29)
Theoretical midterm 6.17(± 1.77) 5.18(± 2.74) 5.75(± 2.28)
4.2
Practical final
5.76(± 2.92) 5.68(± 2.41) 5.73(± 2.71)
Theoretical final
4.63(± 2.45) 4.07(± 2.86) 4.39(± 2.64)
Final grade
5.90(± 1.59) 5.17(± 1.79) 5.59(± 1.71)
Discussion of the Results
This Section discusses the results obtained after applying SHAP modeling to the XGBoost regressor and classifier algorithms. First, the complete dataset is divided randomly, ensuring reproducibility and considering stratification, into approximately 70% for the training and the remaining for the test set. Then, grid search cross validation is performed on the training set to fit the models described in Sect. 3.2. Specifically, four cross validation folds are used. The learning rate, number of estimators and maximum depth of the models are tuned. Experiments E1 and E2 : Target is Numerical. The developed XGBoost regression model is mainly aimed at predicting the final numerical grade of the students considering the dataset characteristics. This experiment is subdivided into two different ones, i.e., coding program is Octave and coding program is Python. These experiments correspond to E1 and E2 defined in Sect. 3.4 respectively. Regarding E1, the mean absolute error [12] metrics obtained after the validation process are 0.05 for the whole training (including the validation set) and 0.26 for the test set. The explained variance scores achieved demonstrate, as well, the good performance of the model by obtaining a 0.98 for the training and a 0.94 score for the test set, considering the best possible score is 1.0. Regarding E2, the mean absolute error is 0.03 for the training and 0.35 for the test. The explained variance scores are 0.98 for the training and 0.93 for the test. Figures 1a and e show the bar charts of the influence of each feature with respect to the mean of its absolute SHAP value for the training set of E1 and E2 respectively. The features are ordered by their importance in the model predictions. The theoretical final test is the most important feature in both experiments. However, the influence of the theoretical midterm exam is much greater in the academic years when the subject was taught in Python. One hypothesis in this regard may be that students found it more difficult to study Octave than Python, which made them focus more on the practical final exam of Octave. The same inferences can be drawn by looking at Fig. 1c which represent the summary plot of the impact on the model output considering the SHAP value for the training set of the E1. For example, low theoretical final has a negative impact on the output of the model, i.e., lower final grades are obtained.
364
L. Melgar-Garc´ıa et al.
Figures 1b, d and f are the SHAP feature importance representation for the test set of the mentioned experiments. The test set behaviors are similar to those of the training set. Therefore, the prediction models are validated not only with the evaluation metrics detailed above, but also with these figures.
(a) Bar chart for E1 training
(b) Bar chart for E1 test
(c) Summary plot for E1 training
(d) Summary plot for E1 test
(e) Bar chart for E2 training
(f) Bar chart for E2 test
Fig. 1. SHAP graphs for E1 and E2 experiments
Experiments E3 and E4: Target Is Letter. Experiments E3 and E4 correspond to the XGBoost classifier model. In these cases, the main goal is to classify the final grades of the students into four categories: insufficient, sufficient, notable and outstanding. E3 refers to the period when the subject was taught in Octave and E4 to when it was taught in Python. The accuracy achieved for E3 is 0.90 for the training and 0.71 for the test set. For the E4 the accuracy is 0.91 and 0.8 for the training and test sets respectively. Summary bar charts of the importance of each feature to the absolute SHAP value for the training sets are in Figs. 2a and b for E3 and E4. In general terms, the same conclusions as in the regression problems can be deduced by looking at these Figures, i.e., both practical exams (midterm and final) are more important features when the course was taught in Octave than when it was taught in Python. Therefore, it can be said that students understood Python more easily than Octave. Figures 2c and d are the representations of the test sets. Their behaviours are similar to those of the respective training set, thus providing another way of model validation.
Explainable Artificial Intelligence for Education
(a) Bar chart for E3 training
(b) Bar chart for E3 test
(c) Summary plot for E4 training
(d) Summary plot for E4 test
365
Fig. 2. SHAP graphs for E3 and E4 experiments
Tables 2a and b present the importance features of the SHAP values for the entire dataset of the E3 and E4. The feature with the highest influence for the insufficient class in the experiment E3 is the practical midterm exam, while in E4 it is the theoretical final exam. This means that the exam that caused the students fail the most was the practical midterm for the course taught in Octave. For the sufficient class, the most influential feature are the theoretical final exam for E3 and the theoretical midterm for E4. For the notable class, it is the theoretical final for E3 and the theoretical midterm for E4. The outstanding grade is only achieved in the academic years with Octave and it is only influenced by the theoretical final exam shown in Figs. 2a and b and Table 2a. Table 2. Importance features according to SHAP values for the whole dataset Exam
Insufficient Sufficient Notable Outstanding
Practical midterm
0.051
0.041
0.0
0.0
Theoretical midterm 0.006
0.014
0.008
0.0
Practical final
0.008
0.044
0.029
0.0
Theoretical final
0.047
0.067
0.097
0.011
(a) E3 experiment Exam
Insufficient Sufficient Notable
Practical midterm
0.042
0.019
0.017
Theoretical midterm 0.026
0.055
0.049
Practical final
0.0
0.015
0.015
Theoretical final
0.081
0.042
0.016
(b) E4 experiment
366
5
L. Melgar-Garc´ıa et al.
Conclusions
Adapting contents of academic classes over the years is essential to increase students’ motivation and give them a realistic view of the industry outside of the educational system. Implementing the above, the programming language of the Artificial Intelligence course belonging to the last year of an academic degree of a Spanish university has been modified. Prior to the modification, the practical assignments and tests were taught in Octave and then switched to Python, one of the most popular programming languages today. Among other reasons, Python is easy to interpret, easy to code and open source. This research work has analyzed through explainable artificial intelligence with XGBoost and SHAP the success of students in this subject. The three academic years in which Octave was taught were compared with the two years in which Python was taught. The analysis has shown that theoretical exams have been more important features than practical exams in courses with Python, leading to the interpretation that students understood Python more easily and therefore they focused more on the theoretical tasks than on the practical ones. In addition, the feature with the highest importance for the insufficient class for the courses with Octave has been the practical midterm exam. However, for the Python courses, the feature with the highest importance for the insufficient class has been the theoretical final exam. The educators of the subject propose to keep its contents as up to date as possible with the reality of the industry sector. In addition, we expect to conduct a more detailed study taking into account not only the students’ performance but also their responses to the subject’ surveys over the years. Acknowledgments. The authors would like to thank the Spanish Ministry of Science and Innovation for the support under the projects PID2020-117954RB-C21 and TED2021-131311B-C22 and the European Regional Development Fund and Junta de Andaluc´ıa for projects PY20-00870 and UPO-138516.
References 1. Saranya A., Subhashini R.: A systematic review of explainable artificial intelligence models and applications: Recent developments and future trends. Decis. Anal. J. 7, 100230 (2023) 2. Alamri, R., Alharbi, B.: Explainable student performance prediction models: a systematic review. IEEE Access 9, 33132–33143 (2021) 3. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794. KDD 2016, Association for Computing Machinery, New York, NY, USA (2016) 4. Guleria, P., Sood, M.: Explainable AI and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Educ. Inf. Technol. 28, 1081–1116 (2022) 5. Hasib, K.M., Rahman, F., Hasnat, R., Alam, M.G.R.: A machine learning and explainable AI approach for predicting secondary school student performance. In: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0399–0405 (2022)
Explainable Artificial Intelligence for Education
367
6. IBM: Explainable AI (XAI). https://www.ibm.com/watson/explainable-ai. Accessed 30 Apr 2023 7. Jas, K., Dodagoudar, G.: Explainable machine learning model for liquefaction potential assessment of soils using xGBoost-SHAP. Soil Dyn. Earthq. Eng. 165, 107662 (2023) 8. Khosravi, H., et al.: Explainable artificial intelligence in education. Comput. Educ. Artif. Intell. 3, 100074 (2022) 9. Lo, C.K.: What is the impact of chatGPT on education? A rapid review of the literature. Educ. Sci. 13(4), 410 (2023) 10. L´ opez, M.A.R.: European higher education area-driven educational innovation. Procedia. Soc. Behav. Sci. 237, 1505–1512 (2017) 11. L´ opez-Fern´ andez, D., Tovar, E., Raya, L., Marzal, F., Garcia, J.J.: Motivation of computer science students at universities organized around small groups. In: IEEE Global Engineering Education Conference (EDUCON), pp. 1120–1127 (2019) 12. Melgar-Garc´ıa, L., Guti´errez-Avil´es, D., Rubio-Escudero, C., Troncoso, A.: Identifying novelties and anomalies for incremental learning in streaming time series forecasting. Eng. Appl. Artif. Intell. 123, 106326 (2023) 13. Misiejuk, K., Wasson, B.: State of the field report on learning analytics. In: Center for the Science of Learning and Technology, pp. 1–76 (2017) 14. SHAP. https://shap.readthedocs.io/en/latest/index.html. Accessed 30 Apr 2023 15. Swamy, V., Du, S., Marras, M., K¨ aser, T.: Trusting the explainers: teacher validation of explainable artificial intelligence for course design (2023) 16. Takacs, R., K´ ar´ asz, J.T., Tak´ acs, S., Horv´ ath, Z., Attila, O.: Successful steps in higher education to stop computer science students from attrition. Interchange 53, 1–16 (2022). https://doi.org/10.1007/s10780-022-09476-2 17. TIOBE: Index for April 2023. https://www.tiobe.com/tiobe-index/. Accessed 30 Apr 2023 18. UNESCO Digital learning and transformation of education: artificial intelligence in education. https://www.unesco.org/en/digital-education/artificial-intelligence. Accessed 30 Apr 2023 19. UNESCO Education 2030: ChatGPT and artificial intelligence in higher education. https://unesdoc.unesco.org/ark:/48223/pf0000385146. Accessed 30 Apr 2023 20. Octave: About GNU Octave. https://octave.org/about. Accessed 30 Apr 2023 21. Zawacki-Richter, O., Mar´ın, V.I., Bond, M., Gouverneur, F.: Systematic review of research on artificial intelligence applications in higher education – where are the educators? Int. J. Educ. Technol. High. Educ. 16, 39 (2019). https://doi.org/10. 1186/s41239-019-0171-0 22. Zhang, K., Aslan, A.B.: AI technologies for education: recent research & future directions. Comput. Educ. Artif. Intell. 2, 100025 (2021)
Author Index
A Ako-Nai, Frederick 306 Alberola, Juan M. 248 Álvarez, Rubén 340 Álvarez-Aparicio, Claudia 69 Andrysiak, Tomasz 122 Arce, Elena 238
E Escanez-Exposito, Daniel Estevez, Julian 350
F Fernández-Becerra, Laura 3 Fernandez-Gamiz, Unai 350 Fernández-Llamas, Camino 69 Ferrero-Guillén, Rubén 340 Frutos-Bernal, E. 187 Fúster-Sabater, A. 14
B Basurto, Nuño 167 Bellas, Francisco 228 Bernal, E. Frutos 157 Botti, Vicent 248 Bringas, Pablo García 111 Bu, Seok-Jun 132 C Caballero-Gil, Pino 101 Calvo-Rolle, José Luis 49, 147 Campazas-Vega, Adrián 59, 69 Carranza-García, Manuel 319 Casado-Vara, Roberto 147, 167 Cho, Sung-Bae 132 Chora´s, Michał 79, 91 Cordero, M. Maldonado 157 Crespo-Martínez, Ignacio Samuel
101
G García-Pérez, Lía 207 González-Díez, Irene 261, 272, 279 González-Santamarta, Miguel A. 3 Graña, Manuel 350 Guerreiro, Sara 228 Guerrero-Higueras, Ángel Manuel 3, 59, 69 Gutiérrez-Avilés, David 358 H Heras, Adrián 248 Hernández Guillén, Jose Diamantino Herrero, Álvaro 167 59, 69
D de la Cal Marin, Enrique 306 de la Puerta, José Gaviria 111 del Rey, A. Martín 157 del Rey, Angel Martín 147, 167 Del Río, Francisco Borja Garnelo 23 del Valle, Javier 350 Díaz-Longueira, Antonio 49, 238 Díez-González, Javier 340 Divasón, Jose 329
J Jeffrey, Nicholas 37 Jiménez-Navarro, Manuel J. 319 Jove, Esteban 147, 238 Juan-González, Nerea 340 Julián, Vicente 248 K Klikowski, Jakub 91 Komorniczak, Joanna 79, 91 Kordík, Pavel 217 Kozik, Rafał 79, 91
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. García Bringas et al. (Eds.): CISIS 2023/ICEUTE 2023, LNNS 748, pp. 369–370, 2023. https://doi.org/10.1007/978-3-031-42519-6
177
370
Author Index
Ksieniewicz, Paweł 79 Kuznetsov, Stanislav 217
Romero, Ana 329 Romero, Óscar Fontenla
L Lera, Francisco J. Rodríguez 3, 23 Llamas, Camino Fernández 23 Llamazares-Elías, Samir 197 Lopez-Guede, Jose Manuel 350 Luna-Romera, José María 319
S Sáenz-de-Cabezón, Eduardo 329 Saganowski, Łukasz 122 Sáiz-Manzanares, María Consuelo 261, 272, 279, 294 Saldaña, Carmina 272 Sánchez-Anguix, Victor 248 Santos, Matilde 207 Severt, Marcos 147, 167 Sobrín-Hidalgo, David 3
M Maldonado, R. Macías 157 Martín del Rey, A. 187 Martín del Rey, Ángel 177 Martínez-Ballesteros, María 319 Martínez-de-Pisón, Francisco Javier 329 Martínez-Gutiérrez, Alberto 340 Melgar-García, Laura 358 Méndez-Busto, Daniel 238 Michelena, Álvaro 49, 238 Miranda-Garcia, Alberto 111 N Navarro-Cáceres, Juan José 59 O Olivera, Vicente Matellán
3, 23
P Pastor-López, Iker 111 Pawlicka, Aleksandra 79 Pawlicki, Marek 79 Pazo-Robles, M. E. 14 Prieto, Abraham 228 Q Quintián, Héctor
49
T Tan, Qing 37, 306 Timiraos, Míriam 49, 238 Tocino, Angel 197 Torres, José Francisco 358 Troncoso, Alicia 358 Troncoso-García, Ángela 358 U Urda, Daniel 167 Urquijo, Borja Sanz
111
V Varela Vázquez, Carmen 261 Varela, Carmen 272, 279 Vega-Márquez, Belén 319 Verde, Paula 340 Villar, José R. 37 W Wojciechowski, Szymon
91
49, 147, 238
R Ramos-Hernanz, Josean 350 Rodríguez-Rosa, Miguel 187
Z Zayas-Gato, Francisco ˇ ek 217 Žid, Cenˇ Zulueta, Ekaitz 350
238