141 52 16MB
English Pages 619 [591] Year 2022
Smart Innovation, Systems and Technologies 309
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain Editors
Intelligent Decision Technologies Proceedings of the 14th KES-IDT 2022 Conference
123
Smart Innovation, Systems and Technologies Volume 309
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
Ireneusz Czarnowski · Robert J. Howlett · Lakhmi C. Jain Editors
Intelligent Decision Technologies Proceedings of the 14th KES-IDT 2022 Conference
Editors Ireneusz Czarnowski Gdynia Maritime University Gydnia, Poland Lakhmi C. Jain KES International Selby, UK
Robert J. Howlett ‘Aurel Vlaicu’ University of Arad Arad, Romania Bournemouth University and KES International Research Shoreham-by-Sea, UK
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-19-3443-8 ISBN 978-981-19-3444-5 (eBook) https://doi.org/10.1007/978-981-19-3444-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
KES-IDT 2022 Conference Organization
Honorary Chairs Lakhmi C. Jain, KES International, UK Gloria Wren-Phillips, Loyola University, USA
General Chair Ireneusz Czarnowski, Gdynia Maritime University, Poland
Executive Chair Robert J. Howlett, KES International and Bournemouth University, UK
Program Chairs Jose L. Salmeron, University Pablo de Olavide, Seville, Spain Antonio J. Tallón-Ballesteros, University of Seville, Spain
Publicity Chairs Izabela Wierzbowska, Gdynia Maritime University, Poland Alfonso Mateos Caballero, Universidad Politécnica de Madrid, Spain
v
vi
KES-IDT 2022 Conference Organization
Special Sessions Recent Advances in Data Analysis and Machine Learning: Paradigms and Practical Applications Margarita Favorskaya, Reshetnev Siberian State University of Science and Technology, Russian Federation Nikita Andriyanov, Financial University under the Government of the Russian Federation
Advances in Intelligent Data Processing and Its Applications Margarita Favorskaya, Reshetnev Siberian State University of Science and Technology, Russian Federation M. Sergeev, Saint Petersburg State University of Aerospace Instrumentation, Russian Federation Lakhmi C. Jain, KES International, UK
Blockchains and Intelligent Decision Making Xiangpei Hu, Dalian University of Technology, Dalian, China Zhaojun Han, Dalian University of Technology, Dalian, China
Intelligent Processing and Analysis of Multidimensional Signals and Images Roumen Kountchev, Technical University of Sofia, Bulgaria Lakhmi C. Jain, KES International, UK
Decision Making Theory for Economics Takao Ohya, Kokushikan University, Japan Lakhmi C. Jain, KES International, UK
KES-IDT 2022 Conference Organization
vii
Signal Processing and Pattern Recognition Techniques for Modern Decision-Making Systems Paolo Crippa, Università Politecnica delle Marche, Italy Claudio Turchetti, Università Politecnica delle Marche, Italy
Artificial Intelligence Innovation in Daily Life Hadi Saleh, Vladimir State University. Vladimir, Russian Federation
Decision Techniques for Data Mining, Transportation and Project Management Ireneusz Czarnowski, Gdynia Maritime University, Poland Piotr Jedrzejowicz, Gdynia Maritime University, Poland Dariusz Barbucha, Gdynia Maritime University, Poland
Fake News and Information Literacy: Current State, Open Issues and Challenges in Automatic and AI-based Detection, Generation and Analytical Support Systems Fiammetta Marulli, University of Campania, in Caserta, Italy Lelio Campanile, University of Campania, in Caserta, Italy Laura Verde, University of Campania, in Caserta, Italy
International Program Committee and Reviewers Jair Minoro Abe, Paulista University, Brazil AsMiltos Alamaniotis, University of Texas at San Antonio, USA Ana Maria de Almeida, Iscte Instituto Universitário de Lisboa, Portugal Piotr Artiemjew, University of Warmia and Mazury in Olsztyn, Poland Ahmad Taher Azar, Prince Sultan University, Saudi Arabia Valentina Emilia Balas, Aurel Vlaicu University of Arad, Romania Andrzej Banaszek, West Pomeranian University of Technology, Szczecin, Poland Dariusz Barbucha, Gdynia Maritime University, Poland Alina Barbulescu, Transilvania University of Brasov, Romania
viii
KES-IDT 2022 Conference Organization
Monica Bianchini, University of Siena, Italy Janos Botzheim, Eötvös Loránd University, Hungary Adriana Burlea-Schiopoiu, University of Craiova, Romania Vladimir V. Buryachenko, Reshetnev Siberian State University of Science and Technology, Russian Federation Alfonso Mateos Caballero, Universidad Politécnica de Madrid, Spain Lelio Campanile, University of Campania, in Caserta, Italy Frantisek Capkovic, Slovak Academy of Sciences, Bratislava, Slovakia Giovanna Castellano, University of Bari Aldo Moro, Italy Katarzyna Cheba, West Pomeranian University of Technology, Szczecin, Poland Shyi-Ming Chen, National Taiwan University of Science and Technology, Taiwan Yen-Wei Chen, Ritsumeikan University, Japan Tan Choo Jun, Wawasan Open University, Malaysia Pei Chun Lin, Feng Chia University, China Jerry Chun-Wei Lin, Western Norway University of Applied Sciences, Norway Marco Cococcioni, University of Pisa, Italy Angela Consoli, Department of Defence, Australia Paolo Crippa, Università Politecnica delle Marche, Ancona, Italy Gloria Cerasela Crisan, Vasile Alecsandri University of Bacau, Romania Ireneusz Czarnowski, Gdynia Maritime University, Poland Vladimir Dimitrieski, University of Novi Sad, Serbia Dinu Dragan, University of Novi Sad, Serbia Agnieszka Duraj, Lodz University of Technology, Poland Margarita N. Favorskaya, Reshetnev Siberian State University of Science and Technology, Russian Federation Raquel Florez-Lopez, University Pablo Olavide of Seville, Spain Mayssa Frikha, University of Sfax, Tunisia Keisuke Fukui, Hiroshima University, Japan Mauro Gaggero, National Research Council of Italy, Italy Mauro Gaspari, University of Bologna, Italy Christos Grecos, Arkansas State University, USA Zhaojun Han, Dalian University of Technology, Dalian, China Ralf-Christian Härting, Hochschule Aalen University, Germany Ioannis Hatzilygeroudis, University of Patras, Greece Dawn E. Holmes, University of California, USA Katsuhiro Honda, Osaka Prefecture University, Japan Tzung-Pei Hong, National University of Kaohsiung, Taiwan Mirjana Ivanovic, University of Novi Sad, Serbia Yuji Iwahori, Chubu University, Japan Lakhmi Jain, KES International and Liverpool Hope University, UK Joanna Jedrzejowicz, University of Gdansk, Poland Piotr Jedrzejowicz, Gdynia Maritime University, Poland Dragan Jevtic, University of Zagreb, Croatia Nikos Karacapilidis, University of Patras, Greece Radosław Katarzyniak, Wrocław University of Science and Technology, Poland
KES-IDT 2022 Conference Organization
ix
Petia Koprinkova-Hristova, Bulgarian Academy of Sciences, Sofia Roumen Kountchev, Technical University of Sofia, Bulgaria Aleksandar Kovaˇcevi´c, University of Novi Sad, Serbia Boris Kovalerchuk, Central Washington University, USA Marek Kretowski, Bialystok University of Technology, Poland Vladimir Kurbalija, University of Novi Sad, Serbia Kazuhiro Kuwabara, Ritsumeikan University, Japan Giorgio Leonardi, Università del Piemonte Orientale, Italy Ivan Lukovi´c, University of Belgrade, Serbia Fiammetta Marulli, University of Campania, in Caserta, Italy Lyudmila Mihaylova, The University of Sheffield, UK Polina Mihova, New Bulgarian University, Sofia, Bulgaria Yasser Mohammad, Assiut University, Egypt Mikhail Moshkov, KAUST, Saudi Arabia Tetsuya Murai, Chitose Institute of Science and Technology, Japan Vesa A. Niskanen, Vytautas Magnus University, Kaunas Agnieszka Nowak-Brzezi´nska, University of Silesia, Poland Takao Ohya, Kokushikan University, Japan Eugenio Oliveira, University of Porto, Portugal Mrutyunjaya Panda, Utkal University, Bhubaneswar, India Petra Perner, FutureLab Artificial Intelligence IBaI-2 Radeberg, Germany Gloria Phillips-Wren, Sellinger School of Business and Management Loyola University Maryland, USA Gian Piero Zarri, Sorbonne University, LaLIC/STIH Laboratory Anitha S Pillai, Hindustan Institute of Technology and Science, Chennai, India Camelia Pintea, TU Cluj-Napoca, Romania Bhanu Prasad, Florida A&M University, Tallahassee, Florida, USA Radu-Emil Precup, Politehnica University of Timisoara, Romania Jim Prentzas, Democritus University of Thrace, Greece Małgorzata Przybyła-Kasperek, University of Silesia in Katowice, Poland Marcos G. Quiles, Federal University of Sao Paulo (UNIFESP), Brazil Hana Rabbouch, University of Tunis, Tunisia Ewa Ratajczak-Ropel, Gdynia Maritime University, Poland Alvaro Rocha, ISEG, University of Lisbon, Portugal Foued Saadaoui, King Abdulaziz University, Jeddah, Saudi Arabia Anatoliy Sachenko, West Ukrainian National University, Ukraine Hadi Saleh, Vladimir State University. Vladimir, Russian Federation Jose L. Salmeron, Computational Intelligence Lab, University Pablo de Olavide, Spain Mika Sato-Ilic, University of Tsukuba, Japan Milos Savic, University of Novi Sad, Serbia Md Shohel Sayeed, Multimedia University, Malaysia Mikhail Sergeev, Saint Petersburg State University of Aerospace Instrumentation, Russian Federation Marek Sikora, Silesian University of Technology, Gliwice, Poland
x
KES-IDT 2022 Conference Organization
Aleksander Skakovski, Gdynia Maritime University, Poland Margarita Stankova, New Bulgarian University, Bulgaria Catalin Stoean, University of Craiova, Romania Masakazu Takahashi, Yamaguchi University, Japan Shing Chiang Tan, Multimedia University, Malaysia Dilhan Thilakarathne, ING Bank, Netherlands Shusaku Tsumoto, Shimane University, Japan Claudio Turchetti, Università Politecnica delle Marche, Italy Eiji Uchino, Yamaguchi University, Japan Marco Vannucci, Scuola Superiore Sant’Anna, Italy Laura Verde, University of Campania, in Caserta, Italy Fen Wang, Central Washington University, USA Hu Xiangpei, Dalian University of Technology, China Yoshiyuki Yabuuchi, Shimonoseki City University, Japan Mariko Yamamura, Radiation Effects Research Foundation, Japan Dmitry Zaitsev, Odessa State Environmental University, Ukraine Cecilia Zanni-Merk, Normandie Université, INSA Rouen, LITIS, France Lindu Zhao, Southeast University, China Beata Zielosko, University of Silesia in Katowice, Poland Alfred Zimmermann, Reutlingen University Faculty of Informatics Doctoral Program Services Computin, Germany Aleksandr G. Zotin, Reshetnev Siberian State University of Science and Technology, Russian Federation Sergey V. Zykov, National Research University Higher School of Economics and National Research University MEPhI, Russian Federation
Preface
This volume contains the proceedings of the 14th International KES Conference on Intelligent Decision Technologies (KES-IDT 2022). The conference was held in Rhodes, in a hybrid mode on June 20–22, 2022, due to the COVID-19 pandemic restrictions. The KES-IDT is an international annual conference organized by KES International. The KES Conference on Intelligent Decision Technologies belongs to a sub-series of the KES Conference series. The KES-IDT provides opportunities for the presentation of new research results and discussion under the common title “Intelligent Decision Technologies”. The conference has an interdisciplinary character, giving opportunities to researchers from different scientific areas and different application areas to show how intelligent methods and tools can support the decision-making processes. Every year the KES-IDT conference attracts a number of researchers and practitioners from all over the world. This year the submitted papers have been allocated to the main track and nine special sessions. Each submitted paper has been reviewed by 2–3 members of the International Program Committee and International Reviewer Board. A total of 50 papers were accepted for presentation during the conference and inclusion in the KES-IDT 2022 proceedings. We are very satisfied with the quality of the papers and would like to thank the authors for choosing KES-IDT as the forum for the presentation of their work. We also gratefully acknowledge the hard work of the KES-IDT international program committee members and the additional reviewers for taking the time to review the submitted papers and selecting the best among them for presentation at the conference and inclusion in the proceedings.
xi
xii
Preface
We hope that KES-IDT 2022 significantly contributes to the fulfillment of academic excellence and leads to even greater successes of KES-IDT events in the future. Gydnia, Poland Arad, Romania/Shoreham-by-Sea, UK Selby, UK June 2022
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain
Contents
Part I
Main Track
1
Latent Chained Comments to Retweet Extraction on Twitter . . . . . . Ryusei Takagi and Yasunobu Sumikawa
2
Recommendation Versus Regression Neural Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jesús Bobadilla, Santiago Alonso, Abraham Gutiérrez, and Álvaro González
3
4
3
15
Heuristic Approach to Improve the Efficiency of Maximum Weight Matching Algorithm Using Clustering . . . . . . . . . . . . . . . . . . . Liu He and Ryosuke Saga
25
Digital Cultural Heritage Twins: New Tools for a Complete Fruition of the Cultural Heritage Entities . . . . . . . . . . . . . . . . . . . . . . . . Gian Piero Zarri
37
5
Arabic Speech Processing: State of the Art and Future Outlook . . . Naim Terbeh, Rim Teyeb, and Mounir Zrigui
6
Inter-rater Agreement Based Risk Assessment Scheme for ICT Corporates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto Cassata, Gabriele Gianini, Marco Anisetti, Valerio Bellandi, Ernesto Damiani, and Alessandro Cavaciuti
7
Sequence Classification via LCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Riccardo Dondi
8
Assured Multi-agent Reinforcement Learning with Robust Agent-Interaction Adaptability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joshua Riley, Radu Calinescu, Colin Paterson, Daniel Kudenko, and Alec Banks
49
63
77
87
xiii
xiv
9
Contents
Development of a Telegram Bot to Determine the Level of Technological Readiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darya I. Suntsova, Viktor A. Pavlov, Zinaida V. Makarenko, Petr P. Bakholdin, Alexander S. Politsinsky, Artem S. Kremlev, and Alexey A. Margun
99
10 Design of Multiprocessor Architecture for Watermarking and Tracing Images Using QR Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Jalel Baaouni, Hedi Choura, Faten Chaabane, Tarek Frikha, and Mouna Baklouti 11 NIR-SWIR Spectroscopy and Imaging Techniques in Biomedical Applications—Experimental Results . . . . . . . . . . . . . . . 123 Yaniv Cohen, Ben Zion Dekel, Zafar Yuldashev, and Nathan Blaunstein 12 Four-Valued Interpretation for Paraconsistent Annotated Evidential Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Yotaro Nakayama, Seiki Akama, Jair Minoro Abe, and Tetsuya Murai 13 On Modeling the Insurance Claims Data Using a New Heavy-Tailed Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Abdelaziz Alsubie Part II
Recent Advances in Data Analysis and Machine Learning: Paradigms and Practical Applications
14 Estimation and Prediction of the Technical Condition of an Object Based on Machine Learning Algorithms Under Conditions of Class Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Victor R. Krasheninnikov, Yuliya E. Kuvayskova, and Vladimir N. Klyachkin 15 Automated System for the Personalization of Retinal Laser Treatment in Diabetic Retinopathy Based on the Intelligent Analysis of OCT Data and Fundus Images . . . . . . . . . . . . . . . . . . . . . . . 171 Nataly Ilyasova, Nikita Demin, Aleksandr Shirokanev, and Nikita Andriyanov 16 Development and Research of Intellectual Algorithms in Taxi Service Data Processing Based on Machine Learning and Modified K-means Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Nikita Andriyanov, Vitaly Dementiev, and Alexandr Tashlinskiy 17 Using Machine Learning Methods to Solve Problems of Monitoring the State of Steel Structure Elements . . . . . . . . . . . . . . 193 Maria Gaponova, Vitalii Dementev, Marat Suetin, and Aleksandr Tashlinskii
Contents
xv
Part III Advances in Intelligent Data Processing and Its Applications 18 Accurate Extraction of Human Gait Patterns Using Motion Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Margarita N. Favorskaya and Konstantin A. Gusev 19 Vision-Based Walking Style Recognition in the Wild . . . . . . . . . . . . . . 215 Margarita N. Favorskaya and Vladimir V. Buryachenko 20 Specifics of Matrix Masking in Digital Radar Images Transmitted Through Radar Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Vadim Nenashev, Anton Sentsov, and Alexander Sergeev 21 Matrix Mining for Digital Transformation . . . . . . . . . . . . . . . . . . . . . . . 237 Nikolaj Balonin, Yury Balonin, Anton Vostrikov, Alexander Sergeev, and Mikhail Sergeev 22 Using Chimera Grids to Describe Boundaries of Complex Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Alena V. Favorskaya and Nikolay Khokhlov 23 Ultrasonic Study of Sea Ice Ridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Alena V. Favorskaya and Maksim V. Muratov 24 Technique of Central Nervous System’s Cells Visualization Based on Microscopic Images Processing . . . . . . . . . . . . . . . . . . . . . . . . 269 Alexey Medievsky, Aleksandr Zotin, Konstantin Simonov, and Alexey Kruglyakov 25 Multi-kernel Analysis Method for Intelligent Data Processing with Application to Prediction Making . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Miltiadis Alamaniotis Part IV Blockchains and Intelligent Decision Making 26 Potentials of Blockchain Technologies in Supply Chain Management—Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Ralf-Christian Härting, Nathalie Hoppe, and Sandra Trieu 27 Synchronization of Mobile Grading and Pre-cooling Services . . . . . . 303 Xuping Wang, Yue Wang, Na lin, and Ya Li 28 The Challenge of Willingness to Blockchain Traceability Adoption: An Empirical Investigation of the Main Drivers in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Xueying Zhai and Xiangpei Hu
xvi
Part V
Contents
Intelligent Processing and Analysis of Multidimensional Signals and Images
29 Decorrelation of a Sequence of Color Images Through Hierarchical Adaptive Color KLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Roumen Kountchev and Roumiana Kountcheva 30 Moving Objects Detection in Video by Various Background Modelling Algorithms and Score Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 347 Ivo Draganov and Rumen Mironov Part VI
Decision Making Theory for Economics
31 Equilibrium Search Technique Using Genetic Algorithm and Replicator Dynamics and Its Application to Food Supply Chain Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Natsuki Morita, Hayato Dan, Katsumi Homma, Hitoshi Yanami, Shota Suginouchi, Mizuho Sato, Hajime Mizuyama, and Masatoshi Ogawa 32 Utilization of Big Data in the Japanese Retail Industry and Consumer Purchasing Decision-Making . . . . . . . . . . . . . . . . . . . . . 375 Shunei Norikumo 33 Calculations by Several Methods for D-AHP Including Hierarchical Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Takao Ohya 34 A Model of Evaluations with Its Rewards for Evaluation Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Takafumi Mizuno Part VII
Signal Processing and Pattern Recognition Techniques for Modern Decision-Making Systems
35 Heart Rate Detection Using a Non-obtrusive Ballistocardiography Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Sebastian Rätzer, Maksym Gaiduk, and Ralf Seepold 36 Wireless Communication Attack Using SDR and Low-Cost Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Batoul Achaal, Mohamad Rida Mortada, Ali Mansour, and Abbass Nasser 37 Wearable Acceleration-Based Human Activity Recognition Using AM-FM Signal Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Giorgio Biagetti, Paolo Crippa, Laura Falaschetti, Michele Alessandrini, and Claudio Turchetti
Contents
xvii
38 GpLMS: Generalized Parallel Least Mean Square Algorithm for Partial Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Ghattas Akkad, Viet-Dung Nguyen, and Ali Mansour 39 Siamese Network for Salivary Glands Segmentation . . . . . . . . . . . . . . 449 Gabin Fodop, Aurélien Olivier, Clément Hoffmann, Ali Mansour, Sandrine Jousse-Joulin, Luc Bressollette, and Benoit Clement 40 A Lightweight and Accurate RNN in Wearable Embedded Systems for Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . 459 Laura Falaschetti, Giorgio Biagetti, Paolo Crippa, Michele Alessandrini, Di Filippo Giacomo, and Claudio Turchetti 41 Direction-of-Arrival Based Technique for Estimation of Primary User Beam Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Zeinab Kteish, Jad Abou Chaaya, Abbass Nasser, Koffi-Clément Yao, and Ali Mansour Part VIII Artificial Intelligence Innovation in Daily Life 42 Fuzzy Decision-Making in Crisis Situation: Crowd Flow Control in Closed Public Spaces in COVID’19 . . . . . . . . . . . . . . . . . . . 483 Wejden Abdallah, Oumayma Abdallah, Dalel Kanzari, and Kurosh Madani 43 Perception of 3D Scene Based on Depth Estimation and Point-Cloud Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Shadi Saleh, Shanmugapriyan Manoharan, Julkar Nine, and Wolfram Hardt 44 Machine Learning-Based Crime Prediction . . . . . . . . . . . . . . . . . . . . . . 509 Hadi Saleh, Anastasia Sakunova, Albo Jwaid Furqan Abbas, and Mohammed Shakir Mahmood Part IX Decision Techniques for Data Mining, Transportation and Project Management 45 An Intelligent Decision Support System Inspired by Newton’s Laws of Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Nafe Moradkhani, Frederick Benaben, Benoit Montreuil, Matthieu Lauras, Clara Le Duff, and Julien Jeany 46 Using the AHP Method to Select an Electronic Documentation Management System for Polish Municipalities . . . . . . . . . . . . . . . . . . . 535 Oskar S˛ek and Ireneusz Czarnowski
xviii
Part X
Contents
Fake News and Information Literacy: Current State, Open Issues and Challenges in Automatic and AI-based Detection, Generation and Analytical Support Systems
47 Timed Automata Networks for SCADA Attacks Real-Time Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 Francesco Mercaldo, Fabio Martinelli, and Antonella Santone 48 On the Evaluation of BDD Requirements with Text-based Metrics: The ETCS-L3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Lelio Campanile, Maria Stella de Biase, Stefano Marrone, Mariapia Raimondo, and Laura Verde 49 Break the Fake: A Technical Report on Browsing Behavior During the Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Lelio Campanile, Mario Cesarano, Gianfranco Palmiero, and Carlo Sanghez 50 A Federated Consensus-Based Model for Enhancing Fake News and Misleading Information Debunking . . . . . . . . . . . . . . . . . . . 587 Fiammetta Marulli, Laura Verde, Stefano Marrore, and Lelio Campanile Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
About the Editors
Ireneusz Czarnowski is a Professor at the Gdynia Maritime University. He holds B.Sc. and M.Sc. degrees in Electronics and Communication Systems from the same university. He gained the doctoral degree in the field of computer science in 2004 at Faculty of Computer Science and Management of Poznan University of Technology. In 2012, he earned a postdoctoral degree in the field of computer science in technical sciences at Wroclaw University of Science and Technology. His research interests include artificial intelligence, machine learning, evolutionary computations, multiagent systems, data mining, and data science. He is an Associate Editor of the Journal of Knowledge-Based and Intelligent Engineering Systems, published by the IOS Press, and a Reviewer for several scientific journals. Robert J. Howlett is the Academic Chair of KES International, a non-profit organization that facilitates knowledge transfer and the dissemination of research results in areas including Intelligent Systems, Sustainability, and Knowledge Transfer. He is a Visiting Professor at ‘Aurel Vlaicu’ University of Arad in Romania. His technical expertise is in the use of intelligent systems to solve industrial problems. He has been successful in applying artificial intelligence, machine learning, and related technologies to sustainability and renewable energy systems; condition monitoring, diagnostic tools and systems; and automotive electronics and engine management systems. His current research work is focused on the use of smart microgrids to achieve reduced energy costs and lower carbon emissions in areas such as housing and protected horticulture. Dr. Lakhmi C. Jain Fellow (Engineers Australia) is with the Liverpool Hope University and the University of Arad. He was formerly with the University of Technology Sydney, the University of Canberra, and Bournemouth University, UK. He serves the KES International for providing a professional community the opportunities for publications, knowledge exchange, cooperation, and teaming. Involving around 5000
xix
xx
About the Editors
researchers drawn from universities and companies worldwide, KES facilitates international cooperation and generate synergy in teaching and research. KES regularly provides networking opportunities for professional community through one of the largest conferences of its kind in the area of KES.
Part I
Main Track
Chapter 1
Latent Chained Comments to Retweet Extraction on Twitter Ryusei Takagi and Yasunobu Sumikawa
Abstract Twitter, one of the social networking services, has a retweet function that displays tweets posted by other users on the retweeter’s timeline.As this retweet function spreads information, past studies have been conducted to analyze posts that are spread by many people. However, some Twitter users use the retweet function not only to spread information, but also to discuss their opinions. In these discussions, they tend to do retweet first, followed by multiple tweets of their own opinions. We call the multiple tweets representing the opinions latent comments. In this study, we propose algorithms that collect tweets which talk about their previous retweets and are continuously posted. We have created a novel dataset including latent comments and evaluated our algorithm on the dataset. We have confirmed that the algorithm performs well on the dataset.
1.1 Introduction (Twitter provides the retweet function that allows Twitter users to share information posted by other users by showing the content on the users’ timeline. As this function is effective in delivering information to many users, it is also used for corporate advertising. Indeed, the number of times a tweet has been retweeted is one of the important indicators of popularity [4, 8]. The number of retweets on Twitter is an important measure of popularity; however, as text is the basic way of communicating on Twitter, Twitter users sometimes do retweets to start a discussion on the retweeted contents. in this study, we define latent comments if the following two conditions are satisfied: (1) tweets classified as one of the three types: rule-, direct-, and indirect-based referring tweets related to the R. Takagi · Y. Sumikawa (B) Department of Computer Science, Takushoku University, 815-1 Tatemachi, Hachioji-shi, Tokyo, Japan e-mail: [email protected] R. Takagi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_1
3
4
R. Takagi and Y. Sumikawa
Fig. 1.1 Example of a latent comment
retweeted content, and (2) continuously chained on the user’s timeline. Figure 1.1 shows an example of latent comments. The tweet1 shown in Fig. 1.1a is an original tweet whereas the tweet2 in Fig. 1.1b asserts the user’s opinion to the original tweet. Looking at Fig. 1.1b, the user does the assertion by posting an original tweet instead of posting the tweet as a quote tweet. Several Twitter users post tweets representing their opinions as original tweets as well as the tweet referred to in the example. Therefore, it is necessary to design algorithms for detecting such opinion tweets. In this study, we propose three algorithms for automatically collecting latent comments. The proposed algorithms are designed for the three types of latent comment: (1) rule-based latent comments contain commonly used notations such as “>RT” regardless of the topic, (2) direct-based latent comments contain the same words as the retweeted text, and (3) indirect-based latent comments do not contain the same words as the retweeted text. To check the effectiveness of our algorithm, we collected 3293 tweets posted by news agencies’ official accounts and manually collected 100 test data including latent comments of Twitter users who retweeted the tweets. Using these tweet data, we found that the proposed algorithm achieved a high accuracy of 85.9% as the F1 score. Definitions: We define a tweet set as a latent comment if the content of the tweets is related to the previous retweet and the tweets are continuously posted without including non-latent comments between them. Figure 1.2 shows the concept of the latent comments. The solid and dotted lines indicate the connected tweets as latent comments and non-latent comments, respectively. There are two retweeted news tweets about an Olympic game and a trading event. Tweet1 follows the first retweet about the Olympic game and contains “Opening ceremony”; thus, we regard this tweet as a latent comment. As Tweet2 includes the keyword “Olympic” and follows Tweet1 and Retweet1, this tweet is also a latent comment for Retweet1. The text of the last tweet, Tweet3, is “I will get up early tomorrow”; it is unclear if the tweet points to the Olympic game. In addition, there is a retweet between Retweet1 and Tweet3. We do not consider Tweet3 as a latent comment for Retweet1. 1 2
https://mobile.twitter.com/takeda828/status/1458052542269558786. https://mobile.twitter.com/takeda828/status/1458053054398222341.
1 Latent Chained Comments to Retweet Extraction on Twitter
5
Retweet1
Tweet1
Tweet2
Retweet2
Tweet3
RT Olympic
Opening Ceremony
Olympic champion
RT Trading
I will get up early tomorrow
Fig. 1.2 Examples of chained latent comments
1.2 Related Work 1.2.1 Topic Detection and Tracking Identifying latent comments is to extract tweet chains; thus, this study is one of the topic detection and tracking (TDT) studies. Looking at previous TDT studies, Radinsky and Davidovich proposed an algorithm that extracted news chains representing causal relationships from the descriptions of past events to predict possible future events [10]. Comparing with our study, the past study assumptions are based on long descriptions whereas our study assumptions are based on short descriptions. For TDT studies on Twitter, there are methods to detect and track hot events from online news streams [9], detect global and local hot events using local community detection mechanisms [11], and reporting real-life events to users in the human-readable form [7]. These past events use not only tweet texts but also context information such as trending and geography. By contrast, using this context information is difficult in our study; therefore, we use tweet text only.
1.2.2 Using Explicit Popularity on SNS Twitter is sometimes used as a medium that reflects current trends or influences. Studies have been conducted over the past decade to predict which tweets will receive the most likes and retweets. [5] proposed a model that predicts the number of retweets within a given timeframe. Some recent studies proposed methods to discover influencers on Twitter in addition to the popularity of each tweet, and the number of retweets is used as one of the indicators of influencers [2, 12, 13]. In the aforementioned studies, retweets themselves are considered as an influence. While previous studies consider retweets as an indicator of influence, this study uses them as starting points for discussions and detects their tweet chains.
6
R. Takagi and Y. Sumikawa
1.3 Data Collection 1.3.1 Tweet Collection We collected all tweets by the Twitter official API. Twitter provides three kinds of tweets: tweets, retweets and quote tweets. If a Twitter user posts an original text, it is defined as a tweet. If a tweet is re-posted by a user using Twitter’s official retweet function, it is called a retweet. If a tweet is re-posted with new content, it is called a quote tweet; thus, a quote tweet can be called as a re-tweet with a comment. In this study, we treat all quote tweets as original tweets as they include additional information/text. We collected tweets from official accounts of news agencies to create our dataset. The reason of this selection is twofold. The first one is that the accounts have numerous numbers of tweets. The second one is that the number of people viewing those tweets is also large. These reasons are necessary to perform a detailed analysis including what kind of tweets are latent referring tweets as discussed in Sect. 1.5. We collected news tweets only from Japanese news agencies’ accounts and performed manual dataset creation so that we can properly analyze whether Twitter users’ tweets are references to news. The news agencies we collected were the Yomiuri Shimbun, Asahi Shimbun, NHK, Mainichi Shimbun, and Yahoo News.3 For these accounts, we used user_timeline4 an official Twitter API to retrieve tweets from the news agencies’ accounts.
1.3.2 Latent Comment Candidate Collection In this study, we used two Twitter official APIs, retweeters5 and user_timeline API, to collect the Twitter users who retweeted each news tweet from the news agency tweets and their latent comments. The detailed collection procedure is as follows. First, to collect the tweets that retweeted the news agencies’ tweets, we used retweeters to obtain the user IDs of the users who retweeted them. For these user IDs, we retrieved past tweets using the user_timeline API and collected 10 tweets each before and after the retweet of the news tweet. Table 1.1 shows the statistics of the collected tweets.
3
The names of the news agency accounts we collected are Yomiuri_Online, asahi, nhk_news, mainichi, YahooNewsTopics. 4 https://developer.twitter.com/en/docs/twitter-api/v1/tweets/timelines/api-reference/getstatuses-user_timeline. 5 https://developer.twitter.com/en/docs/twitter-api/v1/tweets/post-and-engage/api-reference/getstatuses-retweeters-ids.
1 Latent Chained Comments to Retweet Extraction on Twitter Table 1.1 Statistics of collected tweet dataset Number of news tweet Number of news articles Number of retweet IDs Period of timestamps Number of news categories
7
3293 3293 2,267,322 24 Oct. 2021 to 2 Mar. 2022 51
1.4 Algorithm Our algorithm determines whether the given tweet is a latent comment to the given retweet or not. If the algorithm decides that the tweet is a latent comment, the algorithm returns true; otherwise, it returns false. If there are other tweets between the two given IDs, the algorithm traces them one by one to determine if each tweet is a comment on the given retweet. For this determination, we propose three types of latent comments: rule-, direct-, and indirect-based comments. The details of identifying chained comment tweets and the 3 types of tweets and the algorithms for detecting them are in the remainder of this section.
1.4.1 Chained Comment Tweet Identification This algorithm determines whether a given tweet is a comment on another given retweet. As a single tweet can only contain 280 characters, Twitter users sometimes post their opinions in multiple tweets. To track such tweets, we retrieve other tweets between a given tweet ID and retweet ID. As the purpose of this retrieving is to reveal posts that divide one’s opinion into multiple posts, it is assumed that there are no other retweets between them. We apply the proposed algorithm recursively to the obtained tweets to analyze whether they are latent comments or not. If we find other retweets posted at this time, then the recursive analysis is terminated and false is returned. We also terminate the recursive analysis if false is obtained by applying the proposed algorithm described in the following sections. If the recursive search visits the retweet ID and determines that it is a latent comment, the algorithm returns true. Once this recursive search returns false, our algorithm considers the given tweet as a non-latent comment.
1.4.2 Rules in Latent Comments On Twitter, there are widely used keywords such as “>RT” when expressing own opinion after a retweet. We consider rule-based latent comments as tweets including
8
R. Takagi and Y. Sumikawa
these keywords in their texts. The algorithm to find this type of tweet is based on the presence of such keywords in the text.
1.4.3 Direct-Based Latent Comments We call tweets direct-based latent comments when the tweets explicitly use words contained in the retweeted content. To collect these tweets, we use the Jaccard coefficient that calculates the similarity between two texts based on the number of words they share. If the score is over a given threshold, we consider the tweets as latent comments. In this study, we set 0.2 as the threshold. The formal definition is as follows: Jaccard(A, B) =
| T A ∩ TB | | T A ∪ TB |
(1.1)
where | · | is the size of the set and T A and TB are the tokens included in tweet A and B, respectively. The higher the score of the measurements, the more correlated they are.
1.4.4 Indirect-Based Latent Comments We call tweets indirect-based latent comments if the tweets use indirect-referring words to refer to the retweets instead of using words contained in the retweeted content. To collect these tweets, we cannot use direct keywords as a filter; thus we perform latent semantic analysis such as LDA [1], LSA [3], and Doc2Vec[6]. We first create feature vectors with the latent semantic analysis models equipped with TF-IDF model. TF-IDF indicates the importance of a word to a document in the dataset. This score is a multiplication of the term frequency and inverse document frequency. Term frequency refers to how frequently each term (word) occurs in each document, whereas the inverse document frequency represents how rarely each term occurs in all documents. The formal definition is as follows. TFIDF(w, d, D) = tf w,d ∗
|D| | {d ∈ D | w ∈ d } |
(1.2)
where tf w,d is the number of times a word w occurs in a document d and | • | is the size of •. The second term of this equation gives the number of all labeled data divided with labeled data including w. We then compute the cosine similarity between the feature vectors of retweets and news agencies’ tweets. We regard the tweet as latent comment if the score of cosine similarity is over 0.5 in this study.
1 Latent Chained Comments to Retweet Extraction on Twitter
9
1.4.5 Overview of Identifying Latent Referring Tweet Algorithm At the beginning, we input two tweet IDs: an original tweet ID and a retweet ID. We then collect other tweets between the given two tweet IDs described in Sec. 1.4.1. Thereafter, we apply morphological analysis, lemmatization, stop word removal and other fundamental natural language processing techniques to create tokens for each tweet. We then sequentially apply the procedures described in Sects. 1.4.2, 1.4.3, and 1.4.4 to analyze if each tweet is a comment on a given retweet. After applying the three processes, we return the result of applying logical OR to all the results obtained.
1.5 Experimental Evaluations 1.5.1 Experimental Setting Dataset. We have created a dataset including latent comments for this study because there was no ground truth dataset for latent comment extraction task. We first collected 3293 news tweets as described in Sect. 1.3.1 from Oct. 24, 2021 to Mar. 2, 2022. We then randomly selected 1000 news tweets and collected tweets posted after retweeting the news tweets. We manually checked whether the tweets are latent comments for retweets. If the tweets were determined as latent comments for the retweets, we combined them as pairs in our dataset. This was performed by two workers, including Ph.D. holder working on machine learning research. Then, the collected pairs were verified on agreement of the two workers, and the verified pairs were kept for the experiment. As results, the dataset contained 50 latent comments. We also added 50 non-latent comments into the dataset. After removing the test data, we trained LDA on the collected tweet data to use the model as an indirect-based latent comment identifier. Except for the correct answer data above, the authors extracted rules from 5000 randomly selected tweets that could be used in a rule-based algorithm. Table 1.2 lists all the keywords obtained from this manual process. These keywords are used in this study.
Table 1.2 Rules used in latent comments >RT →RT (RT) ↓
(continue)
(RT
RT>
RT
RT:
) (RT reference)
We add English words to Japanese to make it easier to understand though we used only Japanese in the experimental evaluation
10
R. Takagi and Y. Sumikawa
Evaluation Criteria. We evaluated the proposed algorithm as a binary classification problem to determine whether the result is a latent comment or not. Therefore, we evaluated the algorithm with micro-average precision (P), recall (R), and F1-score (F1 ), which are widely used to evaluate classification. To better understand the accuracy of the proposed algorithms, we evaluated the prediction results of each algorithm as a multi-class classification problem. As the number of test data is different for each label, we used the macro-averages precision (ma P), recall (ma R), and F1-score (ma F).
1.5.2 Results Q. How accurate is the proposed algorithm in detecting whether a comment is latent or not? A. The proposed algorithm achieved 86% as P, R, and F1 scores. Table 1.3 shows P, R, and F1 scores of four cases where each of the three algorithms proposed in this study is applied and where all results, represented as All, are combined by OR operator. We can see that using all the three algorithms and Direct reference are the best among the four cases. As all the P, R, and F1 scores achieved approximately 86%, we can conclude that our algorithm works well. Q. How accurate is each algorithm for identifying each latent comment type? A. The algorithm of finding each latent comment type achieved a score of approximately 26% to 45% as ma F scores. A. The algorithm combining the three algorithms achieved 75% as a ma F score.
Table 1.4 shows ma P, ma R, and ma F scores for identifying each type of the latent comment by each of the three algorithms and All proposed. This result also indicates
Table 1.3 Micro-average precision, recall, and F-scores of latent comment or not classification by each algorithm P (%) R (%) F1 (%) Rule Direct reference Indirect reference All
52.0 86.0 81.0 86.0
52.0 86.0 81.0 86.0
52.0 85.9 81.0 85.9
1 Latent Chained Comments to Retweet Extraction on Twitter
11
Table 1.4 Macro-average precision, recall, and F-scores of multiclasss classification by each algorithm ma P (%) ma R (%) ma F (%) Rule Direct reference Indirect reference All
37.7 45.0 43.1 95.2
Fig. 1.3 Confusion matrix for binary classification on identifying latent comment or not
49.5 46.3 27.7 73.6
41.7 45.4 26.0 75.3
0.9 0.8
Not comment
0.7 0.6 0.5 0.4 0.3
Comment
0.2 0.1
Not comment
Fig. 1.4 Confusion matrix for prediction results of each algorithm
Comment
1.0
Not comment 0.8
Direct
0.6
Indirect
0.4 0.2
Rule 0.0
Not comment Direct Indirect
Rule
that using all the three algorithms is the best among the four cases. Especially, our algorithm achieved 75% as ma F score. To analyze the features of the proposed algorithm in detail, Fig. 1.3 shows the results of our algorithm in correctly predicting whether a tweet is a latent comment or not. The results show that our algorithm correctly predicted all non-latent comments as non-latent; however, it mispredicted some latent comments as non-latent. To analyze what kind of data was mispredicted, the confusion matrix summarizing the prediction results of each algorithm is shown in Fig. 1.4. We can see that all rulebased latent comments were correctly predicted. By contrast, many indirect-based latent comments were mispredicted as not latent comments. Our dataset includes only 10% indirect-based latent commenting tweets; thus, this misprediction had a small impact on the overall evaluation results. However, for improving accuracy, more sophisticated algorithms can be used to properly discover indirect-based latent comments rather than the LSA used in this study.
12
R. Takagi and Y. Sumikawa
1.5.3 Limitations Although we focused on Japanese news tweets in this study, several Twitter users post latent comments without limiting to news tweets. To analyze the exact overall trend of Twitter users, we need to collect not only news tweets but also all tweets and their retweets. If we can solve the problem of limited use of Twitter API and collecting all tweets or Twitter providing the data, the above bias can be removed with the proposed algorithm. However, currently we cannot solve the issue of data collection; thus we intend to study this issue as a future work.
1.6 Conclusions In this study, we have proposed algorithms to detect latent comments for the retweets. These latent comments are tweets in which Twitter users discuss their opinions after retweeting others’ tweets. The algorithms are designed for detecting three types of latent comments: rule-, direct-, and indirect-based latent comments. To evaluate the effectiveness of our algorithm, we created a test dataset and tested as classification problems. The results showed that we confirmed that our algorithm obtained about 86% micro-averaged and 75% macro-averaged F1 scores. There are two possible directions for future research. The first one is to estimate authentic popularity measures on Twitter. There have been studies using Twitter popularity; however, most of them simply use the number of retweets. However, as the definition proposed in this study, Twitter users sometimes retweet to start discussions or express their opinions. We will combine the algorithm proposed in this study with sentiment analysis to identify whether the latent comments for each retweet are positive or negative. The second one is that we will apply our algorithm to other language tweets as this paper focused only on Japanese tweets.
References 1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993– 1022 (2003). Mar 2. De Salve, A., Mori, P., Guidi, B., Ricci, L., Pietro, R.D.: Predicting influential users in online social network groups. ACM Trans. Knowl. Discov. Data 15(3) (2021) 3. Deerwester, Dumais, S.T., Furnas, G.W., Thomas K.L., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inform. Sci. 41(6), 391–407 (1990) 4. Hong, L., Dan, O., Davison, B.D.: Predicting popular messages in twitter. In: Proceedings of the 20th International Conference Companion on World Wide Web (WWW’11), pp. 57–58. Association for Computing Machinery, New York, NY, USA (2011) 5. Kupavskii, A., Ostroumova, L., Umnov, A., Usachev, S., Serdyukov, P., Gusev, G., Kustarev, A.: Prediction of retweet cascade size over time. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2335–2338. Association for Computing Machinery, New York, NY, USA (2012)
1 Latent Chained Comments to Retweet Extraction on Twitter
13
6. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML’14, Bejing, China, vol. 32, pp. 1188–1196, 22–24 June 2014 7. Lei, Z., Wu, L.D., Zhang, Y., Liu, Y.C.: A system for detecting and tracking internet news event. In: PCM’05, pp. 754–764. Springer, Berlin (2005) 8. Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Bad news travel fast: a content-based analysis of interestingness on twitter. In: Proceedings of the 3rd International Web Science Conference (WebSci’11). Association for Computing Machinery, New York, NY, USA (2011) 9. Qi, Y., Zhou, L., Si, H., Wan, J., Jin, T.: An approach to news event detection and tracking based on stream of online news. 2, 193–196 (2017) 10. Radinsky, K., Davidovich, S.: Learning to predict from textual data. J. Artif. Int. Res. 45(1), 641–684 (2012). Sep 11. Tan, Z., Zhang, P., Tan, J., Guo, L.: A multi-layer event detection algorithm for detecting global and local hot events in social networks. Procedia Computer Sci. 29, 2080–2089 (2014) 12. Zhang, Z., Zhao, W., Yang, J., Paris, C., Nepal, S.: Learning influence probabilities and modelling influence diffusion in twitter. In: Companion Proceedings of the 2019 World Wide Web Conference (WWW’19), pp. 1087–1094. Association for Computing Machinery, New York, NY, USA (2019) 13. Zheng, C., Zhang, Q., Young, S., Wang, W.: On-demand influencer discovery on social media. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM’20), pp. 2337–2340. Association for Computing Machinery, New York, NY, USA (2020)
Chapter 2
Recommendation Versus Regression Neural Collaborative Filtering Jesús Bobadilla , Santiago Alonso , Abraham Gutiérrez , and Álvaro González
Abstract Neural Collaborative Filtering recommendations are traditionally based on regression architectures (returning continuous predictions, e.g. 2.8 stars), such as DeepMF and NCF. However, there are advantages in the use of collaborative filtering classification models. This work tested both neuronal approaches using a set of representative open datasets, baselines, and quality measures. The results show the superiority of the regular regression model compared to the regular classification model (returning discrete predictions, e.g. 1–5 stars) and the binary classification model (returning binary predictions: recommended, non-recommended). Results also show a similar performance when comparing our proposed recommendation neural approach with the state-of-the-art neural regression baseline. The key issue is the additional information the recommendation approach provides compared to the regression model: While the regression baseline only returns the recommendation values, the proposed recommendation model returns value, probability pairs. Extra probability information can be used in the recommender systems area for different objectives: recommendation explanation, visualization of results, quality improvements, mitigate attack risks, etc.
2.1 Introduction Recommender systems (RS) are the key field of artificial intelligence (AI) in the AI personalization area. RS are usually classified according to the filtering approach: demographic [1], context-aware [10], content-based [6], social [15], collaborative filtering [13], and ensembles [5]. The most accurate RS implement ensembles combining several of the mentioned filtering strategies. Collaborative filtering (CF) is the most adequate approach to obtain accurate predictions and recommendations from RS. The first approaches to CF were based on the K-Nearest Neighbors (KNN) algorithm [17], where a variety of similarity measures were proposed to address the J. Bobadilla · S. Alonso (B) · A. Gutiérrez · Á. González ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_2
15
16
J. Bobadilla et al.
sparsity problem of CF. Soon, machine learning models proved to be more accurate than the KNN algorithm. In particular, the Matrix Factorization model provides accurate prediction results in its regular formulation: Probabilistic Matrix Factorization (PMF) [17]. The non-Negative Matrix Factorization (NMF) [7] model reaches quality results similar to the PMF one, and it allows to make some semantic interpretation of its hidden factors. The evolution of CF RS research now points to deep learning models, as they improve matrix factorization models [14]. When Deep Learning (DL) architectures are applied to the CF field, it is possible to benefit from their flexibility and the existing models in other areas such as the generative one and NLP. DL architectural flexibility allows us to tackle CF in a shallow or deep way [9], integrate Natural Language Processing (NLP) capabilities [11], and even make use of generative learning [3, 8]. Deep Matrix Factorization (DeepMF) [16] was the first remarkable DL model for CF. The Neural Collaborative Filtering Architecture (NCF) [9] collects all embedding outputs by a ‘Concatenate’ layer, and then processes them using a MLP followed by a regression output layer (linear or ReLu activation function, since there are no negative ratings). As we may see, NCF is a regression model. Since recommending is, in essence, classifying, the relevant question here is Would it not be more reasonable to use a neural classification model to make recommendations, rather than a regression one? To answer this question, we must dive into the CF recommendation alternatives; There are three main approaches to tackle a CF recommendation (Fig. 2.1): (1) binary, (2) based on categorical rating and (3) based on categorical items. Binary classification neural architectures can be easily obtained from the NCF model (Fig. 2.1a) by replacing the regression output layer by a classification output layer. This classification output layer will be implemented using an unique output neuron with “sigmoid” activation function and a “binary crossentropy” model loss. In this model, labels are recommendation values obtained from the user,item ratings by applying a recommendation threshold. Categorical item-based classification models (Fig. 2.1b) directly return the set of recommended items to each user [4]; the output layer of this neural architecture has as many neurons as there are items in the RS. The output neurons are implemented using the “softmax” activation function, whereas the loss of the model is the “categorical cross-entropy” one. Since the output of this architecture has a probabilistic distribution, we can make N recommendations by taking the N output highest probability values. Despite the high dimensionality of the input and output layers of categorical item-based classifications, the results are accurate [4]; this is similar to the NLP “seq2seq” architectures, where the output is a dense classification layer containing as many neurons as dictionary tokens in the corpus (thousands, or tens of thousands). Finally, categorical rating-based classification models [2] (Fig. 2.1c) are based on NCF by replacing the regression output layer by a classification layer; this classification layer has as many neurons as the cardinality of the rating set. The output neurons are implemented using the “softmax” activation function, whereas the model loss is “categorical crossentropy”. The above neural models can be explained as: let U be the set of users U = {u = 1, . . . , U }, I the set of items I = {i = 1, . . . , I }, and O the set of ratings O =
2 Recommendation Versus Regression Neural Collaborative Filtering
(a) Binary classification model
17
(b) Items-based categorical classification model
(c) Ratings-based categorical classification model
Fig. 2.1 Main approaches to CF recommendations
{u, i, r }, where r ∈ {1, . . . , R} and R is the largest rating in the dataset (e.g. five stars). Overall, we want to make the prediction of the rating that an active user u would made to an item i: yˆu,i = f (u, i|Θ), where Θ are the model parameters, yˆu,i is the obtained prediction value and f maps the model parameters to the predicted score. To calculate Θ, an objective function needs to be optimized. Our starting point is the matrix K factorization machine learning model, where: yˆu,i = f (u, i| pu , qi ) = pu,k qi,k ; k is the dimension of the latent space, pu is the latent vector puT qi = k=1 for user u and qi is the latent vector for item i. NCF modifies the above equations in the following way: yˆu,i = f (P T vuU , Q T viI |P, Q, Θ f ), where P is the latent factor matrix for users (M ∗ K ), Q is the latent factor matrix for items (N ∗ K ), and Θ f are the model parameters. Since f is formulated as MLP, it can be reformulated as: f (P T vuU , Q T viI ) = Φout (Φ X (. . . Φ2 (Φ1 (P T vuU , Q T viI )) . . .)), where Φout is the mapping function to the output layer and Φ X is the mapping function to the xth layer. The NCF loss function is L sqr = (u,i)∈O wu,i ( yˆu,i − yu,i )2 , where yu,i are the training ratings (targets) and wu,i are the neural network weights (its parameters). The DeepMF K model could be seen as a simplified NCF model where pu,k qi,k , as in the matrix factorization model. ClassificaΦout = puT qi = k=1 tion approaches provide an output layer out that returns R probabilistic val. . , Φ R }, where rR=1 Φr = 1. Classification uses the crossues: Φout = {Φ1 , Φ2 , . entropy loss: L = − (u,i)∈O yu,i log( yˆu,i ) + (1 − yu,i )log(1 − yˆu,i ). Please note that in binary classification: R = 2 (recommended, non-recommended), whereas RS datasets usually have an R = 5 (from 1 to 5 stars). Regular classification models
18
J. Bobadilla et al.
make the following predictions: yˆu,i = argmax(r ), whereas our proposed model sets yˆu,i = rR=1 r Φr . From an academic point of view, regression models should not be the best choice to address classification goals, since neural network training aims to obtain correct predictions (not recommendations). When predictions are used to make binary classifications (based on a unique threshold) the gap between prediction values and recommendations can be considered as acceptable, whereas this gap increases when predictions are used to make categorical recommendations (e.g.: one to ten stars classification). When predictions are used to make categorical recommendations, a quantification process is made; e.g.: we can establish that prediction values between 3.75 and 4.75 will be classified as ‘4 stars’. This is a different approach from running a learning process in a CF classification model. The above-mentioned considerations lead us to consider an important advantage in the use of recommendation models: they simultaneously provide both recommendation values and their reliabilities. Figure 2.1b, c show recommendation models where their output layers provide probability distributions for their output categories. Both models are implemented using “softmax” activation functions and “categorical cross entropy” as the learning loss function. From Fig. 2.1b we can select, as recommendations, the “N” categories (items) with the “N” highest probabilities. From Fig. 2.1c, for each input user, item pair, we chose its output value with the highest probability (its “argmax”). We can also make use of the output probabilities to get accurate predictions from the recommendation results. As an example: in an RS where ratings range from 1 to 5, if the recommendation of an user, item pair returns 0, 0, 0, 0.5, 0.5, we could provide the prediction 1 ∗ 0 + 2 ∗ 0 + 3 ∗ 0 + 4 ∗ 0.5 + 5 ∗ 0.5 : 4.5 (50% four stars and 50% five stars). Detailed research on the use of such reliabilities has been done in [2]. CF research has been centered on improving accuracy; nevertheless, RS should accomplish additional objectives to be friendly and useful to its clients. Among the existing additional objectives, we can highlight the following ones: novelty, diversity, reliability, and recommendation explanation. This paper provides an innovative approach to address the reliability objective [12, 18]. RS users often receive only an ordered list of recommended items; this limited information does not contribute to the customer’s confidence in the system. The lack of additional information is usually compensated by providing the number of users that have voted each recommended item: typically, we choose an item rated with four stars instead of an item rated with five stars if the first one has been voted by 1000 users and the second one has been voted only by 4 users. In this case, the number of users who voted an item acts as a reliability measure. Using CF machine learning methods, we should provide a pair prediction, reliability for each recommended item. RS can ‘translate’ numerical information; e.g. 4, 0.9: “It is almost certain that you will like this film”, 5, 0.75: “It is likely that you will really like this film”. These textual recommendations are richer, more understandable, and easier to accept than an ordered list of recommended items. Since the proposed deep learning model provides prediction, reliability pairs, without loss of recommendation accuracy, it is useful for commercial RS.
2 Recommendation Versus Regression Neural Collaborative Filtering
19
This paper will test the current prediction NCF and recommendation NCF models and also provides a proposed categorical recommendation model. The quality of the prediction and recommendation results will be tested in both model categories. It is expected to find the best model choice, or the best combination of models for each RS aim: prediction and recommendation. Experiments will be performed using several representative RS open datasets. Section 2.2 introduces the experiment design, the selected quality measures and the chosen datasets. It also shows the results obtained and provides their explanations. Section 2.3 highlights the main conclusions of the paper and the suggested future works. Finally, a reference section lists current research in the area.
2.2 Experiments and Results Four NCF-based models are tested in the paper: (a) regression, (b) binary classification, c) categorical classification, and d) proposed categorical classification. Table 2.1 shows the key features of each model: the activation function of the neurons in the output layer, the loss function of the model used for learning, the output results and the classification order. Both categorical classification models provide an output layer containing as many neurons as rating values in the RS (e.g.: one to five stars). We define ‘p’ as the output probability function of the classification model for the set of ratings in the RS (e.g.: {1, 2, 3, 4, 5} in a one to five star RS), and we define ‘R’ as the number of possible ratings in the RS (e.g.: R = 5 in one to five stars RS). For each user, item pair in the testing set, both categorical classifications return the rating having the highest output probability: argmax( p( j))| j ∈ {1, 2, . . . , R}. When a fixed number of recommendations ‘N’ is required, all individual output recommendations are ordered and the highest ‘N’ ones in the list are chosen. The column labeled ‘order’ in Table 2.1 shows the ordered function designed for both categorical classification and the proposed categorical classification models. As an example: we have the following outputs in an RS where R = 5: 0,0,0,0.8,0.2, 0,0,0,0.2,0.8, 0,0,0.2,0.5,0.3. The categorical classification model will return the recommendation values: 4, 5, 4, and are ordered: [4, 4, 5]: (1st, 3rd, 2nd)or (3r d , 1st , 2nd ). The proposed categorical classification model will return the recommendation values: 4, 5, 4. Their orders are: [4 ∗ 0.8 + 5 ∗ 0.2, 4 ∗ 0.2 + 5 ∗ 0.8, 3 ∗ 0.2 + 4 ∗ 0.5 + 5 ∗ 0.3] = [4.2, 4.8, 4.1], and they are ordered [4.1, 4.2, 4.8]: (3rd, 1st, 2nd). If we choose N = 2, the categorical classification model returns the recommendations (2nd, 1st)or (2nd, 3rd), while the proposed model returns (2nd, 1st). In short, the proposed categorical classification provides an improved ‘relevance’ value for each NCF recommendation. Instead of the classical recommendation value result, our proposed classification-based model returns the recommendation pair . It opens the door to use the probability values in diverse objectives such as: improving recommendations, explaining recommendations, or preventing attacks. The models shown in Table 2.1 have been tested using four popular open-access datasets. Table 2.2 contains the values of the main parameters of the chosen CF
20
J. Bobadilla et al.
Table 2.1 Tested models and their key features Model
Regression
Binary classification
Categorical classification
Activation
Linear
Sigmoid
Softmax
Softmax
Loss function
MSD
Binary cross-entropy
Categorical cross-entropy
Categorical cross-entropy
Output
Prediction
Order
–
Recommended, argmax( p( j))| j ∈ argmax( p( j))| j ∈ {1, 2, . . . , R} Nonrecommended {1, 2, . . . , R} R – argmax( p( j))| j ∈ j=1 j ∗ p( j) {1, 2, . . . , R}
Table 2.2 Main parameter values of the tested datasets Dataset #users #items #ratings Movielens 100K MovieLens 1M MyAnimeList* Netflix*
943 6040 19,179 23,012
1682 3706 2,692 1750
99,831 911,031 548967 535,421
Scores
Sparsity
1–5 1–5 1–10 1–5
93.71 95.94 98.94 98.68
Table 2.3 Main parameter values of the performed experiments Experiments Prediction Threshold recommendation Dataset/quality measure Movielens 100K and 1M, Netflix* MyAnimeList*
Proposed categorical classification
Rating recommendation
MAE
F1
Accuracy
–
T hr eshold = {3, 5} N = {2, 4, 6, 8, 10} T thr eshold = {7, 9} N = {2, 4, 6, 8, 10}
Ratings = {1, 2, 3, 4, 5} N = {5} Ratings = {1, 2, . . . , 10} N = {5}
–
data sets: Movielens 100K, Movielens 1M, and representative subsets of both MyAnimeList and Netflix datasets. We have run the four Table 2.1 models on each of the four Table 2.2 datasets, testing the three quality measures shown in Table 2.3: Mean Absolute Error (MAE) prediction, F1 recommendation, and accuracy of individual rating predictions. Please note that the MyAnimeList dataset ratings range from 1 to 10, whereas the rest of datasets range from 1 to 5. Table 2.3 shows the main parameter values of the recommendation experiments; the number of recommendations ‘N’ ranges from 2 to 10, step 2, for all the experiments. We have used values 3 and 5 as recommendation thresholds θ for datasets whose ratings range from 1 to 5, while the testing thresholds have been 7 and 9 when MyAnimeList was chosen.
2 Recommendation Versus Regression Neural Collaborative Filtering Table 2.4 Mean Absolute Error comparative Model/dataset ML 100K ML 1M Regression Categorical classification
0.74 1.01
0.69 0.99
21
Netflix*
MyAnimeList*
0.74 1.05
0.91 1.18
The lower the value, the better the result
RS usually provides recommendation results, for example: “We recommend Avatar, Titanic, and Riddick”. Traditionally, for each user, the CF process makes a prediction for each of his/her not consumed items (e.g.: not watched movies) and then it selects the ‘N’ highest predictions. ‘N’ is the fixed number of recommendations (N = 3 in our ‘films’ example). Making predictions is not an objective, but only the way to get recommendations. However, the quality of the prediction is generally used to test the model quality; the underlying concept is that the higher the quality of the prediction is, the better the RS. This can be true for CF regression models, but it does not make sense for CF recommendation models, where results are categorical instead of continuous: In recommendation models, each prediction is a natural number and then each prediction error is also a natural number. The continuous versus discrete nature of regression and classification, respectively, will lead to larger prediction errors in the recommendation cases. Table 2.4 shows the MAE obtained both in the regression and categorical classification baselines. As expected, the regression model returns better prediction results than the classification one. As explained, recommendation tests must be performed to make a fair comparison between the CF regression and the classification models. Precision and recall are representative measures to test recommendation quality; we have chosen the F1 measure, which makes a balanced combination of both precision and recall. Figure 2.2 shows the F1 results for each of the tested datasets. A couple of representative thresholds θ is used for each dataset and their results are shown in separate graphs. Considering all Fig. 2.2 graphs, we can state that high thresholds put to the test model performances, and we can make better choices by examining the graphs of threshold 5 and threshold 9. First, the results show that the regular categorical classification does not provide the expected recommendation quality; it can be explained due to the nature of the classification process: a classification model can provide us with tens of four-of-five-star recommendations (e.g.: Avatar, 5, Riddick, 5, . . .., Titanic, 5, Dune, 4). As an example: when the number of recommendations is low (e.g. N = 2), …Which two films are we selecting from the large group of movies recommended with five stars? We need a ‘relevance’ value for each recommended item of five stars, so the recommended items are the ‘N’ ones with the highest relevance (extracted from the set of items with the highest recommendation values). This is the concept behind the ‘proposed categorical classification’ detailed in Table 2.1. As we explained, the neural classification model provides us with the necessary relevance (probability) values in its categorical cross-entropy probabilistic output layer.
22
J. Bobadilla et al.
Fig. 2.2 F1 recommendation results. Tested datasets: a MovieLens 100K, b MovieLens 1M, c Netflix*, d MyAnimeList. The higher the values, the better the result.
Binary classification model has a slightly low performance than the regression and the proposed ones when 1–5 ratings datasets are tested (e.g.: Fig. 2.2c, threshold 5). When testing the MyAnimeList dataset (1–10 ratings) (Fig. 2.2d, threshold 9) the broader range of ratings leads to a performance decrease in the binary classification model, showing the weakness of the binary approach when large range ratings are used. Finally, the proposed categorical classification model provides a recommendation quality similar to the best of the tested baselines: the regression model. It can be seen in all Fig. 2.2 graphs, particularly when the recommendation process is put to the limit: highest thresholds and the MyAnimeList dataset. The similarity in the quality of the results of both the regression and the proposed classification models is very significant, since it shows that we are correctly using the probability distribution of the proposed model. We interpret each output probability value of the model as the relevance of the recommendation, and combine them to obtain a ‘relevance’ value for each recommendation rating; then the resulting pairs value, relevance are ordered by relevance. In short, the proposed method obtains the quality of classical regression NCF by exploiting the output probabilities of the model classification architectures.
2.3 Conclusions Collaborative filtering-based recommender systems currently use neural state of art models to make recommendations. Deep matrix factorization and neural collaborative filtering are the reference approaches, of which neural collaborative filtering
2 Recommendation Versus Regression Neural Collaborative Filtering
23
provides the best result. Neural collaborative filtering is a regression model that returns recommendations: It makes necessary to perform a continuous to categorical translation based on a fixed threshold. This paper proposes to avoid this translation stage by using a neural recommendation model: it is reasonable to propose a recommendation model to perform a recommendation task. Experiments have been performed using several baselines and various quality measures. The results in this paper show the superiority of the state-of-the-art neural regression model over both the regular classification model and the binary classification model. The results also show similar quality results when the proposed classification model is compared with the regression baseline. This paper encourages the use of the proposed model, as it provides additional information in each recommendation result, compared to the baseline. Future works is based on the fields where the extra probability information promises to be more useful: recommendation explanation can be improved by providing the user with each recommendation probability; recommendation quality can be improved by adequately ordering recommendations based on the individual probability values; bidimensional plots can be drawn using each recommendation value, probability pair, malicious attacks could be handled by removing high recommendations with associated low probabilities, etc. Acknowledgements This work was partially supported by Ministerio de Ciencia e Innovación of Spain under the project PID2019-106493RB-I00 (DL-CEMG) and the Comunidad de Madrid under Convenio Plurianual with the Universidad Politécnica de Madrid in the actuation line of Programa de Excelencia para el Profesorado Universitario.
References 1. Bobadilla, J., González-Prieto, Á., Ortega, F., Lara-Cabrera, R.: Deep learning feature selection to unhide demographic recommender systems factors. Neural Comput. Appl. 33(12), 7291– 7308 (2021) 2. Bobadilla, J., Gutiérrez, A., Alonso, S., González-Prieto, Á.: Neural collaborative filtering classification model to obtain prediction reliabilities. Int. J. Interact. Multimedia Artif. Intell. (2021) 3. Bobadilla, J., Lara-Cabrera, R., González-Prieto, Á., Ortega, F.: Deepfair: deep learning for improving fairness in recommender systems (2020). arXiv preprint arXiv:2006.05255 4. Bobadilla, J., Ortega, F., Gutiérrez, A., Alonso, S.: Classification-based deep neural network architecture for collaborative filtering recommender systems. Int. J. Interact. Multimedia Artif. Intell. 6(1) (2020) 5. Çano, E., Morisio, M.: Hybrid recommender systems: a systematic literature review. Intell. Data Anal. 21(6), 1487–1524 (2017) 6. Deldjoo, Y., Schedl, M., Cremonesi, P., Pasi, G.: Recommender systems leveraging multimedia content. ACM Comput. Surv. (CSUR) 53(5), 1–38 (2020) 7. Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput. 23(9), 2421–2456 (2011) 8. Gao, M., Zhang, J., Yu, J., Li, J., Wen, J., Xiong, Q.: Recommender systems based on generative adversarial networks: a problem-driven perspective. Inform. Sci. 546, 1166–1185 (2021)
24
J. Bobadilla et al.
9. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 173–182 (2017) 10. Kulkarni, S., Rodd, S.F.: Context aware recommendation systems: a review of the state of the art techniques. Computer Sci. Rev. 37, 100255 (2020) 11. Narang, S., Taneja, N.: Deep content-collaborative recommender system (DCCRS). In: 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 110–116. IEEE (2018) 12. Ortega, F., Lara-Cabrera, R., González-Prieto, Á., Bobadilla, J.: Providing reliability in recommender systems through Bernoulli matrix factorization. Inform. Sci. 553, 110–128 (2021) 13. Ortega, F., Zhu, B., Bobadilla, J., Hernando, A.: Cf4j: collaborative filtering for java. Knowl. Based Syst. 152, 94–99 (2018) 14. Rendle, S., Krichene, W., Zhang, L., Anderson, J.: Neural collaborative filtering vs. matrix factorization revisited. In: Fourteenth ACM Conference on Recommender Systems, pp. 240– 248 (2020) 15. Shokeen, J., Rana, C.: A study on features of social recommender systems. Artif. Intell. Rev. 53(2), 965–988 (2020) 16. Xue, H.J., Dai, X., Zhang, J., Huang, S., Chen, J.: Deep matrix factorization models for recommender systems. In: IJCAI, vol. 17, pp. 3203–3209. Melbourne, Australia (2017) 17. Zhu, B., Hurtado, R., Bobadilla, J., Ortega, F.: An efficient recommender system method based on the numerical relevances and the non-numerical structures of the ratings. IEEE Access 6, 49935–49954 (2018) 18. Zhu, B., Ortega, F., Bobadilla, J., Gutiérrez, A.: Assigning reliability values to recommendations using matrix factorization. J. Comput. Sci. 26, 165–177 (2018)
Chapter 3
Heuristic Approach to Improve the Efficiency of Maximum Weight Matching Algorithm Using Clustering Liu He and Ryosuke Saga
Abstract Graph matching is one of the most well-studied problems in combinatorial optimization. The history of the maximum weight matching problem, which is one of the most popular problems in graph matching, is intertwined with the development of modern graph theory. To solve this problem, Edmonds proposed the blossom algorithm. After that, an increasing number of approximation matching algorithms that can be faster in matching than the blossom algorithm has been proposed. However, matched weights should be considered. This study aims to improve the efficiency of solving the maximum weight matching algorithm by using clustering. Improving efficiency means achieving optimal matching while improving the running time. In this experiment, we attempt to use various real-world datasets to ensure the accuracy of experiments.
3.1 Introduction Graph matching is a basic problem and widely used in the computer field. The applications of graph matching range from transportation networks [1] to social networks [2] and economics [3]. The first purpose is to minimize transportation costs [1] and optimally allocate positions [2, 4, 5]. With the rapid development of the Internet, graph matching technology has become widespread, and many examples have emerged. Such examples include matching between players in chest tournaments [6] and kidney exchange between patients and donors [7]. One of the most well-known graph matching problems is the maximum weight matching problem. The maximum weight matching problem entails finding a match that maximizes the sum of weights for a given weighted graph. In this study, we introduce several notations and terms for discussion. The input is graph G = (V, E), where n = |V | and m = |E| are the number of vertices and edges, respectively w L. He · R. Saga (B) Graduate School of Humanities and Sustainable System Sciences, Osaka Prefecture University, Sakai, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_3
25
26
L. He and R. Saga
is the edge weight function. Matching set M is a set of vertices-disjoint edges, and W M is the total sum of the edge-weights in matching set M. The maximum weight matching problem can be described as the following program, where xe represents the incidence vector of the matching. max
we x e
(3.1)
xe ≤ 1, v ∈ V
(3.2)
e∈E
s.t.
e∈N (v)
xe = 0 or 1, e ∈ E
(3.3)
Edmonds proposed the blossom algorithm to solve the maximum weight matching problem [8]. The blossom algorithm mimics the structure of the Hungarian algorithm, but the search for augmenting paths is complicated by the presence of odd-length alternating cycles and the fact that matched edges must be searched in both directions. Edmonds’ solution is to contract blossoms as they are encountered. A blossom is defined inductively as an odd-length cycle alternating between matched and unmatched edges, whose components are either single vertices or blossoms in their own right. However, it requires a large amount of time to solve the maximum weight matching problem. To solve the problem rapidly, approximate matching algorithms have been studied as one of the methods that can solve the graph matching problem. However, the result of an approximation algorithm is not better than the optimal result obtained by the maximum weight matching algorithm. This study proposes a method of combining the clustering method and the maximum weight matching algorithm to improve the efficiency of the maximum weight matching algorithm for general graphs. First, clustering is used to divide the nodes in graph G into various clusters. Second, the maximum weight matching algorithm is executed within each cluster. According to the experimental results, improved efficiency can be achieved by using a combination of the clustering method and the maximum weight matching algorithm.
3.2 Related Work 3.2.1 Greedy Matching Algorithm Greedy is an effective method of solving optimization problems. At each point in time, the best one at that moment is selected without thinking about the whole thing. A matching algorithm based on the greedy algorithm, which is called greedy matching algorithm, has also been proposed [9], and the process is as follows:
3 Heuristic Approach to Improve the Efficiency of Maximum …
1. 2. 3. 4. 5.
27
Randomly select node i. Add the heaviest edge of node i to matching set M. Remove the edges connected to the vertices of edge e from edge set E. Until E = ∅ repeat the previous steps. Return matching set M.
3.2.2 Path Growing Algorithm Drake et al. proposed an approximate matching algorithm based on the greedy method [10]. This algorithm finds a set of paths where the vertices of the graph are not intersecting. Moreover, each edge formed in the path is the heaviest edge of the current vertex. The performance ratio of this algorithm is 21 , and the running time is O(|E|). However, the performance ratio of the algorithm is inferior to the optimal result.
3.2.3 Improved Linear Time Approximation Algorithm for Weighted Matchings Drake et al. also proposed a linear-time approximation matching algorithm[11] with a performance ratio of 23 − . They proved that the performance ratio of the linear time approximation algorithm that can solve the weighted matching problem is 23 − . This algorithm can obtain a match where the total weight of the result is close to 2 w(Mopt ), and the execution time is O(m −1 ). However, the speed of the algorithm 3 is insufficient when the algorithm is applied to actual problems.
3.2.4 Approximating Maximum Weight Matching in Near-Linear Time Duan et al. presented the near-linear time algorithm for computing (1 − )-approximate maximum weight matching[12]. Specifically, given an arbitrary realweighted graph and when > 0, this algorithm can compute such matching in O(m −2 log3 n) time. The previous best approximate maximum weight matching algorithm with comparable running time can only guarantee a ( 23 − )-approximate solution. In addition, this study presented a fast algorithm running in O(mlognlog −1 ) time and computing a ( 43 − )-approximate maximum weight matching. This algorithm is close to optimal in as much as its running time can only be improved in logarithmic factors and dependence on . In addition, the result is an approximate match rather than an optimal match.
28
L. He and R. Saga
3.2.5 Data Reduction for Maximum Matching on Real-World Graph Koana et al. presented new near-linear-time data reduction rules for unweighted and positive-integer-weighted cases [13]. They analyzed the use of the kernelization methodology of parameterized complexity analysis. Result showed that utilizing (linear-time) data reduction rules for computing maximum matchings is practical. These data reduction rules provide significant speedups of the state-of-the-art implementations for computing matchings in real-world graphs. However, finding the right parameter is challenging.
3.2.6 Computational Comparison of Several Greedy Algorithms for the Minimum Cost Perfect Matching Problem on Large Graphs The purpose of this study is to provide a computational comparison of several algorithms for the minimum cost perfect matching problem in complete graphs [14]. Common heuristics for CARP involve the optimal matching of the odd-degree nodes of a graph. The algorithms used in the comparison include the CPLEX solution of an exact formulation, the LEDA matching algorithm, the blossom algorithm, and six constructive heuristics. The results show that two of the constructive heuristics consistently exhibit the best behavior compared with the other four. Blossom is, however, the best option when optimal matchings are needed.
3.3 Proposed Method The purpose of this study is to improve the efficiency of the maximum weight matching algorithm in general graphs. Many factors need to be considered for efficiency. Examples include running time, matching rate, and total weight of matched pairs. One of the matching algorithms with the shortest running time is the greedy matching algorithm. The most suitable matching algorithm currently known for its matching rate and the total weight of the matched pairs is the maximum weight matching algorithm, which has been introduced before. None of the existing approximate matching algorithms can achieve the effect of the maximum weight matching algorithm. On these bases, in this study, we propose to combine the clustering method with the maximum weight matching algorithm. The clustering method and the maximum weight matching algorithm are combined to achieve optimum matching while improving the running time. The efficiency of the proposed method is evaluated using the three factors, which are defined below.
3 Heuristic Approach to Improve the Efficiency of Maximum …
• Running Time:
TM = Tm + Tc
• Matching Rate: RM = • Sum of Weights: WM =
(3.4)
|VM | V
29
(3.5)
we
(3.6)
e∈M
In the three equations above (3.4)–(3.6), Tm , Tc , and TM denote the running time of the maximum weight matching algorithm, clustering, and experiment, respectively. VM is the set of nodes in matching set M, R M is the rate of matching, and W M means the total weights of matching set M.
3.3.1 Combination of Spectral Clustering and Maximum Weight Matching Algorithm Spectral clustering [15] is one of the well-known methods of clustering. Spectral clustering generates a graph from data and decomposes the connected parts of the graph to perform clustering. Although the general clustering method is based on the distance from the cluster center point, spectral clustering mainly focuses on connectivity; hence, the total weights in each generated cluster becomes larger than that in the others. In accordance with (3.7), the graph is divided into k clusters (A1 , A2 , A3 , . . . , Ak−1 , Ak ), and the maximum weight matching algorithm is executed in each cluster. Graph segmentation methods include normalized cut (Ncut) [16], minimum cut [17], average cut [18], and RadioCut [19]. In Ncut, the method of measuring the size of the subset is the weight vol(A) of the edges of the graph. Given that this study is about maximum weight matching, Ncut (3.7) is used as the graph division method. 1 W (Ai , Ai ) 2 i=1 vol(Ai ) k
min N cut (A1 , A2 , A3 , . . . , Ak ) =
subject to A1 ∪ A2 ∪ . . . ∪ Ak−1 ∪ Ak = V Ai ∩ A j = ∅, i, j ∈ k W (Ai , Ai ) =
m∈Ai , n∈Ai
wmn
(3.7)
30
L. He and R. Saga
The specific process of combining spectral clustering and the maximum weight matching algorithm is as follows: 1. Build a graph from the data. 2. Express the graph with the adjacency matrix (W ), and compute the degree matrix (D) as follows (3.8): n wi j . (3.8) diag Di = j=1
Laplacian (L) is computed as follows (3.9): L = D − W.
(3.9)
3. Normalize Laplacian (L ) (3.10) and calculate k eigenvectors in ascending order of the eigenvalues of L . 1 1 L = D− 2 L D− 2 (3.10) 4. Arrange the eigenvectors to create matrix X = [x1 , x2 , . . . xk ]. 5. Use k-means to cluster the row vectors of matrix X into k . 6. Use the k-means results and run the maximum weight matching algorithm in each cluster. 7. Return matching set M of each cluster.
3.3.2 Combination of Louvain and Maximum Weight Matching Algorithm The Louvain method [20] is an improvement of the Newman method [21] developed from the Girvan-Newman method [22]. The algorithm is fast and can obtain the best results by stopping, so it is used as another clustering method in this experiment. The objective function of the Louvain method is not to use Modularit y Q, but to optimize the change of modularity ΔQ(3.11). ΔQ means that Modularit y Q is changed by moving node i from the original cluster to neighboring cluster C . ΔQ =
+ ki,in − 2m
in
+ ki 2m
tot
2
−
in
2m
−
2 tot
2m
−
ki 2m
2
(3.11) In (3.11), in is the sum of the weights of the edges inside C, tot is the sum of the weights of the edges incident to the nodes in C, ki is the sum of the weights of the edges incident to node i, ki,in is the sum of the weights of the edges from node i to the nodes in C, and m is the sum of the weights of the total edges in the graph. The specific process of combining the Louvain method with the maximum weight matching algorithm is as follows:
3 Heuristic Approach to Improve the Efficiency of Maximum …
31
1. Treat each node as a separate and independent cluster. 2. Join the neighbor cluster of each cluster with the highest ΔQ to the same cluster, and update the graph. 3. Until ΔQ ≤ 0, repeat Step 2. 4. Use the result of Louvain to execute the maximum weight matching algorithm in each cluster. 5. Return matching set M from each cluster.
3.4 Experiment 3.4.1 Dataset and Experiment In the experiments of this study, we reveal whether the proposed method can improve the matching efficiency in the maximum weight matching of general graphs. Multiple data sets are used in the experiments shown in Table 3.1. Facebook (Friends lists) dataset [23], Facebook Government dataset [24], Facebook Athletes dataset [24], Facebook Company dataset [24], Facebook Official Pages dataset [25], Facebook Artists dataset [24], and LastFM dataset [26] are utilized. In this experiment, the edge weights for each dataset are set at random from 0 to 10. Each experiment is performed 10 times to ensure accuracy. Apart from running time (Eq. 3.4), matching rate (Eq. 3.5) and sum of matching weights (Eq. 3.6), accuracy is used as an evaluation target based on the result of the maximum weight matching algorithm of the existing research. In the equation below, Me is the matching set obtained from the proposed method, and Mopt is the result obtained from the maximum weight matching algorithm. Accuracy =
Me ∩ Mopt Mopt
Table 3.1 Datasets and k of the clustering in experiments Dataset Nodes Edges Facebook (friends lists) Facebook government Facebook athletes Facebook company Facebook official pages Facebook artists LastFM
4039 7057 13,866 14,113 22,470 50,515 7,624
88,234 89,455 86,858 52,310 171,002 819,306 27,806
(3.12)
kL
kC
14 30 130 69 / / 41
[9,18] [25,34] [125,134] [64,73] / / [36,45]
32
L. He and R. Saga
In addition, given that the optimal k L based on ΔQ can be obtained with the Louvain method, the number of clusters kC for spectral clustering is set to [k L − 5, k L + 5], which is based on the result obtained by running each dataset 10 times with the Louvain method in advance. Also “/” in Table 3.1 shows that during the experimental process, according to the results of Facebook Company, the running time of spectral clustering is more than that of running only the maximum weight matching algorithm. Therefore, the datasets that are larger than this Facebook Company do not need to be experimented with spectral clustering.
3.4.2 Experiment Results The results for each dataset are shown in Tables 3.2, 3.3, 3.4 and 3.5. The method that combines the Louvain method and the maximum weight matching algorithm is displayed as “LouvainB,” and the method that combines spectral clustering and the maximum weight matching algorithm is displayed as “SpectralB.” “Blossom” is the maximum weight matching algorithm, and “Greedy” is the greedy matching algorithm. According to the experimental results, the fastest algorithm is the greedy matching algorithm in terms of running time, but the matching rate and the sum weights of matching are the lowest. The experimental results of LouvainB are close to those of the blossom algorithm in terms of matching rate and sum weights of matching, but 1 of that of the blossom algorithm. the running time is about 10 The average accuracy of each experimental result is shown in Table 3.5. The accuracy of greedy and blossom is poor. This result can also be seen from the experimental results of Greedy. Overall, compared with the results of the two methods that use clustering, the results of Louvain are superior to those of spectral clustering. The proposed combination of the Louvain method and the maximum weight matching algorithm is considered to be the most efficient.
Table 3.2 Average matching rate(%) of the experiment Dataset Blossom LouvainB Facebook (friends lists) Facebook government Facebook athletes Facebook company Facebook official pages Facebook artists LastFM
95.2 92.4 82.1 80.1 83.8 87.8 75.8
95.1 90.8 79.4 77.5 82.6 86.8 74.5
Greedy
SpectralB
55.8 56.3 53.3 54.4 52.7 55.6 59.6
83.2 64.5 58.6 75.1 / / 66.9
3 Heuristic Approach to Improve the Efficiency of Maximum … Table 3.3 Average running time (s) of the experiment Dataset Blossom LouvainB Facebook (friends lists) Facebook government Facebook athletes Facebook company Facebook official pages Facebook artists LastFM
Greedy
SpectralB
179.238
13.123
2.830
26.532
223.104
27.608
12.859
62.709
427.133 329.924
39.876 22.507
31.912 33.098
153.910 365.506
1095.159
79.346
72.797
/
8163.95 75.491
1135.31 10.225
392.67 9.006
/ 30.657
Greedy
SpectralB
Table 3.4 Average matching weights of the experiment Dataset Blossom LouvainB Facebook (friends lists) Facebook government Facebook athletes Facebook company Facebook official pages Facebook artists LastFM
33
17936.0
17889.5
10791.0
13495.5
26178.0
25521.1
18915.0
15967.9
47351.0 44506.0
45636.7 43338.8
34557.5 34894.0
28819.4 38249.7
78675.0
77374.3
56852.0
/
97613.0 22591.0
95211.1 22269.4
13019.2 17556.8
/ 19785.1
Table 3.5 Average accuracy of the experiment Dataset LouvainB Facebook (friends lists) Facebook government Facebook athletes Facebook company Facebook official pages Facebook artists LastFM
Greedy
SpectralB
82.450
37.286
54.972
81.035 83.583 88.510 88.874
39.209 43.257 43.548 41.944
54.598 59.915 86.585 /
79.380 93.370
40.350 45.632
/ 60.605
34
L. He and R. Saga
3.5 Conclusion In this study, we conducted experiments on various datasets and proved the effectiveness of the method combining the clustering method and the maximum weight matching algorithm. By combining the clustering method and the maximum weight matching algorithm, we were close to achieving an optical match while reducing the running time. However, this study still requires many improvements before being implemented on actual problems. For example, the experimental result of combining spectral clustering and the maximum weight matching algorithm may have lower matching weights than the result of the greedy matching algorithm, and when the dataset size is increased, the time to run the clustering is also greatly increased. The result of the method that combines the Louvain method and the maximum weight matching algorithm is the most effective. However, in the experiment, we only used offline datasets. The real-world problem is that in most cases, matching can be made both online and offline. In our future research, we will attempt to use online algorithms for the maximum weight matching algorithm problem.
References 1. Hitchcock, F.L.: The distribution of a product from several sources to numerous localities. J. Math. Phys. 20(1–4), 224–230 (1941) 2. Thorndike, R.L.: The problem of classification of personnel. Psychometrika 15(3), 215–235 (1950) 3. Chan, P., Huang, X., Liu, Z., Zhang, C., Zhang, S.: Assignment and pricing in roommate market. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 30 (2016) 4. Irving, R.W.: An efficient algorithm for the ‘stable roommates’ problem. J. Algor. 6(4), 577–595 (1985) 5. Zhao, B., Xu, P., Shi, Y., Tong, Y., Zhou, Z., Zeng, Y.: Preference-aware task assignment in on-demand taxi dispatching: an online stable matching approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 2245–2252 (2019) 6. Kujansuu, E., Lindberg, T., M’akinen, E.: The stable roommates problem and chess tournament pairings. Divulgaciones Matemáticas 7(1), 19–28 (1999) 7. Roth, A.E., Sönmez, T., Ünver, M.U.: Pairwise kidney exchange. J. Econ. Theory 125(2), 151–188 (2005) 8. Edmonds, J.: Paths, trees, and flowers. Can. J. Math. 17, 449–467 (1965) 9. Preis, R.: Linear time 1/2-approximation algorithm for maximum weighted matching in general graphs. STACS 99, 259–269 (1999) 10. Drake, D.E., Hougardy, S.: A simple approximation algorithm for the weighted matching problem. Inform. Process. Lett. 85(4), 211–213 (2003) 11. Drake, D.E., Hougardy, S.: Improved linear time approximation algorithms for weighted matchings. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 14–23 (2003) 12. Duan, R., Pettie, S.: Approximating maximum weight matching in near-linear time. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 673–682 (2010) 13. Koana, T., Korenwein, V., Nichterlein, A., Zschoche, P.: Data reduction for maximum matching on real-world graphs: theory and experiments (2018). arXiv preprint 14. Wøhlk, S., Laporte, G.: Computational comparison of several greedy algorithms for the minimum cost perfect matching problem on large graphs. Comput. Oper. Res. 87, 107–113 (2017)
3 Heuristic Approach to Improve the Efficiency of Maximum …
35
15. von Luxburg, U.: A tutorial on spectral clustering. Statist. Comput. 17(4), 395–416 (2007) 16. Shi, J., Malik, J.: Normalized cuts and image segmentation. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 22, no. 8, pp. 888–905 (2000) 17. Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1101–1113 (1993) 18. Sarkar, S., Soundararajan, P.: Supervised learning of large perceptual organization: graph spectral partitioning and learning automata. IEEE Trans. Pattern Anal. Mach. Intell. 22(5), 504–525 (2000) 19. Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Computer Aided Design Integr. Circ. Syst. 11(9), 1074–1085 (1992) 20. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Statist. Mech. Theory Experim. 2008, 10 (2008) 21. Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 6 (2004) 22. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 2 (2004) 23. Leskovec, J., Mcauley, J.: Learning to discover social circles in ego networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems. vol. 25. Curran Associates, Inc. (2012). https://proceedings.neurips.cc/paper/2012/ file/7a614fd06c325499f1680b9896beedeb-Paper.pdf 24. Rozemberczki, B., Davies, R., R.S., Sutton, C.: Gemsec: Graph embedding with self clustering. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 65–72 (2019) 25. Rozemberczki, B.C.A., Sarkar, R.: Multi-scale attributed node embedding (2019). arXiv preprint 26. Rozemberczki, B., Sarkar, R.: Characteristic functions on graphs: birds of a feather, from statistical descriptors to parametric models. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1325–1334 (2020)
Chapter 4
Digital Cultural Heritage Twins: New Tools for a Complete Fruition of the Cultural Heritage Entities Gian Piero Zarri
Abstract This paper suggests that, for a complete fruition of any possible kind of artwork, we must associate to every Cultural Heritage (CH) item a formal description including two related components, a physical and a symbolic one. The first deals with the description of the standard features/properties (dimensions, support, style, location, etc.) currently used to characterize a CH item. The second, a sort of Digital Cultural Heritage Twin of the physical one, is devoted to the description of the immaterial, symbolic and conceptual meanings carried on by the CH items like, in the case for example of the iconographic narratives, the behaviors of the represented personages, their reciprocal interactions, the implicit messages associated with the artwork, the connections with other works, etc. From a technical, concrete point of view, the creation of these Digital CH Twins implies the use of complex Knowledge Representation techniques able to take into account the above immaterial meanings, as illustrated in this paper.
4.1 Introduction This paper concerns the use of advanced digital techniques, mainly Artificial intelligence-inspired, to allow a generalized access to all the important features, both physical and symbolic/conceptual, of any kind of Cultural Heritage (CH) entities, and to make possible their global fruition. For the sake of simplicity, we englobe under the term physical all those measurable/easily quantifiable/descriptive features and properties that are currently used to characterize and identify a Cultural Heritage item, i.e., dimensions, support, execution technique, style, name of the artist, information about the owner, places and collections to which the item belongs or belonged in the past, etc. The term symbolic/conceptual refers, on the contrary, to the possibility of modeling and describing in computer-usable form the immaterial properties G. P. Zarri (B) STIH Laboratory, Sorbonne University, 75006 Paris, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_4
37
38
G. P. Zarri
of these items, that is all the possible conceptual meanings carried on by those entities and the emotional messages they transmit to the user/observer in a more or less explicit manner. We will then consider here the CH items from two strictly related perspectives, i.e., as physical tangible artifacts, but also as a bundle of symbolic intangible features denoting, among others, emotions, evocations, associations, cultural suggestions, historical recollections, backgrounds etc. This advanced conception of the digitalization practices can be referred to by saying that an exhaustive representation of a CH item should consist of a twofold digital entity, making then reference to what we can call the Digital Cultural Heritage Twin approach. The first component of this digital entity would then consist of the usual description of the physical characteristics of the original cultural item. The second component concerns its associated Digital CT Twin that will reproduce, in computer-usable form, the symbolic/conceptual properties of the original item evoked above. The concept of Digital CT Twin derives from that of Digital Twin, a relatively recent notion that is now spreading very quickly in several applied research contexts see, e.g., [1–3]. At the origin, digital twins consisted simply in the creation of high-fidelity virtual images of real-world entities, utilized then, e.g., to predict failures and/or mitigate related unexpected events. The notion of Digital Twin tends now to generalize in order to include also the description in digital form of all the immaterial/symbolic aspects of any possible real-world entity. To make more concrete the concept of this dual approach methodology, we will supply in this paper some examples of formal representation of particularly important symbolic/conceptual features proper to two iconographic narratives1 masterpieces, Diego Velazquez’s “The Surrender of Breda” and Leonardo da Vinci’s “Mona Lisa” (“La Gioconda”). These formal examples must be interpreted as fragments of the specific Digital CT Twins to be linked to the usual physical characteristics descriptions in order to provide a complete understanding/fruition of the two masterpieces. We can immediately note that the creation of these twins dealing with the formal description of immaterial entities like events, behaviors, situations, attitudes, purposes etc., necessarily implies the utilization of very advanced Knowledge Representation techniques. This is why we propose, for the concrete set-up of our Digital CT Twins, to make use of NKRL, the “Narrative Knowledge Representation Language” [5]—a tool largely used for dealing with abstract entities/situations/properties like those mentioned above see, e.g., [6]. In the following, Sect. 4.2 will present a (very reduced) state of the art about the formalization of complex CH entities. Section 4.3 describes the essential features of NKRL; Sect. 4.4 will illustrate the use of this tool to implement some fragments of Digital CH Twins. Section 4.5 will be a short Conclusion. 1
Iconographic Narratives [4] represent a particularly important category of Cultural Heritage entities that includes paintings, drawings, frescoes, mosaics, sculptures, murals etc., i.e., in short, all those CT items that transmit to the user/observer a particular message making use of visual/illustrative modalities. The iconographic narratives can correspond to dynamic stories like The Raft of the Medusa (Gericault) or the Surrender of Breda (Velasquez), or describe more static situations like The Garden at Sainte-Adresse (Monet), a still life as The Vase with Fifteen Sunflowers (Van Gogh) or a portrait like Da Vinci’s Mona Lisa.
4 Digital Cultural Heritage Twins: New Tools …
39
4.2 Formalizing Cultural Heritage Entities, Short State of the Art The “digitization” activities in the Cultural Heritage (CH) domain have a long tradition of work that goes back to the sixties. They were limited, initially, to some annotation operations using keywords extracted from thesauri as Art & Architecture Thesaurus (AAT), ICONCLASS or Union of Artist Names (ULAN).2 The use of (relatively simple) data models very popular in the eighties-nineties as the original Dublin Core proposal or of more sophisticated models like VRA (Visual Resources Association) Core 4 Schema contributed then to improve the efficacy/generality of these annotation procedures.3 Moreover, the conversion of these models into RDFcompatible tools under the influence of the Semantic Web activities—see the DCMI (Dublin Core Metadata Initiative) Abstract Model4 or the RDF(S) version of the CIDOC CRM tool [7]—has further increased their interoperability potentials. The introduction of a Semantic Web/RDF approach in the CH domain has indubitably produced important beneficial effects without, however, being able to propose concrete solutions for representing those immaterial features at the core of our Digital CH Twins. In this context, a canonical example is represented by the RDF-based description of Claude Monet’s “Garden at Sainte-Addressee” painting included in the Image Annotation on the Semantic Web W3C Incubator Group Report [8]. This description supplies, in fact, an impressive number of details about the physical (dimensions etc.) characteristics of the painting. However, with respect to the proper content/deep meaning of the painting, the representation in digital form is reduced to three xml-like statements relating the presence on the picture of three persons (three Claude Monet’s relatives). No information is given about the important immaterial properties that characterize the painting, e.g., the mode of sitting of the personages facing of the sea, their mutual relationships, their attitude about the peaceful and bright land-scape, etc. The Incubator Group Report goes back to 2007, but, apparently, few real progresses with respect to a (at least partial) description of the symbolic/conceptual properties of the CH items have been made in recent years. We can see in this context, e.g., the flat statements in the style of “the subject of the painting is a woman” included in the formalization of the Mona Lisa painting carried out in 2013 in an Europeana context [9]. Another example is the formal description of a Giambologna’s sculpture [10] denoting the kidnapping of a Sabin woman by a Roman warrior, on display at the Capodimonte museum in Naples. This description is restricted to purely documentary-oriented statements telling us that this sculpture is part of the Collezione Farnese, without any attempt to model the frightened woman, 2
http://www.getty.edu/research/tools/vocabularies/aat/index.html, last accessed 2022/3/14; http:// www.iconclass.nl/home. Last accessed 2022/3/14; http://www.getty.edu/research/tools/vocabular ies/ulan/index.html, last accessed 2022/3/14. 3 http://dublincore.org/. Last accessed 2022/3/14; https://www.loc.gov/standards/vracore/schemas. html. Last accessed 2022/3/14. 4 https://www.dublincore.org/specifications/dublin-core/abstract-model/. Last accessed 2022/3/14.
40
G. P. Zarri
the kidnapper, the man who tries to prevent the kidnapping, etc. Additional examples can be found, e.g., in Sect. 1.1 of [11].
4.3 Main Features of the NKRL Language From an ontological point of view, the most remarkable characteristic of NKRL concerns the addition of an ontology of elementary events to the usual ontology of concepts. This last—called HClass, hierarchy of classes—presents some interesting, original aspects see, e.g., [5: 123–137]. However, its architecture is relatively traditional, and the concepts are represented according to the usual Semantic Web binary model. A pure binary-based approach faces, however, major difficulties when the entities to be represented are not simple notions/concepts that can be defined a priori and inserted then in a graph-shaped static ontology, but denote dynamic situations. These last are characterized, in fact, by the presence of complex spatio-temporal information and of mutual relationships among their elements (including, e.g., intentions and behaviors). In specifying the ontology of elementary events of NKRL, an augmented n-ary approach has then been chosen. n-ary means using a formal representation where a given predicate can be associated with multiple arguments—using a simple example, representing the n-ary purchase relation implies associating with a purchase-like predicate several arguments as a seller, a buyer, a good, a price, a timestamp, etc. Augmented means that—see the seminal “What’s in a Link” paper by William Woods [12]—within n-ary representations, the logico-semantics links between the predicate and its arguments are explicitly represented making use of the notion of functional role [13]. The NKRL representation of a simple narrative like “Peter gives an art book to Mary” will include, then, the indication that Peter plays the role of “subject/agent” of the action of giving, book is the “object” of the action and Mary the “beneficiary”, see below. In the NKRL ontology of elementary events, the nodes are then represented by augmented n-ary knowledge patterns called templates—this ontology is known as HTemp, the hierarchy of templates. Templates denote formally general classes of elementary states/situations/actions/episodes etc. (designated collectively as elementary events). Examples can be “be present in a place”, “experience a given situation”, “have a specific attitude”, “send/receive messages”, etc. Templates’ instances— called predicative occurrences—describe then formally the meaning of specific elementary events pertaining to one of these classes. The general representation of a template is given by Eq. 4.1. L i P j (R1 , a1 ) (R2 , a2 ) . . . (Rn , an ) .
(4.1)
In Eq. 4.1, L i is the symbolic label identifying a specific template. Pj is a conceptual predicate. Rk is a generic functional role, used then to identify the specific logico-semantic function performed by its filler ak , a generic
4 Digital Cultural Heritage Twins: New Tools …
41
predicate argument, with respect to Pj . When an NKRL template denoted as Move:TransferMaterialThingsToSomeone is instantiated to provide the representation of the elementary event above, “Peter gives an art book to Mary”, the predicate Pj (MOVE) will then introduce its three arguments ak , PETER_, MARY_ and ART_BOOK_1 (individuals, i.e., instances of HClass’ entities) via, respectively, the functional roles (Rk ) SUBJ(ect), BEN(e)F(iciary) and OBJ(ect). The full n-ary construction is then reified via the use of the label L i , allowing the association of multiple templates/predicative occurrences within wider conceptual structures.
4.4 Introducing Important Digital CH Twin Features This Section supplies some examples of formalization in NKRL terms of important conceptual features to be used then in a Digital CH Twins context. The examples are derived from two recent experiments, described in detail in [11, 14]. The first concerns the formal description of the central scene of Diego Velazquez’s painting about “The Surrender of Breda”, which represents Ambrosio Spinola, Commander in Chief of the Spanish Army, receiving, on June 5, 1625, the keys to the city by Justinus van Nassau, governor of Breda. The main interest of this scene resides in the benevolent attitude of the winner, Spinola, towards the loser, van Nassau, a not so common behavior at that time. The second concerns Leonardo da Vinci’s The Mona Lisa (“La Gioconda”) painting, which does not require any particular presentation/comment.
4.4.1 Templates and Templates’ Instances (Predicative Occurrences) Table 4.1 shows the creation of an instance (a predicative occurrence) of the Receive:TangibleThing template. This occurrence is identified by the label breda.c8, and corresponds to the elementary event “Ambrosio Spinola receives the keys to the city from Justinus van Nassau in the context of particular celebrations”, see [14]. The template of Table 4.1 corresponds to the syntax of Eq. 4.1. To avoid the ambiguities of natural language and any possible combinatorial explosion problem, see [5: 56–61], both the predicate and the functional roles of Eq. 4.1 are primitives. Predicates Pj belong then to the set {BEHAVE, EXIST, EXPERIENCE, MOVE, OWN, PRODUCE, RECEIVE}, and the roles Rk to the set {SUBJ(ect), OBJ(ect), SOURCE, BEN(e)F(iciary), MODAL(ity), TOPIC, CONTEXT}. The HTemp hierarchy is structured into seven branches, where each of them includes only the templates created around one of the seven predicates Pj . HTemp includes presently (March 2022) more than 150 templates, see [5: 137–177].
42
G. P. Zarri
Table 4.1 Deriving a predicative occurrence from a template name: Receive:TangibleThing father: Receive: position: 7.1 natural language description: “Receive Some Tangible Thing from Someone” RECEIVE: SUBJ: var1: [var2] OBJ: var3 SOURCE: var4: [var5] [BENF: var6: [var7]] [MODAL: var8] [TOPIC: var9] [CONTEXT: var10] { [modulators], ≠abs } var1: human_being_or_social_body var3: artefact_; economic/financial_entity var4: human_being_or_social_body var6: human_being_or_social_body var8: activity_; process_; symbolic_label; temporal_sequence var9: sortal_concept var10: situation_; symbolic_label var2, var5, var7: location_ breda.c6:
RECEIVE: SUBJ: AMBROSIO_SPINOLA: (BREDA_) OBJ: (SPECIF key_to_the_city BREDA_) SOURCE : JUSTINUS_VAN_NASSAU: (BREDA_) CONTEXT: CELEBRATION_1 date-1: 05/06/1625 date-2:
Receive:TangibleThing (7.1)
Ambrosio Spinola receives the keys to the city from van Nassau in the context of particular celebrations.
In a template, the arguments of the predicate (ak in Eq. 4.1) are actually represented by variables (var i ) with associated constraints, expressed as HClass concepts or combinations of concepts. When creating a predicative occurrence, the constraints linked to the variables are used to specify the legal sets of HClass terms, concepts or individuals, that can be substituted for these variables within the occurrence—note that, in this paper, concepts are in lower case characters and individuals in upper case. In Table 4.1, e.g., the individuals AMBROSIO_SPINOLA and JUSTINUS_VAN_NASSAU must be really instances of individual_person, a specific term of human_being_or_social_body.
4.4.2 Structured Arguments (Expansions) Structured arguments/expansions [5: 68–70] concern the possibility of building up the predicate arguments ak of Eq. 4.1 as a set of recursive lists introduced by the four AECS operators. These are the disjunctive operator ALTERN(ative) = A, the distributive operator ENUM(eration) = E, the collective operator COORD(ination)
4 Digital Cultural Heritage Twins: New Tools …
43
Table 4.2 Representing the properties of the hidden painting gio2.c4: OWN: SUBJ: PAINTING_2 OBJ: property_ TOPIC: (COORD (SPECIF painted_on POPLAR_PLANK_1 (SPECIF under_ PAINTING_1)) (SPECIF labelled_as HIDDEN_PAINTING)) { obs } date-1: today_ date-2: Own:CompoundProperty (5.42) PAINTING_2 has been painted under the plank of PAINTING_1, and is known as the HIDDEN_PAINTING.
= C and the attributive operator SPECIF(ication) = S. An example of expansion is (SPECIF key_to_the_city BREDA_) in Table 4.1—denoting, using the operator SPECIF, that the keys to the city we are speaking of are more exactly specified via the addition of the property BREDA_. We can look now at that part of the Mona Lisa encoding—see, e.g., occurrence gio2.c4 in Table 4.2—which concerns the problem of the identification of the hidden painting, i.e., the portrait that lies beneath Mona Lisa on the same poplar panel, see [15]. In NKRL, the instances of the Own:Property and Own:CompoundProperty templates are used to introduce the properties of inanimate entities; the templates of the type Behave:HumanProperty represent the properties of human (animate) entities see, e.g., [6]. The difference between the Own:Property and Own:CompoundProperty’s instances concerns the way of representing the property, i.e., the filler of the TOPIC role. This is simply a single HClass term in the Own:Property instances, while it is a structured argument in the instances of Own:CompoundProperty. In the gio2.c4 occurrence, PAINTING_2 is the hidden painting. Its properties, i.e., the TOPIC’s filler, are denoted by a COORD list including two arguments, both implemented as SPECIF lists. The first states that the property painted_on of PAINTING_2 is characterized by two features, i) PAINTING_2 has been realized on the same POPLAR_PLANK_1 that represents the support of the Mona Lisa’ portrait (PAINTING_1) and ii) it is located under PAINTING_1—this feature is expressed using an additional SPECIF list. The second argument tells us that PAINTING_2 is known as the HIDDEN_PAINTING.
4.4.3 Determiners and Attributes The semantic predicate Pj , the seven functional roles Rk and the (simple or structured) arguments ak of the predicate, see Eq. 4.1, are the three basic building blocks strictly necessary to give rise, together, to a meaningful representation of templates or predicative occurrences. A valid interpretation of templates/occurrences will only arise, then, after the (mandatory) assembling of the three blocks through a L i label. When the addition of further information-carrying elements is required to better specify the
44
G. P. Zarri
meaning expressed by this basic core, these additional elements (basically locations, modalities and temporal information) are dealt with simply as determiners/attributes [5: 70–86]. In Table 4.1, for example, the variables var2, var5 and var7 associated with the template denote determiners/attributes of the location type that are then represented, in the corresponding predicative occurrences, by specific terms of the HClass concept location_ or by individuals derived from these terms—BREDA_ in the case of Table 4.1. Modulators (modalities) represent an important category of determiners/attributes that apply to a full, well-formed template or predicative occurrence to specify/particularize its full meaning. They are classed into three categories, temporal (begin, end, obs(erve)), deontic (oblig(ation), fac(ulty), interd(iction), perm(ission)) and modal (abs(olute), against, for, main, ment(al), wish, etc.) modulators, see [5: 71–75]. An example of use of the temporal modulator obs(erve) is reproduced in Table 4.2: we can see, as of today, that PAINTING_2 is…; today_ is a legal HClass concept. Another category of attribute/determiners concerns the two operators date-1 and date-2. They can only be associated with full predicative occurrences (see, e.g., the two occurrences in Tables 4.1 and 4.2) and are used to materialize the temporal interval (or a specific date) originally linked to the corresponding elementary event see, e.g., [16]. To highlight the importance of the attribute/determiners in NKRL, we reproduce in Table 4.3 another fragment of the global formalization of the Mona Lisa painting, the occurrence gio2.c6 that represents a typical example of NKRL modeling of the notion of negation. This consists in adding to a predicative occurrence a specific modal modulator, negv (negated event) to point out that the corresponding elementary event did not take place. In our example, WOMAN_53, the woman represented in the hidden painting, has not been recognized (see Table 4.3) as WOMAN_52, i.e., as Mona Lisa—WOMAN_53 is probably Isabella d’Aragona, the wife of the Duke of Milan Ludovico il Moro, one of the Leonardo’s employers, see [15]. The association of the temporal modulator obs(erve) with today_ has the usual “as of today” meaning. Table 4.3 An example of “negated event” gio2.c6: BEHAVE: SUBJ: (SPECIF WOMAN_53 (SPECIF identified_with WOMAN_52)) { obs, negv } date-1: today_ date-2: Behave:HumanProperty (1.1)
We can remark (modulator obs) that the elementary event represented by gio2.c6 is a “negated event” (modulator negv), i.e., WOMAN_53 is not WOMAN_52.
4 Digital Cultural Heritage Twins: New Tools …
45
4.4.4 Second Order Conceptual Structures In the context of digitally representing real and effective Digital CH Twins, several predicative occurrences corresponding to multiple elementary events must, of course, be associated. An easy way to do this concerns the possibility of making use of a co-reference mechanism that allows us to logically associate two or more predicative occurrences where the same individual(s) appear(s). Returning then to the predicative occurrence breda.c6 of Table 4.1, we can note that, thanks to the presence of the filler CELEBRATION_1 in the CONTEXT role, the ceremony of handing over the keys occurs in the context of more general festivities. Occurrence breda.c8 tells us now that CELEBRATION_1 figures also as SUBJ of an Own:CompoundProperty predicative occurrence (breda.c8) where the filler of the TOPIC role—i.e., the property associated with the SUBJ, see 4.4.2 above—is an AECS structure corresponding to (SPECIF surrender_ BREDA_). These festivities are then associated with the surrender of the city. The most general and interesting way of logically associating single predicative occurrences is, however, to make use of second order structures created through the reification of single occurrences. These structures reflect, at the digital level, surface connectivity phenomena like causality, goal or indirect speech. “Reification” is intended here as the possibility of creating new objects out of already existing entities and to say something about them without making reference to the original entities. In NKRL, reification is implemented using the symbolic labels (the L i terms of Eq. 4.1) of the predicative occurrences according to two different conceptual mechanisms. The first concerns the possibility of referring to an elementary/complex event as an argument of another (elementary) event—a complex event corresponds to a coherent set of elementary events. The (surface) connectivity phenomenon involved here is the indirect speech. An informal example can be that of an elementary event X describing someone who speaks about Y, where Y is itself an elementary/complex event. In NKRL, this mechanism is called completive construction, see [5: 87–91]. It is illustrated, e.g., by the association of the occurrences breda.c5/breda.c6 in Table 4.4, where breda.c6 is introduced by breda.c5 as filler of its CONTEXT role. The representation of breda.c6 is reproduced in Table 4.1; the # prefix in breda.c5 denotes that the associated term is not a HClass item but an occurrence label. breda.c5 and breda.c6 represent, together, a coherent entity formalizing the most important element of the scene: while receiving the keys (breda.c6), Ambrosio Spinola prevents (PRODUCE activity_blockage) a Justinus’ attempt to genuflect in front of him (breda.c5). The second process allows us to associate together, through several types of connectivity operators, elementary/complex events that can still be regarded as fully independent entities. This mechanism—binding occurrences, see [5: 91–98]—is realized under the form of lists formed of a binding operator Bni and its L i arguments, see Eq. 4.2:
46
G. P. Zarri
Table 4.4 An example of completive construction breda.c5: PRODUCE: SUBJ: AMBROSIO_SPINOLA: (BREDA_) OBJ: activity_blockage MODAL: (SPECIF moral_suasion hand_gesture) TOPIC: (SPECIF GENUFLECTING_1JUSTINUS_VAN_NASSAU) CONTEXT: #breda.c6 date-1: 05/06/1625 date-2: Produce:CreateCondition/Result (6.4)
(Within thebreda.c6 framework), Spinola stops van Nassau to genuflect. breda.c6) RECEIVE
SUBJ
AMBROSIO_SPINOLA...
Ambrosio Spinola receives the keys…
(Lbk (Bn i L 1 L 2 . . . L n ))
(4.2)
Lbk is the symbolic label identifying the global binding structure. The Bnj operators are: ALTERN(ative), COORD(ination), ENUM(eration), CAUSE, REFER(ence), the weak causality operator, GOAL, MOTIV(ation), the weak intentionality operator, COND(ition). These binding structures are particularly important in an NKRL context given that, among others, the top-level formal structure introducing the NKRL representation of a complex narrative has obligatory the form of a binding occurrence—see, e.g., [11, 14]. An example of binding occurrence is given in Table 4.5. The CAUSE of what related in gio2.c6 is the publication, see gio2.c7, of a paper listing all the incoherencies associated with the identification of the hidden painting woman with Mona Lisa. These are described by the occurrences included in a binding occurrence, #gio2.c8, introduced in gio2.c7 in a completive construction mode.
4.5 Conclusion In this paper, we have proposed the use of advanced digital techniques to consent a generalized access to all the important features, both physical and symbolic/conceptual, of any kind of Cultural Heritage (CH) entities, and to make possible their global fruition. We suggest then to conceive the digitalized image of a CH entity as the association of two terms, the first describing the physical (in the largest leaning of this word) properties of this entity, and the second corresponding to a Digital Cultural Heritage Twin supplying the formal description of all the immaterial/symbolic aspects of the same entity. NKRL, the Narrative Knowledge Representation Language, has been chosen for the implementation of the Digital CH Twins components, because of its well-known capacity for dealing with important immaterial/symbolic aspects of many different realities. To demonstrate the viability of this approach, we have illustrated in some depth the use of NKRL to represent in
Lillian Feldmann Schwartz has spread what described in gio2.c8 via a paper in “The Visual Computer” Journal.
gio2.c7: MOVE: SUBJ: LILLIAN_FELDMANN_SCHWARTZ OBJ: #gio2.c8 MODAL: (SPECIF SCIENTIFIC_PAPER_2 (SPECIF published_on THE_VISUAL_COMPUTER_JOURNAL)) date-1: 1/1/1988, 31/1/1988 date-2: Move:StructuredInformation (4.42)
The elementary event represented by gio2.c6 (see Table 3) is a negated event…
gio2.c6: BEHAVE: SUBJ: (SPECIF WOMAN_53 … { obs, negv }
The elementary event modelized by gio2.c6 has been caused by what is collectively described in the completive construction involving occurrences gio2.c7 and gio2.c8.
gio2.c5: (CAUSE gio2.c6 gio2.c7)
Table 4.5 An example of binding occurrence
4 Digital Cultural Heritage Twins: New Tools … 47
48
G. P. Zarri
digital form complex expressive fragments of possible Digital CH Twins proper to two well-known Renaissance masterpieces.
References 1. Grieves, M.: Origins of the digital twin concept (Working Paper). Melbourne (FL), Florida Institute of Technology (2016; https://www.researchgate.net/publication/307509727. Accessed 14 Mar 2022 2. Fuller, A., Fan, Z., and Day, C.: Digital twin: enabling technology, challenges and open research. arXiv (2019). https://arxiv.org/abs/1911.01276v1. Accessed 14 Mar 2022 3. Semeraro, C., Lezoche, M., Panetto, H., Dassisti, M.: Digital Twin paradigm: a systematic literature review. Comput. Ind. 130, (2021). Article 103469 4. Speidel, K.: What narrative is—reconsidering definitions based on experiments with pictorial narrative. Front. Narrative Stud 4(s1), s76–s104 (2017) 5. Zarri, G.P.: Representation and Management of Narrative Information—Theoretical Principles and Implementation. Springer, London (2009) 6. Zarri, G.P.: Sentiments analysis at conceptual level making use of the narrative knowledge representation language. Neural Netw. 58(1), 82–97 (2014) 7. Le Boeuf, P., Doerr, M., Ore, C.E., Stead, S.: Definition of the CIDOC conceptual reference model (version 6.2.6). ICOM/CIDOC Documentation Standard Group, Heraklion (2019) 8. Troncy, R., van Ossenbruggen, J., Pan, J.Z., Stamou, G. (Eds.): Image Annotation on the Semantic Web, W3C Incubator Group Report 14 August 2007. W3C (2007). https://www.w3. org/2005/Incubator/mmsem/XGR-image-annotation/. Accessed 14 Mar 2022 9. Isaac, A. (ed.): Europeana Data Model Primer (14/07/2013). Europeana (2013). http://travesia. mcu.es/portalnb/jspui/bitstream/10421/5981/1/EDM_primer.pdf. Accessed 14 Mar 2022 10. Lodi, G., Asprino, L., Nuzzolese, A.G., Presutti, V., Gangemi, A., Reforgiato Recupero, D., Veninata, C., Orsini, A.: Semantic Web for cultural heritage valorisation. In: Hai-Jew, S. (ed.) Data Analytics in Digital Humanities, pp. 3–37. Springer International, Cham, (2017) 11. Amelio, A., Zarri, G.P.: Conceptual encoding and advanced management of Leonardo da Vinci’s Mona Lisa: preliminary results. Information MDPI 10(10), 321 (2019) 12. Woods, W.A.: What’s in a Link: Foundations for Semantic Networks. In: Bobrow, D.G., Collins, A.M., (eds.) Representation and understanding: studies in cognitive science, pp. 35–82. Academic Press, New York (1975) 13. Zarri, G.P.: Functional and semantic roles in a high-level knowledge representation language. Artif. Intell. Rev. 51(4), 537–575 (2019) 14. Zarri, G.P.: Use of a knowledge patterns-based tool for dealing with the “narrative meaning” of complex iconographic cultural heritage items. In: Proceedings of the 1st International Workshop on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding (VIPERC 2019)—CEUR Vol-2320, pp. 25–38. CEUR-WS.org, Aachen (2019) 15. Amelio, A.: Exploring Leonardo Da Vinci’s Mona Lisa by visual computing: a review. In: Proceedinfgs of the 1st International Workshop on Visual Pattern Extraction and Recognition for Cultural Heritage Understanding (VIPERC 2019)—CEUR Vol-2320, pp. 74–85. CEURWS.org, Aachen (2019) 16. Zarri, G.P.: Modeling and exploiting the temporal information associated with complex “narrative” documents. Int. J. Knowl. Eng. Data Mining 6(2), 135–167 (2019)
Chapter 5
Arabic Speech Processing: State of the Art and Future Outlook Naim Terbeh, Rim Teyeb, and Mounir Zrigui
Abstract The aim of this article is to study the Arabic speech processing applications which have introduced the voice communication as a solution for specific situations (non-native speakers, speakers with voice disabilities, learners of Arabic vocabulary, speech recognition or speech synthesis). We present the principal applications of processing the Arabic spoken language accentuating the most challenges preventing obtaining better results. The current paper gives in detail the followed approaches and the applied techniques in the automatic processing applications of spoken Arabic, so it can be a reference study for researchers and developers who deal with this topic.
5.1
Speech Processing
The term ‘speech processing’ refers to any system that processes acoustic signals (audio, speech or voice). These systems have presented a large progression by covering different applications (economic, scientific, educative and others). They form one of the principal branches of Natural Language Processing (NLP) and a sub-field of Artificial Intelligence (AI). They focus on modeling natural languages to build applications like speech recognition and synthesis, pronunciation evaluation, speech classification (singing or normal, healthy or pathological, dialectal or standard, etc.), speech diagnostic (voice anomaly detection, problematic phonemes N. Terbeh (&) Computer Department, University Lille, CHU Lille, CERIM, ULR 2694, Lille, France e-mail: [email protected] R. Teyeb Department of Computing, Taibah University, Taibah, Saudi Arabia e-mail: [email protected] N. Terbeh R. Teyeb M. Zrigui Research Laboratory in Algebra, Numbers Theory and Intelligent Systems, Monastir University, Monastir, Tunisia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies309, https://doi.org/10.1007/978-981-19-3444-5_5
49
50
N. Terbeh et al.
identification, dialect identification, etc.), and human machine interaction (question answering, dialogue systems, etc.). Speech processing is a highly interdisciplinary field with connections to computer science, signal processing, phonology, speech therapy, linguistics, cognitive science, psychology, mathematics and others. Some of the earliest AI applications was speech processing (speech recognition and synthesis). In particular, the last decade has witnessed an incredible increase in the performance and public use of these systems. Speech processing researchers are proud of developing language independent models and tools that can be applied to all languages and for different dialects; e.g., speech recognition systems can be built for different languages and dialects using the same methodologies, mechanisms and models. However, some languages actually get more attention (like English, French and Chinese) than others (like Hindi, Swahili and Thai) [1–3]. Arabic, which is the primary language of the Arab world and which is the religious language of millions of non-Arab Muslims, is somewhere in the middle of this continuum. Though Arabic speech processing has many challenges, it has seen many successes and developments. In the rest of the paper, we discuss the main challenges of Arabic speech as a necessary background, and we survey a number of its application areas. We end up with a critical discussion of the future of Arabic speech.
5.2
Arabic Speech and Its Challenges
The Arabic spoken language poses a number of modeling challenges for speech processing applications: morphological richness, ambiguity, dialectal variations, noise, voice pathologies, non-native speakers, and resource poverty.
5.2.1
Morphological Richness
In Arabic language, words have numerous forms (hence various pronunciations) resulting from a rich inflectional system of derivation depending on several features like the gender, the number, the person, the aspect, the mood, the case, and the number of attachable critics. As a result, it is probable to find single Arabic words that will be translated into English sentences (five words, and even more): • ﻭ ﺳﻴﺪﺭﺳﻮﻧﻬﺎwa + sa + ya-drus-uuna + ha will be translated into “And they will study it”. • ﺃﺗﺘﺬﻛﺮﻭﻧﻨﺎa + ta + ta-dhakaru-na + na’ will be translated into “Do you remember us?”.
5 Arabic Speech Processing: State of the Art and Future Outlook
51
This challenge leads to a higher number of unique Arabic vocabulary types compared to other languages like English or French, which is challenging for automatic learning models designed for applications of speech processing [4–7].
5.2.2
Ambiguity
The Arabic script uses optional diacritical marks (vowelization) to represent short vowels and other phonological information that is important to distinguish words from each other. Nevertheless, these marks are almost never used outside of religious texts and children’s literature, which generates a high degree of ambiguity. This ambiguity in Arabic vocabulary due to non-vowelization can be described by the various pronunciations possible of the word “ ”ﻛﺘﺒﺖwhich can be pronounced: • • • •
ﻛﺘﺒﺖkatabtu “I wrote”; ﻛﺘﺒﺖkatabat “she wrote”; ﻛﺘﺒﺖkutibat “It was written”; ﻛﺘﺒﺖka + tibat “such as Tibat” or “like Tibat”.
An input acoustic signal (voice command, speech to be recognized, speech to be classified or analyzed, etc.) which presents a strong ambiguity will be processed by the automaton as being a defective or incomprehensible speech sequence. The system finds considerable difficulty in deciphering such an entry. This difficulty is due either to the diversity of meanings that can be conveyed by the utterance, or to the total lack of understanding this pronunciation. Computational linguistic researchers are usually invited to introduce new approaches to resolve the ambiguities carried by the Arabic speech (see [8–10]).
5.2.3
Diversity of Arabic Spoken Language
The Arabic language is the 4th most spoken language in the world with about 445 million speakers, also ranked 8th in number of pages circulating on the Internet. For historical and ideological reasons, this language presents a hierarchy of various varieties: • Classical Arabic: This is the Arabic of the Holy Quran. It has been spoken since the seventh century. The classical Arabic language is presented as a reference, branched afterwards into different dialects. Classical Arabic belongs to the religious domain as it is practiced by members of a clergy. • Standard Arabic: This is the language of the literature and the press, commonly spoken on radio and television broadcasts, in conferences and in official speeches in all Arab countries. It is taught at school as a first language.
52
N. Terbeh et al.
• Intermediate Arabic: This is a simplified variant of Classical Arabic and an elevated form of dialect Arabic. This variant is booming today. It is used more and more commonly in education and in the media. It allows us to get close to the illiterate and the mother tongue of the people. • Dialect Arabic: This is another variation of classical Arabic. It is mainly oral, and it is the language of everyday conversation. Each Arab country has its own dialect [4, 11, 12]: – Arabic dialects spoken in the Arabian Peninsula: Gulf dialect, Saudi Arabian dialect, Yemen; – Maghrebian dialects: Algerian, Moroccan, Tunisian, Mauritanian, Lybian; – Middle Eastern dialects: Egyptian, Sudanese, Syro-Lebanese-Palestinian; – The Maltese language is also considered an Arabic dialect. These different registers make the task of automatically processing the Arabic language, in its different varieties, a difficult and complicated task. This is why the developed systems are designed to handle only one of the variants.
5.2.4
Orthographic Inconsistency
Standard Arabic and dialectal Arabic are both written with a high degree of spelling inconsistency, especially on social media: A third of all words in MSA comments online have spelling errors, and dialectal Arabic has no official spelling standards although there are efforts to develop such standards computationally, such as the work on CODA or Conventional Orthography for Dialectal Arabic. Furthermore, Arabic can be written in other scripts, most notably a Romanization called Arabizi which attempts to capture the phonology of the words (see [13–16]).
5.2.5
Noisy Speech
To achieve a high-quality recording, it is necessary to take care of filtering the recordings downloaded via the radio or television channels (Al Arabia, Al Jazeera, Al Wataniya, etc.) or the noise in the recording studio. This filtering consists in removing unwanted sequences such as silence periods and sequences of words in languages other than Arabic, because sometimes the host sends a word or a sentence spontaneously or intentionally in other languages. In addition, the musical sequences are not part of the speech to be processed. In our case, we talk about verification more than filtering, so we eliminate noise like “uh”, “em”, “hem” and sounds caused by the telephone. We use Audacity to cut sequences of types “em”, “uh”, etc. Then we filter the entire vocal corpus with a successive application of the low-pass filter until recordings of acceptable quality are achieved [17, 18].
5 Arabic Speech Processing: State of the Art and Future Outlook
5.2.6
53
Pathological Speech
The human voice can be affected by many dysfunctions such as pronunciation defects due to damage to the vocal cords. In this way, pathological speech is one that is partially or totally defective. In other words, it is a sequence of speech which contains deformations at the level of its fundamental components, which are the phonemes. Pathological speech generally causes comprehension problems for the recipients. This problem increases proportionally with the rate of false pronunciations contained in the produced speech and deepens for human-machine communication [19, 20]. The deformations that can affect human speech are often of the following kind: • Vocal cord lesions: These prevent the vocal cords from reaching the frequency or intensity that corresponds to the subglottic air pressure. • Poor articulation: One or more phonemes do not occur from their own articulatory points. • Uncontrolled prosodic parameters (too high/too low speech speed, poor distribution of acoustic energy, etc.).
5.2.7
Non-native Speaker
Learning foreign languages generates a diversity of accents by non-native speakers. Often the accent of such a learner presents several distortions in the speech produced since they are hampered by the articulatory context of their mother tongue [21, 22].
5.2.8
Resource Poverty
Data are the bottleneck of NLP: This is true for rule-based approaches that need lexicons and carefully created rules and for machine learning approaches that need corpora and annotated corpora. Although Arabic un-annotated text corpora are quite plentiful, Arabic morphological analyzers and lexicons as well as annotated and parallel data in the non-news genre and in dialects are limited. None of the above-mentioned issues are unique to Arabic; e.g., Turkish and Finnish are morphologically rich, Hebrew is orthographically ambiguous, and many languages have dialectal variants. However, the combination and degree of these phenomena in Arabic create a particularly challenging situation for NLP research and development. For more information on Arabic computational processing challenges, please see [23, 24].
54
5.3
N. Terbeh et al.
Arabic Speech and Its Applications
We organize this section for processing Arabic speech in five principal parts: First, we describe the phenomenon of speech recognition. Second, we discuss some advanced user-targeting applications of speech synthesis. The third part is dedicated to detail the process of speech analysis. Assisting non-native learners of Arabic vocabulary is the subject of the fourth part. The fifth part will itemize the specific treatment of pathological speech.
5.3.1
Speech Recognition
The purpose of speech recognition (or speech-to-text) is to decode the information carried by an acoustic signal from the data provided by the analyzer. We fundamentally differentiate two types of recognition: Firstly, the mission of speaker recognition is to recognize the person producing the speech. Secondly, the objective of speech recognition is to recognize the content of the speech produced independently of the speaker [18, 25]. The purpose for which recognizers are called is the subject of this recognition classification. Indeed, we distinguish what follows: • For the subtitle of speaker recognition, we differentiate the identification of the speaker verification, depending on whether the objective is to check if the analyzed voice corresponds well to the person who senses the speech. This involves determining a pre-established finite list of speakers, who have produced the analyzed signal. • The pronounced sentence to be recognized is distinguished between text dependent speaker recognition, dictated text recognition and text independent recognition. Fixing the sentence to be pronounced is firstly made from the design phase of the system, secondly introduced in the test phase, and thirdly not fixed. • We separate the speech recognition of single-speakers, multi-speakers or independent speakers, depending on whether they have been trained to recognize the voice of a person or the voice of a finite group of people, or on whether he is able to recognize anyone's voice. • There is also a distinction between recognizers of isolated words, recognizers of connected words and recognizers of continuous speech, depending on whether the voice base to be recognized counts isolated words separated from each other by periods of silence, sequences of connected predefined words or any contained speech. The human-machine communication environment is an interactive space that provides communication between the user and the machine in different ways (through a keyboard, gestures, voice commands, etc.). Nowadays, voice communication practically covers a major part of this voice communication [26, 27].
5 Arabic Speech Processing: State of the Art and Future Outlook
55
However, several factors can prevent these interactive systems from succeeding in effective, reliable and trustworthy communication. The noisy speech and the short utterances are major factors resulting in unrecognized voice commands [28], hence the degradation and non-credibility of communication outcomes. Speech and speaker recognition in noisy conditions In noisy environments, speech recognition becomes a significantly difficult process. In such cases, researchers resort to train their systems on various types of noise (simulated noisy speech), so that it will be easily detectible if it appears in real cases. Other approaches adopt similarity techniques between pre-generated speech models and those generated in real communication (see [29–31]). Speech and speaker recognition with short utterances When the inputs used for recognition are very short, it is preferred to adopt methods based on the extraction and reduction of the dimension of feature vectors (Mel Frequency Cepstral Coefficients (MFCC)) using the Discrete Karhunen-Love Transform (DKLT). Other researchers have utilized a nonlinear AM-FM model as an improved parameterization of speech signals. The adopted methods guarantee better performance compared to MFCC features (see [32, 33]).
5.3.2
Speech Synthesis
Unlike analyzers and recognizers, synthesizers produce artificial speech [34–36]. There are basically two types of synthesizers: • Unlike analyzers, synthesizers of speech from a digital representation permits producing speech based on digital audio-signal characteristics. • Contrary to recognizers, speech synthesizers from a symbolic representation can pronounce any sentence without prior listening to a human speaker. In this second category, we also distinguish synthesizers as a function of their operation mode: – Text-based synthesizers receive spelling text as input and must provide the corresponding audio representation; – Synthesizers based on concepts are often integrated into man-machine dialogue systems. They receive the text to be spoken and its linguistic structure, as produced by the dialogue system. Typically, synthesizers do not produce defective speech unless the input text is abnormal or resources are insufficient.
56
5.3.3
N. Terbeh et al.
Speech Analysis
Speech signal analyzers seek to extract the characteristics of the signal as it is produced and not as it is understood, since the latter represents the mission of recognizers [37]. The analyzers are used either as basic modules in speech coding, recognition or synthesis systems or for specialized applications like medical diagnostic aid for the detection of pathologies, voices, automatic identification of problematic phonemes, automatic identification of dialects, language learning, etc.
5.3.4
Learning of Arabic Vocabulary
Arabic vocabulary learning systems are generally based on call and dialogue systems: CALL systems (computer-assisted language learning systems) utilize NLP, which enables technologies to assist language learners. There has been several efforts in Arabic CALL exploring a range of resources and techniques. Examples include the use of Arabic grammar and linguistic analysis rules to help learners identify and correct a variety of errors [38, 39], besides the multi-agent tutoring systems that simulate the instructor, the student and the learning strategy. They include also a log book to monitor progress, as well as a learning interface. Another approach focuses on enriching the reading experience with concordances, text-to-speech, morpho-syntactic analysis, and auto-generated quiz questions [40]. Dialog systems can support smooth and natural conversations with users, which has attracted considerable interest from both research and industry in the past few years. This technology is changing how companies engage with their customers among many other applications, such as commercial dialog systems by big multinational companies such as Amazon Alexa, Google Home and Apple Siri support many languages. Only Apple Siri supports Arabic with a limited performance. There are some strong recent competitors in the Arab world, particularly Arabot and Mawdoo3’s Salma.
5.3.5
Pathological Speech Processing
The term ‘pathological speech processing’ refers to speech classification (healthy or pathological), identification of problematic phonemes (presenting the degradation of the produced speech) and correction of pathological speech [41–43].
5 Arabic Speech Processing: State of the Art and Future Outlook
5.3.5.1
57
Speech Classification
Depending on the parameters that characterize the human voice, the produced speech can follow several possible classifications. However, since the objective of our work is to study pathological speech, we deliberately limit ourselves to the classification as healthy (normal) or pathological speech. We quote for each class the traits that characterize it. Healthy speech Healthy speech is produced by a speaker who masters and respects all the rules of pronunciation of the language of speech. Healthy speech contains no anomalies in these constitutions. It is therefore the word that is not affected by any distortion. Generally, healthy speech is characterized by: • Good articulation: Each phoneme is covered by its own articulatory zone. • Acceptable frequency and intensity: There is a compromise between the number and amplitude of the vibrations of the vocal cords. • Good distribution of sound energy: The speed of speech production facilitates understanding for the receiving party (man or machine). Pathological speech The human voice can be affected by many dysfunctions such as pronunciation defects due to damage to the vocal cords. In this way, pathological speech is one that is partially or totally defective. In other words, it is a sequence of speech which contains deformations at the level of its fundamental components, which are the phonemes. Pathological speech generally causes comprehension problems for the recipients. This problem increases proportionally with the rate of false pronunciations contained in the produced speech and deepens for human-machine communication. The deformations that can affect human speech are often of the following kind: • Vocal cord lesions: These prevent the vocal cords from reaching the frequency or intensity that corresponds to the subglottic air pressure. • Poor articulation: One or more phonemes do not occur from their own articulatory points. • Uncontrolled prosodic parameters (too high / too low speech speed, poor distribution of acoustic energy, etc.).
5.3.5.2
Identification of Problematic Phonemes
Once the vocal pathology is detected (the speech in the input is classified as pathological), it is necessary to identify the problematic phonemes in order to be able to help the concerned speaker to improve their speech. This task is based essentially on the rate of occurrences of the phonemes constituting the vocabulary
58
N. Terbeh et al.
in the speech classified as pathological, in that reference. At the end of this phase, a list of the phonemes that cause pronunciation disorders is generated. This list will be used in the correction phase [44–46]. Generally, a comparison between a reference phonetic model and one specific to the speaker is sufficient to identify the phonemes that cause pronunciation problems.
5.3.5.3
Pathological Speech Correction
During this step, the correction algorithms act on the list of pronunciations containing problematic phonemes. The system must designate through a dictionary of pronunciations the one which corresponds to each bad pronunciation. Nevertheless, this situation arises when the system generates more than a possible fix, where more advanced processing (semantic/pragmatic) must be applied to select one [47, 48]. The correction process generally corresponds to replace problematic phonemes by corresponding ones and to test the obtained pronunciation belonging to a lexicon of Arabic vocabularies.
5.3.6
Limits and Solutions
To recapitulate our performed study of Arabic speech processing applications, we summarize in the following table the limits of adopted approaches and we propose some solutions improving the performance of the system and increasing the credibility of the results (Table 5.1).
5.4
Future Outlook
Arabic NLP has many challenges, but it has also seen many successes and developments over the last 50 years. We are optimistic by its continuously positive albeit (sometimes) slow development trajectory. For the next decade or two, we expect a large growth in the Arabic NLP market. This is consistent with the global rising demands and expectations for language technologies and the increase in NLP research and development in the Arab world. The growing number of researchers and developers working on NLP in the Arab world makes it a very fertile ground ready for major breakthroughs. To support this vision, we believe it is time to have an association for Arabic language technologists that brings together talent and resources and sets standards for the Arabic NLP community. Such an organization can support NLP education in the Arab world, serve as a hub for resources, and advocate for educators and researchers in changing old-fashioned university policies as regards journal-focused evaluation and encouraging collaborations within the
5 Arabic Speech Processing: State of the Art and Future Outlook
59
Table 5.1 Limits of previous speech processing works and proposed solutions Limit Speech recognition • Lack of robustness against noise • Difficulty recognizing short utterances • Limited resources • Difficulty recognizing dialect speech • Speech containing words in other languages Speech synthesis • Several interpretations for non-vowel text • Same-context projection of text onto speech • Speech generated would often sound unnatural Language learning • Learners often bothered by similar phonemes in their native language • Learners with learning disabilities • No specific content per level • Lack of interactive resources for learning Speech correction • Out-of-context correction • Correction with neglect of vocal pathology • Utterances in other languages considered as anomalies • Voice perturbations are detected as voice pathologies
Solution • Train the systems on different types of noise • Adopt mathematical models to train systems on big data • Train the systems on different dialects • Utterances in other languages interpreted as noise • Introduction of emotional effects in produced speech • Train the systems on varied emotional speech related to context • Context of the text in the input can avoid ambiguity between different interpretations • Specific learning for phonemes similar to other ones in native language • Identify the level of each learner • Learning base in different modalities (text, audio, video)
• Adopt approaches based on linguistic resources • Specific correction process for each type of pathologies • Contextual correction allows choosing the most adequate suggestion
Arab world. This can be done by connecting academic, industry and governmental stakeholders. We also recommend more open-source tools and public data that are made available to create a basic development framework that lowers the threshold for joining the community, thus attracting more talent that will form the base of the next generation of Arabic NLP researchers, developers and entrepreneurs.
References 1. Srivastava, V., Singh, M.: Challenges and considerations with Code-Mixed NLP for Multilingual Societies. arXiv preprint arXiv:2106.07823 (2021) 2. Ruder, S., Constant, N., Botha, J., Siddhant, A., Firat, O., Fu, J., Johnson, M.: XTREME-R: towards more challenging and nuanced multilingual evaluation. arXiv preprint arXiv:2104. 07412 (2021)
60
N. Terbeh et al.
3. Li, X., Gong, H.: Demystify optimization challenges in multilingual transformers. arXiv preprint arXiv:2104.07639 (2021) 4. Darwish, K., Habash, N., Abbas, M., Al-Khalifa, H., Al-Natsheh, H.T., Bouamor, H., Mubarak, H.: A panoramic survey of natural language processing in the Arab world. Commun. ACM 64(4), 72–81 (2021) 5. Dressler, W.U., Mattiello, E., Ritt-Benmimoun, V.: Typological impact of morphological richness and priority of pragmatics over semantics in Italian, Arabic, German, and English diminutives. 6. Elfaik, H.: Combining context-aware embeddings and an attentional deep learning model for Arabic affect analysis on twitter. IEEE Access 9, 111214–111230 (2021) 7. Kawar, K.: Morphology and syntax in Arabic-speaking adolescents who are deaf and hard of hearing. J. Speech Lang. Hear. Res. 1–16 (2021) 8. Abd, D.H., Khan, W., Thamer, K.A., Hussain, A.J.: Arabic light stemmer based on ISRI stemmer. In: International Conference on Intelligent Computing, pp. 32–45 (2021) 9. Arian, A., Rahimi Khoigani, M.: Investigating quranic ambiguity translation strategies in Persian and Chinese: lexical and grammatical ambiguity in focus. Linguist. Res. Holy Quran 10(1), 61–78 (2021) 10. Ezzini, S., Abualhaija, S., Arora, C., Sabetzadeh, M., Briand, L.C.: MAANA: an automated tool for DoMAin-specific HANdling of ambiguity. In: IEEE/ACM 43rd International Conference on Software Engineering, pp. 188–189 (2021) 11. Habash, N.: 13 Arabic dialect processing. In: Similar Languages, Varieties, and Dialects: A Computational Perspective, 279 (2021) 12. Ullah, A., Kui, Z., Ullah, S., Pinglu, C., Khan, S.: Sustainable utilization of financial and institutional resources in reducing income inequality and poverty. Sustainability 13(3), 1038 (2021) 13. Guellil, I., Saâdane, H., Azouaou, F., Gueni, B., Nouvel, D.: Arabic natural language processing: an overview. J. King Saud University-Comput. Inf. Sci. 33(5), 497–507 (2021) 14. Guellil, I., Adeel, A., Azouaou, F., Benali, F., Hachani, A.E., Dashtipour, K., Hussain, A.: A semi-supervised approach for sentiment analysis of arab (ic/izi) messages: application to the Algerian dialect. SN Comput. Sci. 2(2), 1–18 (2021) 15. Talafha, B., Abuammar, A., Al-Ayyoub, M.: ATAR: Attention-based LSTM for Arabizi transliteration. Int. J. Electr. Comput. Eng. (IJECE) 11(3), 2327–2334 (2021) 16. Eryani, F., Habash, N.: Automatic romanization of arabic bibliographic records. In: 6th Arabic Natural Language Processing Workshop, pp. 213–218 (2021) 17. Ouisaadane, A., Safi, S.: A comparative study for Arabic speech recognition system in noisy environments. Int. J. Speech Technol. 1–10 (2021) 18. Al-Anzi, F.S., AbuZeina, D.: Synopsis on Arabic speech recognition. Ain Shams Eng. J. (2021) 19. Mittal, V., Sharma, R.K.: Deep Learning Approach for Voice Pathology Detection and Classification. Int. J. Healthcare Inf. Syst. Inform. (IJHISI) 16(4), 1–30 (2021) 20. Harder, B.: Speech language pathology, occupational therapy, and physical therapy student perspectives of an interprofessional education simulation (2021) 21. Yusof, N., Baharudin, H., Hamzah, M.I., Malek, N.I.A.: Fuzzy Delphi method application in the development of I-Aqran module for Arabic vocabulary consolidation. Ijaz Arabi J. Arabic Learn. 4(2) (2021) 22. Ali, Z., Saleh, M., Al-Maadeed, S., Abou Elsaud, S., Khalifa, B., AlJa’am, J.M., Massaro, D.: Understand my world: an interactive app for children learning Arabic vocabulary. In: IEEE Global Engineering Education Conference, pp. 1143–1148 (2021) 23. Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8(4), 1–22 (2009) 24. Habash, N.Y.: Introduction to Arabic natural language processing, vol. 3. Morgan & Claypool Publishers (2010) 25. Alsayadi, H.A., Abdelhamid, A.A., Hegazy, I., Fayed, Z.T.: Arabic speech recognition using end-to-end deep learning. IET Signal Process. (2021)
5 Arabic Speech Processing: State of the Art and Future Outlook
61
26. Zhang, J., Wang, B., Zhang, C., Xiao, Y., Wang, M.Y.: An EEG/EMG/EOG-based multimodal human-machine interface to real-time control of a soft robot hand. Front. Neurorobot. 13, 7 (2019) 27. Friedrich, M., Peinecke, N., Geister, D.: Human machine interface aspects of the ground control station for unmanned air transport. In: Automated Low-Altitude Air Delivery, pp. 289–301 (2022) 28. Vacher, M., Lecouteux, B., Portet, F.: Recognition of voice commands by multisource ASR and noise cancellation in a smart home environment. In: 20th European Signal Processing Conference (EUSIPCO), pp. 1663–1667 (2012) 29. McLaughlin, N., Ming, J., Crookes, D.: Speaker recognition in noisy conditions with limited training data. In: 19th European Signal Processing Conference, pp. 1294–1298 (2011) 30. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Speaker identification in noisy conditions using short sequences of speech frames. In: International Conference on Intelligent Decision Technologies, pp. 43–52 (2017) 31. Ming, J., Hazen, T.J., Glass, J.R., Reynolds, D.A.: Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007) 32. Biagetti, G., Crippa, P., Curzi, A., Orcioni, S., Turchetti, C.: Speaker identification with short sequences of speech frames. ICPRAM (2), pp. 178–185 2015 33. Deshpande, M.S., Holambe, R.S.: Speaker identification based on robust AM-FM features. In: 2nd International Conference on Emerging Trends in Engineering & Technology, pp. 880– 884 (2009) 34. Ali, A.H., Magdy, M., Alfawzy, M., Ghaly, M., Abbas, H.: Arabic speech synthesis using deep neural networks. In: International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–6. IEEE (2021) 35. Mutawa, A.M.: Machine learning for Arabic text to speech synthesis: a Tacotron approach (2021) 36. Bettayeb, N., Guerti, M.: Speech synthesis system for the holy quran recitation. Int. Arab J. Inf. Technol. 18(1), 8–15 (2021) 37. El-Dakhs, D.A.S., Ahmed, M.M.: A variational pragmatic analysis of the speech act of complaint focusing on Alexandrian and Najdi Arabic. J. Pragmat. 181, 120–138 (2021) 38. Shaalan, K., Talhami, H.: Error analysis and handling in Arabic icall systems. In: Artificial Intelligence and Applications (2006). Citeseer, pp. 109–114 39. Shaalan, K.F.: An intelligent computer assisted language learning system for Arabic learners. Comput. Assist. Lang. Learn. 18(1–2), 81–109 (2005) 40. Meftouh, K., Harrat, S., Jamoussi, S., Abbas, M., Smaili, K.: Machine translation experiments on padic: a parallel Arabic dialect corpus. In: Pacific Asia Conference on Language, Information and Computation (2015) 41. Terbeh, N., Zrigui, M.: Vers la correction automatique de la Parole Arabe. Citala 2014 (2014) 42. Maraoui, M., Terbeh, N., Zrigui, M.: Arabic discourse analysis based on acoustic, prosodic and phonetic modeling: elocution evaluation, speech classification and pathological speech correction. Int. J. Speech Technol. 1071–1090 (2018) 43. Terbeh, N., Zrigui, M.: Vocal pathologies detection and mispronounced phonemes identification: case of Arabic continuous speech. In: 10th International Conference on Language Resources and Evaluation (LREC’16), pp. 2108–2113 (2016) 44. Terbeh, N., Zrigui, M.: Identification of pronunciation defects in spoken Arabic language. In: International Conference of the Pacific Association for Computational Linguistics, pp. 355– 365 (2017) 45. Terbeh, N., Zrigui, M.: A novel approach to identify factor posing pronunciation disorders. In: International Conference on Computational Collective Intelligence, pp. 153–162 (2016) 46. Terbeh, N., Trigui, A., Maraoui, M., Zrigui, M.: Arabic speech analysis to identify factors posing pronunciation disorders and to assist learners with vocal disabilities. In: 2016 International Conference on Engineering & MIS (ICEMIS), pp. 1–8 (2016)
62
N. Terbeh et al.
47. Terbeh, N., Trigui, A., Maraoui, M., Zrigui, M.: Correction of pathological speeches and assistance to learners with vocal disabilities. Multimedia Tools Appl. 77(14), 17779–17802 (2018) 48. Terbeh, N., Labidi, M., Zrigui, M.: Automatic speech correction: A step to speech recognition for people with disabilities. In: Fourth International Conference on Information and Communication Technology and Accessibility (ICTA), pp. 1–6 (2013)
Chapter 6
Inter-rater Agreement Based Risk Assessment Scheme for ICT Corporates Roberto Cassata, Gabriele Gianini, Marco Anisetti, Valerio Bellandi, Ernesto Damiani, and Alessandro Cavaciuti
Abstract An ISO 9001 audit can be seen as an independent risk assessment on the business, where each ‘Nonconformity” or “Opportunity For Improvement” is considered as a potential risk. Nevertheless, their actual impact on the business remains difficult to determine; as a consequence, the urgency of a mitigation plan at corporate level can sometimes be underestimated. This paper proposes a semi-quantitative risk assessment methodology on the ISO 9001 findings relying on a selected panel of experts. The experts’ responses are analyzed and validated using a specific statistics test for inter-rater reliability. The proposed methodology has been applied on real findings coming from ISO 9001 internal audits, involving 10 subject matter experts of 7 different countries.
6.1 Introduction Risk-assessment is a widely discussed research topic, and number of solutions and standards were proposed. Similar to the Delphi Technique [8] our risk assessment methodology estimates the likelihood of an event by asking a panel of experts; but R. Cassata · A. Cavaciuti Cisco Systems, San Jose, USA e-mail: [email protected] A. Cavaciuti e-mail: [email protected] G. Gianini · M. Anisetti · V. Bellandi (B) · E. Damiani Universitá degli Studi di Milano, Milan, Italy e-mail: [email protected] G. Gianini e-mail: [email protected] M. Anisetti e-mail: [email protected] E. Damiani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_6
63
64
R. Cassata et al.
instead of running the assessment until participants reach consensus, we measure concordance among raters with a statistics named Kappa Inter-rater agreement, where Kappa is a score expressing the consensus level. In this paper we specifically target large ICT corporations where the adoption of a risk-based approach translates into: (i) identification of risks and opportunities, (ii) plan of actions to address them, (iii) implementation in a quality management system and (iv) evaluation of effectiveness. An ISO 9001 audit can be seen as the 1st step of the risk assessment, also called “Risk Identification” in the ISO 31000:2018 Risk Management standard nomenclature [4]. The outcome of the audit is an identified set on nonconformities and opportunities for improvement that have associated risks with potential impacts on the business. For each finding, the actions taken to prevent or mitigate the associated risk and to evaluate the effectiveness of the mitigation are generally local to the audited organization and not propagated to other organizations belonging to the same corporation. This paper proposes a methodology to perform a Risk Assessment using a systematic and structured approach involving a panel of experts who provide their judgments on the findings identified during an audit. The judgments of the experts are evaluated using a specific approach for inter-rater reliability assessment, computing the level of inter-rater agreement as measured by a set of metrics inspired by the Cohen Kappa [2]: each one of those metrics—whose exact formulation depends on the nature of the response categories used in the assessment (nominal, ordinal, numeric)—measures how much the raters agree with each other, and discounts the effect of agreements occurring by chance, computed via probabilistic methods. For eliciting the expert judgment, we designed a survey focused on identifying (i) risk categories, (ii) risk probability, (iii) risk impact and (iv) risk profile cost.
6.2 The Methodology Figure 6.1 shows the risk Assessment approach as defined by ISO 31000 (left portion) and the relation with our approach (right portion). In our methodology, the Risk Identification is executed during the audit itself. Risk Analysis is done by the panel of selected experts using our structured survey and the Risk Evaluation is summarized in our Business Risk Scorecard. Findings Selection Our starting point are the finding discovery procedures, according to Fig. 6.1, the auditor is in charge of reviewing all the ISO 9001 findings opened in a given time frame, selecting the relevant ones. During this phase, findings details are collected in order to provide sufficient information to the experts. We consider the following ISO 9001 finding details as most relevant: (i) the finding title, (ii) the finding description, (iii) the requirement and (iv) the potential business impact. Panel of Experts Selection Criteria The auditor is responsible of the selection of a panel of experts that will analyze the risks associated with the identified findings. The selected experts must be independent and in a position that allows them to
6 Inter-rater Agreement Based Risk Assessment Scheme … Our Methodology
ISO 31000 Risk Assessment (5.4)
Risk iden fica on (5.4.2)
Risk analysis (5.4.3)
65
perform
Findings Selec on
detailed in
auditor assess select
Findings Survey
fill in
used for
pool of experts Risk evalua on (5.4.4) Risk Scorecards
used in
Inter-rater agreement
Risk treatment (5.5)
Fig. 6.1 ISO 31000 risk assessment process compared to the proposed methodology
make unbiased judgment. They must have agreed to treat all the ISO 9001 finding information as confidential since they contain sensitive information. The auditors should select the experts based on (i) their experience, (ii) their technical expertise, (iii) their business background, (iv) their knowledge of company processes, (v) their roles in the corporation. Survey Structure The panel of experts evaluates the findings using survey we specifically designed for this purpose. The survey is focused on typical ICT risks. The survey is articulated in four main assessments: (i) Risk Categories, (ii) Risk Cost Profile, (iii) Risk Probability and (vi) Risk Impact. Experts are asked to select the applicable Risk Categories (multiple choices), to select the Risk Profile Cost (single choice), Risk Probability (single choice), Risk Impact (single choice). In the following we detail each of the above assessment. Risk Categories assessment The identification of a list of risks associated to an ISO 9001 finding depends to the nature of the business and on the target technology. Table 6.1 shows our risk catalogue made of a set of five main risk categories and a subset of impacted areas. The expert is requested to select for each of the five categories the relevant impacted areas. Profile Cost assessment Risk treatment is a decision-making process whereby risks are treated by selecting and implementing measures to address the specific risks identified in the risk assessment and subsequent risk analysis. Nowadays projects stakeholders involved in development activities typically do their risk analysis considering two aspects: risk probability and risk impact. Table 6.2 shows a typical risk matrix considering probability and impact.
66 Table 6.1 The proposed risk catalogue Risk categories Strategic risk Are risks associated to the decisions taken by directors that could have an impact on organization business objectives
Operational Risk—Legal The risk that a counter-party to a transaction will not be liable to meet its obligations under law Operational risk Risks related to inadequate processes, resource that could results in ineffective or inefficient product/service delivery and reduction of the organization margins
Compliance risk Risk arising from failure to comply with process, laws and regulations
Technical risk Risks related to product/service technical aspects that could result in customer dissatisfaction
R. Cassata et al.
Impacted areas Approval of product/service delivery even if the quality goals are not met Decision impacting credit, financial aspects Decision impacting infrastructures availability Decision impacting resources and headcount Other strategic risks Intellectual property (IP) Brand protection and reputation Legal Lawsuit Other Legal risks Inefficient process (process generates waste of resource/time/money/etc.) Ineffective process (process is unable to produce desired results) KPIs are not capturing relevant indicators and/or can’t be used for business improvement Other operational risks Ability to sell product/service in specific countriesRisk arising from failure to comply with process, laws and regulations Health and safety compliance ESD compliance Product/service limitations (e.g: blind Color GUI) Compliance with international standard Other product/service compliance risk Testability Performance Reliability Availability Scalability Security Maintainability Other technical risk
This catalogue is used during the Risk Categories assessment. It is defined based on our expertise in risk assessment evaluation
To have a more accurate risk analysis of corporate business risks it is important to consider another factor: the cost profile. For example, a risk could have a negligible initial cost but with an exponential cost profile over time that could results in unsustainable cost. The cost profile represents the economic impact of a delay in implementing a risk control, where risk control represents a “measure that maintains and/or modifies risk” (ISO 31000:2018 3.8) [4]. In other words the finding risk’s cost profile is the impact on the corporate projects costs if no control or mitigation are implemented.
6 Inter-rater Agreement Based Risk Assessment Scheme … Table 6.2 Probability/impact matrix
67
Risk probability Risk impact Rare Possible Probable LOW 1 2 3 MEDIUM 2 4 6 3 6 9 - HIGH
Risk Probability/Impact assessment Risk is also analyzed in relation to its potential impact using the risk-matrix in Table 6.2. More formally, the risk score is defined as scor e = fr (Pr obabilit y, I mpact) where f is the function that maps probability and Impact to the score using the risk matrix. Any risk generates a cost for the corporate so, we decided to ask to the expert the impact in terms of corporate monetary cost, defined in our case as LOW (L O W ≤ 10.000$), MEDIUM (10.000$ < M E D I U M < 100.000$ ), HIGH (H I G H ≥ 100.000$). We decided to describe the Risk probability in the next 2 years in qualitative terms, such as Rare: Pr obabilit y ≤ 10%, Possible: 10% < Pr obabilit y < 50% Probable: Pr obabilit y ≥ 50%. Table 6.3 shows the survey structure as it is presented to the panel of experts in relation to a given finding.
6.2.1 Determination of the Inter-rater Agreement In order to quantify the level of inter-rater agreement for a given setting it is customary the design, a metric inspired by the Cohen Kappa [2]: i.e. one computes the observed value of the agreement π —according to some chosen metric—then compares the outcome to its expected value for the case of a random choice of the options by the experts. The rate of improvement with respect to the performance of the random choice process is then adopted as the value of the Kappa metric, which takes the following form π − π (6.1) κ≡ 1 − π where π is the observed agreement rate, and π the chance-expected agreement rate. The maximum value achievable by the Kappa metric is 1. Most often, the Kappa issued by a study is used for benchmarking with respect to an ordinal scale of qualitative agreement expressions, such as the scale devised by [7]: ≤ 0 → “Poor agreement” 0.00 − 0.20 → “Slight agreement”, 0.21 − 0.40 → “Fair agreement”, 0.41 − 0.60 → “Moderate agreement”, 0.61 − 0.80 → “Substantial agreement”, 0.81 − 0.99 → “Almost perfect agreement”, 1.00 → “Perfect agreement”.
68
R. Cassata et al.
Table 6.3 Survey structure example described in Sect. 6.3 Engineering quality assessment finding DESCRIPTION: Confidentially level of the design documents is inconsistent and inappropriate, Design documents information can be shared only when it’s necessary and with people who need to know EVIDENCE: Configuration Management System: J-AX All documents are stored in Joker Directory REQUIREMENTS: ISO 7.5.3.1 Document Information required by the quality management system shall be controlled to ensure: b) it adequately protected /e.g. from loss of confidentially, improper use, or loss of the integrity IMPACT: Inappropriate breaches of confidentially can lead to legal action and/or competitors’ advantages Question
Answer options
1. Strategic Risk
Approval of product/service delivery even if the quality goals are not met Decision impacting credit, financial aspects Decision impacting infrastructures availability Decision impacting resources and headcount Other: …
2. Operational Risk—LEGAL
Intellectual property (IP) Brand Protection and Reputation Legal Lawsuit Other: …
3. Operational Risk
Inefficient process (Process generates waste of resources/time/money/etc.) Ineffective process (process) KPI are not capturing relevant indicators and/or can’t be used for business improvement Other: …
4. Compliance Risks
Ability to sell product/service in specific countries Health and safety compliance ESD Compliance Product/Service limitations (e.g. blind color GUI) Compliance with international standard Other: …
5. Technical Risk
Testability Performance Reliability Availability Scalability Security Maintainability Other: …
6. Risk Assessment—Probability
RARE POSSIBLE PROBABILE
7. Risk Assessment—Impact
LOW MEDIUM HIGH
8. Risk Assessment—Profile Cost
Constant Logarithmic Linear Exponential
6 Inter-rater Agreement Based Risk Assessment Scheme …
69
6.2.2 Kappa Coefficient Formulation The original definition of κ by Cohen has spawned a number of variants that fit diverse settings. A comprehensive review can be found in [3]. Here we consider the assessment of each object (each Finding) by a number n of raters. The setups most relevant to the present work are the following: 1. Ordinal mutually exclusive response categories, with multiple level scale (k > 2), assessed by n > 2 raters which applies to Probability, Impact and Cost Profile assessment 2. Nominal, multiple-level scale, non-mutually-exclusive response categories, assessed by n > 2 raters, which applies to Strategic Risk, Operational Risk— Legal, Operational Risk, and Compliance Risk assessment. Rating with nominal response categories, with single choice over multiple level scale (k > 2), assessed by n > 2 raters Let the index i = 1, 2, . . . , N represent the objects, let the index j = 1, 2, . . . , k represent the categories, and let the index h = 1, 2, . . . , n represent the raters. Let r ij be the number of raters that have assigned object i to category j. Then to quantify the agreement over the fact that the category j is assigned to object i one can count the number of pairs r ij (r ij − 1)/2. Since we are assuming here that each rater assigns the object to exactly one of the k categories, a natural way for quantifying the agreement in a purely nominal setting consists of counting how many rater pairs agree over a category and to compare it to the maximum agreement achievable, i.e. to compute the ratio π ij ≡ r ij (r ij − 1)/n(n − 1). The agreement over object i is the sum over the categories, and the overall agreement is the average over the number N of the objects that were rated by at least one pair
π=
N k r ij (r ij − 1) 1 i i π with π = N i=1 n(n − 1) j=1
Most agreement coefficients share the same definition of the observed agreement: they differ in the expression of the the chance expected agreement. The simplest choice for the chance expected rate is the one by Brennan and Prediger [1], i.e. π B = 1/k: when the rating of an object is a random process, the object is be assigned to any of the k categories with equal probability 1/k. Plugging π and π B in equation (6.1) one obtains the Brennan and Prediger’s Kappa. Rating with ordinal response categories, with single choice over multiple level scale (k > 2), assessed by n > 2 raters. One can treat the ordinal setting as a nominal setting enriched with extra structure, which weights differently the disagreements of categories which are located near and far in the ranking: one can for instance stipulate that ratings where an object is assigned to categories closer in the ordering represent less serious a disagreement than ratings where the object is assigned to categories that are located farther in the ordering. This can be formalized introducing a weight matrix w j such that the matrix element w j = 1 when j = and such that the the matrix
70
R. Cassata et al.
element w j is non-zero, when the categories j and are meant to be considered a partial agreement. This leads to the definition of a weighted count r˜ , which accounts also for the “cross-talk” among categories, and to a corresponding weighted count of the pairwise agreements r ij (˜r ij − 1). Overall the average over objects is ⎛ ⎞ N k k i i 1 ⎝ r j (˜r j − 1) ⎠ i π≡ w j ri with r˜ j ≡ N i=1 j=1 n(n − 1) =1
(6.2)
Possible choices for the weights are linear w j = 1 − | j − |/(k − 1), quadratic w j = 1 − ( j − /(k − 1))2 , square root w j = 1 − (| j − |/(k − 1))1/2 or power of a fixed number, such as w j = 1/3| j−| . Brennan-Prediger agreement coefficient The Brennan-Prediger agreement coefficient [1] is defined by Eq. (6.2) for the observed agreement rate π and by the following definition of the chance expected agreement (index B denotes the Brennan-Prediger definition) 1 π B ≡ 2 w j, . k j, The Kappa index for multi-choice nominal categories in the case of n raters. Consider now the multi-choice case, i.e. assume that each expert can assign to an object up to k distinct and non-mutually exclusive properties (each property being expressed by a response category). Consider a pair of experts, indexed by g and h: with reference to an object i, we denote by Aig the set of options for which rater g has expressed positive opinion, we denote this set cardinality by agi ≡ car d(Aig ), and call it response cardinality for rater g; similarly we denote the response cardinality of rater h by ahi ≡ car d(Aih ). For quantifying the agreement over object i we count i of response categories in which both experts say T r ue, i.e. the only the number x gh Positive Agreements i ≡ car d(Aig ∩ Aih ) x gh i ∈ {0, 1}, whereas in the case In the case of single-choice constraint on e has x gh i i of multi-choice x gh ∈ {0, 1, . . . , min(agi , ahi )}. The quantity x gh has to be compared i i i to a reference value. We use min(ag , ah ) (indeed, given ah and agi , the maximum achievable number of agreements is the minimum of the two numbers) and define the rate of agreement as follows.
i ≡ πgh
i x gh
min(agi , ahi )
(6.3)
Notice that when min(ahi , agi ) = 0—i.e. when at least one of the raters in the pair does not make any assessment for the object—the quantity π is undefined.
6 Inter-rater Agreement Based Risk Assessment Scheme …
71
The most straightforward way to generalize the rate of agreement from 2 to n raters (i.e. ν = n(n − 1)/2 unordered rater pairs) consists of taking the average of the pairwise agreements over the ν pairs i πr.a
n−1 n−1 n n i x gh 1 i 1 ≡ π = ν g=1 h=g+1 gh ν g=1 h=g+1 min(agi , ahi )
(6.4)
we call this rates-average definition (it is denoted by the index r.a.). i Now we consider that the variable x gh is described by an Hypergeometric distribution ([6]) so that i = x gh
agi ahi k
=
max(agi , ahi ) min(agi , ahi ) k
It follows that for the expected value we have i πr.a. =
i n−1 n−1 n n n x gh 1 1 1 max(agi , ahi ) = (h − 1) max(agi , ahi ) = i i ν νk νk min(a , a ) g h g=1 h=g+1 g=1 h=g+1 h=1
Plugging the last expression and Eq. (6.4) in expression (6.1) defines κr.a. . Notice that this expression refers to a single object i: if needed one can average over the objects which have received at least a pair of ratings. However, in this work we focus on the ratings of individual objects. This is the formulation of κr.a. that we used in this work for the non-mutually exclusive nominal response options, which corresponded to the Risk Categories.
6.2.3 Business Risk Scorecard The business risk scorecard was conceived to have a structured and concise report focused on providing the risk evaluation ensuring a consistent view with the relevant information necessary to top management to understand easily: (i) the risks categories exposure, (ii) the results of impact and probability assessment and (iii) the results of the cost profile assessment. We extended the traditional risk analysis score defined as a function of probability (likelihood of an event) and the impact (its consequences) to include the cost profile. We call this new score function Business Risk Severity and is more formally defined as: Business Risk Severit y = f s ( fr (Pr obabilit y, I mpact), Cost Pr o f ile), where fr is the risk score function as defined in Sect. 6.2 and f s is the severity mapping function that takes the risk score and the cost profiles and map them to the level of severity using the Business Risk Severity matrix defined in Table 6.4. The inter-rater agreement is computed and reported in the business risks scorecard as an index of the level of reliability agreement of the experts.
72 Table 6.4 Business risk severity matrix
The Business Risk Severity is depicted in red (High) yellow (Moderate) and green (minor)
R. Cassata et al.
Cost profile Risk score fr Const. Log. Linear Exp. 9 6 4 3 2 1 -
Fig. 6.2 Experts’ votes aggregated by findings and risk categories
6.3 Case Study Having selected the findings the auditor identifies the panel of experts that shows the right competences to evaluate the selected findings according to the criteria in Sect. 6.2. Before starting asking the panel of experts to fulfill our survey, we first presented the ISO 9001 findings (i.e., via brainstorming) we responded to any clarification request. We then submitted the survey to the panel of experts (Survey structure is presented in Sect. 6.2). Figure 6.2 shows survey results in terms of experts’ votes relative to all the findings of our case study aggregated by categories. Note that, the plots of Strategic, Operational, Compliance and Technical categories are affected by multiple choice votes in terms of the absolute numbers presented. Inter-rarer Agreement. The collected survey data are used to evaluate our interrater agreement following the approach in Sect. 6.2.1. Tables 6.5, and 6.6 show the results of the computation of the different kappas. Business Risk scorecard. For each ISO 9001 finding a Business Risk Scorecard is generated as described in Sect. 6.2.3.These risk scorecards summarizes all the impor-
0.44 1.00 0.23 0.33 0.97
0.52 0.14 1.00 0.51 –0.06 0.13 0.19 0.32 1.00 –0.29
0.39 0.68 0.60 0.73 0.40
0.44 0.65 0.33 0.00 0.79
0.15 –0.04 0.56 0.87 –0.67 0.58 0.69 0.44 0.46 –0.50 0.74 0.52 –0.33
The columns refer to the findings, the rows to the different risk categories. The six shades of color give the range in which the number falls, according to Landis and Koch [7]. The cell is white when there are less than two raters expressing a judgment on that case
-
Strategic risk Op erational risk legal Operational risk Compliance risk Technical risk
0.60 0.82 0.91 0.30 0.93
0.38 0.58 0.43 0.33 0.63
Categories 01 κra 02 κra 03 κra 04 κra 05 κra 06 κra 07 κra 08 κra 09 κra 10 κra
Table 6.5 The value of the ratio-average κra for the nominal variables
6 Inter-rater Agreement Based Risk Assessment Scheme … 73
0.08 0.10 0.31
See caption of Table 6.5 for the color code conventions
-
Cost profile Risk probability Risk impact 0.15 0.19 0.61
0.18 0.19 0.40
0.02 –0.01 –0.08 –0.03 –0.07 0.31 0.11 0.40 0.10 0.31 0.29 0.40 0.40 0.10 –0.10
0.31 0.03 1.00
0.08 0.29 0.40
Categories 01 κB 02 κB 03 κB 04 κB 05 κB 06 κB 07 κB 08 κB 09 κB 10 κB
Table 6.6 The value of the Brennan-Prediger κ B for the ordinal variables with weight given by w j = 1/3| j−|
74 R. Cassata et al.
6 Inter-rater Agreement Based Risk Assessment Scheme …
75
tant elements considered by our approach and helps in prioritizing risk mitigation actions.
6.4 Discussion The ISO 9001 finding owner, who is responsible to implement the correction and the preventive action is not always able to provide a reliable evaluation of the risk impact at corporate level. The reliability and the investment in consistent preventive actions (Risk Treatment) depends on knowledge of the different corporate entities and the ability to understand the complexity of the risk. Moreover, each risk can have multiple ramifications impacting several aspects of the business like: financial, infrastructure, brand reputation, security, health, safety, etc. In order to improve Risk Assessment accuracy, we propose an innovative approach that involves a pool of experts coming from different areas of the business and implements a methodology based on four new key ideas: 1. a defined Risk Catalogue (see Table 6.1) with the list of the typical ICT risks, 2. the new concept of Profile Cost (see Sect. 6.2), 3. the introduction of the Inter-rater Kappa (see Sect. 6.2.1) to measure the agreement among the panel of experts and 4. the adoption of the new Business Risk Scorecard (see Sect. 6.2.3) to have a structured and concise report on the risk evaluation to support top management in the decision making process. An advantage of the proposed methodology is the low interaction among the experts and this could be an important factor that would naturally reduces bias and further investigations will be conducted on this aspect.
6.5 Conclusions Uncertainty is a key concept in risk conceptualization and risk assessment; several methodologies to conduct risk assessment are described in ISO 31010 [5]. We think that the approach presented in this research is a valid methodology that potentially could be generalized to become a valid risk assessment technique. The proposed methodology has been applied on real findings coming from several different ISO 9001 internal audits and 10 subject matter experts from 7 different European countries have been involved.
76
R. Cassata et al.
References 1. Brennan, R.L., Prediger, D.J.: Coefficient kappa: some uses, misuses, and alternatives. Educ. Psychol. Measure. 41(3), 687–699 (1981) 2. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measure. 20(1), 37–46 (1960) 3. Gwet, K.L.: Handbook of Inter-rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Advanced Analytics, LLC (2014) 4. ISO Central Secretary: ISO 31000:2018 risk management—guidelines. Standard, International Organization for Standardization, Geneva, CH (2018) 5. ISO Central Secretary: ISO 31010:2019 risk management—risk assessment techniques. Standard, International Organization for Standardization, Geneva, CH (2019) 6. Kupper, L.L., Hafner, K.B.: On assessing interrater agreement for multiple attribute responses. Biometrics, 957–967 (1989) 7. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, 159–174 (1977) 8. Linstone, H.A., Turoff, M., et al.: The Delphi Method. Addison-Wesley, Reading (1975)
Chapter 7
Sequence Classification via LCS Riccardo Dondi
Abstract Computing a longest common subsequence of two given sequences is a fundamental problem in several fields of computer science. While there exist dynamic programming algorithms that can solve the problem in polynomial-time, their running time is considered too high for some practical applications. In this contribution we propose a method for comparing two sequences that is based on (1) a combinatorial algorithm that computes a window constrained longest common subsequence and (2) a genetic algorithm. We present experiments on synthetic datasets that show that the method is able to return solutions close to the length of a longest compute common subsequence. Moreover, the method is faster than the dynamic programming algorithm for the longest common subsequence problem.
7.1 Introduction Longest Common Subsequence (LCS) is a prominent problem in computer science, with several applications ranging from sequence analysis in computational biology [3, 7, 17] to gesture recognition [9] and dataset analysis [16]. In this latter application, LCS has been effectively considered for anomaly detection in order to identify an unusual or anomalous process by comparing an observed time series with the usual functional behaviour of a machine. Given two sequences, the LCS problem looks for a common subsequence (that is a sequence that can be obtained in both sequences after the deletion of some symbols) of maximum length. While the longest common subsequence approach is widely applied for the comparison of two sequences and it is known to be solvable in polynomial time [11], one drawback is the computational complexity of the algorithms to solve it, which is quadratic in the length of input sequences, assuming they have the same length. An extension of the problem considers more than two input sequences. When the number of input sequences is not a constant, the LCS problem is NPhard and also hard to approximate [15]. Genetic programming algorithms have been R. Dondi (B) University of Bergamo, Bergamo, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_7
77
78
R. Dondi
successfully applied to solve this extension of the LCS problem [4, 10, 14]. Notice that, unlike these contributions, the goal of our paper is to focus on the efficiency of a method for comparing two input sequences. In applications like anomaly detection in time-series analysis [12], an observed time series (called pattern in the following) is compared to a sequence representing the usual behaviour of a machinery (called text in the following). The pattern and the text are usually highly similar when the pattern represents a normal behaviour, while a larger dissimilarity is related to an anomalous behaviour. LCS can be used then to distinguish between these two cases [12]. Since the polynomial-time algorithms for the LCS problem are computationally too expensive [1], a possible approach is to design a method that, by approximating LCS, is able to classify usual and an anomalous sequences/behaviours and that can be computed faster than LCS. The approximability of the LCS problem in near linear time has been analyzed by recent contributions [2, 5, 6]. In this work we propose a novel approach for comparing two sequences which is based on a variant of the LCS problem where each position of a pattern is constrained to possibly match a position of the text contained in a restricted window. The idea is that, if this variant of LCS is large, the pattern represents a normal behavior, otherwise the pattern can be classified as anomalous. A related variant of the LCS problem has been considered in literature, by defining a constraint on the distance between two matched positions of a sequence included in a common subsequence [13], but in that case the positions of the pattern are not constrained to be mapped in a window of the text. We present in Sect. 7.3 our method, which combines a dynamic programming algorithm and a genetic algorithm. First, we present a dynamic programming algorithm that computes in O(nw) time a longest common subsequence between a pattern and a text such that the restricted window constraint is satisfied (n is the length of the text or the pattern, and w is the length of the constrained window). Then we present a genetic algorithm, where we consider a limited number of generations (that is 2, 5 and 10) and a populations of small size (that is 5), in order to maintain the efficiency of our method. We present in Sect. 7.4 an experimental evaluation of our method on synthetic datasets consisting of sequences of length 1000 and 10,000, varying the distance between the text and the pattern; we include also two datasets of random generated sequences. The experimental results show that our method computes near optimal solutions. Moreover, it is faster than the dynamic programming algorithm for LCS, in particular for length 10,000 datasets.
7.2 Definitions In this section we introduce the concepts that will be useful in the rest of the paper. Let P be a sequence, over an alphabet Σ, |P| denotes its length and P[i], with 1 ≤ i ≤ |P|, denotes the symbol in position i of P. Given two positions i, j in P, with 1 ≤ i ≤ j ≤ |P|, P[i, j] denotes the substring of P between position i and
7 Sequence Classification via LCS
79
position j. P is a subsequence of P if it can be obtained by deleting some symbols (possibly none) of P. S is a common subsequence of two sequences P and T , if it is a subsequence of both P and T . A longest common subsequence of P and T is a common subsequence of P and T having maximum length. Given two sequences P, T and a common subsequence S of P and T , a position of P (or T ) that belongs to S is called a match, otherwise it is an unmatched position. The Longest Common Subsequence (LCS) problem, given two sequences P and T , asks for a longest common subsequence S of P and T . In what follows, we assume that |P| ≤ |T | = n (otherwise we can simply exchange P and T ). Given a position i, 1 ≤ i ≤ |P|, of P, the position c(i), |T | · i. 1 ≤ c(i) ≤ |T |, of T corresponding to i is defined as c(i) = |P| A window Wi associated with position i of P, with 1 ≤ i ≤ |P|, is defined as an interval of positions in T that includes c(i), the w positions that precedes c(i) (or the first position of T ) and the w positions following c(i) (or the last position of T ). The value of w is chosen large enough so that each position of T belongs to some Wi , in particular that if j belongs to Wi and j − 1 is not in Wi , then j − 1 is in Wi−1 . The window Wi is formally defined as Wi = [l(i), r (i)], where l(i) and r (i) are: l(i) = max (c(i) − w, 1)
r (i) = min(c(i) + w, |T |).
A common subsequence of P and T where each matched position i of P is mapped to a position of T in Wi is called a window constrained common subsequence of P and T . Now, we can define formally the window constrained variant of LCS we are interested into: Problem 1 Window Constrained Longest Common Subsequence (WCLCS). Instance: two sequences P and T , a value w ≥ 1. Solution: a window common constrained subsequence S of P and T of maximum length.
7.3 Method Description In this section we describe our method, which consists of two phases: 1. (Phase 1) We compute a solution of WCLCS in O(nw) time (in our method we consider w = log2 n); 2. (Phase 2) We apply a genetic algorithm that, starting from the solution computed in Phase 1 and other common subsequences of P and T obtained by solving WCLCS on substrings of P and T , computes (possibly longer) common subsequences of P and T . We start by presenting a polynomial-time algorithm for WCLCS, then we present our genetic algorithm.
80
R. Dondi
7.3.1 Phase 1: A Polynomial-Time Algorithm for WCLCS We define a dynamic programming algorithm that solves the WCLCS problem in polynomial time. For ease of exposition, we assume that the two sequences P and T are both extended to position 0 that cannot be matched by any solution of WCLCS. Define function W C L[i, j], where 0 ≤ i ≤ n and j in Wi , as the length of a window constrained longest common subsequence between P[0, i] and T [0, j]. W C L[i, j], with j in Wi and i, j ≥ 1, is computed with the following recurrence (we recall that r (i) represents the rightmost position of Wi ): ⎧ W C L[i − 1, j] ⎪ ⎪ ⎪ ⎪ ⎪ W C L[i, j − 1] ⎪ ⎪ ⎪ ⎪ ⎪ ⎨W C L[i − 1, j − 1] W C L[i, j] = max W C L[i − 1, j − 1] + 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ W C L[i − 1, r (i − 1)] + 1 ⎪ ⎪ ⎩
if j in Wi−1 if j − 1 in Wi if j − 1 in Wi−1 if P[i] = T [ j] and j − 1 in Wi−1 if P[i] = T [ j] and r ( j − 1) < j − 1
(7.1)
In the base case, when i = 0 or j = 0, W C L[i, j], with j in Wi , W C L[i, j] = 0. Lemma 1 W C L[i, j] = k, for i, j ≥ 0, j in Wi and k ≥ 0, if and only if there exists a solution of WCLCS on instance P[0, i] and T [0, j] of length k. Furthermore, W C L[i, j] can be computed in O(nw) time. Proof We start by proving that W C L[i, j] = k for i, j ≥ 0, j in Wi and k ≥ 0, if and only if there exists a solution of WCLCS on instance P[0, i] and T [0, j] of length k. We prove this property by induction on i + j. First, consider the base case, when i = 0 or j = 0. Then, since position 0 of P and T cannot be matched, the length of a window constrained common subsequence is 0 and, by definition of W C L, it holds that W C L[i, j] = 0. Assume that W C L[i, j] = k , for i, j ≥ 0 and i + j ≤ h, if and only if there exists a solution of WCLCS on instance P[0, i] and T [0, j] of length k . We prove that W C L[i, j] = k, for i, j ≥ 0 if and only if there exists a solution of WCLCS on instance P[0, i] and T [0, j] of length k, when i + j = h + 1. Consider a solution s of WCLCS on instance P[0, i] and T [0, j] of length k. If s contains P[i] = T [ j] = a, for some symbol a, then s = s a, where s is a window constrained LCS having length k − 1 of (1) P[0, i − 1] and T [0, j − 1] if j − 1 in Wi−1 , or of (2) P[0, i − 1] and T [0, r (i − 1)] if j − 1 is not in Wi−1 . In the first case, by induction hypothesis, since i − 1 + j − 1 ≤ h, W C L[i − 1, j − 1] = k − 1 and by the fourth case of Recurrence 7.1 it holds that W C L[i, j] = k. In the latter case, by induction hypothesis, since i − 1 + r (i − 1) ≤ h, W C L[i − 1, r (i − 1)] = k − 1 and by the fifth case of Recurrence 7.1 it holds that W C L[i, j] = k. Consider the case that s does not contain at least one of P[i] and T [ j]. If s is a window constrained LCS of P[0, i − 1] and T [0, j], by the first case of Recurrence
7 Sequence Classification via LCS
81
7.1 and by induction hypothesis, since i − 1 + j ≤ h, it holds that W C L[i − 1, j] = k. If s is a window constrained LCS of P[0, i] and T [0, j − 1], by the second case of Recurrence 7.1 and by induction hypothesis, since i + j − 1 ≤ h, it holds that W C L[i, j − 1] = k. If s is a window constrained LCS of P[0, i − 1] and T [0, j − 1], by the third case of Recurrence 7.1 and by induction hypothesis, since i − 1 + j − 1 ≤ h, it holds that W C L[i − 1, j − 1] = k. In all cases, W C L[i, j] = k. Assume that W C L[i, j] = k. Then one of the five cases of Recurrence 7.1 must hold. If the fourth case (or the fifth case, respectively) holds, then P[i] = T [ j] = a, for some symbol a, and W C L[i − 1, j − 1] = k − 1 (or W C L[i − 1, r (i − 1)] = k − 1, respectively). By induction hypothesis, since i − 1 + j − 1 ≤ h (since i − 1 + r (i − 1) ≤ h, respectively), there exists a window constrained LCS s of P[0, i − 1] and T [0, j − 1] (of P[0, i − 1] and T [0, r (i − 1)], respectively) having length k − 1. Then defining s = s a, we obtain a window constrained LCS s of P[0, i] and T [0, j] having length k. If W C L[i − 1, j] = k, since i − 1 + j ≤ h, by induction hypothesis there exists a window constrained LCS s of P[0, i − 1] and T [0, j] having length k. If W C L[i, j − 1] = k, since i + j − 1 ≤ h, by induction hypothesis there exists a window constrained LCS s of P[0, i] and T [0, j − 1] having length k. If W C L[i − 1, j − 1] = k, since i − 1 + j − 1 ≤ h, by induction hypothesis there exists a window constrained LCS s of P[0, i − 1] and T [0, j − 1] having length k. Next, we consider the time complexity to compute W C L[i, j]. W C L[i, j] is defined over O(nw) values, as 0 ≤ i ≤ n and j belongs to Wi , where |Wi | ≤ 2w + 1. Given two values i and j, where 1 ≤ i ≤ n and j belongs to Wi , W C L[i, j] is computed with Recurrence 7.1 in constant time by considering the values W C L[i − 1, j], W C L[i − 1, j], W C L[i − 1, j − 1], W C L[i − 1, R(i − 1)]. Since the base case can be computed in constant time, the overall time complexity to compute W C L[i, j], 0 ≤ i ≤ n and 0 ≤ j ≤ w, is O(nw). An optimal solution of WCLCS on input P, T is equal to the value of W C L[|P|, |T |] (see Lemma 1) and can be computed in O(nw) time.
7.3.2 Phase 2: A Genetic Algorithm Next, we describe our genetic algorithm to improve a solution returned by the exact algorithm for WCLCS. The genetic algorithm considers a set (called population) of z common subsequences (called chromosomes) of P and T . The population is initialized as follows: • A chromosome representing the solution returned by Phase 1 with w = log2 n • A set of z − 1, with z > 1, solutions of WCLCS on instance P[0, i], T [0, j], for i and j chosen randomly in [1, |P|] and [1, |T |], respectively, with w = log2 n. The chromosomes evolve by applying three operations: mutation, crossover, and selection. Each chromosome is evaluated via a fitness function, which is defined as
82
R. Dondi
the length of the common subsequence of P and T represented by the chromosome. Next, we describe the evolution operators we apply. • Mutation. We choose randomly a subsequence/chromosome S of the population and we apply a mutation on the chromosome as follows: – We define two substrings P and T of P and T by randomly selecting two positions a and b of S (and the corresponding pairs of matching positions a P , b P , aT , bT in P and T , respectively). Then, P = P[a P , b P ] and T = P[aT , bT ]. – We compute a solution S of WCLCS on input P and T , with w = log2 l, where l = min(|P |, |T |), by applying the dynamic programming algorithm of Sect. 7.3.1. – A subsequence S ∗ is obtained by replacing S[a, b] in S with S . • Crossover. We consider two chromosomes/common subsequences C A , C B , then we randomly select two positions, i of P and j of T , that are matched in C A , and we compute two common subsequences of P and T as follows: – C A contains the matching positions until i and j of C A and the matching positions after i and j of C B . – C B contains the matching positions until i and j of C B and the matching positions after i and j of C A . • Elitist selection. The z longest chromosomes are transferred to the next generation, in order to avoid a decreasing of the computed solutions. The procedure described above is iterated for g generations. Method Parameters. In order to obtain a fast heuristic, we consider a window of size w = log2 n in Phase 1. The size of the population is limited to 5. We consider three variants of our method, where we vary the number of generations. We denote by 2-g (5-g and 10-g, respectively) the variants obtained by applying 2 (5 and 10, respectively) generations. We consider a small number of generations in order to maintain the efficiency of the method. In each generation, we apply 2 mutations and 2 crossovers.
7.4 Experimental Evaluation The experimental evaluation of the three variants of our method is performed on eight synthetic datasets of sequences, four datasets contain sequences of length 1000 and four datasets contain sequences of length 10,000. Synthetic Datasets. We generate synthetic datasets of sequences having length 1000 and 10,000, over an alphabet of size 26 (ASCII lower case). Six of the datasets are built as follows. We start from a set of 100 random sequences, each one representing a text. Starting from a text, we create a corresponding pattern by applying some
7 Sequence Classification via LCS
83
modifications: subsequences of length 10, 20, 30 in the text are cut and moved to other positions of the sequence. For each length considered (1000 and 10,000), three such datasets are defined, depending on the number of modifications that have been applied: 1000-15 and 10,000-15 (15% modifications of the text length), 1000-30 and 10,000-30 (30% modifications of the text length), 1000-50 and 10,000-50 (50% modifications of the text length). Moreover, we consider two sets of random generated sequences, denoted by 1000-Random and 10,000-Random. Evaluation. We present the experimental results for our method on the synthetic datasets and we compare it with the dynamic programming algorithm for the LCS problem [11]. The experiments were run on Linux Mint version 20.1 with processor 3.1 GHz Intel Core i7 and 16GB of RAM. Due to the stochasticity of the method, we executed 50 independent runs and we present the average values (for running time and solution length) over the 50 runs. All algorithms were implemented in Python. We start by considering the quality of the solutions returned by our method. As shown in Table 7.1, the solutions returned by our method are always suboptimal, except for one dataset (1000-15), for which Phase 1 returned optimal solutions. As expected, the quality of the returned solutions increases with the number of generations, however the improvement is moderate. In particular, when the number of generations increases from 5 to 10 the improvement is very limited. In Table 7.1, we further consider the quality of our method by presenting the approximation factors achieved. Notice that the approximation factor is the ratio between the length of the solutions returned by a variant of our method and the length of an LCS. First, it has to be observed that the approximation factor is always close to 1. Except for the 1000-Random and 10,000-Random datasets, the variants of our method have an approximation factor of at least of 0.931 for the 1000 length datasets and of 0.983 for the 10,000 length datasets. Moreover, even for the two Random datasets the approximation factor is acceptable, being within 0.876 and 0.909, respectively, for the 1000 length dataset and for the 10,000 length dataset,
Table 7.1 Solutions returned by variants of our methods and an optimal solution (LCS): we present the average length of variants of our method, the average optimal solutions and the average approximation factor Dataset 2-g 5-g 10-g LCS 2-g 5-g 10-g Length Approximation factor 1000-15 1000-30 1000-50 1000-Random 10,000-15 10,000-30 10,000-50 10,000-Random
850.150 711.639 647.112 279.754 8617.509 7747.993 6803.049 2947.076
850.150 713.028 648.328 280.376 8621.632 7751.468 6804.500 2950.314
850.150 713.874 648.418 280.382 8622.271 7751.470 6806.092 2950.395
850.150 747.940 695.440 319.250 8828.240 7879.330 6911.100 3240.620
1.000 0.951 0.931 0.876 0.976 0.983 0.984 0.909
1.000 0.953 0.932 0.878 0.977 0.984 0.985 0.910
1.000 0.954 0.932 0.878 0.977 0.984 0.985 0.910
84
R. Dondi
respectively. The approximation factor has a moderate improvement as the number of generations increases, in particular increasing from 5 to 10 generations leads essentially to the same approximation factor. We present also in Table 7.2 the deviation standard, the minimum and the maximum of the approximation ratio achieved by the variants of our method. As it can be seen, the standard deviation is always small and the difference between the minimum and maximum approximation ratio is limited (at most 0.139). Also these values are similar for the three different variants of the method considered. We also analyze the improvement produced by Phase 2 (the genetic algorithm) with respect to the first phase. We report in Table 7.3 the relative reduction Δ achieved by the genetic algorithm, where Δ is defined as follows. Denote by E 1 (E 2 , respectively) the difference between the length of a longest common subsequence and the length of a solution returned by Phase 1 (Phase 2, respectively). Then 2 if E 1 > 0, else Δ = 0. As shown in Table 7.3, even the 2-g variant is Δ = E1E−E 1 able to improve the solutions significantly. Δ is reduced by the 2-g variant of at least 19.445% and at most 36.167%. The 5-g and the 10-g variants have slightly better performances, with moderate differences between these latter two variants.
Table 7.2 Standard deviation, minimum and maximum of the solutions returned by method variants Dataset 2-g 5-g 10-g st. dev. Min Max st. dev. Min Max st. dev. Min Max 1000-30 1000-50 1000-Random 10,000-15 10,000-30 10,000-50 10,000-Random
0.0185 0.0224 0.0236 0.005 0.005 0.004 0.007
0.903 0.849 0.810 0.964 0.968 0.971 0.893
0.995 0.988 0.923 0.987 0.991 0.992 0.927
0.0185 0.0224 0.0234 0.005 0.005 0.004 0.0007
0.903 0.849 0.814 0.966 0.969 0.971 0.895
0.995 0.988 0.924 0.987 0.993 0.992 0.928
0.0184 0.0225 0.0235 0.005 0.005 0.004 0.007
0.902 0.849 0.812 0.966 0.969 0.971 0.894
0.996 0.989 0.926 0.987 0.992 0.992 0.928
Notice that the 1000-15 dataset is not considered as Phase 1 computes an optimal solution for all the sequences in this dataset Table 7.3 Improvement of Phase 2 with respect to Phase 1 Dataset WCLCS solution 2-g (%) 5-g (%) 1000-30 1000-50 1000-Random 10,000-15 10,000-30 10,000-50 10,000-Random
691.42 628,41 270.22 8517.51 7673.58 6742.75 2868.69
35.73 27.900 19.445 32.182 36.167 35.818 21.076
38.231 29.715 20.713 33.509 37.855 36.680 21.946
10-g (%) 39.727 29.849 20.727 33.714 37.857 37.625 21.986
Columns 3, 4 and 5 present the average values of Δ for the three variants of our method. 1000-15 dataset is not considered as Phase 1 computes an optimal solution for all the sequences in this dataset
7 Sequence Classification via LCS
85
Table 7.4 Running time (in seconds) of variants of our method and the dynamic programming algorithm for LCS (denoted by OPT) Dataset 2-g 5-g 10-g OPT 1000-15 1000-30 1000-50 1000-Random 10,000-15 10,000-30 10,000-50 10,000-Random
0.327 0.353 0.388 0.363 5.335 5.390 5.516 5.454
0.463 0.463 0.453 0.377 5.886 6.500 6.498 5.901
0.802 0.801 0.798 0.767 11.164 11.761 11.670 11.279
0.337 0.428 0.450 0.400 26.688 26.744 26.583 27.113
The running time of the method variants (reported in Table 7.4) highly depends on the number of generations. In particular, the 2-g variant is always the fastest, also when compared with the dynamic programming algorithm for the LCS problem (denoted by OPT), although for the 1000 length datasets the difference is small. For 1000 length sequences, the 5-g method has performances similar to OPT, while the 10-g method is significantly slower than all the other methods considered. The results on the 10,000 length sequences show that the running time of all variants of our method outperforms OPT; the slowest variant (10-g) is more than two times faster and the fastest variant (2-g) is almost five time faster. Considering the different variants of our method, there is a considerable difference between the variants with at most 5 generations and the 10-g variant. Conversely, the difference in running time between the 2-g variant and the 5-g variant is moderate, ranging from 0.447 to 1.210 s for the 10,000 datasets.
7.5 Conclusion We have designed a fast heuristics for comparing two sequences by combining (1) a dynamic programming algorithm for a window constrained LCS and (2) a genetic algorithm. The experiments on synthetic datasets show that the proposed algorithm computes near optimal solutions, while reducing significantly the running time with respect to the dynamic programming algorithm for the LCS problem, in particular for the datasets consisting of sequences of length 10,000. As future works, it would be interesting to apply other evolutionary strategies in the second phase of our method in order to possibly improve the running time. Furthermore, it would be interesting to apply the method on real-datasets and to compare it with existing evolutionary approaches to LCS [4, 8] on benchmark datasets. Finally, it would interesting to perform statistical tests to gain more insights into the method performances.
86
R. Dondi
References 1. Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: Guruswami, V. (ed.) IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17–20 Oct 2015, pp. 59–78. IEEE Computer Society (2015) 2. Abboud, A., Rubinstein, A.: Fast and deterministic constant factor approximation algorithms for LCS imply new circuit lower bounds. In: Karlin, A.R. (ed.) 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, 11–14 Jan 2018, Cambridge, MA, USA. LIPIcs, vol. 94, pp. 35:1–35:14. Schloss Dagstuhl—Leibniz-Zentrum für Informatik (2018) 3. Blin, G., Bonizzoni, P., Dondi, R., Sikora, F.: On the parameterized complexity of the repetition free longest common subsequence problem. Inf. Process. Lett. 112(7), 272–276 (2012) 4. Blum, C., Djukanovic, M., Santini, A., Jiang, H., Li, C., Manyà, F., Raidl, G.R.: Solving longest common subsequence problems via a transformation to the maximum clique problem. Comput. Oper. Res. 125, 105089 (2021) 5. Boroujeni, M., Seddighin, M., Seddighin, S.: Improved algorithms for edit distance and LCS: beyond worst case. In: Chawla, S. (ed.) Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, 5–8 Jan 2020, pp. 1601–1620. SIAM (2020) 6. Bringmann, K., Das, D.: A linear-time n0.4 -approximation for longest common subsequence. In: Bansal, N., Merelli, E., Worrell, J. (eds.) 48th International Colloquium on Automata, Languages, and Programming, ICALP 2021, 12–16 July 2021, Glasgow, Scotland (Virtual Conference). LIPIcs, vol. 198, pp. 39:1–39:20. Schloss Dagstuhl—Leibniz-Zentrum für Informatik (2021) 7. Castelli, M., Dondi, R., Mauri, G., Zoppis, I.: Comparing incomplete sequences via longest common subsequence. Theor. Comput. Sci. 796, 272–285 (2019) 8. Djukanovic, M., Berger, C., Raidl, G.R., Blum, C.: An a* search algorithm for the constrained longest common subsequence problem. Inf. Process. Lett. 166, 106041 (2021) 9. Frolova, D., Stern, H., Berman, S.: Most probable longest common subsequence for recognition of gesture character input. IEEE Trans. Cybern. 43(3), 871–880 (2013) 10. Hinkemeyer, B., Julstrom, B.A.: A genetic algorithm for the longest common subsequence problem. In: Cattolico, M. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2006, Proceedings, Seattle, Washington, USA, 8–12 July 2006, pp. 609–610. ACM (2006) 11. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975) 12. Hsiao, K., Xu, K.S., Calder, J., III, A.O.H.: Multicriteria similarity-based anomaly detection using Pareto depth analysis. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1307–1321 (2016) 13. Iliopoulos, C.S., Rahman, M.S.: Algorithms for computing variants of the longest common subsequence problem. Theor. Comput. Sci. 395(2–3), 255–267 (2008) 14. Jansen, T., Weyland, D.: Analysis of evolutionary algorithms for the longest common subsequence problem. Algorithmica 57(1), 170–186 (2010) 15. Jiang, T., Li, M.: On the approximation of shortest common supersequences and longest common subsequences. SIAM J. Comput. 24(5), 1122–1139 (1995) 16. Khan, R., Ali, I., Altowaijri, S.M., Zakarya, M., Rahman, A.U., Ahmedy, I., Khan, A., Gani, A.: LCSS-based algorithm for computing multivariate data set similarity: a case study of real-time WSN data. Sensors 19(1), 166 (2019) 17. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U. S. A. 85(8), 2444–2448 (1988)
Chapter 8
Assured Multi-agent Reinforcement Learning with Robust Agent-Interaction Adaptability Joshua Riley, Radu Calinescu, Colin Paterson, Daniel Kudenko, and Alec Banks Abstract Multi-agent reinforcement learning facilitates agents learning to solve complex decision-making problems requiring collaboration. However, reinforcement learning methods are underpinned by stochastic mechanisms, making them unsuitable for safety-critical domains. To solve this issue, approaches such as assured multi-agent reinforcement learning, which utilises quantitative verification to produce formal guarantees of safety requirements during the agents learning process, have been developed. However, this approach relies on accurate knowledge about the environment to be effectively used which can be detrimental if this knowledge is inaccurate. Therefore, we developed an extension to assured multi-agent reinforcement learning called agent interaction driven adaptability, an automated process to securing reliable safety constraints, allowing inaccurate and missing knowledge to be used without detriment. Our preliminary results showcase the ability of agent interaction driven adaptability to allow safe multi-agent reinforcement learning to be utilised in safety-critical scenarios.
8.1 Introduction An emerging technology with significant underexplored potential is multi-agent systems (MAS) [1], where a collection of agents work collaboratively towards shared goals. This potential has been displayed in domains including search and rescue operations [2], and medical applications [3], amongst many others. This potential is limited by the difficulties that arise from developing MAS, given the complex relationships that the agents share which furthers task assignment and planning J. Riley (B) · R. Calinescu · C. Paterson Department of Computer Science, University of York, Heslington, York, UK e-mail: [email protected] D. Kudenko L3S Research Center, Leibniz Universität Hannover, Hannover, Germany A. Banks Defence Science and Technology Laboratory, Salisbury, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_8
87
88
J. Riley et al.
challenges [4]. This issue is exacerbated when these systems are applied to scenarios where humans will not be present to provide assistance, such as in safety-critical scenarios. In order to solve these challenges, multi-agent reinforcement learning (MARL) is often applied to these systems. Reinforcement learning (RL) [5] is a widely used form of machine learning which makes use of aspects of behavioural psychology to promote desirable behaviour and discourage undesirable behaviour through the use of numerical rewards and punishments, solving objectives efficiently over time. RL can be extended into MAS and allows agents to learn efficient ways to meet objectives while using their collaborative nature to share tasks and roles [6]. However, both RL and MARL are limited by their inherent stochasticity that underpins their learning processes [7]. Safety-critical scenarios, which encompass a large amount of MAS potential uses, require agents to be conservative in their actions and choices due to the hazards present within the environment, meaning the stochastic nature of MARL is not suitable for use. Safe RL, which attempts to negate the MARL issue, is known as one of the most significant challenges of RL, and safe MARL has produced multiple directions of study [8]. One method of producing safe RL and MARL that shows great promise is with the use of the constrained criterion [7]. By removing the ability for agents to take actions and move to states which contain any potential risk, the stochastic nature of RL and MARL is negated, and these techniques can be used safely. However, the main issue with the constrain-criterion are the risks of over-constraining the environment and unnecessarily removing effective ways of completing the required objectives or removing the ability to complete objectives entirely. Two approaches that negate the problems with the constrained approach are Assured RL (ARL) [9], and the approach which we delivered previously called Assured MARL (AMARL) [10, 11], which are focused on safe RL and safe MARL, respectively. AMARL produces assurances that both safety and functional requirements will be met through the use of quantitative analysis (QV). However, currently AMARL is limited by the significant preliminary knowledge required for its use, and its lack of ability to adjust to changing environments. This work we present in this paper makes extensions to our AMARL approach, enabling more robustness regarding inaccurate knowledge, minimal environmental knowledge, and adapting environments. This is achieved using a two-stage extension called Agent Interaction Driven Adaptability (AIDA), which addresses these limitations by using an automated process to allow adaptability at runtime. The rest of this paper is structured as follows. Section 8.2 defines concepts and terminology used to present our approach. Section 8.3 compares our AIDA extension to related work. Section 8.4 introduces our AIDA extension. Sections 8.5 presents a problem domain used for evaluating AIDA and the evaluation results. Finally, Sect. 8.6 concludes our findings and suggests potential future work.
8 Assured Multi-agent Reinforcement Learning …
89
8.2 Background Single-Agent Reinforcement Learning (RL). RL is a machine learning technique that allows an agent to learn deterministic policies with the use of Markov Decision Processes (MDP). MDPS offer a mathematical framework for representing sequential decision-making problems that contain stochastic behaviour [5], these frameworks capture environmental states, actions, rewards, and costs. Policies π : S → A map each state within an environment with an available action, which the agent will take when a state in an MDP is visited [5]. Numerical rewards and punishments are used to guide the agent towards desirable policies. By exploring the MDPs state-space to find higher rewards and exploiting the knowledge of the rewards it has obtained by taking specific actions, the agent will gradually learn increasingly efficient behaviours, eventually solving the MDP. A deterministic policy (π ) can be found once all stateaction pairs are mapped by selecting each state’s most functionally beneficial action. Multi-agent Reinforcement Learning (MARL). Using similar methods as singleagent RL, MARL is framed as a multi-agent MDP (MA-MDP), an extension of a standard MDP to capture multiple agents, otherwise known as a Markov Game. Multiple agents interacting and having differing environmental impacts make the environment non-static, breaking the Markovian principle, making it necessary to use an MA-MDP [5]. Agents working in a shared environment will likely be required to learn how to work collaboratively to achieve shared goals or learn to out-manoeuvre adversarial agents to achieve their best individual performance. Due to this added layer of difficulty, deep MARL using neural networks has become standardised [12]. Abstract Markov Decision Process (AMDP). AMDPS capture a high-level representation of a more detailed MDP. States and actions are aggregated based on how similar they are regarding the information that needs to be captured. AMDPs are commonly used in safety engineering due to the large state spaces that many practical problems contain [13]. AMDPs allow us to drastically simplify these processes by aggregating both states and actions and abstracting them into a single state with corresponding abstracted actions. In this way only the relevant information is captured and allows high-level planning and analysis. Quantitative Verification (QV). QV uses mathematical algorithms to give us the ability to determine if quantitative properties will hold for a state transition model; it does this by examining the entire state-space of the model. In order to express these properties for an MDP, we make use of probabilistic tree logic (PCTL) [14]. PCTL is a temporal logic that facilitates the evaluation of probabilistic bounds on reachability in an MDP. For example, the probability of reaching state s f , where f is the end state, before time T . These models can be efficiently built within probabilistic model checking tools such as PRISM [15], allowing transitions to states to be described and reward structures that allow information to be captured. For example, this can be used to evaluate
90
J. Riley et al.
bounds δ on cumulative rewards associated with actions taken within the problem space, e.g. battery usage > δ when the mission is complete. AMARL. Our AMARL approach [11] is a multi-step approach for the safe constraint of MARL agents using QV to provide formal proofs that safety and functional requirements will hold. AMARL consists of four stages that require partially complete knowledge of the environment. The first stage involves the domain expert analysing the environment, identifying states, actions, risks, and rewards and determining safety and functional requirements. The second stage consists of constructing an AMDP based on the information found in stage one, which is required to facilitate efficient QV. The third stage requires QV to be run over the AMDP created in stage two, directed by the safety and functional requirements described in PCTL. This process will then synthesis safe abstract policies that meet safety and functional requirements. The fourth and final stage consists of the agents’ behaviour being strictly constrained by the chosen safe abstract policy, allowing safe MARL to be used.
8.3 Related Work While safe MARL is relatively new, many of the techniques employed are drawn from safe RL. Our approach falls within the umbrella of techniques that constrain MDPs and MA-MDPS. They have been identified as one of the most promising techniques for producing efficient safety-critical RL agents [7], and this trend has continued into safe MARL. Constrained RL has been applied in various ways with different degrees of safety [16, 17]. We categories these different techniques as, conservative-safe learning, soft-safe learning, and threshold-safe learning. Conservative-safe learning uses a learning process that constrains agents’ actions. In this way risky actions are removed from the agents’ available behaviours, regardless of the actions perceived functionality [17, 18]. However, the conservative approach suffers from being overly constrictive, possibly removing valuable actions, or making the problem unsolvable. Soft, safe learning allows agents to enter into risky states without perceiving the potential risk as a calculation of whether to take the action [19, 20]. These soft constraints allows the agents more freedom in their environment. However, the probability of risk can quickly become unreasonable for a safety-critical scenario, offering little comfort. Threshold-safe learning can be viewed as a compromise between conservative safe learning and soft-safe learning. Agents are permitted to take risky actions, but under stringent limitations that are set by a risk threshold [21–23]. Threshold-safe learning allows agents to make use of risky actions that are functionally appropriate to meeting objectives, but only if they do not bring the level of cumulative risk above a specified amount. These approaches can be expensive in terms of time, and like all current safety-based learning approaches, cannot guarantee optimality.
8 Assured Multi-agent Reinforcement Learning …
91
Constrained MDP approaches from safe RL have been applied to MARL in safetycritical scenarios [24] and several techniques have arisen using this concept though this class of research is in its infancy. Our AMARL approach is one such technique and involves constraints synthesised via QV [10, 11], which builds from previous work in safe RL [23]. Our work, using threshold-safe learning, was the first to apply such methods to safe MARL.
8.4 Agent Interaction Driven Adaptability AIDA, as seen in Fig. 8.1, extends our initial AMARL approach allowing agents to identify inconsistencies in the AMDP at run-time. This extension addresses one of the limiting factors of AMARL, the need for the AMDP to be accurate for safety to be assured. With AIDA the probabilities of mission failure involved with potential risks can be corrected without any interaction by the domain expert. This allows QV, policy synthesis and agent constraint to be applied autonomously. This allows AIDA to identify inconsistencies in risk and rewards, and to safely identify new states and transitions that were previously unknown. AIDA consists of two stages. Firstly, prior to run time, a domain expert defines the environment, requirements and constraints. The second stage is an automated process where the agents compare the low-level domain problem with an abstracted world model to identify inconsistencies which trigger an update of the abstracted model. The new AMDP will be generated based on this updated world model, allowing QV, policy synthesis, and agent constraint.
Fig. 8.1 Agent interaction driven adaptability framework
92
J. Riley et al.
Stage One. This stage comprises three tasks undertaken by a domain expert prior to run-time. Allow perception and tracking of environmental changes defines the mechanisms by which the agents perceive the environment and hence identify environmental changes. For the domain example in Fig. 8.2, agents localise and plan using data from laser sensors to obtain ‘Identity tags’ and to check rooms access for surveillance cameras. This information can then be fed into process one of the automated stage which will update the abstracted world model. Inputs and mission requirements defines the inputs for process two of the automated stage. This consists of information about the robotic system as well as mission and safety requirements. Using PrismModelGen.py the domain expert supplies, the system size, the initial states, and the PCTL formatted mission requirements. This is only required once and facilitates the AMDP generation in the automated stage. Define relationship between constraints and agent behaviour consists of defining those constraints which impact agent behaviour for example an agent may not be allowed to move from room r F to room ri through hallway h 8 . The domain expert supplies agents with the ability to constrain themselves as to allow the automated stage to work correctly. Using the agents’ location data and its identification of objects, the agent would know which room it is in and apply behavioural constraints stopping it from entering the hallway h 8 . This facilitates the agent constraint in process four of the automated stage. Stage Two. The automated process begins when an agent finds an inconsistency between its abstracted world view and the actual low-level environment. The agent then updates its view of the environment and launches the first processes of this stage. Agent Abstracted World View Construction defines the agent’s view of the environment or world view as an abstracted representation. The abstracted world view is supplied as a two dimensional array such that for each state transition we define: whether a transition exists between states; if a state transition will result in objectives being reached; and the risk associated with the state transition. This world view can be updated by the agents within the system as inconsistencies are found. Prism Model Construction uses the abstract world view and the PCTL requirements from stage one, creates an AMDP in the PRISM language format, including states, transitions, risks, and rewards. Policy Synthesis and Selection makes use of QV, applied to the AMDP with the PCTL requirements, to synthesis policies which satisfy the constraints. From these policies the policy which maximises the reward obtained while not exceeding the risk requirement is selected. Agent Constraint makes use of the selected abstract safe policy which is passed back to the agents. Then, using the environmental constraint relationship supplied by the domain expert, the agent constrains its behaviour based on this updated safe policy. This allows learning to continue in a changing environment with the assurances that safety requirements will be met.
8 Assured Multi-agent Reinforcement Learning …
93
8.5 Evaluation Domain example. To evaluate our approach for Deep MARL, we created a navigation domain as shown in Fig. 8.2 within the unity games engine [25]. Nine rooms, labelled r A to r I , each contain a flag that the agents should collect. Rooms are connected by hallways, h i , in which security cameras are placed. The coloured areas of hallways show the coverage of the cameras. If an agent enters one of these coloured areas, then there is a probability that the mission fails. Each agent is equipped with a device to disrupt the security cameras, but the probability of avoiding detection is not certain and is dependent on the area in which the camera is located. The probability of disruption device failing in each location 0.05 for locations h 2 , h 5 , h 7 , h 8 and 0.1 for all other camera locations. The two goals are, by their nature, conflicting. The more objectives an agent finds, the less likely it is that all agents will return safely. Viewing this problem in such as way, we set functional and safety constraints that we expect from the system within the environment, allowing us to state the functional and safety parameters for the problem domain, these being C1 : The probability of agents remaining undetected during the operation should not fall below 0.7; and C2 : The agents should collect at least six flags. These parameters were chosen to demonstrate the approach within this domain and the features that it holds. While this domain is framed as an infiltration mission, AIDA is applicable to a wide range of scenarios, including search and rescue and maintenance tasks. Experimental Setup We implemented our navigation domain example in the Unity games engine and implemented our approach using the MLAgents plugin [25]. Three homogeneous agents were constructed using deep RL with Proximal Policy Optimization and intrinsic curiosity to collect flags while limiting their exposure to risk.
Fig. 8.2 Domain example including rooms, hallways, and cameras
94
J. Riley et al.
Initially, the agents were given minimal information about their environment, and AIDA was used to explore the world and recognise states, transitions, reward locations, and risks. The agents were given preliminary knowledge of two states r A , r B , one transition h 1 , the existence of objectives within the two known rooms, and the knowledge that the worst-case probability of capture from a surveillance camera is 0.1. The agents are expected to identify new states, transitions, and the location of objectives, while initially applying the worst-case probability to identified risks. These probabilities can then be updated as the agents become familiar with these areas of risk. Results As the agents explored the environment they very quickly identified transitions h 2 , h 3 , h 4 , as well as states rC , r D , r E , r F . They also identified the risks associated with the transitions, and goals in all states apart from rC , due to a lack of exploration in these areas led by consistent constraints. The agents identified that the initial risk probabilities were correct for both h3, h4, and left these unchanged. After roughly 100 learning episodes, constrained exploration led to the agents discovering transitions, h 9 , h 10 , and the states and rewards of r G , r H . Rooms r A , r B , r E , r G , r H were routinely visited and allowed through constraints due to the number of objectives to collect with only one necessary risk in h 3 . Shortly after this behaviour was established, the agents explored into state r F and identified that the risk associated with h 7 was inaccurate and lowered to the correct standing of 0.05. This exploration was allowed due to the worst-case probability of this transition still being within the allowed safety requirements. With this lowering, the addition of r F to the normally accepted constraints allowed the agents to meet the mission requirement of six objectives. This is where the Prism generation was instructed to stop, with the end abstracted constraining policy allowing limited travel to states r A , r B , r D , r E , r G , r H , facilitating seven objectives to be met without exceeding the safety limit of 0.3.
Fig. 8.3 Preliminary results showing cumulative risk and reward
8 Assured Multi-agent Reinforcement Learning …
95
The results that accompany this description of the agents’ progression can be seen in Fig. 8.3. Here we display the smoothed average of the cumulative risk and reward to help identify the development of patterns and the true cumulative risk and reward shown in faint colours. These true reward and risk lines show two things, the actual risk and reward of each episode, not based on an average, but also how intrinsic curiosity, while beneficial in overcoming sparse rewards by pushing agents towards unexpected outcomes, makes agent behaviour widely unpredictable. In almost 7000 learning episodes, the safety requirement was met without exception, showing how AIDA can accompany even the most unpredictable of agents. Due to this intrinsic reward the mission requirements of the collected six objectives were quickly found and became common. The true values of risk and reward were included in these results to show that risk levels remained at the acceptable level and due to the true values of each episode being able to show us different trends in behaviours that are lost in the moving average. Most beneficially, from episode 5000 onwards, the true reward shows seven objectives being met more and more commonly, implying that the maximum amount of objectives under the current constraints would be met with more learning time. Lastly, it also allows us to see that the risk value begins to hold increasingly predictably at the mission requirement, giving assurance and predictability to a stochastic process. One final interesting occurrence in the results can be seen at roughly episodes 1500 and 2000, where there are significant drops in risk levels and functionality. This can be attributed to the automated process altering the agents’ constraints and requiring time to learn new behaviours.
8.6 Conclusion AIDA, allows robust adaptability with limited initial information and also in the case of inaccurate information, facilitating agents exploring and constraining their behaviours with automated QV. We show the two-stage process working within a multi-agent infiltration domain with many factors initially unknown to our agents. The preliminary results that we have obtained support this statement and deliver to our knowledge the first addition of automated functional and safety proofs on MARL. Our future work includes more extensive testing of AIDA in multiple settings and increased system sizes to showcase the broad applicability of our approach. In particular, we aim to showcase our approach in a second navigation based domain taking the form of irradiated areas with non-deep learning algorithms to show the scope of potential use.
96
J. Riley et al.
References 1. Dorri, A., Kanhere, S.S., Jurdak, R.: Multi-agent systems: a survey. IEEE Access 6, 28573– 28593 (2018) 2. Frasheri, M., Cürüklü, B., Esktröm, M., Papadopoulos, A.V.: Adaptive autonomy in a search and rescue scenario. In: 2018 IEEE 12th International Conference on Self-adaptive and Selforganizing Systems (SASO), pp. 150–155. IEEE (2018) 3. Hurtado, C., Ramirez, M.R., Alanis, A., Vazquez, S.O., Ramirez, B., Manrique, E.: Towards a multi-agent system for an informative healthcare mobile application. In: KES International Symposium on Agent and Multi-agent Systems: Technologies and Applications, pp. 215–219. Springer (2018) 4. Abbas, H.A., Shaheen, S.I., Amin, M.H.: Organization of multi-agent systems: an overview. arXiv preprint arXiv:1506.09032 (2015) 5. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018) 6. Bu¸soniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. Innov. Multi-agent Syst. Appl. 1, 183–221 (2010) 7. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015) 8. Zhang, K., Yang, Z., Ba¸sar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, pp. 321–384 (2021) 9. Mason, G., Calinescu, R., Kudenko, D., Banks, A.: Assurance in reinforcement learning using quantitative verification. In: Advances in Hybridization of Intelligent Methods, pp. 71–96. Springer (2018) 10. Riley, J., Calinescu, R., Paterson, C., Kudenko, D., Banks, A.: Reinforcement learning with quantitative verification for assured multi-agent policies. In: 13th International Conference on Agents and Artificial Intelligence, York (2021) 11. Riley, J., Calinescu, R., Paterson, C., Kudenko, D., Banks, A.: Utilising assured multi-agent reinforcement learning within safety-critical scenarios. Procedia Comput. Sci. 192, 1061–1070 (2021). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES2021 12. Hernandez-Leal, P., Kartal, B., Taylor, M.E.: Is multiagent deep reinforcement learning the answer or the question? A brief survey. Learning 21, 22 (2018) 13. Faria, J.M.: Machine learning safety: an overview. In: Proceedings of the 26th Safety-Critical Systems Symposium, York, UK, pp. 6–8 (2018) 14. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994) 15. Kwiatkowska, M., Norman, G., Parker, D.: Prism 4.0: verification of probabilistic real-time systems. In: International Conference on Computer Aided Verification, pp. 585–591. Springer (2011) 16. Brunke, L., Greeff, M., Hall, A.W., Yuan, Z., Zhou, S., Panerati, J., Schoellig, A.P.: Safe learning in robotics: from learning-based control to safe reinforcement learning. arXiv preprint arXiv:2108.06266 (2021) 17. Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. arXiv preprint arXiv:2002.12156 (2020) 18. Huh, S., Yang, I.: Safe reinforcement learning for probabilistic reachability and safety specifications: a Lyapunov-based approach. arXiv preprint arXiv:2002.10126 (2020) 19. Wachi, A., Sui, Y.: Safe reinforcement learning in constrained Markov decision processes. In: International Conference on Machine Learning, pp. 9797–9806. PMLR (2020) 20. Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3387–3395 (2019) 21. Srinivasan, K., Eysenbach, B., Ha, S., Tan, J., Finn, C.: Learning to be safe: deep RL with a safety critic. arXiv preprint arXiv:2010.14603 (2020)
8 Assured Multi-agent Reinforcement Learning …
97
22. Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (2020) 23. Mason, G.R., Calinescu, R.C., Kudenko, D., Banks, A.: Assured reinforcement learning with formally verified abstract policies. In: 9th International Conference on Agents and Artificial Intelligence (ICAART), York (2017) 24. Ge, Y., Zhu, F., Huang, W., Zhao, P., Liu, Q.: Multi-agent cooperation Q-learning algorithm based on constrained Markov game. Comput. Sci. Inf. Syst. 17(2), 647–664 (2020) 25. Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., et al.: Unity: a general platform for intelligent agents. arXiv preprint arXiv:1809.02627 (2018)
Chapter 9
Development of a Telegram Bot to Determine the Level of Technological Readiness Darya I. Suntsova , Viktor A. Pavlov , Zinaida V. Makarenko , Petr P. Bakholdin , Alexander S. Politsinsky , Artem S. Kremlev , and Alexey A. Margun Abstract The article proposes a Telegram bot to automate the process of evaluating the technology readiness level (TRL). A brief overview of the solutions existing in this area was carried out, a flowchart reflecting the algorithm of the bot, and images demonstrating the appearance of its interface are presented. The bot has been successfully tested for performance: with equal input data, the conclusion about the TRL level obtained at the output of the algorithm coincides with the result of an expert assessment in accordance with GOST R 58048-2017. It is proposed to use the bot as a source of another additional opinion about the level of technological readiness of the project.
9.1 Introduction The vector of development of modern science and technology is innovation: new cyber-physical systems and big data processing systems are created, new materials and design methods are used, machine learning and artificial intelligence became more and more advanced [1]. Of course, the successful implementation of an innovative project becomes a competitive advantage for any organization, but the development process is always fraught with risks, since it is impossible to predict with absolute accuracy what the results will be. To assess and minimize risks, various programs, methods, business strategies are created [2–5], the best international experience is adopted. Thus, in accordance with Russian legislation, the methodology proposed by NASA for assessing the TRL (Technology Readiness Level) technology readiness level was adapted. The national standard [6] is used to evaluate the achieved results of the project, plan development stages, necessary resources and milestones. Timely
D. I. Suntsova · V. A. Pavlov · Z. V. Makarenko · P. P. Bakholdin (B) · A. S. Politsinsky · A. S. Kremlev · A. A. Margun University ITMO, Saint-Petersburg 197101, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_9
99
100
D. I. Suntsova et al.
assessment of the level of readiness during the process of working on a project helps to reduce the cost of work and project duration. In an effort to make the process of assessing the level of TRL simple and convenient, researchers from different countries are developing automated tools for checking technological readiness—calculators. Tools based on the Microsoft Office Excel software product and on the basis of the freely distributed OpenOffice software package are known. These include the AFRL TRL developed by the US Air Force Research Laboratory and the NASA TRL calculator [7], the interface of which is shown in Fig. 9.1. On the basis of [7], advanced technology readiness level calculators [8, 9] were later created. These tools solve the problem of automating the calculation of numerical values of technology readiness indicators and allow you to save the results of the survey for the specified project. However, they have an unfriendly, intuitive interface and require the installation of additional software. In the calculator created by IMATEC as a website [9], TRL estimation occurs in stages. First of all, the user uploads or creates his project on the site, marking the current stages and documents. After that, it becomes possible to assess the TRL of the project by passing a survey in which, without a positive answer, it is impossible to move from a lower stage to a higher one. Thus eliminating the assignment of a deliberately falsely high level of technology readiness. The interface of the calculator is shown in Fig. 9.2. Despite its simplicity, the proposed solution is not available to a wide audience due to the language barrier: the site exists only in Portuguese, without knowledge of which many instructions become unobvious.
Fig. 9.1 Start window view of the TRL calculator Ver BI.1 beta
9 Development of a Telegram Bot to Determine the Level …
101
Fig. 9.2 Online calculator interface, IMATEC
In the version of the calculator written in the C# programming language and using Microsoft SQL Server databases [10], a higher accuracy of TRL determination was achieved, however, this tool is adapted for use in defense industry programs and is not universal. Calculators are being developed based on client–server technologies [11] in order to conduct a comprehensive assessment of the technological readiness of the project as a whole, including each technology included in it. The low level of awareness of existing software solutions for automating the assessment of the TRL level, as well as the disadvantages of these calculators, is the reason that in practice the determination of the level of technological readiness of a project is usually carried out by an expert method. So, GOST R 58048-2017 “Technology transfer. Guidelines for assessing the level of technology readiness” [12] offers a questionnaire containing a total of 274 technology readiness indicators. The expert is required to give a percentage assessment of how the state of the technology under consideration corresponds to each indicator related to the type of this technology and the selected area of analysis. Thus, despite the indicators clearly defined in the methodological document, the decision on the compliance or noncompliance of the technology with a specific level of readiness remains subjective, since it is based on poorly formalized knowledge of specific people. The administrative difficulties related to the formation of a working group of specialists for the examination [13], the large labor and time costs for manually assessing the TRL level, and the shortcomings of existing software solutions in this area make it urgent to develop an alternative tool for automating the technology readiness check. In this paper, we propose a TRL calculator based on the Telegram messenger. Such an implementation of the algorithm, unlike existing solutions, allows users to evaluate the technology at any time, when necessary, without being tied to a workplace with special software. None of the currently existing TRL calculators assumes the possibility of full-fledged work from a mobile device, and this problem is solved by the proposed development. Finally, compared to the practiced expert approach, the time spent on determining the TRL level using the Telegram bot will be significantly reduced.
9.2 Bot Algorithm The algorithm of the bot is developed in accordance with standards [14, 15], which establish the general provisions for the development of a product concept and technologies in terms of project management creating a product at the initial stage of its life cycle. Unlike the main analogues, the application logic is built on the “question - answer” system, and not on the method of comparing estimates. The method of
102
D. I. Suntsova et al.
Fig. 9.3 The main scheme of the TRL-bot algorithm
comparing estimates consists in the user entering data on the readiness of project stages or passing an additional survey. The “question-answer” system involves the passage of several questions by the user, the answers to which are the basis for a conclusion about the level of readiness of the technology. Block diagrams of the algorithm are shown in Figs. 9.3, 9.4 and 9.5. According to Table A.1 from [14], there are nine levels of technology readiness. The TRL bot evaluates the level at which the project is currently located, and gives a hint about what documents or tests are missing to move to the next level. If necessary, the user can get a link to the previously specified GOST by clicking on the “Link to GOST” button in the main menu. In case the user needs information about the scale of technological readiness, he just needs to click on the button “TRL levels”. To get help on using the bot, you need to click on the “Help/Instructions” button in the main menu. To activate the process of assessing TRL on the principle of “question - answer”, you must click on the button “Assess the Level of TRL”. If necessary, it is possible to go to the main menu (Fig. 9.6) by pressing the “Return to the Main Menu” button. An example of using the bot is shown in Fig. 9.7. The questions of the algorithm are aimed at establishing the fact of testing (or lack thereof), the conditions for testing, the presence and composition of the accompanying documentation. In the presented example, first of all, the user is asked about testing on certified equipment and the availability of acts and protocols for them. If the answer is negative, the fact of conducting preliminary tests of the system and the availability of draft design documentation are specified. If at least one of these conditions is not met, the user is asked whether the layouts of the main systems have been developed, and whether
9 Development of a Telegram Bot to Determine the Level …
103
Fig. 9.4 Branch No. 2 of the TRL-bot algorithm
the calculations and tests of the concept’s performance have been completed. If the answer is positive, the bot concludes that the technology has reached TRL 4. Thus, by answering three questions, the user gets an idea of the project’s readiness. The key indicators based on which the bot makes a decision comply with the recommendations of [14] (Table 9.1). The source code can be found on GitHub [16]. More examples of the work of the bot are presented in Figs. 9.8, 9.9, 9.10 and 9.11.
9.3 Conclusion The proposed Telegram bot solves the problem of automating the definition of TRL. An objective assessment of the level of technological readiness of innovative projects is strategically important both for research organizations involved in their preparation, and for project customers, investors and beneficiaries. The bot is a simple, convenient and affordable way to test the technology being developed. The verification is carried out in accordance with the standards approved in Russia and, at the same time, it is free from subjectivity, one way or another inherent in any technology assessment by the expert method. The speed of decision-making during the execution
104
D. I. Suntsova et al.
Fig. 9.5 Branch No. 3 of the TRL-bot algorithm
Fig. 9.6 Initial menu
of the algorithm, the absence of the need for specific resources or specially trained specialists are also undeniable advantages of the Telegram bot. At this stage, it is recommended to consider the proposed algorithm as a source of a second opinion on the level of technological readiness of the project. In the future, it is planned to improve the algorithm in such a way as to claim a complete replacement for peer review.
9 Development of a Telegram Bot to Determine the Level …
105
Fig. 9.7 Example of evaluating the TRL in the bot interface
Table 9.1 Compliance of indicators for making a decision by the bot with the [14] using the example of TRL 4 TRL Key metrics for bot decision
Description and approximate scope of work according to [14]
4
Operability is demonstrated on detailed layouts A three-dimensional models with high scale and modeling accuracy were applied
Layouts of the main systems were developed Preliminary calculations are done The performance of the concept was tested
The design of the device based on the new technology is described in detail
106
D. I. Suntsova et al.
Fig. 9.8 Example of the work of the bot including the reaction on Incorrect Input (part 1)
Fig. 9.9 Example of the work of the bot (part 2)
9 Development of a Telegram Bot to Determine the Level …
107
Fig. 9.10 “TRL levels” command execution result
Fig. 9.11 “Link to GOST” command execution result
Acknowledgements This research is financed by University ITMO in the context of the project “Artificial intelligence methods for cyber-physical systems” No. 620164.
References 1. Decree of the President of the Russian Federation of December 1, 2016 No. 642—On the Strategy for the Scientific and Technological Development of the Russian Federation 2. Burkov, E., Paderno, P., Sattorov, F., Tolkacheva, A.: Methodological support of the working group in solving the problem of predicting the results of a classification examination. Sci. Tech.J. Inform. Technol. Mech. Opt. 3(21), 426–432 (2021). https://doi.org/10.17586/22261494-2021-21-3-426-432 3. Komarov, A., Pikhtar, A., Grinevsky, I., Komarov, K., Golitsyn, L.: A conceptual model for assessing the technological readiness of a scientific and technological project and its potential
108
4.
5.
6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
D. I. Suntsova et al. at the early stages of development. Econ. Sci. 7(2), 111–134 (2021). https://doi.org/10.22394/ 2410-132X-2021-7-2-111-134 Sartori, A., Sushkov, P., Mantsevich, N.: Principles of lean research and development management based on the methodology of innovation project readiness levels. Econ. Sci. 1–2(6), 22–34 (2019). https://doi.org/10.22394/2410-132X-2020-6-1-2-22-34 Sartori, A., Gareev, A., Ilyina, N., Mantsevich, N.: Application of the readiness levels approach for various subject areas in lean R&D. Econ. Sci. 1–2(6), 118–134 (2020). https://doi.org/10. 22394/2410-132X-2020-6-1-2-118-134 GOST R 58048-2017—Technology transfer. Guidelines for assessing the technology readiness level DAU Tools, https://www.dau.edu/cop/stm/lists/tools/allitems.aspx. Last accessed 27 Jan 2022 Dmitrenko, I.: A variant of the calculator of readiness levels of high-precision technologies. Actual Problems Human. Nat. Sci. 12–1, 103–107 (2016) Dmitrenko, I.: A variant of the calculator of readiness levels of high-precision technologies based on an open software product. Actual Problems Human. Nat. Sci. 1–1, 64–66 (2017) Xavier Jr, A., Veloso, A., Souza, J., Kaled Cás, P., Cappelletti, C.: AEB online calculator for assessing technology maturity: IMATEC. J. Aerospace Technol. Manage. 12 (2020) Altunok, T., Cakmak, T.: A technology readiness levels (TRLs) calculator software for systems engineering and technology management tool. Adv. Eng. Softw. 41, 769–778 (2010) Zhebel, V. et al.: A software tool for a comprehensive assessment of the technological readiness of innovative scientific and technological projects. Econ. Sci. 1(4) (2018) Kravets, A., Drobotov, A.: The use of simulation modeling to assess the quality of business plans for innovative projects. Sci. Tech. J. Inform. Technol. Mech. Opt. 2(72) (2011) GOST R 56861-2016—Development of the concept of the product and technologies GOST R 56862-2016—Life cycle management system GitHub chatbot page. https://github.com/trlitmo/Trl-bot-v3. Last accessed 8 Mar 2022
Chapter 10
Design of Multiprocessor Architecture for Watermarking and Tracing Images Using QR Code Jalel Baaouni , Hedi Choura , Faten Chaabane , Tarek Frikha , and Mouna Baklouti Abstract Several manipulations like copying, editing and diffusing multimedia tracks illegally through the internet and Peer to Peer networks do not represent any challenge even to simple customers but represent a dangerous phenomenon for the software industry. In this paper, the proposed technique consists in embedding a fingerprint, a QR code using an image watermarking and tracing approaches which are applied both on multiprocessor architecture. The proposal has respectively two novelties: using the QR-code as a tracing code provides many advantages as supporting a large amount of information in a compact format, damaging resiliency and addressing problems of computational costs and time of the existing tracing codes during the accusation step. The whole scheme is adapted to be supported by an embedded architecture. The experimental assessments show that the proposed choice has responded to the different requirements of the application.
10.1 Introduction Faced with technological change and the multiplicity of potential problems, security changes dimension and evolves over time. Clearly, it was essential for the image distributor to find accurate measures to eliminate copyright infringement and ensure a high degree of security in legal distribution operations [1]. The literature has focused on watermarking, which consists of embedding a fingerprint in digital content in order to find its supplier and protect it from any piracy operation. But it is still deficient in the context of traffickers. Indeed, the main purpose of an image distributor is to ensure the safe use of press releases and to track down illegitimate users in a piracy trial. Our system, transparent to the user, allows the integrity of the images to be checked and warns the practitioner of violations of this integrity while indicating the parts of the image still usable for analysis but today. J. Baaouni (B) · H. Choura · F. Chaabane · T. Frikha · M. Baklouti National Engineering School of Sfax (ENIS), University of Sfax, BP 1173, 3038 Sfax, Tunisia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_10
109
110
J. Baaouni et al.
However, tightly connected integrated systems play a vital role in many day-today processes as well as in industry and critical infrastructure. Therefore, security engineering for embedded systems is a field that is currently attracting more interest. The demand for information is exploding on the internet. The QR Code contains embedded information, it is much simpler and cheaper to implement. Our objective is to design and develop an application that modifies an image by adding a unique fingerprint that can trace traitors who illegally rebroadcast the image within a limited time. The fingerprint will be an anti-collusion code (Tardos code converted into QR-code) to make the charge more reliable or less complex. To realize this, we focus on an image-tracing scheme based on QR codes and implementating it on a multiprocessor architecture to be embedded in a device. The insertion step of the image watermarking of the images will be applied in an embedded platform which is Raspberry PI 3.
10.2 Background and Related Work 10.2.1 The QR Code A QR code symbol consists of a 2D array of light and dark squares, known as modules [2]. The QR code structure contains modules for encoding data and for function patterns. Function patterns consist of finder patterns, separators, timing patterns and alignment patterns. For example, there are three identical finder patterns located at the upper left and right, and lower left corner of the symbol. The finder patterns are for a QR code reader to recognize a QR code symbol and to determine its orientation. In addition, the QR code structure has an inherent error correction mechanism that allows data to be recovered even if a certain number of modules have been corrupted. The data capacity of a QR code depends on its version and error correction level. There are forty different QR code versions and four error correction levels; namely, L (low), M (medium), Q (quartile) and H (high), which correspond to error tolerances of approximately 7%, 15%, 25% and 30% respectively.
10.2.2 Discrete Wavelet Transform (DWT) The Discrete Wavelet Transform (DWT) is a technique that is widely used in image and signal processing. For digital images, the DWT technique involves the decomposition of an image into frequency channels of constant bandwidth on a logarithmic scale [3, 4]. When applying the DWT technique to a 2D image, the image is decomposed into four sub-bands, which are denoted as LL (low-low), LH (low-high), HL (highlow) and HH (high-high). Each sub-band in turn can be further decomposed at the next
10 Design of Multiprocessor Architecture for Watermarking …
111
level, and this process can continue until the desired number of levels is achieved. In view of the fact that the human visual system is more sensitive to the LL sub-band (i.e. the low frequency component), to maintain image quality watermark information is typically embedded within one or more of the other three sub-bands [3].
10.2.3 Related Work Digital watermarking has been proven effective for protecting digital media. It has recently gained considerable research interest. The watermarking process aims to embed secret data such that the resulting object is not greatly distorted. Also, the embedded watermark bits should resist malicious attacks to protect and/or verify the object ownership. There have been a variety of different uses of QR codes in the area of computer security. In previous work, Chow et al. [5] proposed the use of QR codes for watermarking using two techniques in the frequency domain. Their proposed approach combined the use of the DWT with the Discrete Cosine Transform (DCT) for QR code watermarking. In other work on QR code watermarking, an authentication method for medical images using a QR code based zero watermarking scheme was proposed [6]. In [5], Chow et al. proposed a watermarking method for digital images using the QR code images and based on the discrete wavelet transform (DWT) and the discrete cosine transform (DCT). The proposed method decomposed the cover image using the discrete wavelet transform after it applied the discrete cosine transform on each block of the cover image. The QR code image was transformed using Arnold transform to increase security. Then two pseudorandom number sequences are generated to embed the QR code information in the DCT block of the cover image. The main idea of the proposed watermarking method benefits from the QR code structure that is inherent in the error correction to improve the robustness of the watermarking against attacks. Kang et al. [7] proposed a watermarking approach based on the combination of DCT, QR codes and chaotic theory. In their approach, a QR code image is encrypted with a chaotic system to enhance the security of the watermark, and embedded within DCT blocks after undergoing block based scrambling. In related work, a digital rights management method for protecting documents by repeatedly inserting a QR code into the DWT sub-band of a document was investigated. Others have also proposed different QR code watermarking approaches, for example, by incorporating an attack detection feature to detect malicious interference by an attacker, or by embedding QR code watermarks using a just noticeable difference model to increase imperceptibility [8]. In related work on QR codes for security, Tkachenko et al. [9] described a modified QR code that could contain two storage levels. They called this a two-level QR code, as it had a public and a private storage level. The purpose of the two-level QR code was for document authentication.
112
J. Baaouni et al.
Mohanty et al. [10] developed two versions (low-power, high-performance) of watermarking hardware module. The DC component and the three low frequency components are considered for insertion in the DCT domain. Maity et al. [11] suggested a fast Walsh transform (FWT) based on a Spread Spectrum (SS) image watermarking scheme that would serve for authentication in data transmission. In [12], Korrapati Rajitha et al. proposed an FPGA implementation of a watermarking system using the Xilinx System Generator (XSG). Insertion and extraction of information were applied in the spatial domain. In [13], Rohollah Mazrae Khoshki et al. put forward a hardware implementation of a watermarking system based on DCT. Their work was developed using Matlab-Simulink followed by Altera DSP Builder (integrated with Simulink Embedded coder) for Auto-Code generation. In [14], Rahate Kunal B. et al. suggested a hardware implementation of a fragile watermarking system operating in the spatial domain. Their proposed watermarking scheme was imperceptible and robust against geometric attacks, but fragile against filtering and compression. Hirak Kumar Maitya et al. In [15], Manas N. et al. suggested a hardware implementation of a watermarking algorithm based on phase congruency and singular value decomposition. Their idea consisted in embedding watermark data in the host image using the Singular Value Decomposition (SVD) in the congruency phase mapping points applied in the spatial domain. Their system was implemented using the Xilinx ISE 14.3 tool and a Virtex 5 FPGA device.
10.3 The Proposed Architecture The general scheme of a multimedia distribution system as shown in Fig. 10.1 consists of 5 steps which are respectively: the generating and the encoding of the fingerprints, their embedding in the corresponding releases before their purchasing operation, the construction of the yielded release which is the result of the application of a collusion attack and the tracing process. In this paper, we focus on the second and the sixth steps: the watermarking and the tracing ones. The main issue is to propose a suitable image-tracing scheme, based on QR codes and to carry it out in a multiprocessor architecture to be inserted into a device.
Fig. 10.1 The different steps of watermarking images
10 Design of Multiprocessor Architecture for Watermarking …
113
Fig. 10.2 General scheme
In fact, the challenge of developing an image personalization solution that modifies an image by adding a unique fingerprint, is to be able to detect pirates who illegally retransmitted the image in a limited time. The optimization of the fingerprint size is done by converting it into a QR code, Thus, the tracing traitors’ process consists in decoding the detected QR-code and to identify the possible colluders (Fig. 10.2).
10.3.1 The Proposed QR-Code-Based Architecture One of the main purposes of digital watermarking research is to identify the owner of the original product. To achieve this goal, the embedded QR code may contain relevant data that could help us detect the holder, due to its large capacity and highspeed barcode for reading and decoding two-dimensional. Figure 10.3 explains the process of our watermarking algorithm. As depicted in Fig. 10.3, we present the digital watermarking that protects image from fraudulent operations [16]. The watermarking operation must not change the quality of the watermarked media. Watermarking of images consists in the insertion of a signature, which must satisfy three constraints: • Its presence must be imperceptible, not visible to an untrained user who must not distinguish between watermarking content and the original media. • The robustness means its detection, which must be possible even if the image should resist to attacks. • The capacity, which is the quantity of information carried by the signature, must in certain types of applications, be as large as possible and it is important to find an accommodement between robustness and the length of embedded information.
114
J. Baaouni et al.
Fig. 10.3 Our watermarking algorithm
The watermarking approach we propose to adopt was suggested in [17] because it has high perceptual transparency and robustness to obtain attacks, and it is a data hiding method based on quantitative index modulation (QIM), which was proposed as a solution to high-rate gain attacks. The basic idea behind RDM is to apply conventional Dither Modulation to the Ratio of the current host sample to the previously generated watermark sample. By watermarking, the ratio between the current feature sample and the appropriate function of the L previously watermarked samples, the performance can be improved.
10.3.2 The Proposed Tardos-Based Fingerprinting Approach In this work, we studied how to generate a fingerprinting framework to improve the Tardos code. In order to obtain good results, we show that we can model the decoding process like Tardos fingerprint as a high dimensional nearest neighbor search problem. The Tardos code [18, 19] was proposed as an improvement of the tracing code and has now provided good results. It is considered to have the best code length, Tardos code is one of the most powerful tools for resisting this collusion process by identifying colluders. The matrix for Tardos code is divided on (n × m) with n number of users and m the code word length [20] as follows:
10 Design of Multiprocessor Architecture for Watermarking … Table 10.1 Characteristics of Raspberry Pi 3 and PC Performance Personal computer RAM Processors (CPU) Frequency
4 GB Two CPU Intel Celerons @ (04 Core) 1.6 GHz
115
Raspberry Pi 3 card 1 GB Single CPU in 64 bit 1.2 GHz
• Every user has a codeword x ji which length is m, i: the ith bit in the codeword and j is the jth user • C is the collusion size • ε1, ε2: false positive and false negative probabilities.
10.3.3 The Proposed Embedded System For the embedded system, we use the Raspberry Pi 3 that contains the Broadcom BCM2837 system-on-chip. This single chip contains the CPU and several other devices, to form a complete computer system on one chip and the pc system that have more CPU and memory than Raspberry Pi 3. The Raspberry Pi 3 Model B is the earliest model of the third-generation Raspberry Pi (Table 10.1).
10.4 The Proposed Algorithm As proposed in [21], the proposed tracing algorithm consists in using two tracing codes: the Boneh Shaw and the Tardos code which have the same length m. As mentioned in [21], it is a group-based fingerprinting system where each user belongs to a group and logically has a group identifier chained to his particular identifier. To survive the code length and embedding time constraints, authors in [1] proposed to convert the tracing code into a QR-code and increase the size of the alphabet from a binary one to an alphabet of 4 length cardinality. This conversion should not only provide less insertion time but also good robustness thanks to QR Code characteristics: the internal channel encoding based on Error Correcting code, the Reed Solomon.
116
J. Baaouni et al.
Algorithm 10.1: Tracing Algorithm Data: 1. n: Number of users 2. col : Number of colluders 3. m : Code length 4. x : The matrix of fingerprints 5. y : Fingerprint of the colluded copy Result: 1. Detected colluders Step: 1. Algorithm 2 (Watermarking N images) 2. Algorithm 3 (Check robustness) 3. Attack of collusion on some images → y (colluded image) 4. Tracing algorithm Return (Robust or no, detect, detected colluders)
In line 4, we explain the steps of the Tardos probabilistic code are: 1. Initialization step 2. Construction step 3. Accusation step. The mechanism of the Tardos code is based on three essential steps: 1. First an initialization step to define its various parameters namely n, c, eps1, eps2, m, 2. Second a construction step that consists of: For n users, we have an n × m matrix, Calculation of threshold Z Calculation of p(i) Compute X for each user (density of probability) Charge: The y sequence was extracted from the pirate copy. The score associated with the user j is the following: Sj =
m
g(y(i), x( j, i), p(i))
(10.1)
i=1
3. Third if the score is higher than Z, the user will be guilty (cludders) Our work is based on the watermarking technique detailed in Algorithm 2: In line 3, we decompose the original image into a 24-bit matrix. Then the QR code is also broken down into 24 bits and stored in a matrix. Since the human visual system is less sensitive to this color channel than red or green then we embed the QR code in the third line. Each pixel of the QR code is read, the coordinates (x , y ) are calculated with the Calculate Points () function for which x is randomly selected on
10 Design of Multiprocessor Architecture for Watermarking …
117
Algorithm 10.2: QR code Watermarking Algorithm Data: Original image and tardos matrix Result: Watermarked image 1. Generating the fingerprint 2. Converting the fingerprint to QR code 3. Embedding the fingerprint [17] return Watermarked image
each iteration in order to achieve key-based security. However, each pixel of the QR code is embedded in “the host” image using Rational Dither Modulation. Finally, the watermarked image is returned.
Algorithm 10.3: Checking the robusteness Data : 1. Original QR code (.jpg) 2. QR Code extracted after attack (.jpg) Resultat : Algorithm 1 Robust or no Step: 1. Calcul Evaluation tools 2. Test Robustness 3. Return Algorithm test Robust or no
10.5 Experimentation and Results Since the purpose of the proposed scheme is to insert an invisible watermark, the quality of the marked image must be as good as the original image. To verify this purpose, different steps were taken after embedding the watermark. In addition, a Gamma correction attack is performed to evaluate the robustness of the proposed scheme against attacks.
10.5.1 Evaluation Tools To evaluate robustness, visibility for our algorithm, we decide to use 3 evaluation tools [17]: • MSE (Mean Square Error): The MSE represents the mean squared error between the watermarked image and the original one.
118
J. Baaouni et al.
1 ( f i j − gi j )2 (M.N ) i=1 j=1 M
MSE( f, g) =
N
(10.2)
• PSNR (Peak Signal-to-Noise Ratio): The PSNR makes it possible to determine the imperceptibility of the signature. Watermarking is generally considered to be a watermark that is imperceptible when the PSNR is greater than 36 dB. PSNR( f, g) = 10. log10
2552 M S E( f, g)
(10.3)
• SSIM (Structural Similarity Index): For two image signals x and y it is necessary the comparison of three components: luminance, contrast and structure. – Luminance L(x, y) =
(2μx μ y + C1 ) (μ2x + μ2y + C1 )
(10.4)
C(x, y) =
(2σx σ y + C2 ) (σx2 + σ y2 + C2 )
(10.5)
σx y + C 3 σx σ y + C 3
(10.6)
– Contrast
– Structure S(x, y) =
The three components are combined to get the overall similarity measure, where the exponents α, β and γ are positive integers that define the importance of each component. (10.7) SSIM(x, y) = [l(x, y)]α [c(x, y)]β [s(x, y)]γ
10.5.2 Experimental Results For experimental purposes, a QR code of 153 × 900 px in format .jpg and 12 images in format .jpg with dimensions W × H px, where W = [202–1440] and H = [153–900] were used. Table 10.2 shows the dimensions of the images and their corresponding values of MSE, PSNR, SSIM and after QR code watermarking and we conclude that our watermarking is invisible. Table 10.2 represents results for experimental showing the result of evaluation tools MSE, PSNR, SSIM calculated between the original image and watermarked image, the result prove the invisibility of our watermarking technique. • PSNR > 55 db (for digital images, sometimes a PSNR ≥ 45 dB is considered as good quality). • SSIM = 0.999(=1) (complete similarity).
10 Design of Multiprocessor Architecture for Watermarking …
119
Table 10.2 Comparison between original and watermarked images Image Dimension (px) QR-code (px) MSE
PSNR
SSIM
203 ∗ 153 297 ∗ 170 300 ∗ 168 267 ∗ 189 275 ∗ 183 202 ∗ 202 1440 ∗ 900 1280 ∗ 720 600 ∗ 448 256 ∗ 256 512 ∗ 512 1024 ∗ 1024
55.84 55.78 55.51 57.67 56.19 55.80 55.71 55.61 55.44 55.90 55.91 37.71
0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.96
Bird Game player Glasses Human Hand Human face Face virtual reality Hand virtual reality Game virtual reality Lena_color_256 Lena_color_512 High dimension (HD)
153 ∗ 153 170 ∗ 170 168 ∗ 168 189 ∗ 189 183 ∗ 183 202 ∗ 202 900 ∗ 900 720 ∗ 720 448 ∗ 448 256 ∗ 256 512 ∗ 512 1024 ∗ 1024
0.169 0.171 0.182 0.111 0.156 0.170 0.174 0.178 0.185 0.167 0.166 –
Table 10.3 symbolizes the experimental design, showing the result of evaluation tools after gamma attack between the original QR code and the extracted QR code from the watermarked image after attack Also shows some examples of gamma attack applied to our watermarked images with Gamma = 0:5; 1:5; 2:5; 3:5. Although the gamma attack is very severe, and we observe the high robustness of the proposed scheme. • PSNR > 45 db. • SSIM > 0.8 (robust watermarking algorithm).
10.5.3 Embedding Time In the experimental tests on the embedded system, we observed that when we raise the number of iterations of the algorithm, the variation between the execution time for Raspberry Pi 3 and the PC is bigger. That explains our idea of comparing Raspberry Pi 3 and other embedded systems with more powerful memory and parallelism, to improve our algorithm and ameliorate performance, invisibility and robustness. We notice in Table 10.4 that the execution time of the final algorithm for 50 iterations is 14.1 for Raspberry than 11.9 for pc, and when we increase the number of iterations to reach 100 iterations and 500 shown respectively in Tables 10.3 and 10.4 , we find that the performance of the PC is excellent and gives us good results in shortest time 14.28 s. In Table 10.4 we present the result of the tracing algorithm that use the random function for calculating probability (randomly algorithm), a comparison between the personal computer and the Raspberry card for 100 iterations of traceability.
120
J. Baaouni et al.
Table 10.3 Comparison between original QR code and extracted QR-code Image Gamma attack PSNR/SSIM PSNR SSIM Bird
Game player
Glasses
Human
Hand
Lena-color-256
Lena-color-512
Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0 Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0 Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0 Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0 Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0 Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0 Gamma = 0.5 Gamma = 1.5 Gamma = 2.5 Gamma = 3.0
51.46 50.40 51.28 50.92 50.13 49.89 51.11 50.54 50.39 49.86 54.88 50.42 52.85 54.78 55.14 54.60 52.17 50.96 51.92 50.94 51.75 49.96 51.22 50.89 51.76 49.89 51.30 50.88
0.96 0.93 0.97 0.99 0.94 0.86 0.95 0.98 0.94 0.99 0.50 0.94 0.75 0.54 0.52 0.56 0.87 0.98 0.89 0.97 0.91 0.92 0.99 0.99 0.90 0.96 0.99 0.99
Table 10.4 Execution time of final algorithm (watermarking + traceability) Execution time (s) PC Raspberry 50 traceability iterations 100 traceability iterations 500 traceability iterations
11.9 17.40 48.34
14.1 19.46 62.62
10 Design of Multiprocessor Architecture for Watermarking …
121
We deduce from the first line that the execution time of a watermark is 10.825 s and detect that the image is hacked by 5 colluders with successive numbers 0, 1, 2, 3, 4. But from line 13, we detect 10 fair users with successive numbers 5, 6, 7, 8, 9, 10, 12, 12, 13, 14. At the end of the table, the execution time of a complete watermark and tracing algorithm operation is equal to 17.40922716 s for PC but 19.465 for the Raspberry card.
10.6 Conclusion In this paper, we propose a QR-Code-based watermarking technique to trace back traitors on embedded platforms. The structure of the fingerprint is concentrated on the QR code, which provides the possibility of hiding a large amount of information, so that a large number of users can be tracked in less embedding time without changing the image quality. We recommend using powerful watermarking technique to hide this fingerprint in the image. In order to obtain a good detection rate, we suggest that our system not only resist to different types of attacks, but also provides good robustness. After some tests, the proposed scheme has some limits in invisible watermark on High Dimensions (HD) image as mentioned in Table 10.2. In future work, we will focus on increasing the level of robustness and invisibility on HD image by proposing the specific algorithm of watermarking. Moreover, we will focus also on ameliorating the traceability algorithm applied on HD image. We suggest to use a hybrid multi-processor architecture on the FPGA, and we can select high-performance embedded applications with parallel hardware execution and acceleration algorithms for the PYNQ platform.
References 1. Chaabane, F., Charfeddine, M., Chokri, P., Amar, B.: A QR-code based audio watermarking technique for tracing traitors. In: 23rd European Signal Processing Conference (EUSIPCO) 2. Chow, Y.W., Susilo, S., Baek, J., Kim, J.J.: QR code watermarking for digital images. In: Information Security Applications (2020) 3. Lai, C.C., Tsai, C.C.: Digital image watermarking using discrete wavelet transform and singular value decomposition. IEEE Trans. Instrum. Measur. 59(11), 3060–3063 (2010) 4. Mallat, S.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989) 5. Chow, Y.-W., Susilo, W., Tonien, J., Zong, W.: A QR code watermarking approach based on the DWT-DCT technique. In: Pieprzyk, J., Suriadi, S. (eds.), Information Security and Privacy—22nd Australasian Conference, ACISP 2017, Auckland, New Zealand, July 3–5, 2017, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10343, pp. 314–331. Springer (2017)
122
J. Baaouni et al.
6. Seenivasagam, V., Velumani, R.: A QR code based zero-watermarking scheme for authentication of medical images in teleradiology cloud. Comput. Math. Methods Med. 2013(516465), 16 (2013) 7. Kang, Q., Li, K., Yang, J.: A digital watermarking approach based on DCT domain combining QR code and chaotic theory. In: 2014 Eleventh International Conference on Wireless and Optical Communications Networks (WOCN), pp. 1–7, Sept 2014 8. Lee, H.-C., Dong, C.-R., Lin, T.-M.: Digital watermarking based on JND model and QR code features. In: Advances in Intelligent Systems and Applications, vol. 2, pp. 141–148. Springer (2013) 9. Tkachenko, I., Puech, W., Destruel, C., Strauss, O., Gaudin, J., Guichard, C.: Two level QR code for private message sharing and document authentication. IEEE Trans. Inf. For. Secur. 11(3), 571–583 (2016) 10. Mohanty, S.P., Adamo, O.B., Kougianos, E.: VLSI architecture of an invisible watermarking unit for a biometric-based security system in a digital camera. In: Proceedings of the 2007 Digest of Technical Papers International Conference on Consumer Electronics, pp. 1–2. Las Vegas, NV, USA (2007) 11. Santi, M., Banerjee, A., Abhijit, A., Malay, K.: VLSI design of spread spectrum image watermarking. In: Proceedings of the 13th National Conference on Communication NCC, 2007. IIT Kanpur, India (2007) 12. Rajitha, K., Nelakuditi, U.R., Mandhala, V.N., Kim, T.-H.: FPGA implementation of watermarking scheme using XSG. Int. J. Secur. Appl. 9(1), 89–96 (2015) 13. Khoshki, R.M.: Hardware based implementation of an image watermarking system. Int. J. Adv. Res. Comput. Commun. Eng. 3(5) (2014) 14. Rahate Kunal, B., Bhalchandra, A.S., Agrawal, S.S.: VLSI implementation of digital image watermarking. Int. J. Eng. Res. Technol. (IJERT) 2(6) (2013) 15. Nayak, M.R., Bag, J., Sarkar, S., Sarkar, S.K.: Hardware implementation of a novel water marking algorithm based on phase congruency and singular value decomposition technique. Int. J. Electron. Commun. (2017) 16. Chaabane, F., Charfeddine, M., Ben Amar, C.: A survey on digital tracing traitors schemes. In: 9th International Conference on Information Assurance and Security (lAS) (2013) 17. Guzman-Candelario, C.L., Garcia-Hernandez, J.J., Gonzalez-Hernandez, L., A low-distortion QR code watermarking scheme for digital color images robust to gamma correction attack. In: 11th International Conference for Internet Technology and Secured Transactions (ICITST2016) 18. Skoric, B., Oosterwijk, J.: Binary and q-ary Tardos codes, revisited. IACR Cryptology ePrint Archive (2012) 19. He, S., Wu, M.: Collusion resistant video fingerprinting for large user group. In: ICIP’06 (2006) 20. Tardos, G.: Optimal probabilistic fingerprint codes. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing. ACM, San Diego, CA, USA (2003) 21. Desoubeaux, M., Le Guelvouit, G., Puech, W.: Fast detection of Tardos codes with BonehShaw types. In: Proceedings of the SPIE 8303, Media Watermarking, Security, and Forensics (2012)
Chapter 11
NIR-SWIR Spectroscopy and Imaging Techniques in Biomedical Applications—Experimental Results Yaniv Cohen, Ben Zion Dekel, Zafar Yuldashev, and Nathan Blaunstein
Abstract The present work is a continuation of the previous research on theoretical and experimental analysis of novel methods of infrared tomography for earlier diagnosis of gastric colorectal and cervical cancers, based on thermal imaging in mice. In this work, we present a preclinical experimental analysis of lung and mammary cancers, based on near-infrared (NIR)—shortwave infrared (SWIR) spectroscopy and imaging techniques, which can be approbated for different biomedical applications. It was obtained experimentally that highly significant reduction in temperature can be produced by relatively small tumors and, therefore, small metastasis less than 1 mm can be detected and observed. It was shown that usage of the SWIR camera during experimental trials allows to obtain deeper penetration in the mucosae and clear view of a tumor compared to visual camera.
11.1 Introduction Lung cancer remains the leading cause of cancer incidence and mortality, with 2.1 million new lung cancer cases and 1.8 million deaths predicted in 2018, representing close to 1 in 5 (18.4%) cancer deaths. Worldwide, there will be about 2.1 million newly diagnosed female breast cancer cases in 2018, accounting for almost 1 in 4 cancer cases among women. Recently improved devices have allowed for a more accurate assessment of infrared imaging as a screen for human breast cancer. Dynamic or active imaging of other tumors due to minute changes in environmental temperatures around these Y. Cohen (B) National Research University Higher School of Economics, Moscow 101000, Russia e-mail: [email protected] B. Z. Dekel · N. Blaunstein Ruppin Academic Center, 4025000 Emek Hefer, Israel Z. Yuldashev Bioengineering Systems Department, Saint Petersburg Electrotechnical University “LETI”, Saint Petersburg, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_11
123
124
Y. Cohen et al.
tumors may allow us to screen and to characterize these tumors more accurately and also at a very early stage. The objective of the present experiments is to explore the sensitivity of short wave infra-red camera and infrared thermal cameras in the range of 8–12 um to identify and characterize mouse tumors formed by the injections of two different tumors (lung and mammary mouse tumor cells) subcutaneously and follow the development of these tumors. By using the non-contact and non-invasive short wave infrared and thermocameras we propose to monitor the mice and try to detect miniscule temperature changes in-vivo (+0.1 °C) (regardless to whether they are elevated or depressed), as compared to the surrounding area and to non-infected controls.
11.2 Methodology of Experimental Trials The study was approved by the ethics committees of Hadassah medical center, Israel and was conducted in accordance with the Declaration of Helsinki, NIH approval number: OPRR-A01-5011. Four groups, each containing 10 mice, injected subcutaneously as follow: Group 1. Injected with H460 (mouse lung tumor). Group 2. Injected with MMAC (mouse mammary adenocarcinoma). Group 3. Injected with both types but at different sites. Group 4. Injected with medium only-serve as control. It should be noted that these tumors usually develop locally with no metastasis within the 20 day period in which we propose to screen the tumors. One day after and subsequently for 18 additional days all groups will be anesthetized and screened for specific temperature changes. Also at each point one mouse from each group will have its peritoneum exposed, scanned and it will be sacrificed. Biopsies of internal organs and skin will be harvested and sent for pathological examination in order to correlate tumor growth with the results obtained from the thermographs. Digital infrared thermal images collected using Thermal camera “Gilboa” by OPGAL LTD, Pixel Size—25 µm, Spectral Range—7.5–14 µm, Number of Pixels—640 × 480, Thermal resolution—70 mK, Focus—12 cm, Lens—18 mm, FOV—25.5°. Digital SWIR images collected using IK1523Camera from ABS-Jena, Germany. Spectral range 0.9–1.7 µm, Pixel size 25 µm, Resolution 640 (H) × 512 (V), Sensor 1,3 , InGaAs (Figs. 11.1, 11.2, 11.3, 11.4 and 11.5). The near infrared—short wave infrared (NIR-SWIR) region spans nominally the wavelength range ~750–2500 nm, in which absorption bands correspond mainly to overtones and combinations of fundamental vibrations (depending on the overtone order and the bond nature and strength). The NIR-SWIR range, similar to visible light, responds primarily to reflected light from objects rather than the thermal emissions from those objects.
11 NIR-SWIR Spectroscopy and Imaging Techniques in Biomedical …
125
Fig. 11.1 Shaved mouse ready for imaging
The intensity of NIR-SWIR bands depends on the change in dipole moment and the non-harmonicas of the bond. Because the hydrogen atom is the lightest, and therefore exhibits the largest vibrations and the greatest deviations from harmonic behavior, the main bands typically observed in the NIR-SWIR region correspond to bonds containing this and other light atoms (namely C–H, N–H, O–H and S–H); by contrast, the bands for bonds such as C=O, C–C and C–Cl are much weaker or even absent. The NIR spectrum contains not only chemical information of use to determine compositions, but also physical information that can be employed to determine physical properties of samples. Human tissues contain a variety of substances whose absorption spectra at NIR-SWIR wavelengths are well defined, and which are present in sufficient quantities to contribute significant attenuation to measurements of transmitted light. The concentration of some absorbers, such as water, melanin, and bilirubin, remain virtually constant with time. However, some absorbing compounds, such as oxygenated haemoglobin (HbO2 ), deoxyhaemoglobin (Hb), and oxidised cytochrome oxidase (CtOx), have concentrations in tissue which are strongly linked to tissue oxygenation and metabolism. The absorption and scattering characteristics of tissue components, such as water, fat, oxyhemoglobin (HbO2 ) deoxyhemoglobin (Hb) and melanin, determine the penetration depth of light in tissues. In the ultraviolet–visible part of the spectrum ( 0.
x→∞
A. Alsubie (B) Department of Basic Sciences, College of Science and Theoretical Studies, Saudi Electronic University, Riyadh, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_13
149
150
A. Alsubie
An important property of the HT distributions is called the regular variational property. A HT distribution is called regularly varying, if it obeys lim
x→∞
1 − G (ux; ) = u − , ∀ u > 0, > 0, 1 − G (x; )
where the quantity is termed as an index of regular variation. The HT distributions possess another important characteristic known as the slowly varying property; see Ahmad et al. [2]. Let L(.) be a slowly varying function, then it satisfies lim
x→∞
L (ux; ) = 1, ∀ u > 0. L (x; )
The slowly varying function tells us that the function L(.) does not change much as we go out towards the tail of a distribution. It means that the size of L (ux; ) compare to L (x; ) is roughly the same scale. Therefore, the ratio of both quantities is equal to 1, as we got out towards the infinity (i.e., x → ∞). Equivalently, the slowly varying function can also be written as lim
x→∞
L (ux; ) − L (x; ) = 0, ∀ u > 0. L (x; )
Due to the applicability and importance of the HT distributions in the actuarial and financial sectors, numerous new HT distributions have been introduced; see Tomarchio and Punzo [11], Afuecheta et al. [1], Bakar et al. [6], Ahmad et al. [4], and Tomarchio et al. [12]. In this paper, we introduce a new statistical methodology to obtain new statistical distributions for modeling the financial data sets. Definition A random variable X has a new modified-G (for short “NM-G”) family, if it’s DF M (x; δ; ) is given by (1 + δ)2 G¯ (x; ) M (x; δ; ) = 1 − 2 , δ < 1, x ∈ R, 1 + δ G¯ (x; )
(13.1)
where G¯ (x; ) = 1 − G (x; ) is the SF of the baseline model with parameter vector . The function defined in Eq. (13.1) is a valid DF, if and only if, δ < 1. Corresponding to M (x; δ; ), the PDF m (x; δ; ) is given by (1 + δ)2 g (x; ) m (x; δ; ) = 3 1 − δ G¯ (x; ) , x ∈ R, 1 + δ G¯ (x; )
(13.2)
Furthermore, in link to M (x; δ; ) and m (x; δ; ) , the survival function (SF) M¯ (x; δ; ), and hazard function (HF) h (x; δ; ) are given by
13 On Modeling the Insurance Claims Data Using a New …
(1 + δ)2 G¯ (x; ) M¯ (x; δ; ) = 2 , x ∈ R, 1 + δ G¯ (x; )
151
(13.3)
and −1 g (x; ) G¯ (x; ) h (x; δ; ) = 1 − δ G¯ (x; ) , x ∈ R, ¯ 1 + δ G (x; ) respectively. In this paper, we implement the NM-G distributions approach and introduce updated versions of the generalized exponential (Gen-exponential) distribution. The updated version of the Gen-exponential is called, a NMGen-exponential distribution. In Sect. 13.2, the expressions for the PDF, DF, SF, and HF of the NMGenexponential distribution are obtained. Furthermore, different PDF behaviors of the NMGen-exponential distribution are also provided. In Sect. 13.3, the HT characteristics of the NMGen-exponential distribution are proved mathematically. Section 13.4 is devoted to analyzing a financial data set to show the importance and practical illustration of the NMGen-exponential distribution. Finally, some concluding remarks are presented in Sect. 13.5.
13.2 NMGen-exponential Distribution Consider the DF G (x; ) of the Gen-exponential is given by α G (x; ) = 1 − e−ϑ x , ϑ ∈ R+ , α ∈ R+ , x ∈ R+ ,
(13.4)
with PDF α−1 , x ∈ R+ , g (x; ) = αϑe−ϑ x 1 − e−ϑ x and SF α G¯ (x; ) = 1 − 1 − e−ϑ x , ϑ ∈ R+ , α ∈ R+ , x ∈ R+ ,
(13.5)
where = (α, ϑ). Using Eq. (13.4) in Eq. (13.1), we get the CDF of the NMGen-exponential distribution, given by α (1 + δ)2 1 − 1 − e−ϑ x M (x; δ; ) = 1 − α 2 , 1 + δ 1 − 1 − e−ϑ x δ < 1, ϑ ∈ R+ , α ∈ R+ , x ∈ R+ .
(13.6)
152
A. Alsubie
The PDF of the NMGen-exponential distribution is given by m (x; δ; ) =
α−1 α (1 + δ)2 αϑe−ϑ x 1 − e−ϑ x 1 − δ 1 − 1 − e−ϑ x , α 3 1 + δ 1 − 1 − e−ϑ x x ∈ R+ .
(13.7)
Furthermore, the SF and HF of the NMGen-exponential distribution are given by α (1 + δ)2 1 − 1 − e−ϑ x + ¯ M (x; δ; ) = α 2 , x ∈ R . 1 + δ 1 − 1 − e−ϑ x and α−1 α −1 αϑe−ϑ x 1 − e−ϑ x 1 − 1 − e−ϑ x α h (x; δ; ) = 1 + δ 1 − 1 − e−ϑ x α × 1 − δ 1 − 1 − e−ϑ x , respectively. Some possible behaviors of m (x; δ; ) are obtained in Fig. 13.1. The plots in Fig. 13.1, show that as the value of δ increases, the NMGen-exponential distribution possesses the HT behavior.
1.0
Fig. 13.1 The PDF plots of the NMGen-exponential distribution
0.0
0.2
0.4
m(x)
0.6
0.8
α =1.2, δ =0.2, ϑ =1 α =1.2, δ =0.5, ϑ =1 α =1.2, δ =0.9, ϑ =1
0
1
2
3
4 x
5
6
7
13 On Modeling the Insurance Claims Data Using a New …
153
13.3 The HT Characteristics In the previous section, we showed visually (see Fig. 13.1) that the NMGenexponential distribution possesses the HT behavior. In this section, we prove mathematically the HT characteristics of the NMGen-exponential distribution.
13.3.1 The Regularly Varying Tail Behavior This subsection is devoted to analyze the RVTB (regularly varying tail behavior) of the NMGen-exponential distribution. According to Karamata’s theorem [10], in terms of SF M¯ (x; δ; ), we have Theorem 1 If G¯ (x; ) is the SF of the regular varying distribution, then M¯ (x; δ; ) is also a regular varying distribution. Proof Suppose lim x→∞ (13.3), we have
¯ ) G(ux; ¯ ) G(x;
= f (u) is finite but non-zero ∀ u > 0. Using Eq.
2 1 + δ G¯ (x; ) M¯ (ux; δ; ) G¯ (ux; ) = lim · lim . ¯ (x; δ; ) ¯ (x; ) 1 + δ G¯ (ux; )2 x→∞ M x→∞ G
(13.8)
Using Eq. (13.5) in Eq. (13.8), we get α 2 1 + δ 1 − 1 − e−ϑ x M¯ (ux; δ; ) G¯ (ux; ) lim = lim · , ¯ (x; δ; ) ¯ (x; ) 1 + δ 1 − 1 − e−ϑ(ux) α 2 x→∞ M x→∞ G α 2 1 + δ 1 − 1 − e−ϑ∞ M¯ (ux; δ; ) G¯ (ux; ) = lim · lim , ¯ (x; δ; ) ¯ (x; ) 1 + δ 1 − 1 − e−ϑ(u∞) α 2 x→∞ M x→∞ G α 2 M¯ (ux; δ; ) G¯ (ux; ) 1 + δ 1 − 1 − e−∞ = lim · lim 2 , ¯ (x; δ; ) ¯ (x; ) x→∞ M x→∞ G 1 + δ 1 − (1 − e−∞ )α α 2 M¯ (ux; δ; ) G¯ (ux; ) 1 + δ 1 − 1 − e1∞ = lim · , lim ¯ (x; δ; ) ¯ (x; ) 1 + δ 1 − 1 − 1 α 2 x→∞ M x→∞ G ∞ e 1 α 2 ¯ ¯ M (ux; δ; ) G (ux; ) 1 + δ 1 − 1 − ∞ = lim · lim , ¯ (x; δ; ) ¯ (x; ) x→∞ M x→∞ G 1 α 2 1+δ 1− 1− ∞ 2 M¯ (ux; δ; ) G¯ (ux; ) 1 + δ {1 − (1 − 0)α } lim = lim · , ¯ (x; δ; ) ¯ (x; ) 1 + δ {1 − (1 − 0)α }2 x→∞ M x→∞ G lim
x→∞
M¯ (ux; δ; ) G¯ (ux; ) [1 + δ {1 − 1}]2 = lim · , ¯ (x; ) [1 + δ {1 − 1}]2 x→∞ G M¯ (x; δ; )
154
A. Alsubie
M¯ (ux; δ; ) G¯ (ux; ) [1 + δ × 0]2 = lim · , ¯ (x; δ; ) ¯ (x; ) [1 + δ × 0]2 x→∞ M x→∞ G M¯ (ux; δ; ) G¯ (ux; ) = lim , lim ¯ (x; δ; ) ¯ (x; ) x→∞ M x→∞ G M¯ (ux; δ; ) lim = f (u) . ¯ (x; δ; ) x→∞ M lim
which is finite but non-zero ∀ u > 0; thus, M¯ (x; δ; ) is a regular varying distribution.
13.3.2 A Supportive Application of the RVTB Consider the distribution of X has the power law behavior (PLB), then we have G¯ (x; ) = P(X > x) ∼ x − . By implementing the Karamata’s theorem, we can write G¯ (x; ) as G¯ (x; ) = x − L (x; ) , where L (x; ) represents the slowly varying function. From Eq. (13.3), we have (1 + δ)2 G¯ (x; ) M¯ (x; δ; ) = 2 . 1 + δ G¯ (x; )
(13.9)
Since G¯ (x; ) = x − . Therefore, we can write Eq. (13.9), as follows (1 + δ)2 x − M¯ (x; δ; ) = 2 , 1 + δx − M¯ (x; δ; ) = x − L (x; ) , where
(1 + δ)2 L (x; ) = 2 . 1 + δx −
(13.10)
If L (x; ) is a slowly varying function, then the obtained result of RVTB is true. According to Resnick [9], for all > 0, we have to show that lim
x→∞
L (ux; ) = 1. L (x; )
13 On Modeling the Insurance Claims Data Using a New …
155
By incorporating Eq. (13.10), we get 2 1 + δx − L (ux; ) (1 + δ)2 = × , 2 L (x; ) (1 + δ)2 1 + δ(ux)− 2 1 + δx − L (ux; ) = 2 . L (x; ) 1 + δ(ux)−
(13.11)
By applying lim x→∞ on both sides of Eq. (13.11), we have 2 1 + δx − L (ux; ) = lim lim 2 , x→∞ L (x; ) x→∞ 1 + δ(ux)− 2 1 1 + δ (∞) L (ux; ) = lim lim 2 , x→∞ L (x; ) x→∞ 1 1 + δ (∞) L (ux; ) [1 + δ × 0]2 = , x→∞ L (x; ) [1 + δ × 0]2 L (ux; ) = 1. lim x→∞ L (x; ) lim
13.4 Data Analysis Here, we illustrate the NMGen-exponential distribution by taking a data set from the financial sector. The data consist of five hundred and seventy-five (575) observations and represents the initial claims of the unemployment insurances. For the ease of numerical computation, each observation is divided by 1500. The summary measures of the data set is provided in Table 13.1. Using the financial data, the comparison of the NMGen-exponential distribution is done with the Gen-exponential model (see Eq. (13.4)), Weibull, exponentiated Weibull (Exp-Weibull), and Marshall-Olkin Weibull (MO-Weibull) distributions. The DFs of the Weibull, Exp-Weibull, and MO-Weibull distributions are given by
Table 13.1 The summary measures of the unemployment insurance data Min 1st Qu. Median Mean 3rd Qu. Max. 32.84
55.14
66.68
72.79
86.49
205.57
Variance 559.2492
156
A. Alsubie λ
G (x) = 1 − e−ϑ x , x ∈ R+ , ϑ ∈ R+ , λ ∈ R+ ,
λ α G (x) = 1 − e−ϑ x , x ∈ R+ , α ∈ R+ , ϑ ∈ R+ , λ ∈ R+ , and λ
G (x) =
1 − e−ϑ x , x ∈ R+ , φ ∈ R+ , ϑ ∈ R+ , λ ∈ R+ , 1 − (1 − φ)e−ϑ x λ
respectively. The results of the NMGen-exponential, Gen-exponential, Weibull, Exp-Weibull, and MO-Weibull distributions are compared using the Kolmogorov–Smirnov (KS) test with the p-value. Using the financial data, the estimated values of the parameters along with the KS test and p-value are presented in Table 13.2. A model having a higher p-value and smaller KS test value is considered a useful model for the underˆ ϑ, ˆ λˆ , KS and p-value of the fitted models Table 13.2 The values of α, ˆ δ, δˆ ϑˆ λˆ φˆ Model αˆ
50
100
150
−2
−1 0 1 2 Theoretical Quantiles
1.0 0.8 0.6 0.2 0.0 100
150
200
0
PP plot 3
50
100 x
150
200
0.0 0.2 0.4 0.6 0.8 1.0
200 150 100 50 −3
p-value 0.2851 0.1476 0.1582 0.1714 0.2108
Emprical NMGen−exponential
x
NMGen−exponential
KS 0.0411 0.0978 0.0625 0.0462 0.0439
0.4
Fitted SF
1.0 0.6 50
200
x
Sample Quantiles
2.0846
Emprical NMGen−exponential
0.2
Fitted DF
0.4
NMGen−exponential
1.1983 1.0114 1.2074
0.8
−0.1629 0.0512 0.0396 0.0765 0.0516 0.0643
0.0
Fitted PDF
0.000 0.005 0.010 0.015 0.020 0.025
NMGen-exponential 28.5442 Gen-exponential 10.3221 Weibull Exp-Weibull 28.5084 MO-Weibull
0.0
0.2
0.4
x
0.6
0.8
1.0
Fig. 13.2 The estimated PDF, DF, SF, QQ, and PP of the NMGen-exponential distribution for the financial data
13 On Modeling the Insurance Claims Data Using a New …
157
lined data. From the results in Table 13.2, we can see that the NMGen-exponential has a smaller KS test and the highest p-value. These facts lead to the results that the NMGen-exponential distribution may be a better choice to implement for analyzing the financial data. Furthermore, the fitted PDF, DF, SF, along with the PP (probability–probability) and QQ (quantile–quantile) plots of the NMGen-exponential distribution are also obtained in Fig. 13.2. The plots in Fig. 13.2, show that the NMGen-exponential distribution provides a close fit to the financial data.
13.5 Concluding Remarks This paper contributed to the literature of distribution theory by introducing a new methodology for introducing new HT distributions. The proposed method was called, a NM-G family of distributions. Using the NM-G method, an updated version of the Gen-exponential distribution called, a NMGen-exponential distribution was introduced. The HT characteristics of the NMGen-exponential distribution were shown visually and mathematically. To establish the applicability of the NMGenexponential distribution, a financial data set related to the insurance claims was analyzed. The practical illustration showed that the NMGen-exponential distribution can be a more suitable model for dealing with financial phenomena.
References 1. Afuecheta, E., Semeyutin, A., Chan, S., Nadarajah, S., Ruiz, D.A.P.: Compound distributions for financial returns. PLoS ONE 15(10), e0239652 (2020) 2. Ahmad, Z., Mahmoudi, E., Dey, S.: A new family of heavy tailed distributions with an application to the heavy tailed insurance loss data. In: Communications in Statistics-Simulation and Computation, pp. 1–24 (2020) 3. Ahmad, Z., Mahmoudi, E., Hamedani, G.: A class of claim distributions: properties, characterizations and applications to insurance claim data. Commun. Stat. Theory Methods 51(7), 2183–2208 (2022) 4. Ahmad, Z., Mahmoudi, E., Roozegarz, R., Hamedani, G.G., Butt, N.S.: Contributions towards new families of distributions: an investigation, further developments, characterizations and comparative study. Pak. J. Stat. Oper. Res. 18(1), 99–120 (2022) 5. Bhati, D., Ravi, S.: On generalized log-Moyal distribution: a new heavy tailed size distribution. Insur. Math. Econ. 79, 247–259 (2018) 6. Bakar, S.A., Nadarajah, S., Ngataman, N.: A family of density-hazard distributions for insurance losses. In: Communications in Statistics-Simulation and Computation, pp. 1–19 (2020) 7. He, W., Ahmad, Z., Afify, A.Z., Goual, H.: The arcsine exponentiated-X family: validation and insurance application. In: Complexity (2020) 8. Korkmaz, M.Ç.: A new heavy-tailed distribution defined on the bounded interval: the logit slash distribution and its application. J. Appl. Stat. 47(12), 2097–2119 (2020) 9. Resnick, S.I.: Discussion of the Danish data on large fire insurance losses. ASTIN Bull. J. IAA 27(1), 139–151 (1997)
158
A. Alsubie
10. Seneta, E.: Karamata’s characterization theorem, feller and regular variation in probability theory. Publ. Inst. Math. 71(85), 79–89 (2002) 11. Tomarchio, S.D., Punzo, A.: Dichotomous unimodal compound models: application to the distribution of insurance losses. J. Appl. Stat. 47, 2328–2353 (2020) 12. Tomarchio, S.D., Bagnato, L., Punzo, A. (2022). Model-based clustering via new parsimonious mixtures of heavy-tailed distributions. AStA Adv. Stat. Anal., 1–33
Part II
Recent Advances in Data Analysis and Machine Learning: Paradigms and Practical Applications
Chapter 14
Estimation and Prediction of the Technical Condition of an Object Based on Machine Learning Algorithms Under Conditions of Class Inequality Victor R. Krasheninnikov , Yuliya E. Kuvayskova , and Vladimir N. Klyachkin Abstract Estimation and prediction of the state of a technical object (serviceable or faulty) is necessary for a control system that ensures the reliable functioning of the object. Machine learning methods can be applied to solve this problem. However, usually the ratio of serviceable and faulty states of the object in the training sample is uneven. As a rule, there are more serviceable states than faulty ones. Therefore, for a more reliable forecast of the state of a technical object, it is necessary to find the best method of machine learning. In the paper, a study was made of the effectiveness of using binary classification metrics when searching for the best learning model for assessing the technical state of an object in conditions of class inequality. It is shown that the AUC ROC metric to evaluate a model that detects faulty states is preferable. If the value of this metric is close to 1, then the model predicts well the state of the object, including faulty ones. If the value of this metric is close to 0.5, then the model is not able to predict the class of faulty states and to identify critical situations in the operation of the object. To assess the technical condition of the object, a program with a graphical interface in the Python language was developed using the Scikit-Learn and PyQt5 libraries. The effectiveness of the fulfilled research is demonstrated on the example of a water treatment system.
14.1 Introduction Technical diagnostics is carried out in order to ensure the working condition of the object and prevent emergency situations [1]. Let there be a set of situations, that is, observations of the values of the diagnosed parameters X 1 , X 2 , …, X p during the operation of the object at different points in time, and a set of possible responses Y to these situations: Y = 1 if the object is V. R. Krasheninnikov (B) · Y. E. Kuvayskova · V. N. Klyachkin Ulyanovsk State Technical University, 32 Severny Venetz str, Ulyanovsk 432027, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_14
161
162
V. R. Krasheninnikov et al.
serviceable and Y = 0 if the object is faulty. There is some unknown relationship between the values of the parameters X 1 , X 2 , …, X p and the responses Y. It is required to build a model that approximates this unknown dependence and is able to give a fairly accurate answer about the state of the object (serviceable or faulty) for new input values of the object’s, based on the available data on the functioning of the object. However, the ratio of serviceable and faulty states of the object is uneven in the training sample. As a rule, there are more serviceable states than faulty ones. This requires a sufficiently accurate detection of violations of the object’s performance, that is, the prediction of faulty states to prevent emergency situations in the operation of the object. To achieve this goal, it is necessary to choose the right learning model. To solve this problem, various time series models [2–4], image models [5–10], fuzzy models [11–13], as well as machine learning methods [14–22] can be used. In the paper, a study was made of the effectiveness of using binary classification metrics (accuracy, precision, recall, F-measure, AUC ROC) when searching for the best learning model to assess the state of an object in conditions of class inequality.
14.2 Machine Learning Algorithms Let us use the following basic methods to evaluate the technical state of an object: decision trees [15, 16], logistic regression [17], support vector machine [18], discriminant analysis [19], Bayesian classifier [20], neural networks [21] and a model of a quasi-periodic process in the form of an image on a cylinder. The parameters of objects in many practical situations has a quasi-periodic character, that is, the presence of a noticeable periodicity with random variations of quasi-periods. For example, aircraft engine noise and vibration, daily fluctuations in the temperature of the atmosphere and water, etc. The image model on a cylinder is convenient for representing such processes [7, 8]. Let us consider a spiral grid on the cylinder (Fig. 1a). Rows of this grid are turns of a cylindrical spiral. The turns of this image can also be considered as closed circles on the cylinder (Fig. 1b). To describe the image defined on a cylindrical grid, we use the analog of Habibi’s autoregressive model [6]: xn = ρ xn−1 + r xn−T − ρr xn−T −1 + q ξn ,
(14.1)
where n is end-to-end image point number; T is a period, that is, the number of points in one turn, ξn are independent standard random variables, ρ, r and q are model parameters. The process has zero mean, the variance σ2 =
q 2 (1 + r s) (1 − ρ2 )(1 − r 2 )(1 − r s)
14 Estimation and Prediction of the Technical Condition of an Object …
163
Fig. 14.1 The grids on a cylinder: a spiral grid, b circular grid
and covariance function ⎛
⎞ T −1 1 z s k ρm ⎠ zm + Cov (xn , xn+m ) = q 2 ⎝ (1 − ρ z k )(z k − ρ) k (1 − r 2 )T (1 − ρ2 )(1 − r s)(s − r ) k=0
√ where z k = T r exp(i2kπ/T ) and s = ρ T . The parameter ρ affects the correlation along the lines, that is, the smoothness of the process. The parameter r affects the correlation of readings over a period distance. For values of r close to 1, adjacent lines of the image (helix turns) will be strongly correlated, so this model can be used to describe and simulate quasiperiodic processes. Figure 14.2 shows an example of a simulation of a process with the parameters ρ = 0.85, r = 0.95, q = 1. The similarity of neighboring quasi-periods is noticeable due to the large value of the parameter r. The optimal estimate xˆn of the value xn given the previous values of the process has the form: xˆn = ρ xn−1 + r xn−T − ρr xn−T −1 .
(14.2)
Fig. 14.2 An example of process simulation where quasi-periods are separated by vertical lines
164
V. R. Krasheninnikov et al.
Thus, it is possible to evaluate process values in order to predict the state of the object. The least squares method can be applied to solve the identification problem, that is, to determine the values of the model parameters ρ and r in Eq. (14.2), under which it best predicts a given characteristic of an object. Note that the considered model of the system of processes contains an essential parameter—the length of quasi-periods T. This parameter must also be determined when identifying the model, which can be done by the methods of revealing hidden periodicities described in [23], for example, by the correlation-extremal method.
14.3 Study of the Effectiveness of Binary Classification Metrics for Assessing the Technical Condition of an Object The results of assessing the technical condition of the object, obtained by the model, can be represented as a confusion matrix (Table 14.1) [22, 24]. The confusion matrix has four categories: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN). Based on the confusion matrix, various metrics are calculated that characterize the predictive ability of the model. To assess the technical state of the object, it is especially desirable that the model predicts the faulty states of the object well in order to detect and prevent critical situations in the operation of the object. Let us consider the following metrics: Accuracy (Acc), Precision (P) and Recall (R): Acc =
TP +TN , T P + T N + FP + FN
P=
TP , T P + FP
R=
TP . T P + FN
However, for assessing the technical state of an object in conditions of class inequality, Accuracy metric may be uninformative, since assigns the same weight to all technical states of an object, which may be incorrect if the distribution of states in the training sample is strongly biased towards some then one state. Let us consider as an example 100 object observations, of which 90 correspond to serviceable states and 10 correspond to faulty ones and compare two predictive models: Model 1 and Model 2 with confusion matrices presented in Tables 14.1 and 14.2. Accuracy metric for Model 1 assumes the value Acc = (80+5)/100 = 0.85. The Model 2 predicts all object observation states as serviceable and has the Accuracy metric Acc = (90 + 0)/100 = 0.90. It is more than for Model 1. But despite this, Model 2 does not have a good predictive ability, since it does not predict the faulty Table 14.1 Confusion matrix of Model 1
Positive
Negative
Predicted positive
TP = 80
FP = 10
Predicted negative
FN = 5
TN = 5
14 Estimation and Prediction of the Technical Condition of an Object … Table 14.2 Confusion matrix of Model 2
165
Positive
Negative
Predicted positive
TP = 90
FP = 10
Predicted negative
FN = 0
TN = 0
states of the object at all, that is, critical and emergency situations in the operation of the object. Unlike the Accuracy metric, the Precision and Recall metrics [22, 24, 25] do not depend on the correlation of classes, which indicates the suitability of their use in conditions of class inequality. However, the Precision metric does not allow a high estimate of the model if it predicts only one state of the object, since in this case we get an increase in the False Positive value. Recall, on the other hand, does not allow detecting this state at all, since this metric does not take into account the False Positive and True Negative values, and the sum of these values is equal to the number of all faulty states of the object. Let us analyze these metrics on the example of Model 2, when the model classifies all objects as serviceable: P = 90/(90 + 10) = 0.90,
R = 90/(90 + 0) = 1.0.
These metrics rate the predictive power of the model very high at 90% and 100%. However, the model does not have a good predictive ability, since it does not predict the faulty states of the object at all, and the Precision and Recall metrics, although they do not depend on the class ratio, cannot correctly assess the adequacy of the model. F-measure is a weighted harmonic mean of the Precision and Recall metrics, it shows how many states are correctly predicted by the model, and how many true states the model does not miss [22, 24, 25]: F = (β2 + 1)
P·R , β2 · P + R
where β assumes values in the range 0 ≤ β ≤ 1 if Precision is given priority, and β > 1 if Recall is given priority. In practice, the most commonly used value is β = 0.5 for giving priority to Precision and β = 2 for Recall. However, this metric is also not capable of giving a suitable assessment of the model when predicting the technical state of an object under conditions of class inequality. Let us consider model 2 and calculate the F-measure in three cases: β = 1,
F = 2 · (0.9 · 1)/(0.9 + 1) = 0.95,
β = 0.5, F = (1.25 · 0.9 · 1)/(0.25 · 0.9 + 1) = 0.92, β = 2, F = (5 · 0.9 · 1)/(4 · 0.9 + 1) = 0.98.
166
V. R. Krasheninnikov et al.
These results show that the F-measure in all three cases rated the predictive ability of the model very highly, although it does not predict the faulty states of the object. Another informative and generalizing metric is area under the ROC-curve (receiver operating characteristics): AUC (area under the curve) [24–28]. The ROCcurve is a line from (0, 0) to (1, 1) in True Positive Rate (TPR) and False Positive Rate (FPR) coordinates: FPR =
TP FP , T PR = . FP + T N T P + FN
The larger the FPR, the more faulty states are predicted incorrectly. To combine FPR and TPR into one metric, you need to calculate these metrics, and then plot them on the same chart with the FPR and TPR axes. The resulting curve is curve ROC, and area under the curve is the AUC ROC metric: AU C R OC =
1 + T PR − FPR . 2
The AUC ROC metric is also robust against unbalanced classes. The ideal case when the model does not make errors is reached at FPR = 0 and TPR = 1, then the area under the ROC-curve is equal to 1. Otherwise, when the model randomly predicts the state of the object, the AUC ROC tends to 0.5, since the model predicts the same ratio of the amount of TP and FP. An AUC ROC value close to 0.5 means that the model does not have a good predictive ability. Let us calculate the AUC ROC criterion based on Model 2: FPR =
90 10 = 1, T P R = = 1, 10 + 0 90 + 0
AU C R OC =
1+1−1 = 0.5. 2
The value of the metric is 0.5, which means that the model is inadequate and has no predictive ability. Thus, if it is necessary to predict the faulty states of an object with sufficient accuracy, then the AUC ROC metric should be used to evaluate the model, since only this metric of the considered ones takes into account the errors of both classes. If the value of this metric tends to 1, then the model predicts object states well, including faulty ones. If the value of the metric is close to 0.5, then the forecast results obtained from the model are random, and the model is not able to predict the class of faulty states and, accordingly, identify critical situations in the work of the object. If during modeling the value of the AUC ROC metric is close enough to 1, then the quality of the model can be additionally assessed by the value of the F-measure calculated by the formula (14.2) with the coefficient β = 0.5 to give priority to the Precision metric, since it takes into account the value False Positive, and recall, in turn, does not take into account either False Positive or True Negative.
14 Estimation and Prediction of the Technical Condition of an Object …
167
14.4 Numerical Experiment To assess the technical condition of an object, a Python program was developed using the SciKit Learn library, which allows predicting the technical condition of an object using machine learning methods and analyzing their quality using binary classification metrics. The program implements seven basic models: logistic regression, Bayesian classifier, discriminant analysis, support vector machine, decision trees, neural networks, and a model of a quasiperiodic process in the form of an image on a cylinder. The program allows to choose the methods of machine learning, with the help of which the assessment of the technical condition of the object is carried out. To assess the quality of models, the program implements quality criteria: Accuracy, Precision, Recall, F-measure and AUC ROC. To calculate them, the initial data sample is divided into two parts: a training one, which is used to build the model, and a test one, to calculate the values of the quality criteria. To avoid retraining models, a cross-validation procedure was implemented with the ability to select the number of sample splits into training and test ones [29, 30]. The program also allows to perform modeling using the composition of machine learning methods, such as bagging, stacking, boosting and aggregation [29–33]. The program also implements a procedure for checking the significance of the diagnosed parameters of an object using the Student’s t-criterion and building models with only significant parameters. The result of forecasting by the methods of logistic regression, Bayesian classifier, discriminant analysis, support vector machine, decision trees, neural networks is the value of the probability that an object belongs to a serviceable or faulty state. And as a result of using the autoregressive model of a cylindrical image, numerical predictions of the values of the object’s functioning parameters for the corresponding time interval are obtained using Eq. (14.2). To interpret the numerical values of the forecasts obtained from the models and assess the technical condition of the object (healthy or faulty), the forecast results obtained are examined for going beyond critical boundaries. And if at least one of the parameters of the object’s functioning went beyond the permissible limits, then the faulty state of the object is fixed, that is Y = 0. The result of the program is a file containing the values of the Y vector for each selected machine learning method for the training and test sets, as well as a table with the values of the quality criteria for each built model. As an object of study, we consider a water treatment system at a natural surface water treatment and drinking water treatment plant in St. Petersburg, Russia [1]. The serviceability of the water treatment system Y was assessed by indicators of drinking water quality depending on the physical and chemical indicators of the water source “Western Kronstadt”: Temperature, Chromaticity, Turbidity, pH, Alkalinity, Oxidizability, Ammonium and Chlorides. A total of 348 observations for each parameter were available, in 47 cases the state of the system was faulty (at least one of the indicators of the quality of purified drinking water went beyond the permissible limits).
168
V. R. Krasheninnikov et al.
Table 14.3 Forecasting results (with all parameters/only with significant ones) Machine learning method
Accuracy
Precision
Recall
F-measure
AUC ROC
Logistic regression
0.84/0.84
0.88/0.89
0.94/0.92
0.89/0.90
0.57/0.63
Bayesian classifier
0.87/0.87
0.87/0.87
1.00/1.00
0.89/0.89
0.53/0.53
Discriminant analysis
0.86/0.86
0.86/0.86
0,99/0.99
0.89/0.89
0.54/0.53
Support vector method
0.86/0.86
0.86/0.86
1.00/1.00
0.88/0.88
0.53/0.50
Decision tree
0.91/0.90
0.93/0.94
0.98/0.94
0.94/0.94
0.76/0.80
Neural Networks
0.88/0.89
0.88/0.88
0.99/1.00
0.90/0.90
0.59/0.60
Cylindrical image models
0.89/0.91
0.93/0.93
0.95/0.97
0.94/0.95
0.72/0.74
Using the developed program, machine learning models are built and the best model is determined for further assessment of the technical state of the system. To assess the quality of models, the initial data set is divided into training (90% of observations), on which the model is built, and test (10% of observations), which is used to assess the quality of models and calculate binary classification metrics. To exclude retraining of models, we apply the cross-validation procedure with the number of partitions into training and test sets equal to five. Before building machine learning models, let us evaluate the significance of the object’s diagnostic parameters using Student’s criterion. Parameters Chromaticity and Ammonium turned out to be insignificant at the level of 0.05. Next, we will build machine learning models taking into account all parameters and only significant ones. According to the cylindrical model, we obtain the numerical values of the time series forecasts using Eq. (14.2). To assess the technical condition, we compare forecasts with critical boundaries, and if the value of any parameter has reached these boundaries, then we will assume that the state is faulty. Then we calculate the values of the considered binary classification metrics for the test sample. The results obtained are shown in Table 14.3, where the values of the metrics are written through the bar when eliminating insignificant parameters. To identify the best model, we first evaluate the adequacy of the models using the AUC ROC metric. It turned out that for the Logistic regression, Bayesian classifier, Discriminant analysis, Support vector method and Neural Networks models, the AUC ROC value is close to 0.5. Thus, these models poorly predict the faulty states of an object. For the Decision tree and Cylindrical image models, the value of the AUC ROC metric exceeds 0.7, so let’s compare these models by F-measure. We get that the values of the F-measure for these models differ slightly. Thus, two models (Decision tree and Cylindrical image) are preferable for assessing the technical condition of an object and predicting faulty states. The results with only significant object parameters turned out to be similar, that is, the best models in terms of AUC ROC and Fmeasure metrics turned out to be two models: Decision tree and Cylindrical image models. However, when removing insignificant parameters of the object, the result of predicting models turned out to be better on average by 2–6%.
14 Estimation and Prediction of the Technical Condition of an Object …
169
14.5 Conclusions In the paper, a study of the effectiveness of using binary classification metrics when searching for the best learning model for assessing the technical state of an object in conditions of class inequality was made. It is shown that in order to identify faulty states in the operation of an object, it is preferable to use the AUC ROC metric to evaluate the model, since only this metric of the considered ones takes into account errors of both classes and, if the value of the metric tends to 1, then the model predicts the state of the object well, in including faulty ones. If the value of the metric is close to 0.5, then the model is not able to identify critical situations in the operation of the object. Based on the research, a program was developed in Python using the Scikit-Learn and PyQt5 libraries. The effectiveness of the conducted research is demonstrated on the example of a water treatment system. The described approach can be used by specialists to assess the technical condition of objects under conditions of class inequality in many technical applications. Acknowledgements This study was funded by the RFBR, project number 20-01-00613.
References 1. Klyachkin, V.N., Krasheninnikov, V.R., Kuvayskova, Yu.E.: Forecasting and diagnostics of the stability of the functioning of technical objects. RUSAINS, Moscow (2020) 2. Box, J., Jenkins, G.: Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco (1970) 3. Stock, J.H., Watson, M.W.: Vector autoregressions. J. Econ. Perspect. 15(4), 101–115 (2001) 4. Kuvayskova, Y., Klyachkin, V., Krasheninnikov, V.: Recognition and forecasting of a technical object state based on its operation indicators monitoring results. In: 2020 International MultiConference on Industrial Engineering and Modern Technologies, FarEastCon 2020, pp. 1–6. IEEE, Vladivostok, Russia (2020) 5. Soifer, V.A. (Ed).: Computer image processing. Part I: Basic concepts and theory. VDM Verlag Dr. Muller e.K (2010) 6. Habibi, A.: Two-dimensional Bayesian estimate of images. Proc. IEEE 60(7), 878–883 (1972) 7. Krasheninnikov, V.R., Vasil’ev, K.K.: Multidimensional image models and processing. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-3, ISRL, vol. 135, pp. 11–64. Springer International Publishing, Switzerland AG (2018) 8. Krasheninnikov, V.R., Kuvayskova, Yu.E.: Modelling and forecasting of qusi-periodic processes in technical objects based on cylindrical image models. In: V International Conference on Information Technology and Nanotechnology (ITNT-2019), vol. 2416, pp. 387–393. CEUR Workshop Proceedings, Samara, Russia (2019) 9. Andriyanov, N.A., Dementiev, V.E., Tashlinskiy, A.G.: Detection of objects in the images: from likelihood relationships towards scalable and efficient neural networks. Comput. Opt. 46(1), 139–159 (2022) 10. Andriyanov, N.A., Tashlinskiy, A.G., Dementiev, V.E.: Detailed clustering based on Gaussian mixture models. Adv. Intell. Syst. Comput. 1251, 437–448 (2022) 11. Zadeh, L.A.: Fuzzy logic. In: Meyers, R. (ed.) Computational Complexity, pp. 1177–1200. Springer, New York (2012)
170
V. R. Krasheninnikov et al.
12. Kuvayskova, Y.E.: The prediction algorithm of the technical state of an object by means of fuzzy logic inference models. Procedia Eng. 201, 767–772 (2017) 13. Kuvayskova, Y., Klyachkin, V., Krasheninnikov, V., Alekseeva, A.: Fuzzy models for predicting the technical state of objects. In: Proceedings of the VI International conference Information Technology and Nanotechnology (ITNT-2020), vol. 2667, pp. 215–218. CEUR Workshop Proceedingsth, Samara, Russia (2020) 14. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2005) 15. Murthy, S.K.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min. Knowl. Disc. 2, 345–389 (1998) 16. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) 17. Menezes, F., Liska, G.R., Cirillo, M., Vivanco, M.: Data classification with binary response through the boosting algorithm and logistic regression. Expert Syst. Appl. 69, 62–73 (2017) 18. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000) 19. Haghighat, M., Abdel-Mottaleb, M., Alhalabi, W.S.: Discriminant correlation analysis: Realtime feature level fusion for multimodal biometric recognition. IEEE Trans. Inf. Forensics Secur. 11, 1984–1996 (2016) 20. Rish, I.: An empirical study of the Naïve Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, pp. 41–46 (2001) 21. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016) 22. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning, pp. 233–240. Pittsburgh, PA (2006) 23. Serebrennikov, M.G., Pervozvansky, A.A.: Revealing Hidden Periodicities. Nauka, Moscow (1965) 24. Ohsaki, M.: Confusion-matrix-based kernel logistic regression for imbalanced data classification. IEEE Trans. Knowl. Data Eng. 29(9), 1806–1819 (2017) 25. Zhukov, D.A., Klyachkin, V.N., Krasheninnikov, V.R., Kuvayskova, Yu.E.: Selection of aggregated classifiers for the prediction of the state of technical objects. In: Proceedings of the Data Science Session at the 5th International Conference on Information Technology and Nanotechnology (ITNT 2019), vol. 2416, pp. 361–367. CEUR Workshop Proceedings, Samara, Russia (2019) 26. Flach, P. A.: The geometry of ROC space: Understanding machine learning metrics through ROC isometrics. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), p. 19. Washington, DC, USA (2003) 27. Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997) 28. Ferri, C., Flach, P., Henrandez-Orallo, J.: Learning decision trees using area under the ROC curve. In: Proceedings of the 19th International Conference on Machine Learning, pp. 139–146. Morgan Kaufmann (2002) 29. Klyachkin, V.N., Kuvayskova, J.E., Zhukov, D.A.: Aggregated classifiers for state diagnostics of the technical object. In: 2019 International Multi-Conference on Industrial Engineering and Modern Technologies, FarEastCon 2019, pp. 1–4. IEEE, Vladivostok, Russia (2019) 30. Kochavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on Artificial intelligence (IJCAI), vol. 2, pp. 1137–1143 (1995) 31. Zhang, C., Ma, Y.: Ensemble Machine Learning. Methods and Applications, Springer, Boston (2012) 32. Bickel, P.J., Ritov, Y., Zakai, A.: Some theory for generalized boosting algorithms. J. Mach. Learn. Res. 7, 705–732 (2006) 33. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Chapter 15
Automated System for the Personalization of Retinal Laser Treatment in Diabetic Retinopathy Based on the Intelligent Analysis of OCT Data and Fundus Images Nataly Ilyasova , Nikita Demin , Aleksandr Shirokanev , and Nikita Andriyanov Abstract In this work, we propose an automated system for the personalization of retina laser treatment in diabetic retinopathy. The system comprises fundus images processing methods, algorithms for photocoagulation pattern mapping, and intelligent analysis methods of OCT data and fundus images. The feasibility of introducing corrections at any interim stage of data processing makes for a safe treatment. A key module of the proposed software architecture is the system for the intelligent analysis of the photocoagulation pattern, allowing the proposed plan to be analyzed and the treatment outcome to be prognosticated. Working with the proposed system, the surgeon is able to map an effective photocoagulation pattern, which is aimed at providing a higher-quality diabetic retinopathy laser treatment when compared with the current approaches. The software developed is intended for the use of high-performance algorithms that can be parallelized using either a processor or a graphics processing unit. This allows achieving high data processing speed, which is so important for medical systems.
15.1 Introduction Artificial intelligence technologies are penetrating all spheres of human activity. The modern medicine is one of the most high-tech industries. Recently the introduction N. Ilyasova (B) · N. Demin · A. Shirokanev Image Processing Systems Institute of the RAS—Branch of the FSRC “Crystallography and Photonics” RAS, Molodogvardeyskay, 151, 443001 Samara, Russia e-mail: [email protected] N. Andriyanov Financial University Under the Government of the Russian Federation, Leningradsky pr-t, 49, Moscow, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_15
171
172
N. Ilyasova et al.
of artificial intelligence and digital medicine technologies into healthcare practice is rapidly changing the methods of diagnosis and treatment [1, 2]. Increasingly, robotic systems are being used to support the diagnosis and treatment of diseases [3]. Ophthalmology is in dire need of a transition to personalized medicine, which would make it possible to make a qualitative leap in the treatment of eye diseases [4]. However, this transition is impossible without the development and implementation of fundamentally new intelligent methods for analyzing patients’ biomedical data. Diabetic retinopathy (DR) is often found in diabetes patients, triggering severe complications [5–7]. If timely treatment, vision loss can be prevented in more than 50% of cases [8–12]. A key instrument for treatment of diabetic retinitis is laser photocoagulation, in which a series of well-measured photocoagulates are inflicted on retina areas with pathology [13–16]. Modern systems mainly rely on the use of a preset pattern for generating a photocoagulation map [14–16]. However, due to highly variable shapes of the macular edema and vascular system, a uniform photocoagulation map cannot be realized using a standard pattern [14, 15]. The stateof-the-art system NAVINAS provides an option of manually mapping a coagulation plan, which is then used for automatically guiding the laser [17–21]. However, the ophthalmologist first needs to analyze the retina and eye fundus condition to ensure that the photocoagulates be inflicted in admissible areas. On the one hand, this method provides a more effective laser photocoagulation given a correctly mapped pattern, but on the other hand, it takes the surgeon an extra time to analyze the retina condition. The aim of the research is to increase the efficiency of retinal laser coagulation by developing information technology that allows implementing a personalized approach to the treatment of diabetic macular edema (DME). To do it, a new technique for applying coagulates was used, which takes into account the various properties of the coagulates totality location. The method of preliminary planning of the coagulates location takes into account the individual features of the anatomical structures location in the area of edema and its shape. To obtain optimal results of laser treatment of DME, a method was used with personalized placement of coagulates at equal distances from each other, taking into account the individual characteristics of anatomical structures and edema boundaries in a particular patient. In this work, we propose architecture of the software for automatically mapping an effective photocoagulation pattern for the laser treatment of DR. The architecture comprises a number of modules, including image processing methods, algorithms for photocoagulation pattern mapping, and intelligent analysis methods.
15.2 Technology of Mapping a Laser Photocoagulation Plan In the first stage of the proposed information technology for mapping a photocoagulation pattern, a laser photocoagulation region of interest (ROI) is extracted, for which the result of the retinal image segmentation needs to be known. Methods for the retinal image segmentation work by extracting different classes of objects in the image. In the retinal image, one may observe areas prohibited for the laser light
15 Automated System for the Personalization of Retinal Laser …
173
exposure. The prohibited areas, firstly, include the disk of optic nerve, the fovea, and blood vessels. The retinal image may also contain pathological formations, like edema and exudates, which should not be exposed to the laser light to avoid negative effects. Retinal image segmentation was done by calculating texture features in the image neighborhoods, followed by the feature-based classification. Out of the large set of features, most informative ones were chosen to simplify the classification process. In biomedical image recognition and analysis, the majority of tasks are solved using texture features [22–24], which are abundant. Studies looked into calculating the following groups of features: (a) histogram features; (b) gradient features; and (c) Haralik features. Texture features have been utilized for extracting vessels, exudates, and other elements in the retinal image. Some elements are better extracted using geometric parameters. For instance, a method for extracting the disk of optic nerve (DON) by means of a local fan transform and evaluating the vessel direction at the DON edge has been reported [25]. The texture features are mainly utilized for extracting freeform objects that can be located everywhere across the image. The main drawback of the texture-feature-based segmentation is its high computational complexity. An alternative approach is based on the use of a convolutional neural network U-Net, whose key drawback is the need to have a fairly large sampling with the ROIs marked off by a doctor-expert [26]. Figure 15.1 illustrates segmentation examples for various areas in the retinal image. Provided an accurate extraction of prohibited areas, the system will provide a safe exposure of the retina to the laser light as it will avoid the prohibited retina areas. In this respect, segmentation plays a key role in providing safe DR treatment. The method of extracting the ROI for laser treatment presupposes that prohibited areas are excluded from the pathological zone, with the latter being outlined based on optical coherence tomography (OCT) date. Using OCT images, it is possible to reconstruct a map of thickness deviations of a given retina from the healthy one, with high deviations indicating that the given region needs to be photocoagulated. The deviation map is built by evaluating the difference between the OCT-aided retinal thickness map and that of a healthy retina. As a result, a map of the regions to be photocoagulated is created [27, 28].
Fig. 15.1 An example of fundus image segmentation: original image (left); segmented image: the optic disc, fovea, exudates and vessels are highlighted (right)
174 Table 15.1 Feature values for various algorithms for mapping a coagulation plan
N. Ilyasova et al. Algorithm
Variance
Median
Number
Random map
6.32
31.62
223
Square map
6.09
30.00
220
Hexagonal map
7.68
30.00
248
Wave map
0.95
30.08
311
Boundary-adaptive map
0.70
30.07
315
Ordered map
0.19
30.08
312
The algorithms for creating a photocoagulation plan operate by marking future photocoagulation spots in the ROI while aiming to attain optimal parameters or a suitable photocoagulation pattern [28, 29]. Most importantly, the plan needs to provide safe photocoagulation, meaning that the inter-coagulate distance should not be smaller than the present one. What is equally important is that the pattern-based treatment should produce a therapeutic effect. The effectiveness of a photocoagulation pattern can be estimated by calculating a number of features relating to the mutual position of coagulates. The majority of the features can be best described in terms of the inter-coagulate distance. The sampling of distances can be formed in a variety of ways, for instance, by conducting a Delaunay triangulation relative to marked points. In [28], seven algorithms for mapping a photocoagulation pattern were proposed. Each of the proposed algorithms offers its own photocoagulation plan, for which characteristics such as variance, a median, and the number of coagulates need primarily to be analyzed (Table 15.1). The proposed techniques were used as a basis for a technology of mapping a photocoagulation pattern, allowing the efficacy of DR treatment to be enhanced (Fig. 15.2). The technology is aimed at mapping an effective photocoagulation pattern [27]. The ophthalmologist will be able to correct the processing result for any block in the diagram in Fig. 15.2. For instance, the ophthalmological surgeon can correct the outline of the ROIs for laser treatment if in their opinion the ROIs have not all been automatically marked off.
Fig. 15.2 Technology of mapping a photocoagulation plan of laser coagulation
15 Automated System for the Personalization of Retinal Laser …
175
Among the above-described stages, the data matching stage presupposes that key points can be marked off both manually and using automatic algorithms. The experience of practical eye surgeons suggests that marking off the key points is not a problem. What presents the problem is the need to perform the manual segmentation of retinal images and laying-out of the coagulation pattern as both procedures are cumbersome and subjective. Because of this, automatization of these procedures is key in developing the technology. For the pathological zone to be extracted, one needs to have information on deviations of the retina thickness from the normal values, which can be derived using the medical software SOCT [30]. By performing preprocessing, the software builds a map of deviations of a particular retina from the norm. Enhanced deviations indicate that the given zone has a pathology. A fundus image reconstructed from OCT data needs to be aligned with a fundus-camera-aided retinal image. The technology proposed herein suggests that key points should be marked in the reconstructed retinal image and fundus image, followed by marking off a pathological zone, which is then aligned with the fundus image. The zone of laser exposure, which is generated automatically based on the segmentation result and the pathological zone, can be corrected manually if necessary. At the final step, a photocoagulation pattern is mapped, for which quality characteristics are calculated and the probability of the successful outcome of laser treatment is evaluated.
15.3 Technology of the Intelligent Analysis of a Preliminary Photocoagulation Pattern As far as a minimal inter-coagulate distance is observed, mapping a photocoagulation pattern within a pathological zone guarantees safe photocoagulation because such an approach enables one to exclude two possible problems: exposure of prohibited areas to the laser light and excessive retina damage due to a very small distance between neighboring coagulates. Nonetheless, even if the laser parameters are chosen correctly, minor damage to the retina due to micro burns cannot be ruled out. Although the damage is usually insignificant but it should be possibly avoided. An important problem is analyzing the mutual arrangement of photocoagulates as a result of planning. Characteristics of the photocoagulation plan are able to provide a prognosis of the laser coagulation outcome. In any case, to be able to estimate its various properties, the preliminary plan needs to be described quantitatively. We note that the preliminary plan can be mapped using an arbitrary technique, including a manual one. A photocoagulation plan comprises an array of points each of which is characterized by certain parameters. The parameters affect the degree of burn at the exposed points and can be evaluated using a technique described in [31, 32]. The laser treatment parameters can be fitted in an optimal way at any layout of points given that the minimal distance is observed. With all the points known to be located
176
N. Ilyasova et al.
Fig. 15.3 Examples of the Delaunay triangulation using the present points corresponding to coagulate centers (left and right)
in the ROI that needs to be exposed to laser treatment, inter-point distances come to the forefront. For a distance sampling to be generated, a point-connecting technique needs to be chosen based on some rule. Next, using a standard Euclidean measure, values of the distances are calculated and written into a general sampling. Noise distances are then excluded and statistical characteristics are calculated, before being written in the general set of features. Based on their expertise, ophthalmologists [14, 15] suggest using statistical characteristics such as the variance of mutual distances, the mathematical mean, and so on. Medical doctors used to analyze the uniformity of the photocoagulation pattern primarily based on the variance. The features used include various statistical characteristics of the inter-coagulate distance (a mutual location feature) and features corresponding to the coagulation pattern volume and the area covered (general features). Figure 15.3 illustrates a Delaunay triangulation performed with respect to the coagulate center points. The triangulation distances are written into the general sampling, from which distances disobeying a three-sigma rule are then excluded. Such distances are marked red in Fig. 15.3. Among statistical characteristics, the following were chosen: a mean arithmetic, variance, a root-mean-square deviation, median, asymmetry, kurtosis, a minimal value, and a maximal value [33]. These characteristics form a basis for evaluating the uniformity and balance of the photocoagulation pattern. Alongside statistical characteristics, an important feature is the number of points in the photocoagulation plan. As an extra feature, the number of local regions in the coagulation pattern may be used. Techniques proposed for finding matching points include the following algorithms: an algorithm for nearest point searching (Nearest Point algorithm), Delaunay triangulation (GenDelaunay algorithm), and an algorithm for extracting local regions, followed by the Delaunay triangulation in each region (LocDelaunay algorithm).The LocDelaunay algorithm is a generalization of the GenDelaunay algorithm. If the photocoagulation plan contains a single region, results of both algorithms will be identical. If there are several local regions in the plan, the GenDelaunay algorithm will connect even distant points (Fig. 15.3). However, as a rule, the distances for such points are excluded as noise. Table 15.2 shows the number of features for each
15 Automated System for the Personalization of Retinal Laser …
177
Table 15.2 The number of features for each group Name of feature group
Feature group NearestPoint
Feature group GenDelaunay
Feature group LocDelaunay
Extra features
Number of features
8
8
8
2
Fig. 15.4 Technology of the intelligent analysis of the photocoagulation plan
group. In total, 26 features were selected and then analyzed using an in-house technology of intelligent data analysis (Fig. 15.4). The technology allows analyzing the classification quality of both initial features and features selected based on discriminant analysis, which relies on evaluating the linear separability of classes. Discriminant analysis aims to transform the initial features so as to maximize the separability criterion [33]. The ophthalmology surgeons argue that laser treatment is most effective when the photocoagulation pattern is laid out in the most uniform way. The efficacy may be based on the use of intelligent analysis methods for a sampling containing information on the photocoagulation pattern and the outcome of DR treatment. These methods allow one to construct a high efficiency classifier for prognosticating the surgery outcome.
15.4 Computer-Aided System for Mapping and Analyzing a Preliminary Photocoagulation Pattern The above-described technology has formed the basis of a computerized system for mapping and analyzing a preliminary photocoagulation pattern for DR laser treatment. The architecture of the system shown in Fig. 15.5 comprises several modules, which realize the proposed methods and algorithms. The software complex was realized on the platform Microsoft.Net Framework 4.5.2, with C# used as a basic language and C++ for realizing high-speed algorithms, and Microsoft Visual Studio serving as the tool development framework. The software developed is intended for the use of high-performance algorithms that can be parallelized using either a processor or a graphics processing unit. Parallelization
178
N. Ilyasova et al.
Fig. 15.5 Architecture of the software complex for DR laser treatment
using CUDA technology is used to quickly calculate texture features. The sequential algorithm for calculating features is implemented in C++. The architecture allows data exchange through the network if one needs to connect to an external algorithm, say, an algorithm for retinal image segmentation based on neural networks. External algorithms work as RESTful web services with data exchange in JSON. The software complex is logically divided into the following systems: a system for retinal image data input, a system for aligning retinal images and OCT data, a system for extracting the ROI for laser treatment, a system for mapping a photocoagulation pattern, and a system for the intelligent analysis of the photocoagulation pattern. Data input involves uploading the retinal image and OCT data of a patient. Upon aligning the retinal image and OCT data, the key points are marked either manually or automatically, with a correction option available. When extracting a laser photocoagulation zone, the disk of optic nerve and fovea are extracted manually, with the subsequent stages of extracting the laser treatment zones being performed automatically. The system for mapping the photocoagulation pattern automatically lays out photocoagulation points within the extracted zone, but if necessary, the surgeon can introduce corrections by shifting, removing or adding photocoagulation spots. The intelligent analysis system calculates features and displays the quantitative data using which the surgeon can analyze the photocoagulation plan. Classification allows one to evaluate the probability of the successful laser treatment outcome. This module is taught, meaning that after inputting corresponding data, it can be taught to recognize new data of the system, expanding the learning sampling. Figure 15.6 shows a flowchart of submodule for features intelligent analysis, which
15 Automated System for the Personalization of Retinal Laser …
179
Fig. 15.6 Flowchart of a submodule for the intelligent analysis of features
Fig. 15.7 Graphical interface of the system
allows the information on the laser treatment outcome to be output. Figure 15.7 shows a graphic interface of the system. Summing up, in this work we have proposed a software complex allowing the surgeon to map a preliminary laser photocoagulation plan and analyze its efficacy on the basis of quantitative data. The system allows to correct interim stages of the procedure.
15.5 Conclusion We have proposed a software complex for mapping and analyzing a preliminary photocoagulation plan for laser treatment of diabetic retinopathy. The software is aimed at automatically mapping a recommended photocoagulation plan and provides for the correction of interim results. The feasibility of introducing corrections at any interim stage of data processing in the computerized system makes for a safe treatment. A key module of the proposed software architecture is the system for the intelligent analysis of the photocoagulation pattern, allowing the proposed plan to be analyzed and the treatment outcome to be prognosticated. Working with the proposed system, the surgeon is able to input patient’s data and map an effective photocoagulation pattern, which is aimed at providing a higher-quality DR laser
180
N. Ilyasova et al.
treatment when compared with the current approaches. In the future, we plan to adapt the system to novel techniques of interim data processing. Acknowledgements This work was funded by the Russian Foundation for Basic Research under RFBR grants ## 19-29-01135 and the Ministry of Science and Higher Education of the Russian Federation within a government project of FSRC “Crystallography and Photonics” RAS.
References 1. Rottier, J.B.: Artificial intelligence: reinforcing the place of humans in our healthcare system. Rev. Prat. 68(10), 1150–1151 (2018) 2. Fourcade, A., Khonsari, R.H.: Deep learning in medical image analysis: A third eye for doctors. J. Stomatol. Oral Maxillofacial Surg. 120(4), 279–288 (2019) 3. Gao, A., et al.: Progress in robotics for combating infectious diseases. Sci. Robot. 6(52), 1–17 (2021) 4. Trinh, M., Ghassibi, M., Lieberman, R.: Artificial Intelligence in retina. Adv. Ophthalmol. Optometry 6, 175–185 (2021) 5. Vorobieva, I.V., Merkushenkova, D.A.: Diabetic retinopathy in patients with type 2 Diabetes Mellitus. Epidemiology, a modern view of pathogenesis. Ophthalmology 9(4), 18–21 (2012) 6. Dedov, I.I., Shestakova, M.V., Galstyan, G.R.: Prevalence of type 2 Diabetes Mellitus in the adult population of Russia (NATION study). Diabetes mellitus 19(2), 104–112 (2016) 7. Tan, G.S., Cheung, N., Simo, R.: Diabetic macular edema. Lancet Diab. Endoc 5, 143–155 (2017) 8. Amirov, A.N., Abdulaeva, E.A., Minkhuzina, E.L.: Diabetic macular edema: Epidemiology, pathogenesis, diagnosis, clinical presentation, and treatment. Kazan Med. J. 96(1), 70–74 (2015) 9. Doga, A.V., Kachalina, G.F., Pedanova, E.K., Buryakov, D.A.: Modern diagnostic and treatment aspects of diabetic macular edema. Ophthalmol. Diabetes 4, 51–59 (2014) 10. Bratko, G.V., Chernykh, V.V., Sazonova, O.V.: On early diagnostics and the occurence rate of diabetic macular edema and identification of diabetes risk groups. Siberian Sci. Med. J. 35(1), 33–36 (2015) 11. Wong, T.Y., Liew, G., Tapp, R.J.: Relation between fasting glucose and retinopathy for diagnosis of diabetes: three population-based cross-sectional studies. Lancet 371(9614), 736–743 (2008) 12. Acharya, U.R., Ng, E.Y., Tan, J.H., Sree, S.V., Ng, K.H.: An integrated index for the identification of diabetic retinopathy stages using texture parameters. J. Med. Syst. 36(3), 2011–2020 (2012) 13. Astakhov, Yu.S., Shadrichev, F.E., Krasavina, M.I., Grigorieva, N.N.: Modern approaches to the treatment of diabetic macular edema. Ophthalmol. Statements 4, 59–69 (2009) 14. Zamytsky, E.A., Zolotarev, A.V., Karlova, E.V., Zamytsky, P.A.: Analysis of the coagulates intensity in laser treatment of diabetic macular edema in a Navilas robotic laser system. Saratov J. Med. Sci. Res. 13(2), 375–378 (2017) 15. Zamytskiy, E.A., Zolotarev, A.V., Karlova, E.V., et al.: Comparative quantitative assessment of the placement and intensity of laser spots for treating diabetic macular edema. Russ. J. Clin. Ophthalmol. 21(2), 58–62 (2021) 16. Kotsur, T.V., Izmailov, A.S.: The effectiveness of laser coagulation in the macula and high-density microphotocoagulation in the treatment of diabetic maculopathy. Ophthalmol. Statements 9(4), 43–45 (2016) 17. Kozak, I., Luttrull, J.: Modern retinal laser therapy. Saudi J. Ophthalmol. 29(2), 137–146 (2014) 18. Kernt, M., Cheuteu, R., Liegl, R.: Navigated focal retinal laser therapy using the NAVILAS® sys-tem for diabetic macula edema. Ophthalmologe 109, 692–700 (2012)
15 Automated System for the Personalization of Retinal Laser …
181
19. Ober, M.D.: Time required for navigated macular laser photo coagulation treatment with the Navilas®. Graefes Arch. Clin. Exp. Ophthalmol 251(4), 1049–1053 (2013) 20. Syeda, A.M., Hassanb, T., Akramc, M.U., Nazc, S., Khalid, S.: Automated diagnosis of macular edema and central serous retinopathy through robust reconstruction of 3D retinal surfaces. Comput. Methods Programs Biomed. 137, 1–10 (2016) 21. Chhablani, J., Kozak, I., Barteselli, G., Oman, S. El-E.: A novel navigated laser system brings new efficacy to the treatment of retinovascular disorders. J. Ophthalmol. 6(1), 18–22 (2013) 22. Septiarini, A., Harjoko, A., Pulungan, R., Ekantini, R.: Automatic detection of peripapillary atrophy in retinal fundus images using statistical features. Biomed. Signal Process. Control 45, 151–159 (2018) 23. Hei Shun, Yu., Tischler, B., Qureshi, M.M., Soto, J.A., Anderson, S., Daginawala, N., Li, B., Buch, K.: Using texture analyses of contrast enhanced CT to assess hepatic fibrosis. Eur. J. Radiol. 85(3), 511-517 (2016) 24. Ilyasova, N., Paringer, R., Kupriyanov, A.: Intelligent feature selection technique for segmentation of fundus images. In: 7th International Conference on Innovative Computing Technology, pp. 138–143. Luton, UK (2017) 25. Anan’in, M.A., Ilyasova, N.Yu., Kupriyanov, A.V.: Estimating directions of optic disk blood vessels in retinal images. Pattern Recogn. Image Anal. Adv. Math. Theory Appl. 17(4), 523–526 (2007) 26. Mukhin, A., Kilbas, I., Paringer, R., Ilyasova, N.: Application of the gradient descent for data balancing in diagnostic image analysis problems. In: 2020 International Conference on Information Technology and Nanotechnology, pp. 1–4. IEEE Xplore, Russia (2020) 27. Ilyasova, N.Yu., Shirokanev, A.S., Kupriynov, A.V., Paringer, R.A.: Technology of intellectual feature selection for a system of automatic formation of a coagulate plan on retina. Computer Opt. 43(2), 304–315 (2019) 28. Shirokanev, A.S., Kirsh, D.V., Ilyasova, N.Yu., Kupriynov, A.V.: Investigation of algorithms for coagulate arrangement in fundus images. Computer Opt. 42(4), 712–721 (2018) 29. Kazakov, A.L., Lebedev, P.D.: Algorithms of optimal packing construction for planar compact sets. Comput. Methods Program. 16, 307–317 (2015) 30. Tamborski, S., Wróbel, K., Bartuzel, M., Szkulmowski, M.: Spectral and time domain optical coherence spectroscopy. Opt. Lasers Eng. 133, 106120 (2020) 31. Shirokanev, A., Ilyasova, N., Andriyanov, N., Zamytskiy, E., Zolotarev, A., Kirsh, D.: Modeling of fundus laser exposure for estimating safe laser coagulation parameters in the treatment of diabetic retinopathy. Mathematics 9, 967 (2021) 32. Shirokanev, A.S., Andriyanov, N.A., Ilyasova, N.Y.: Development of vector algorithm using CUDA technology for three-dimensional retinal laser coagulation process modeling. Comput. Opt. 45(3), 427–437 (2021) 33. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic, New York (1979)
Chapter 16
Development and Research of Intellectual Algorithms in Taxi Service Data Processing Based on Machine Learning and Modified K-means Method Nikita Andriyanov , Vitaly Dementiev , and Alexandr Tashlinskiy Abstract The study is devoted to the development of an algorithm for classifying the work of queuing services based on cluster analysis with the refinement of clusters using a doubly stochastic model. The developed algorithm is compared with other known clustering/classification methods. Comparison is based on the labels set by experts for orders described by many parameters. The study identified the most significant parameters that were selected for the analysis of orders in queuing systems. The solution to the problem of predicting effective orders by the estimated parameters is obtained. The unsupervised learning approach using doubly stochastic autoregression proposed in the paper provided an increase in the accuracy in the classification task compared to traditional machine learning algorithms. The approach can be successfully used by queuing services to adjust pricing policy.
16.1 Introduction The solution of a number of complex applied problems today is possible with the use of artificial intelligence systems. Moreover, often such systems can work even better than humans, and naturally faster than humans. In machine learning, most tasks involve developing and researching new, customized methods. The number of articles in the field of data mining using machine learning algorithms only grows every year. That being said, the vast majority of machine learning models are based on mathematical and statistical calculations. N. Andriyanov (B) Financial University Under the Government of the Russian Federation, Leningradsky pr-t 49, 119334 Moscow, Russia e-mail: [email protected] V. Dementiev · A. Tashlinskiy Ulyanovsk State Technical University, Severny Venets 32, 432027 Ulyanovsk, Russia e-mail: [email protected] A. Tashlinskiy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_16
183
184
N. Andriyanov et al.
At the same time, despite the increasing complexity of the methods and models used, the problems of binary classification remain extremely relevant. This is due to the fact that they are easier to interpret, and such algorithms can be applied in business. For example, the problem of credit scoring can be distinguished [1]. On the other hand, this problem is a special case of predicting the future success or failure of a loan repayment event based on a number of signs. Advanced results in this group of problems are provided by algorithms based on the using ensembles of machine learning models [2]. Often, ensembles are based on decision trees [3], which require a rather long training procedure before using the model. In addition, as part of training, the model can be retrained, which will lead to poor quality on new data. This paper examines algorithms that do not require training and are based on clustering using Gaussian mixture models [4]. At the same time, the transition from clustering to classification, taking into account the presence of only two classes, is not a difficult task. It is important that the output of the Gaussian model of mixtures is the distributions of the parameters of objects within the cluster, the main of which are the mathematical expectations, which act as conditional centers of the clusters. To improve the results in the work, it is proposed to reevaluate the mean values of the distribution by modeling with mathematical models [5–8]. Recently, information technologies are being introduced into almost all areas of activity. It is used from estimation of technical objects state [9] to transport, including the carriage of passengers [10–12]. To now days, during the operation of taxi ordering services, a lot of information has been collected regarding their work. Data volumes can indeed reach hundreds of gigabytes and terabytes. In this regard, the issue of developing algorithms for automatic analysis of these data becomes relevant. One of the interesting areas, like credit scoring, is the quick decision making on the order. In this case, for each order that has just arrived, it is decided whether it is profitable for the taxi ordering service or not. At the same time, it is clear that with a percentage rate of work or with a daily payment, the service itself does not care what the internal content of the order is. However, according to these parameters, the cost of the order is formed. On the basis of this cost, experts can assess whether it was overestimated, underestimated, or corresponded to the real state of affairs when placing an order. One of the key areas in data mining is data classification, which, one way or another, comes down to recognizing features within the data. The solution to this problem today has a fairly large applied field of application. However, analysis of the literature shows that often the data of the taxi service, although they can be processed using modern efficient algorithms, but such processing acts as an internal process of the data owner. In this regard, it is interesting to consider this particular applied problem, since the automation of processes in this area is already achieving significant results, and in the near future its share will only increase.
16 Development and Research of Intellectual Algorithms in Taxi …
185
16.2 Methods and Data Gaussian Mixtures Model (GMM) can be easily applied when the distribution of object parameters is successfully described by Gaussian functions. In the simplest case, there can be 2 classes of objects described using one parameter. Moreover, this parameter for objects of different classes is described in such a way that the internal parameters of the distribution itself differ for different objects. If we are talking about a Gaussian or normal distribution, then the internal parameters are the mathematical expectation and variance. The one-dimensional Gaussian probability distribution function (PDF) is written in the form (16.1). 1 exp −(x − m x )2 /2σx2 , f (x) = √ 2π σx
(16.1)
where m x is average value of parameter X, σx2 is variance of parameter X. From Eq. 16.1, it follows that such a distribution describes only one object endowed with one parameter. At the same time, the transition to the description of a group of objects is not difficult. Then Eq. 16.1 can be summed up into Eq. 16.2: f G M M (x) =
N 1 f i (x), N i=1
(16.2)
where N is total number of objects in the distribution. Figure 16.1 shows a form of the PDF for two objects with one parameter. Wherein 2 2 m x1 = 0, σx1 = 1 and m x2 = 7, σx2 = 2. In Eq. 16.2, it is also assumed that the ratio of objects in the sample is the same. If the proportional contribution of objects is denoted as p1 , p2 , . . . , p N , where the sum of all proportions is equal to 1, then the mixture model will take the form of Eq. 16.3. Fig. 16.1 PDF of the simplest model of Gaussian mixtures
186
N. Andriyanov et al.
Fig. 16.2 PDF of proportional Gaussian mixtures model
f G M M (x) =
N
pi f i (x).
(16.3)
i=1
Figure 16.2 shows an example of a Gaussian mixture model such that 75% of the objects belong to the first class, and 25% of the objects belong to the second class with the same internal parameters of the distributions as in Fig. 16.1. From the graph shown in Fig. 16.2, it can be seen that the accepted values of the parameters correspond to the maximum probability that the object being clustered will be the object of the first cluster. Analysis of PDF in Figs. 16.1 and 16.2 allows concluding that with a sufficient distance of the mathematical expectations for the distributions of different objects, an extremely high accuracy of clustering is achievable. At the same time, it is important to highlight precisely those parameters that optimally describe their classes. Nevertheless, even Eq. 16.3 describes only a fairly abstract simple model. Increasing the depth of the model can facilitate more detailed cluster analysis. At the same time, the number of nested parameters grows, between which there may be correlations. Let us consider the scheme for forming a deep model of Gaussian mixtures (DGMM) shown in Fig. 16.3. So, such a model, like deep neural networks, is described by hidden layers. For the presented example, similarly to Eq. 16.2, we can write an expression for the PDF
Fig. 16.3 The structure of the Deep GMM formation
16 Development and Research of Intellectual Algorithms in Taxi …
187
in the case of a two-layer model, which takes the form of Eq. 16.4: N 1 f DG M M (x) = f i (x), N i=1
(16.4)
i where f i (x) = M1i M j=1 f i j (x) and Mi is the number of subgroups in the ith group. Finally, the expression for the multilayer PDF will be rewritten in the form of Eq. 16.5: f DG M M (x) = N
k2 ...k N k1
1
i=1
kN
f i (x),
(16.5)
i=1
where N is the total number of layers in the model, ki is the number of subgroups in the ith layer of the model. Using Eq. 16.5, it is possible to write Eq. 16.6, which describes different proportions between all objects, both the top layer and hidden layers: f DG M M (x) =
k1 k2 ...k N i=1
⎡ ⎣
Nk
⎤ pi j ⎦ f i (x),
(16.6)
j=1
where the sum of the proportions on each layer is equal to 1, and the sum of the products of the proportions that make up the open layer is 1. Figure 16.4 shows the distributions of DGMM together with their components for a two-layer model. The components are shown with dashed lines. Fig. 16.4 Family of PDFs of two-layer Gaussian model
188
N. Andriyanov et al.
Now let’s look at the data to be analyzed preparation process. In the case of using Gaussian models for analysis, all of the above is applicable to the multidimensional case. The only difference is that in addition to one parameter X, there are a number of other parameters that determine the dimension of the model. However, in this case, each object will be represented by a point in multidimensional space. To determine the cluster of an object, it is necessary to determine which peak is the closest to this point. As part of the study, a small sample of orders was taken. Several characteristics of the order were selected, including the cost. At the same time, the cost was normalized according to the average value, and all characteristics were presented in numerical form. So, for example, the order time from 0 to 24 h was distributed in the range from 0 to 1. Figures 16.5 and 16.6 show scatter diagrams of prices and other parameters such as order time (Fig. 16.5) and trip distance (Fig. 16.6). It is important to note that the price data are also presented with a certain proportionality coefficient, since they are confidential information of the taxi ordering service. According to the presented figures, one can observe completely different correlations of these parameters with the price. In the first case, there is no correlation between parameters. In the second case, there is a clear linear dependence. Actually, such scatter graphs could be constructed for a number of the studied parameters of the system. However, the task of the experts (managers of queuing services) was to evaluate its efficiency using one of two options based on a tabular sample of parameters and the final cost of an order: effective (that is, the cost is Fig. 16.5 Relationship between order time (X-axis) and order price (Y-axis)
4 2 0
Fig. 16.6 Relationship between trip distance (X-axis) and order price (Y-axis)
0
0.5
1
16 Development and Research of Intellectual Algorithms in Taxi …
189
Table 16.1 Expert assessment of the effective cost of taxi service orders Order
Order time
Vehicle class
Distance (km)
Number of free cars in the area of order
Weather conditions
Price
Expert opinion
Order №1
08:30 a.m
Standard
12
1
Weak rain
30 $
+(1)
Order №2
10.30 p.m
Economy
3.6
3
Clearly
7$
−(0)
02:45 p.m
Business
1.5
0
Cloudy
25$
+(1)
… Order №1000
reasonable for these parameters) which labeled by sign “+” (binary 1) or ineffective (i.e. the cost must be adjusted), which labeled by sign “−” (binary 0). These distributions look approximately as in Table 16.1. As can be seen from the Table 16.1, such parameters as time, vehicle class, distance of trip, number of free cars in the area, weather conditions were taken into account. The sample included information on 1000 taxi orders. At the same time, in further analysis, both methods implying training and clustering methods without training were used. However, in each case, the same test sample was considered, including 300, 500, and 600 orders. Accordingly, the learning outcomes in the models where it took place were also achieved on the same samples. Finally, the paper proposes to use the refinement of clustering using a stochastic model. Consider the idea of this approach. Cluster centers are formed iteratively, when analyzing new objects and changing their clusters, because the transition of an object with its own parameters from one cluster to another causes a change in the characteristics of the Gaussian distribution within the cluster. Since the centers of clusters can be formed iteratively, it is possible to construct a doubly stochastic model to describe the centers of clusters for each feature in each cluster in accordance with Eq. 16.7: X im (tk ) = Xˆ im (tk )ρ(tk−1 ) + σxim (tk−1 ) ∗
1 − ρ 2 (tk−1 ) ∗ ξim (tk ),
(16.7)
where i is the serial number of a feature, m is the cluster number, tk is the point in time at which the kth iteration of changing the center of the cluster is performed, ρ(tk−1 ) is correlation estimation based on the previous values of the coordinates of the cluster 2 (tk−1 ) is variance estimate of feature X i in cluster with number m based centers; σxim on the previous values of the coordinates of the centers of the clusters; ξim (tk ) is random addition at time tk , having zero mean and unit variance. The value Xˆ im (tk ) is center coordinates calculated for standard K-means. On the one hand, Eq. 16.7 is used for generation alternate clusters centers, which will provide alternate clustering results, on the other hand, the radius for generated
190
N. Andriyanov et al.
clusters centers can be controlled by correlation ρ in the sequence. However, to ensure convergence, over time, the correlation should tend to unity. So at each step the cluster is chosen by comparing a group of clustering results with K-means estimated center and generated alternate centers. At the same time, it is necessary to have vectors describing the change in the correlation and variance of features in the cluster, on the basis of which it is possible to predict new values using autoregression models. It can be used to nominate two more cluster centers. Then the decision on the final choice of the cluster will be made according to the majority principle.
16.3 Results and Discussion For a test sample of 300, 500, and 600 orders evaluated by their efficiency by experts, automatic recognition of order classes was performed. After dividing orders into 2 groups, the clustering algorithm was rebuilt to solve the classification problem. In this case, algorithms based on decision trees (trained for other 700, 500 and 400 orders, respectively) and an algorithm based on a neural network with back propagation of an error, including 5 hidden layers with 8 neurons each, were used. The network was also trained on the corresponding samples of 700, 500 and 400 orders. According to the data presented in Tables 16.2, 16.3 and 16.4, it can be seen that the Gaussian mixture model provides higher accuracy than the neural network, but the decision Table 16.2 Results of comparing the operation of various clustering algorithms for a variety of factors on 300 orders Model
Accuracy, %
Precision, %
Recall, %
«+»
«+»
«−»
«−»
Decision tree
78.0
82.1
63.6
88.9
50.0
Neural network
78.0
76.7
100.0
100.0
21.4
Gaussian mixtures model
78.0
83.8
61.5
86.1
57.1
Ours
80.0
84.2
66.7
88.9
57.1
Table 16.3 Results of comparing the operation of various clustering algorithms for a variety of factors on 500 orders Model
Accuracy, %
Precision, %
Decision tree
78.0
82.1
63.6
88.9
50.0
Neural network
78.0
76.7
100.0
100.0
21.4
Gaussian mixtures model
78.0
83.8
61.5
86.1
57.1
Ours
80.0
84.2
66.7
88.9
57.1
«+»
Recall, %
«−»
«+»
«−»
16 Development and Research of Intellectual Algorithms in Taxi …
191
Table 16.4 Results of comparing the operation of various clustering algorithms for a variety of factors on 600 orders Model
Accuracy, %
Precision, %
Recall, %
«+»
«−»
«+»
«−»
Decision tree
83.3
86.4
75.0
90.5
66.7
Neural network
85.0
85.1
86.1
84.6
57.1
Gaussian mixtures model
83.3
88.1
72.2
88.1
72.2
Ours
86.7
87.0
85.7
95.2
66.7
tree allowed achieving a better result. Optimal estimates are provided by Gaussian mixture models combined with doubly stochastic model cluster center estimation. It should be noted that there was no training for the mixture model. The main metric was the percentage of correct recognitions or widely known accuracy. The choice of this metric is explained by the fact that it can be considered as averaging the accuracy and completeness metrics for each class, while in the original sample there are 720 objects of a positive class and 280 of a negative one. This allows further consideration also of the precision and recall metrics. So to accurately identify inefficient orders, it is better to use a decision tree model, which also allows efficient orders to be found with the best results in terms of recall. Results for another splitting of original sample are presented by Tables 16.3 and 16.4. It should be noted that our model provides better results. However, all algorithms provide acceptable metrics of precision, recall, and accuracy. Precision tells us the probability with which the algorithm accurately determines a particular class, and recall shows how efficiently the algorithm is able to determine objects of a given class among all available ones. On average, the proportion of correct recognitions or accuracy increases when doubly stochastic models are used. Interestingly, the model showed better performance on a larger number of orders.
16.4 Conclusions The work is devoted to investigation and development of algorithms for clustering queuing systems information by several parameters. Special attention is paid to taxi order services. Based on the results of the work of the taxi ordering service, a sample of 1000 orders and a summary of such orders characteristics were prepared. The following parameters were included in the order characteristics: order time (morning, afternoon, evening, night), distance between the starting and ending points of the route, weather conditions (hot, cold, precipitation and comfortable conditions), service class (economy, standard, business). For such orders, their total cost has also been added. On the basis of the obtained dataset, the experts of the taxi ordering service such as taxi managers performed a binary classification of the proposed orders, at the output of which one more parameter was added to the initial data,
192
N. Andriyanov et al.
called the “benefit from the order”. Experts put the tags “order is beneficial to taxi service” and “order is not beneficial to taxi service”. Based on the results of such a classification, an initial attempt was made to learn how to determine the profitability and disadvantage of orders, which will subsequently help to find the ratio of profitable / unfavorable orders and take measures to optimize the pricing policy for unfavorable orders, taking into account the selected criteria. The best results in clustering are provided by our proposed model with reconfiguration clustering centers using doubly stochastic models. This model makes gains in accuracy about 2–3% comparing with common machine learning methods. And in general depending on test data proposed model is more effective in terms of recall for “not beneficial orders” and provides gain about 10%. Acknowledgements This work was funded by the Russian Foundation for Basic Research under RFBR grant № 19-29-09048.
References 1. Kozodoi, N., Jacob, J., Lessmann, S.: Fairness in credit scoring: Assessment, implementation and profit implications. CoRR arXiv preprint, arXiv:2103.01907v3 (2021) 2. Niu, B., Ren, J., Li, X.: Credit scoring using machine learning by combing social network information: evidence from peer-to-peer lending. Information 10, 397 (2019) 3. Zhanga, Y., Chia, G., Zhanga, Z.: Decision tree for credit scoring and discovery of significant features: an empirical analysis based on Chinese microfinance for farmers. Filomat 32(5), 1513–1521 (2018) 4. Andriyanov, N.A., Tashlinsky, A.G., Dementiev, V.E.: Detailed clustering based on Gaussian mixture models. Adv. Intell. Syst. Comput. 1251, 437–448 (2021) 5. Filin, Ya. A., Lependin, A.A.: Application of a Gaussian mixture model for verifying a speaker using arbitrary speech and countering spoofing attacks. Multicore processors, parallel programming, FPGAs. Signal Process. Syst. 6, 64–66 (2016) 6. Andriyanov, N., Sonin, V.: The use of random process models and machine learning to an-alyze the operation of a taxi order service. ITM Web Conf. 30, 04014 (2019) 7. Avdeenko, T., Khateev, O.: Taxi service pricing based on online machine learning. Data Mining Big Data 1071, 289–299 (2019) 8. Krasheninnikov, V.R., Vasil’ev, K.K.: Multidimensional image models and processing. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-3, ISRL 135, 11–64. Springer International Publishing, Switzerland AG (2018) 9. Kuvayskova, Y., Klyachkin, V., Krasheninnikov, V.: Recognition and forecasting of a technical object state based on its operation indicators monitoring results. In: 2020 International Multiconference on Industrial Engineering and Modern Technologies, FarEastCon 2020, pp. 1–6 (2020) 10. Naseer, S., Liu, W., Sarkar, N.I., Shafiq, M., Choi, J.-G.: Smart city taxi trajectory coverage and capacity evaluation model for vehicular sensor networks. Sustainability 13, 10907 (2021) 11. Hassouna, F.M.A., Assad, M.: Towards a sustainable public transportation: replacing the conventional taxis by a hybrid taxi fleet in the west bank. Palestine. Int. J. Environ. Res. Public Health 17, 8940 (2020) 12. Lee, S., Kim, J.H., Park, J., Oh, C., Lee, G.: Deep-learning-based prediction of high-risk taxi drivers using wellness data. Int. J. Environ. Res. Public Health 17, 9505 (2020)
Chapter 17
Using Machine Learning Methods to Solve Problems of Monitoring the State of Steel Structure Elements Maria Gaponova , Vitalii Dementev , Marat Suetin , and Aleksandr Tashlinskii Abstract The paper considers the issues related to the monitoring of the state of steel structures. Such monitoring is proposed to be based on the processing of video images obtained during the inspection of structural elements. The processing is performed using combinations of artificial neural networks that have passed special training procedures based on transfer learning technologies. This approach allows to overcome the problems of training and post-training of neural networks on small volume samples. The received results allow to claim the possibility of using the developed procedures for searching important classes of defects (in particular, cracks) on images and defining their parameters (size, class, danger degree, etc.). Thus work quality of these procedures (in particular the probability of correct detection and false alarms) appears to be close to the quality of work of competent personnel.
17.1 Introduction Among the many tasks that arise in digital image processing, a special place is occupied by the tasks of detecting an object in a photo or video image and estimating the parameters of such an object. A typical example of such tasks is monitoring the state of various structural elements, when it is required to test a hypothesis about the presence of a particular defect in a particular image and evaluate its parameters. This task has acquired particular importance in recent years due to the emergence of new methods for recording images of structural elements associated with the use of unmanned aerial vehicles. Such devices allow obtaining large volumes of high quality photographic material at minimal cost. However, the processing of such material manually requires, firstly, significant costs associated with the involvement of high-level specialists, and secondly, it can lead to natural human errors. The way out here is to use modern methods of machine learning, the processing quality of M. Gaponova (B) · V. Dementev · M. Suetin · A. Tashlinskii Ulyanovsk State Technical University, 32 Severny Venetz str., Ulyanovsk 432027, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_17
193
194
M. Gaponova et al.
which can be comparable with the results of the work of experts. This paper considers the implementation of such methods, as well as an assessment of their effectiveness. Notice that the problem of defects detection on digital images is a particular case of general problems related to the anomalies detection on such images. For the solution of these problems, two groups of methods are mainly used by now. The first group is associated with the search and use of mathematical definitions of the background image and detectable anomalies. A typical example of this is the Bayes and NeumannPearson criteria [1–3], which use a comparison of the likelihood ratio based on the ratios of conditional densities of the distribution of observations that appear in the presence and absence of anomalies, with some threshold values, usually determined by the probability of a false alarm [2]. Unfortunately, the complexity and variety of real images of metal structures does not allow to form adequate mathematical models of such images. This in turn leads to almost insurmountable difficulties in the synthesis of appropriate detectors. A deliberate simplification of mathematical models, which use simple color and morphological features [4, 5] for this problem, leads to significant errors in practical application. For example, Fig. 17.1 shows a characteristic image of a fragment of a steel structure with a crack beginning at the junction of the steel beam and bridge support. Obviously, the use of classical methods based on the color (brown tint) and morphological (elongated shape) features of the detected object can lead to a significant number of false positives. Indeed, in Fig. 17.1a it is possible to notice several large objects and a large number of small formations that meet these criteria. Figure 17.1b shows the highlighted areas potentially containing a defect. The second group of methods to identify anomalies in images is associated with the use of machine learning algorithms [6–16]. In this case, specially prepared training samples of large volume containing realizations of this or that anomaly are used. By adapting the weights of artificial neural networks to these samples, for example, on the base of the error back propagation algorithm, it is possible to obtain stable procedures for detecting and selecting anomaly regions [7]. However, the feature of
Fig. 17.1 Example of: a image containing a defect (crack), b the results of the color-morphological detector
17 Using Machine Learning Methods to Solve Problems …
195
the problem to be solved is the relatively small volume of images that can be used as elements of the training sample, and the variety of possible defects. The paper has the following structure. Section 17.2 contains a description of the proposed approaches that make it possible to train a combination of neural networks on samples of small-sized defect images. In Sect. 17.3, the effectiveness of the algorithms found in processing real photo and video images of defects in metal structures was evaluated. Section 17.4 contains the main conclusions on the work done.
17.2 Processing Architecture Synthesis In previous work [17, 18], an approach for detecting defects based on the use of an artificial neural network U-Net [11] has already been described. At the same time, we used the classical approach associated with the formation of a training sample, optimizing the weights and network structure for this sample, as well as checking the quality of the network on images that were not included in the training sample. To improve the quality of processing, a preliminary selection of areas of attention that potentially contain defects was applied. The fundamental feasibility of such an approach has been established, and efficiency estimates have been found. In particular, it was shown that the false alarm level of synthesized detectors is clearly lower than the possible practical requirements that arise in such tasks. At the same time, serious disadvantages of such a solution were identified, associated with high requirements for the volume and quality of the training sample, as well as low performance, which does not allow processing data in real time. To overcome these shortcomings, it is proposed to use joint image processing of structural elements potentially containing defects using a combination of artificial neural networks of the YOLO [13] and U-Net families [11, 14, 15]. At the first stage of processingb the original image is passed through the YOLO architecture, as a result of which rectangular areas most likely containing the defect (attention areas) and obvious structural elements (bolts, holes, marks) are identified on the image, which can be used to estimate the parameters of the defect. The identified rectangular regions are then transformed into an input data and fed into a modification of the U-Net neural network for segmentation. At the same time, the encoding part of U-Net was trained for classification on a large sample of real-world ImageNet images and further trained on a specialized sample of Kaggle cracked images in concrete [19]. The decoding part of U-Net was trained in defect segmentation on a specialized sample of defects using augmentation procedures as part of transfer learning procedures. Figure 17.2 shows an example of an image of such labelled training sample. Also, the U-Net training procedure included a mandatory partition into separate areas of size 256 × 256 (tiling). Note that the transfer learning procedure for image segmentation is used in various fields: medical research, satellite image processing, object segmentation, etc. Transfer learning results are generally superior to training a neural network from
196
M. Gaponova et al.
Fig. 17.2 Examples of defects from the training set
Table 17.1 Training results of the U-Net neural network Training the U-Net neural network
Number of trainable parameters
Training sample from Russian Railways (defects of various types) Dice coefficient on the validation set
Dice coefficient on the test set
Training from scratch
7,846,657
0.72
0.71
Further training
3,134,433
0.8
0.77
scratch. In fact, a direct comparison of U-Net neural networks with identical architecture in detecting defects in photos of steel structures showed that significantly better results are provided by an instance that has undergone the transfer learning procedure. In particular, Table 17.1 shows the results of the search and selection of cracks in the images of steel structures. In this case, the Dice coefficient as a metric for selecting the optimal parameters of the specialized part of the neural network is used. The analysis shows that such an approach allows to obtain a stable gain of 8–10% by the Dice coefficient for the detection defect area. Some disadvantage of using transfer learning technology was a certain decrease in the confidence of the neural network in the received predictions. This fact is expressed in the so-called “reinsurance” of the neural network, when a negative decision in checking the hypothesis about the presence of an object in the image fragment is more often accepted. Figure 17.3 shows a typical example of such “reinsurance”. Such “over-insurance” can have a negative impact on the quality of the subsequent analysis of segmentation results in order to determine the type, size, shape and dynamics of crack propagation. A kind of compensation for this effect can be the joint use of semantic segmentation based on the pre-trained U-Net neural network along with the defect detector based on the YOLO network. This design actually is based on the advantages of pre-localization of areas of attention, which are associated with natural markers on the images of the steel structure elements. Actually, in this case YOLO in addition to the defect area itself highlights areas potentially containing this defect. The analysis shows that for this the network also uses indirect features, such
17 Using Machine Learning Methods to Solve Problems …
a
b
197
c
Fig. 17.3 Defect segmentation in a photographic image of a steel structure: a marked image, b predictions made by U-Net neural network after direct training, c predictions made by U-Net with a coder further trained on crack classification
as typical steel structure element connections and other objects of interest, such as drilled crack ends or high-strength bolts. Such an approach makes it possible to expect that localization of attention areas, in turn, will have a positive effect on the segmentation quality and make it possible to estimate the parameters of the detected defects, in particular their size.
17.3 Experiment Description To implement the above method, two artificial neural networks YOLO and U-Net with the architecture shown in Figs. 17.4 and 17.5 were used. The U-Net coder consists of convolutional layers of the VGG-11 [12, 15] neural network pre-trained in classification on a set of real-world ImageNet images. To solve the problems associated with the vanishing gradient, the parameters of the convolutional layers of the U-Net decoder were initiated in a Kaiming-type method before training. Further training of the U-Net neural network layers for semantic segmentation of images was realized in two stages. As part of the first stage, the final layers of the U-Net neural network encoder based on the images of defects (surface cracks) on the surface of concrete, formed on the basis of convolutional layers of the VGG-11 neural network were retrained. The volume of such a training dataset was 40,000 images (Surface Cracks Detection Dataset Kaggle). To eliminate the effect of overtraining, augmentation procedures such as scaling, horizontal reflection, and image rotation were used. A tenth of the images from the training sample were reserved for validation during training. At the second stage, the layers of the U-Net neural network encoder are trained for semantic segmentation of images with defects on the surface of steel structures. The training dataset represents a dataset of 278 labeled at the pixel level photographic images of defects in steel structures (cracks). The labelling consists in preliminary preparation of a black-and-white mask for each image from the dataset, on which the
198
M. Gaponova et al.
Fig. 17.4 U-Net neural network architecture
Fig. 17.5 YOLO neural network architecture
pixels belonging to the defect are highlighted in white, and the background pixels in black. The preparation of the training dataset ends with the localization of defects, which consists in the selection of rectangular fragments in the images that directly contain defects and the transformation of the sizes of the selected regions to 256 × 256 for feeding to the input of the neural network. During localization, 4 training samples were prepared: 100 images with markers in the defect area, 100 images with defects only, 100 images of the first sample with markers painted in the background color, and 100 mixed images with and without markers. The test dataset represents a separate dataset of 89 labelled photographic images of defects in steel structures. To level the consequences of the effect associated with the imbalance between the pixels of the defects and the background in the photographic
17 Using Machine Learning Methods to Solve Problems …
199
Table 17.2 The results of individual and joint work of the trained artificial neural networks on the control sample Type of network
False alarm probability
Probability of correct detection
YOLO only
0.07
0.86
U-Net neural network without transfer learning
0.04
0.75
U-Net neural network with transfer learning
0.01
0.81
Sharing U-Net and YOLO networks
0.01
0.91
images, the procedure of cutting the whole image into separate square fragments of 256 × 256 sizes (tiling) is used. The Sorensen coefficient (Dice coefficient) is used as a quality metric for testing. The standard optimization algorithm Adam is used during training. Binary crossentropy is used as a loss function. Training is carried out over 100 epochs with a step of 10–4 . The training time is 8–17 h on an RTX 3000 graphics card. The segmentation time of a single image is 1.3 s on average. The YOLO neural network was trained in a standard way on the same set of premarked images of steel structure elements. 20% of the images from the training set were reserved for validation and were not used for training in order to monitor the dynamics of learning. Training was done on Nvidia Quadro RTX 3000 video card for 8 h. Average processing time of a single frame was 0.05 s. The results of individual and joint work of the trained artificial neural networks on the control sample, the images from which were not used for training and validation, are shown in Table 17.2. The volume of the validation set is 100 images. Notably, the described neural network approaches to detecting defects in images of steel structures, in turn, can be successfully used to solve the problem of assessing the temporal dynamics of defect propagation. Within that task, it is advisable to use the results of segmentation to assess the size of the identified defects and compare the selected areas of attention on images taken at different time intervals. At the same time, it is possible to determine the size of defects by using a priori information about the known dimensions of the specified markers on the images of steel structures (see Fig. 17.6). Such markers include elements embedded in the structures themselves (high-strength bolts) or special markers artificially applied to the structural elements. Comparison of defect images is supposed to be performed using the pseudogradient adaptation method. The developed algorithms for defect matching and detection were applied to the problem of assessing changes in multi-view and multi-temporal images of the defects themselves. The task was to determine the possibility of determining the fact that two multi-view images contain the same object. The solution of this problem made it possible to quantify the changes that occurred with the defects. For this, about 30 pairs of multi-view images containing the “defect–crack” were made. Matching of these images was performed. It was determined that 22 pairs matched with a very high
200
M. Gaponova et al.
Fig. 17.6 Preliminary localization of the objects of interest YOLOv3
degree of accuracy (inter-frame correlation over 90%), 4 pairs matched well (interframe correlation over 85%), and 4 pairs matched poorly (inter-frame correlation less than 85%). The analysis of the results showed that the quality of matching naturally depends on the difference in shooting points. The greater this difference, the worse the images are matched. Nevertheless, the obtained results show the principal possibility of qualitative matching of such images. To illustrate this, the following experiment was performed. For one of the images in one of the pairs, a change was made in the photo editor by artificially lengthening the crack in one of the images and removing one of the visible objects. These images were then matched together. As a result, despite the significant difference between these images (the difference in scale is only 1.3 times), the algorithm performed the matching with a correlation of 0.89 (see Fig. 17.7). This allows us to conclude about the correspondence of these images. Then, by consistently performing detection and assessment of defects in these images, it is possible to unambiguously conclude about the dynamics of crack development in this case.
a
b
c
Fig. 17.7 The result of combination of two images taken at different points in time: a first image, b second image, c result of the combination
17 Using Machine Learning Methods to Solve Problems …
201
17.4 Conclusions The paper considers the issues of process automation for monitoring the state of metal structures based on the combination of modified neural network detectors. In particular, the use of combinations of U-Net and YOLO networks made it possible to ensure the detection of complex defects (in particular cracks) on real test video images with the general probability of correct detection above 0.9 and probability of false alarm less than 0.01, which is comparable with the performance of a qualified expert. It is important that these quantitative characteristics were achieved as a result of training on samples of small (less than 500 images) volume. In addition, the application of pseudogradient image matching procedures for the comparison of multi-temporal and multi-focus images together with the proposed detectors allowed to solve the important problem of estimating the defect parameters and their dynamics.
References 1. Soyfer, V.A.: Methods of Computer Image Processing, 2nd edn. FIZMATLIT, Moscow (in Russian) (2003) 2. Vasil’ev, K.K.: Optimal Signal Processing in Discrete Time. Radiotechnika, Moscow (in Russian) (2016) 3. Andriyanov, N.A, Vasil’ev, K.K., Dementev, V.E.: Investigation of filtering and objects detection algorithms for a multizone image sequence. In: International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences—ISPRS Archives, vol. 42, pp. 7–10. ISPRS Archives, Moscow, Russia (2019) 4. Bouman, C.A.: Model Based Imaging Processing. Purdue University, West Lafayette, IN, USA (2013) 5. Vizilter, Yu.V., Gorbatsevich, V.S., Vishniakov, B.V., Sidyakin, S.V.: Searching for objects in an image using morphlet descriptions. Comput. Opt. 41(3), 406–411 (in Russian) (2017) 6. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015) 7. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989) 8. Chervyakov, N.I., Lyakhov, P.A., Nagornov, N.N., Valueva, M.V., Valuev, G.V.: Hardware implementation of a convolutional neural network using computations in the residual class system. Comput. Opt. 43(5), 857–868 (in Russian) (2019) 9. Raghu, M., Zhang, C., Kleinberg, J., Bengio, S.: Transfusion: Understanding transfer learning for medical imaging. In: 33rd Conference on Neural Information Processing Systems (NeurIPS), pp. 3347–3357. NeurIPS Proceedings, Vancouver, Canada (2019) 10. Gan, Z., Chen, Y., Li, L., Zhu, C., Cheng, Y., Liu, J.: Large-scale adversarial training for visionand-language representation learning. In: 34th Conference on Neural Information Processing Systems (NeurIPS), vol. 1, pp. 1–13. NeurIPS Proceedings, Vancouver, Canada (2020) 11. Ronneberger, O., Fisher, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), vol. 9351, pp. 234–241. Springer, Cham (2015) 12. Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1237–1242. Barcelona, Spain (2011)
202
M. Gaponova et al.
13. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. IEEE, Las Vegas, NV, USA (2016) 14. Kohl, S.A.A., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J.R., Maier-Hein, K.H., Eslami, S.M.A., Rezende, D.J., Ronneberger, O.: A probabilistic U-Net for segmentation of ambiguous images. In: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), pp. 1–11. IEEE, Montréal, Canada (2018) 15. Iglovikov, V., Shvets. A.: TernausNet: U-Net with VGG11 encoder pre-trained on ImageNet for image segmentation. CoRR arXiv preprint, arXiv:1801.05746v1 (2018) 16. Ye, L., Liu, Z., Wang, Y.: Learning semantic segmentation with diverse supervision. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE, Lake Tahoe, NV, USA (2018) 17. Dementev, V.E., Gaponova, M.A., Suetin, M.N.: Specific features of software development to solve problems of monitoring the condition of steel and reinforced concrete products. In: Modern Problems of Design, Production and Operation of Radio Technical Systems, pp. 100– 103. Ulyanovsk State Technical University, Ulyanovsk (in Russian) (2020) 18. Dementev, V.E., Gaponova, M.A., Suetin, M.N.: The use of machine learning methods to detect defects in images of metal structures. Pattern Recognit. Image Anal. 31, 506–512 (2021) 19. Concrete Crack Images for Classification, https://www.kaggle.com/thesighsrikar/concretecrack-images-for-classification. Last accessed 31 Jan 2022
Part III
Advances in Intelligent Data Processing and Its Applications
Chapter 18
Accurate Extraction of Human Gait Patterns Using Motion Interpolation Margarita N. Favorskaya
and Konstantin A. Gusev
Abstract Gait-based human recognition is a promising biometric modality for automatic visual surveillance. Over the past three decades, human gait has been the subject of permanent research based on video sequences of varying resolution, scale and duration. The range of distinct features expends when various physical devices are used to capture data about gait cycle. However, the use of a single camera remains the main technical solution in many surveillance and recognition tasks. This study aims to improve the extraction of human gait patterns using novel approach to motion interpolation based on optical flow and deep learning techniques. Interpolated optical flow maps provide more information than the binary silhouette technique does. This allows to get better recognition results. We obtained good experimental results for silhouette representations, especially for low-frequency test videos. The proposed technique is useful for constructing a gait model of individual at the training stage.
18.1 Introduction Various physiological and behavioral characteristics can be used to recognize a person, including face, iris, fingerprint, palm print, veins, DNA, gait, voice, typing rhythm, and signature. Each of the characteristics is defined in its own way and has advantages and disadvantages, possibilities and limitations, methods and implementations in our daily life. Gait as a walking style is a well-recognized biometric modality that operates at a distance, and, at the same time, gait features are difficult to hide or spoof. The gait can be studied from a variety of perspectives, for example, in the treatment of diseases of the musculoskeletal system, forensic evidence in court, the recognition of the human personality, and so on. The first video-based gait analysis dates back to the 1990s, and since then it has been a wide field of research activity focused on robust gait representation that is invariant to clothing styles, carrying M. N. Favorskaya (B) · K. A. Gusev Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy ave., Krasnoyarsk 660037, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_18
205
206
M. N. Favorskaya and K. A. Gusev
objects, viewing angles, age, gender, walk speed, footwear, time duration between two recordings, injuries, and even emotions. Obviously, the mentioned above topics are difficult to solve in realistic video sequences, leading to new attempts to improve the accuracy and reliability of gait recognition while minimizing computational costs. Vision-based and sensor-based approaches are available to collect gait information. Sensor-based approach uses wearable sensors such as an accelerometer and gyroscope attached to the human body and floor sensors, which typically receive footprints information for medical purpose. However, one of the simplest ways is to obtain gait information captured by conventional indoor and outdoor CCTV cameras. Thus, many algorithms have been developed for this type of information. Our contribution is to improve the quality and quantity of the original video data using motion interpolation based on optical flow analysis. We can augment the data and additionally find the features of a person’s internal movement, especially for complex cases. Moreover, inaccurate segmentation of the human body against the background in the wild negatively affects the recognition accuracy, and optical flow analysis helps to solve this problem. The drawback of our approach is the necessary pre-processing for motion interpolation, which can make the problem out of the real-time implementation. The rest of the paper is structured as follows. Section 18.2 provides a literature review on methods for improving the original gait data, as well as frame interpolation methods. The proposed method of motion interpolation is presented in Sect. 18.3. Experimental results are reported in Sect. 18.4. Section 18.5 concludes the paper.
18.2 Related Work The physical length of body parts and current motor skills form a person’s gait cycle. The gait cycle includes two periods: stance, when the foot is touching the ground, and swing, when the limb is moving through the air, with 60% and 40% duration of the gait cycle, respectively. Gait style means not only stride length and walking speed, but also the way of 3D movement of torso, swinging of the arms and head position, in other words, all parts of the body are involved in walking, and all global and local features must be taken into account. The vision-based approach involves exploring marker-based and marker-less solutions. The marker-based techniques compute the dynamic features of joint angle trajectories, hip rotation pattern and so on for medical research or virtual representation of patients or actors [1]. At the same time, the marker-less techniques use spatio-temporal features such as shape, color, edges, and motion estimates to capture gait patterns for human recognition [2]. As well-known, gait recognition techniques are categorized into model-based and model-free. The model-based methods create the structure of the human-body and construct the features of various parts of the body based on motion information. Formally, the model-based methods are divided on structural models and motion models. On the other hand, the model-free methods, including template creation,
18 Accurate Extraction of Human Gait Patterns …
207
shape analysis and spatial–temporal representation, extract gait features directly from video sequences without building any structural model. Since gait is the result of a person motion, most methods employ motion information as crucial features. Castro et al. [3] were among the first who applied convolutional neural network (CNN) for gait identification using optical flow features. Optical flow cuboid consisted of optical flow maps in time series providing low-level motion features as input to CNN. Experiments with the challenging TUM-GAID dataset were implemented on a low resolution version of the original video sequences, but have shown results compared to other authors. A deep learning approach using optical flow with the additional descriptors was explored in [4] and tested on several deep neural network architectures using CASIA B gait dataset and TUM-GAID dataset. In [5], the use of optical flow to construct a discriminative biometric signature for gait recognition was studied. Local and global optical flow features were combined into histogram-based descriptor. Skeleton-based model called Siamese denoising autoencoder network was proposed in [6]. It was trained to remove position noise, recover missing skeleton points and correct outliers in joint trajectories. One of the promising directions is the construction of spatio-temporal gait feature descriptors based on the 2D human poses, as it was done by Hasan and Mustafa in [7]. Their spatio-temporal gait descriptors included trajectories of joint angles, temporal displacement and length of body-parts. These 50 descriptors were the inputs of a recurrent convolutional neural network (RCNN) with 2 layers called bidirectional gated recurrent units. The authors argued that their approach achieved comparable performance to gait energy image (GEI) methods or costly 3D poses for gait descriptors. Reducing the frame rate of recorded gait sequences may degrade gait recognition due to the sparsity of the observed gait phases. Thus, an analysis of gait fluctuations with following phase synchronization was proposed in [8], taking into account the detection of a false gait period due to the low frame-rate videos. The average gait over the whole sequence was used as input feature template for the gait recognition system in [9]. Frame interpolation is a well known technique applied in various tasks of computer vision, for example in stereo vision [10], video stabilization [11], outdoor surveillance, and so on. Frame interpolation performs frame rate conversion, frame recovery, or adaptive animation rendering using optical flow-based and kernel-based methods. Conventional algorithms create an intermediate frame between two base frames without additional processing or by slightly removing noise and sharpening the frame. This approach cannot produce a qualitative result and has limitations in its application. In [12], linear frame interpolation was used to normalize gait cycles of different lengths and approximately identify the start frame and end frame of half cycle. The advanced algorithms are based on additional interpolation parameters. The original approach to reduce the gap between adjacent frames was proposed in [13]. Generative adversarial network (GAN) called Frame-GAN was designed to feed frames to a generator and use the following frames as supervision. Thus, the frame rate was doubled compared to the original videos. The results were tested on the CASIA-B and OU-ISIR datasets. Liao et al. [14] studied a view angle as one form of the gait variations using Dense-View GAN to generate new gait energy images
208
M. N. Favorskaya and K. A. Gusev
(GEIs) with various views. The interpolation was defined by linear transformation of two GEI images. The proposed method was evaluated on the CASIA-B and OU-ISIR datasets with better gait recognition results at some angles. Xu et al. [15] proposed phase-aware gait cycle reconstructor (PA-GCRNet), which consists of two parts: PA-GCR for reconstructing a full gait cycle of a silhouette sequence and recognition network GaitSet [16].
18.3 Proposed Method of Motion Interpolation The proposed method consists of two stages: determination of the estimates of optical flow and scene depth and motion interpolation as such. It should be noted that application of deep learning methods leads to the better accuracy results, minimizing losses on different training sets. Sections 18.3.1 and 18.3.2 present estimates of optical flow and scene depth, respectively. Section 18.3.3 describes a process of motion interpolation.
18.3.1 Estimation of Optical Flow Optical flow helps to describe a walking pattern by generating a velocity field between two consecutive frames based on the corresponding feature points or patch displacement from frame to frame. The methods of Lucas-Kanade, Horn-Schunck, BuxtonBuxton, and variational methods are famous earlier methods. The initial assumption is that the estimation of optical flow is based on the brightness constancy constraint represented by Eq. 18.1, where I(x, y, t) represents the intensity of the pixel of coordinates (x, y) at the tth frame. I (x, y, t) = I (x + x, y + y, t + t)
(18.1)
The Taylor series is used to simplify Eq. 18.1 in the form ∂I ∂I ∂I u+ v+ = 0, ∂x ∂y ∂t
(18.2)
where u and v are the horizontal and vertical components for the optical flow, respectively. Mentioned above methods use different constraints on data and smoothness and have varying degrees of generalization. Thus, the Lucas-Kanade method imposes a limitation on the optical flow in the slicing window and uses the Hesse matrix. The Horn-Schunck method minimizes flow distortion and prefers smoother solutions.
18 Accurate Extraction of Human Gait Patterns …
209
Variational methods are modifications of the Horn-Schunck method. The BuxtonBuxton method is based on a model of the motion of the object boundaries in successive frames. Currently, a family of CNNs for optical flow calculation includes FlowNet2, LiteFlowNet2, PWC-Net, CNN with long short-term memory (LSTM), and DeepFlow, when multiple “short” CNNs are used instead of one “long’ CNN. Recently, LiteFlowNet3 was introduced by Hui and Loy [17]. LiteFlowNet2 and LiteFlowNet3, which is characterized by a novel warping of the flow field, have the following advantages over their “parent” network FlowNet2: use of short neural networks, pyramidal feature extraction, cascaded flow inference, flow regularization, fewer parameters (having 30 times fewer parameters and being 1.36 times faster than FlowNet2), as well as high accuracy and high speed. This neural network consists of two compact networks that specialize in pyramidal feature extraction and optical flow estimation. One of the networks transforms images into two pyramids of multidimensional functions, and the other network includes cascaded flow inference module and regularization module. As Hui and Loy mention in [17], LiteFlowNet3 is built upon LiteFlowNet2 with the incorporation of flow field deformation and cost volume modulation for further improving the flow accuracy. To extract optical flow from gait videos, the LiteFlowNet3 architecture was used. Cascaded flow inference module outputs useful information from each layer of two neural networks. Each of these layers has a feature deformation layer for displacing feature maps of the second image towards the first image using the flow estimate at the previous level. The rest of the flow is calculated to get a more accurate result between pixels. This scheme, using intermediate information for subsequent computations, resembles schemes using LSTM. Cascaded flow inference corrects the flow error earlier without passing the accumulated error to the next level. If a person moving in a cross-view direction respect to the camera is the single moving object in the scene, then the optical flow remains practically unchanged that effectively reduces the computational load associated with explicit matching. However, the flow field usually has blurred boundaries and unwanted artifacts. To solve this problem, an additional local convolution with different kernels is applied to regularize the local parts of the flow field based on the output of the cascade flow. The kernels of this local convolution adapt to the pyramidal parameters of the encoder.
18.3.2 Estimation of Scene Depth The depth map contains information about the distance of objects from the corresponding sensor, which is very important for many surveillance tasks. Scene depth can be estimated using active and passive depth sensors, such as structured light sensors or time-of-flight sensors. However, monocular depth estimation is an attractive approach for many applications, including gait recognition. Recently, various techniques were proposed for analyzing images and videos, for example, non parametric learning to improve the inferred depth maps, while optical flow was used to
210
M. N. Favorskaya and K. A. Gusev
ensure temporal depth consistency, MonoGRNet predicting the 3D bounding boxes of objects in a single forward pass [18], PDANet using perceptual and data augmentation consistency for self-supervised monocular depth estimation [19], deep RCNN and training method to generate depth maps from video sequences [20], and so on. Sometimes, a hybrid approach is used when the depth of the scene is estimated with preliminary information about the optical flow. Algorithm of depth map generation transforms input color images to grayscale or color output depth maps, whose pixel values indicate the relative distance from the camera to the object in the scene. However, monocular images do not contain explicit distance information, and various approaches try to find some objective cues. One of such cues is the depth-from-motion, which can be determined from consecutive frames. Experiments show that a RCNN is efficient for generating the depth maps based on depth-from-motion and interpolating frames at motion boundaries. This approach is similar to gait recognition using depth sensors. As a prototype, we used the RCNN architecture proposed in [20] and modified it for our task. RCNN has an autoencoder architecture composed of an encoder (E layers), bottleneck layer, and decoder (D layers). Some layers are enforced by GRU cells, which have the same or better performance than LSTM cells, but are easier to implement and train. The encoder reduces the size of the layer outputs. Thus, the transpose reshaping is added to some layers of decoder in order to increase the output size and avoid artifacts in the output images. Leaky rectified linear units (LReLU) were used in two layers as shown in Table 18.1. LReLU is functioned by Eq. 18.3, where α is the leak parameter defined as α = 0.1. f (x) = α max(0, x) + (1 − α) max(0, −x) Table 18.1 Configuration of RCNN for depth map generation Layer
Filter size
Stride
Layer depth
Activation function
E1
(3, 3)
2
64
LReLU
E2
(3, 3)
2
128
GRU
E3
(3, 3)
2
256
GRU
E4
(3, 3)
2
512
GRU
D1
(1, 1)
1
512
LReLU
D2
(3, 3)
1
512
GRU
(3, 3)
1
256
GRU
(3, 3)
1
128
GRU
D5
(3, 3)
1
64
GRU
D6
(1, 1)
1
3
sigmoid
Reshaping D3 Reshaping D4 Reshaping
(18.3)
18 Accurate Extraction of Human Gait Patterns …
211
As can be seen from Table 18.1, non-linear activation functions are used at each level, where the GRU cells are not applied. Thus, overlaying optical flow and scene depth maps allows to detect foreground moving objects instead of creating saliency maps. In addition, optical flow maps contain segmented, silhouette-like moving objects.
18.3.3 Motion Interpolation Keeping smooth boundaries of moving objects is the main problem of interpolation based on conventional motion estimates and deep neural networks. This problem is important for gait recognition due to its nature. Linear interpolation when calculating the average of the two frames cannot provide accurate results. In [21] for frame interpolation, the idea of residual learning between the average of the frames and the ground truth middle frame was proposed. We adapted this idea to create human gait patterns in such a way that the inputs of a deep neural network are optical maps with detected foreground moving objects. We can consider consecutive optical flow maps as the transformed original frames and interpolate optical flow maps. Residual learning is effective because the model needs to learn to adjust only the moving regions and not generate the entire frame. The deep neural network architecture for motion interpolation is similar to U-Net architecture, where the encoder uses ConvLSTM layers and the encoder and decoder are based on Conv2D layers. Another feature is the use of four consecutive maps M1, M2, M3, and M4 when interpolating only one intermediate map. Thus, CNN is not trained to generate an intermediate map, but is trained to correct moving regions. In other words, to avoid the ambiguities caused by motion estimates, the algorithm implicitly analyzes the motion between maps and generates an intermediate map. The CNN architecture is depicted in Fig. 18.1. The interpolated map is a result of concatenation of the CNN output and averaged M2 and M3.
Fig. 18.1 The CNN architecture
212
M. N. Favorskaya and K. A. Gusev
18.4 Experimental Results TUM-GAIT is a walking human database created by Technische Universität München to study gait-based human identification [22]. This database simultaneously contains RGB video (with resolution 640 × 480 pixels), depth (16 bit per pixels) and audio with 305 people in different variations, this is one of the largest to-date. Videos have lengths of 2–3 s apiece at a frame rate of 30 fps. All subjects were captured from the side view in different condition: ten videos per subject including six normal walks, two walks in coating shoes, and two walks with a backpack. Examples of video frame are shown in Fig. 18.2. Also, to further study of the issues related to time variation, a subset of 32 people was recorded a second time. The database is divided into several files, each of which contains a feature set or modality for all subjects. The RGB image and depth image sequences are presented in two variations: full-frame capture and tracking. Neural network for motion interpolation was trained by transfer learning on a dataset, which was built from the TUM-GAID database. The dataset was formed according to the following principle: 3 consecutive frames were taken from the original sequence of RGB video, input was formed as a map of the optical flow between the first and third frames, and the two reference outputs were formed as a map of the optical flow between the first and second frames and between the second and third frames, respectively. Dataset was divided into training and test subsets at a ratio of 80 to 20, respectively. Dataset was also extended by mirrored copies in horizontal dimension to improve robustness to motion direction. We compared the results of gait recognition without and with motion interpolation (MI), using the traditional skeleton-based methods namely the gait energy image (GEI), gait variance image (GVI), skeleton energy image (SEIM), and skeleton variance image (SVIM). The mean values of gait recognition without motion interpolation were taken from [23]. Normal (N), carrying a bag (B) and shoes (S) samples were selected from the TUM GAID dataset. Table 18.2 presents the results of gait recognition.
Fig. 18.2 Examples of frames from the TUM-GAIT database
18 Accurate Extraction of Human Gait Patterns …
213
Table 18.2 Mean values of gait recognition (%) for the TUM GAID database Method
N
B
Without MI
With MI
Without MI
GEI
99.7
99.7
SEIM
98.7
99.2
GVI
99.0
SVIM
98.4
S With MI
Without MI
With MI
19.0
30.1
96.5
97.4
18.4
29.8
96.1
98.1
99.2
47.7
56.5
94.5
96.7
98.9
64.2
68.3
91.6
93.8
As can be seen from Table 18.2, the best results with the proposed motion interpolation were achieved for cases with carrying a bag and better results were obtained for cases with different shoes.
18.5 Conclusions Obviously, a low frame rate greatly degrades any algorithm of gait recognition, especially model-free. In this work, we interpolate optical flow maps by pre-creating and concatenating them with scene depth maps. Our approach requires less information for motion processing because only moving foreground regions are analyzed and, at the same time, provides rich data for gait recognition. The experiments with of model-free (silhouette) methods based on our interpolated optical flow maps show better results compared to traditional binary silhouette methods.
References 1. Booth, A.T.C., van der Krogt, M.M., Buizer, A.I., Steenbrink, F., Harlaar, J.: The validity and usability of an eight marker model for avatar-based biofeedback gait training. Clin. Biomech. 70, 149–152 (2019) 2. Zeng, W., Wang, C., Yang, F.: Silhouette-based gait recognition via deterministic learning. Pattern Recognit. 47(11), 3568–3584 (2014) 3. Castro, F.M., Marín-Jiménez, M.J., Guil, N., Pérez de la Blanca, N.: Automatic learning of gait signatures for people identification. In: Rojas, I., Joya, G., Catala, A. (eds.) Advances in Computational Intelligence (IWANN 2017) LNCS, vol. 10306, pp. 257–270. Springer, Cham (2017) 4. Sokolova, A., Konushin, A.: Gait recognition based on convolutional neural networks. In: ISPRS International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLII-2/W4, pp. 207–212 (2017) 5. Mahfouf, Z., Merouani, H.F., Bouchrika, I., Harrati, N.: Investigating the use of motion-based features from optical flow for gait recognition. Neurocomputing 283, 140–149 (2018) 6. Sheng, W., Li, X.: Siamese denoising autoencoders for joints trajectories reconstruction and robust gait recognition. Neurocomputing 395, 86–94 (2020) 7. Hasan, M.M., Mustafa, H.A.: Multi-level feature fusion for robust pose-based gait recognition using RNN. Int. J. Comput. Sci. Inform. Secur. 18(1), 20–31 (2020)
214
M. N. Favorskaya and K. A. Gusev
8. Mori, A., Makihara, Y., Yagi, Y.: Gait recognition using period-based phase synchronization for low frame-rate videos. In: 2010 20th International Conference on Pattern Recognit. (ICPR), pp. 2194–2197. IEEE, Istanbul, Turkey (2010) 9. Guan, Y., Li, C.-T., Choudhury, S.D.: Robust gait recognition from extremely low frame-rate videos. In: 2013 International Workshop Biometrics Forensics (IWBF), pp. 1–4. IEEE, Lisbon, Portugal (2013) 10. Favorskaya, M., Pyankov, D., Popov, A.: Accurate motion estimation based on moment invariants and high order statistics for frames interpolation in stereo vision. In: Tweedale, J.W., Jain, L.C., Watada, J., Howlett, R.J. (eds.) Knowledge-Based Information Systems in Practice, SIST, vol. 30, pp. 329–351. Springer International Publishing, Switzerland (2015) 11. Favorskaya, M.N., Buryachenko, V.V.: Warping techniques in video stabilization. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-3, ISRL, vol. 135, pp. 177–215. Springer International Publishing Switzerland (2018) 12. Lee, C.P., Tan, A.W.C., Tan, S.C.: Gait recognition via optimally interpolated deformable contours. Pattern Recogn. Lett. 34, 663–669 (2013) 13. Xue, W., Ai, H., Sun, T., Song, C., Huang, Y., Wang, L.: Frame-GAN: Increasing the frame rate of gait videos with generative adversarial networks. Neurocomputing 380, 95–104 (2020) 14. Liao, R., An, W., Li, Z., Bhattacharyya, S.S.: A novel view synthesis approach based on view space covering for gait recognition. Neurocomputing 453, 13–25 (2021) 15. Xu, C., Makihara, Y., Li, X., Yagi, Y., Lu, J.: Gait recognition from a single image using a phase-aware gait cycle reconstruction network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision—ECCV 2020. LNCS, vol. 12364, pp. 386–403. Springer, Cham (2020) 16. Chao, H., He, Y., Zhang, J., Feng, J.: GaitSet: Regarding gait as a set for cross-view gait recognition. In: Proceedings of the 33th AAAI Conference on Artificial Intelligence (AAAI 2019), pp. 8126–8133. Honolulu, Hawaii, USA (2019) 17. Hui, T.W., Loy, C.C.: LiteFlowNet3: Resolving correspondence ambiguity for more accurate optical flow estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision—ECCV 2020. LNCS, vol. 12365, pp 169–184. Springer, Cham (2020) 18. Qin, Z., Wang, J., Lu, Y.: MonoGRNet: A general framework for monocular 3D object detection. Proc. AAAI Conf. Artif. Intell. 33(1), 8851–8858 (2019) 19. Gao, H., Liu, X., Qu, M., Huang, S.: PDANet: Self-supervised monocular depth estimation using perceptual and data augmentation consistency. Appl. Sci. 11, 5383.1–5383.15 (2021) 20. Mern, J., Julian, K., Tompa, R.E., Kochenderfer, M.J.: Visual depth mapping from monocular images using recurrent convolutional neural networks. In: AIAA Scitech 2019 Forum, pp. 1–10. San Diego, California, USA (2019) 21. Suzuki, K., Ikehara, M.: Residual learning of video frame interpolation using convolutional LSTM. IEEE Access 8, 134185–134193 (2020) 22. Hofmann, M., Geiger, J., Bachmann, S., Schuller, B., Rigoll, G.: The TUM gait from audio, image and depth (GAID) database: Multimodal recognition of subjects and traits. J. Visual Commun. Image Represent. 25(1), 195–206 (2014) 23. Whytock, T., Belyaev, A., Robertson, N.M.: On covariate factor detection and removal for robust gait recognition. Mach. Vis. Appl. 26, 661–674 (2015)
Chapter 19
Vision-Based Walking Style Recognition in the Wild Margarita N. Favorskaya
and Vladimir V. Buryachenko
Abstract Human recognition at distance has ever been an attractive solution for many applications. However, the challenges associated with this task, especially in the wild, are difficult to overcome. Most gait recognition methods do not take into account the shape of the body as a whole, but also a walking style, which helps to recognize a person more accurately. We offer a method for recognizing walking style at a reasonable distance of the camera so that the visual information of a walking person can be captured and analyzed. Method is based on the latest achievements in deep object detection and tracking, as well as deep networks for human action recognition, adapted for walking style recognition. We study six main categories of walking style based on head position, torso position, arm swing, stride length, and walking speed. Additionally, we analyze several clips from each video sequence for accurate recognition. Experiments show that sometimes we cannot explicitly assign a walking style to one category, but two categories such as driver and influencer describe human walking in the wild well. We achieved recognition results 82.4% Top-1 and 94.7% Top-2.
19.1 Introduction Walking style recognition is useful in many computer science tasks, including human identification, patient observation [1], training of humanoid robot [2], and so on. We are interested in walking style to identify a person. An individual walking style is determined by many subjective and objective factors and remains original for many years without destructive physical events on a person. The main focus of the existing algorithms is to analyze the movement of the legs as the most active part of the body M. N. Favorskaya (B) · V. V. Buryachenko Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy ave., Krasnoyarsk 660037, Russian Federation e-mail: [email protected] V. V. Buryachenko e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_19
215
216
M. N. Favorskaya and V. V. Buryachenko
when walking. Some algorithms build motion trajectories of not only the legs, but also the arms, dividing the body silhouette into several regions. However, a walking is not only periodic motion. We can find various cues useful for gait recognition if certain features of body movement are extracted by analyzing the walking style at a distance from the camera, especially considering the variety of shoes, clothing, carried objects, speed, ground surface condition, illumination, age, gender, and mood. Using information about the walking style, it is possible to make an assumption about the category of the observed person, thereby reducing the search area for gait patterns. It is well-known that model-based and model-free approaches involve motion analysis based on different data. A 3D avatar with a clear kinematic model is ideal solution for gait recognition, but is currently an expensive computational task. At the same time, computer vision has extensive experience in 2D and 3D shape analysis, motion estimates of foreground and background objects, taking into consideration the depth of the scene. It should be noted that the observation of human behavior provides additional information for gait recognition in the wild. Walking style can reveal a lot about personality, health and emotion. A general taxonomy of walking style does not exist in the technical literature. We can find different lists of walking styles (from 4 to 11 items), but the main ones are the following: driver, influencer, supporter, corrector, thinker, and multitasker. This list can be continued by the short strider, arm crosser, arm swinger, foot shuffler, and stomper related to gender and/or health features. Walking style defines a person’s character, such as being intelligent, imaginative and dependable, as well as a state of health. Our contribution is three-fold. First, we introduce the categories of walking styles, which are applicable to gait recognition as additional information. Note that under complex illumination conditions (i.e. night conditions), recognition of walking style becomes significant. Second, we have carefully analyzed deep methods for human detection and tracking for a model-free approach in the wild and have came up with a simple and effective solution based on the latest deep trackers. Third, we employed 3D CNN for walking style recognition and then experimented with own dataset. In the remaining part of the paper, Sect. 19.2 summarizes related work on human movement analysis and gait analysis. Section 19.3 presents a proposed categorization of walking styles in the wild. Section 19.4 describes the proposed method for recognition of walking style. The experimental results using own dataset are given in Sect. 19.5. Section 19.6 contains conclusions.
19.2 Related Work Gait-based recognition refers to biometric methods with a wide variety of qualitative and quantitative features, which lead to an individual walking style. At the same time, the individual gait pattern changes depending on physical, emotional or external conditions, making this type of recognition a “soft” biometrics, as opposite
19 Vision-Based Walking Style Recognition in the Wild
217
to a “hard” biometrics of face recognition or fingerprints. In this study, we are interested in a vision-based approach with a marker-less solution. Walking style recognition is a composition of several topics, including human detection and tracking and classification methods. The approaches to human detection and pedestrian detection differ because the human body scale is larger than the pedestrian target, but human detection datasets often include crowded urban scenes [3]. Thus, our task is close to human detection, more precisely, individual detection and tracking. Object detectors are subdivided into anchor-based and anchor-free detectors. In anchor-based detectors, the location of an object is defined by multiple anchor boxes, and the bounding box is considered as reference. Various deep network architectures were developed, for example Faster R-CNN, SSD, Cascade R-CNN, and YOLOv4. It should be noted that enhancement of anchor-based detectors continues at the algorithmic and optimal levels. The latest YOLOv5-v6.0 achieves high inference speed (processing 140 frames per second) and model precision (0.895 mean average precision (mAP)) even for small-sized objects. The main weakness of anchor-based detectors is a necessity to set hyperparameters that are sensitive to specific detection tasks. The anchor-free detectors directly process each pixel in the image to generate the bounding box based on paired or triplet keypoints as it made in CornerNet and CenterNet. These methods aggregate temporal information using proposal-level aggregation and feature aggregation at the level of the entire feature map. Some original solutions for object detection can be found in the literature. Thus, the one-shot conditional object detection framework detects all objects belonging to the target object category using a support image as input [4]. There are other aspects of object detection as well. A spatio-temporal CNN called STDnet-ST was designed to detect small video object of less than 16 × 16 pixels in size using two input frames simultaneously [5]. Only the most promising regions with objects are analyzed. Objects with huge scale variation require the use of a multi-scale pyramidal top-down structure or other similar approaches, where the first layers of the network extract information about edges and textures, and the deep layers provide information about the contours, then fusing all the data. A cascaded fully convolution network model with motion attention was proposed in [6]. This network, called CFCN-MA, included a semantic fully convolutional network (FCN) that captures the spatial context in order to create a coarse saliency map and a lightweight refinement FCN to obtain a final fine saliency map. Recently, implementation of visual object trackers was shifted from the correlation filter-based visual trackers [7], which used histograms of oriented gradients, color names and so on, to deep learning solutions [8]. The CNN-based tracking methods can be either discriminative or generative. The discriminative methods classify the tracked object, highlighting it in the scene [9], while the generative methods accurately match the object template in a given region [10]. It is also possible to use trackers based on recurrent neural networks (RNN), trackers in the form of a combination of CNN and RNN parts or autoencoder networks. It should be noted that end-to-end networks can be roughly divided into three categories that generate the output in terms of object score, confidence map and bounding box, respectively.
218
M. N. Favorskaya and V. V. Buryachenko
Fig. 19.1 The schematic silhouettes of: a driver, b influencer, c supporter, d corrector, e thinker, f multitasker
There are many publications regarding human action recognition and gait recognition, especially based on the gait energy image as an averaged over each gait cycle, but not on walking style of gait.
19.3 Walking Style Categories Gait recognition is a large-scale task, and analysis of body shape and body position in each time instance, captured by a CCTV camera, is essential. However, this topic is rarely discussed in the technical literature. To the best of our knowledge, we are the first to propose categories of walking styles adapted to gait recognition problems. Due to fuzzy interpretation of the roles of driver, influencer, supporter, corrector, thinker, and multitasker, the categorization is conditional. Let us briefly discuss the gait features of these six categories. The driver’s weight is directly straight ahead, and he/she walks forward quickly. The influencer walks with chest forward and shoulder back with head held high. The weight of the supporter is over the legs, not forward or backward. The corrector walks light on his/her toes and looks at the floor. The thinker is oblivious to the environment and often walks slowly. The multitasker does several activities at the same time, such as eating, talking, texting on a mobile, listening to music, etc. The schematic silhouettes of these roles are depicted in Fig. 19.1. Thus, head position, torso position, relative width of arm swing, and relative stride length (relative to the size of a pre-determined boundary box) are the main parameters of the walking style.
19.4 Method of Walking Style Analysis Walking style analysis involves two main steps, including pre-processing video to detect a person in frames (Sect. 19.4.1) and walking style recognition using a set
19 Vision-Based Walking Style Recognition in the Wild
219
of bounding boxes belonging to the individual (Sect. 19.4.2). These subtasks can be solved both by traditional methods of video processing and classification and by deep learning methods. We are interested in applying deep learning methods that provide more accurate results.
19.4.1 Human Detection in Videos The architecture of the deep CNN object detectors resembles a Lego-style subnetting chain. YOLOv5 refers to a single stage anchor-based detector architecture, which differs from YOLOv4 in its PyTorch implementation. This approach predicts the coordinates of a certain number of bounding boxes with the classification results and the probability of finding an object, and also corrects their location. In recent years, many authors have improved the YOLO architectures by adding interesting solutions in every part of the object detector. It should be noted that the basic goal of YOLO family detectors is fast operating speed of neural network and optimization for parallel computations. A modern detector usually consists of two parts [11]: a backbone, which plays a significant role in object detection and pre-trained on ImageNet, and a head, which is used to predict the classes and bounding boxes of objects. In addition, some layers are added between the backbone and the head to collect feature maps from different stages. Usually this part of the object detector is called the neck and includes several bottom-up paths and several top-down paths. Thus, a conventional object detector consists of following parts: • • • •
Input: image, patches, image pyramid. Backbones: VGG16, ResNet-50, SpineNet, among others. Neck: additional blocks and path-aggregation blocks. Heads: dense prediction (single stage) and sparse prediction (double stages).
To improving the training stage, the YOLO architecture involves special activation functions, bounding box regression loss, data augmentation, regularization methods, normalization of the network activation functions by their mean and variance, and skip-connections. Usually, a conventional object detector is trained offline. The slang term “bag of freebies” (BoF) refers to the use of additional training methods that provide higher accuracy without increasing inference cost. To enhance certain attributes in the model, some plugin modules have been added and post-processing methods are employed to form a “bag of specials” (BoS). Different BoF and BoS are implemented for backbone and for detector. All these improvements make YOLOv5 one of best object detectors among others. Figure 19.2 depicts examples of YOLOv4 outputs for videos from KITTI Vision Benchmark Suite [12] and MOT16 [13]. Thanks to its high speed, we can transform the original video sequence into a sequence of human bounding boxes without using a human tracker for walking style analysis.
220
M. N. Favorskaya and V. V. Buryachenko
Fig. 19.2 Example of a sequence transformation: a original frames from videos Kitti_0016.mp4, Kitti_0018.mp4, Kitti_0024.mp4, b human bounding boxes from the same frames
19.4.2 Walking Style Recognition Various pre-trained convolutional network (ConvNet) models have been available since the 2010s to solve the problem of human action recognition. This was a move from 2D CNN models to 3D CNN models with 3D kernels that can analyze spatiotemporal information based on various architecture solutions. Walking is one of the actions, and a walking style can be seen as a special feature of walking, but at the same time, this feature is very useful for gait recognition in the wild. Usually 3D CNNs are pre-trained using the ImageNet dataset. In [14], a deep 3D CNN model was developed for gait recognition, taking into account the limitation of gait recognition datasets. 3D CNN directly create hierarchical representations of spatio-temporal data using spatio-temporal filters. Moreover, they can be two-stream, combining spatial information and optical flow for better prediction. The C3D architecture proposed in [15] was a generic, compact, simple, and efficient implementation of the 3D ConvNet architecture, so some interesting modifications have been suggested recently. In this study, we used two-stream inflated 3D ConvNet (I3D) that was pre-trained on the Kinetics and ImageNet dataset [16] to extract important features. To achieve more accurate results, we suggest cutting out the video sequence on N short parts with the captured person. Then, from each part, K bounding boxes with a size 224 × 224 pixels from K frames are selected to create a clip, which is fed on one of the streams based on the I3D network. The final score is determined by voting according to the results of the classification of each stream. The proposed architecture for walking style recognition is depicted in Fig. 19.3. The number of clips is selected depending on the duration of the video sequence and computing resources. In our opinion, such solution makes it possible to accurately capture the walking style of a person, which may change rapidly in the wild, as well as the walking trajectory. The I3D architecture has some special features such as filters and pooling kernels inflated to 3D structures, bootstrapping 3D filters from 2D filters, pacing receptive field growth in space, time and network depth using the Inflated Inception-V3 architecture, and two 3D streams, which simultaneously process images and optical flow. Classifying the outputs of I3D networks can be done in different way of voting, using
19 Vision-Based Walking Style Recognition in the Wild
221
Fig. 19.3 The proposed architecture for walking style recognition
average function, max function or weighted function, among others. Average function implies calculating the mean responses of each element in the I3D outputs and obtaining a final score with the same dimension. The basic assumption of averaging is to utilize the local predictions of all clips for walking style recognition and use their mean responses as the global prediction. The I3D outputs are used to select the most discriminative clip for every walking style category as a classification of the whole video sequence. Voting based on a weighted linear function with pre-determined coefficients is not reasonable for this task. We use a simple strategy to get a final score of the walking style in a given situation by voting.
19.5 Experimental Results Our dataset includes video sequences that present six gait styles such as Driver, Influencer, Supporter, Corrector, Multitasker and Thinker. Due to the fact that the video sequences were captured in the wild and contain dynamic scenes with different camera’s viewing angles, lighting, background, as well as several people in some clips, the task of gait recognition becomes much more complicated. In addition, the presented categories have a similar structure. For example, Driver and Influencer, Supporter and Corrector are composed of movement types that are often highly correlated. Our WalkingStyles dataset includes 70–90 video sequences for each category captured from YouTube, as well as a description of all RNN architectures used in the experiments [17]. The WalkingStyles dataset consists of two parts: open source videos from Internet, mostly outdoor shoots with various lighting conditions, and indoor self-captured video sequences with walking people. The long videos were split into several short parts of no more than 5 s, which is sufficient to extract gait features using 3D-CNN. The entire dataset was divided into the training set and the test set in a ratio of 70 to 30. Typical frames from some video sequences of the
222
M. N. Favorskaya and V. V. Buryachenko
WalkingStyles dataset and the corresponding gait categories are presented in Table 19.1. Several RNN architectures have been tested during experiments. Architectures called RNN_v1, RNN_v2, RNN_v3, and I3D differ in the total number of layers (from 14 to 20), as well as the type and number of delay layers: gated recurrent units (GRUs) and long short-term memory (LSTM) cells. To eliminate the influence of noise and overfitting during the training, dropout layers were added after last or each GRU layer with a factor of 0.3. All RNN architectures were pre-trained on the ImageNet dataset for action classification/action recognition, and then re-trained on our own WalkingStyles dataset. The details of RNN architectures and experimental results are shown in Table 19.2. Experiments have shown that the results of walking styles recognition in some cases confuse closely related categories, but using the Top-2 prediction metric can improve the recognition accuracy by up to 94.7%. The best values of accuracy were shown by the I3D network trained for 50 epochs (Fig. 19.4). A further increase in the number of epochs led to overfitting of the model. Table 19.1 Description of some video sequences from the WalkingStyles dataset Video name
Frame
Main style
Second style Duration, min
[HOT] Top Models’ Different Walking Styles on Runway.mp4
Driver
Influencer
10 Tips To ALWAYS Walk with Confidence.mp4
Influencer
17 Different styles of walking.mp4
Corrector
The Perfect Walk.mp4
Influencer
05:17
Sherlock Holmes Scenes.mp4
Thinker
06:21
60 Model Poses In 1 Minute.mp4
Multitasker Supporter
01:01
03:28
07:33
Supporter
07:20
19 Vision-Based Walking Style Recognition in the Wild
223
Table 19.2 RNN architectures and obtained accuracy results Model
Types and number Dropout layers of delay layers
Training epochs Top-1 result Top-2 result
RNN_v1 2 GRU
1, after last layer
20
59.75
71.40
RNN_v2 2 GRU, 1 LSTM
1, after last layer
20
62.26
69.15
RNN_v3 5 GRU
5, after each layer 50
71.17
94.61
I3D
1, after last layer
82.39
91.85
4 GRU-RNN
80
The best results are in bold
Fig. 19.4 Comparison of training accuracy and training loss for different RNNs architectures using the WalkingStyles dataset: a training accuracy, b training loss
Thus, a pre-trained deep RNN for action recognition with a combination of an ensemble classifier based on the I3D recurrent network makes it possible to recognize a person’s walking style with high accuracy in a complex environment. An identified gait style can be used for personal biometric recognition as additional information or for increased safety in crowds in the wild.
19.6 Conclusions Currently, soft biometric methods are less powerful than traditional biometric methods, but their role in urban surveillance is growing. Such statement provokes the further development of gait recognition methods using multimodal soft biometric features. The walking style is among them. We propose a method for walking style recognition based on 2D CNN as a human detector and multiple 3D CNNs for the
224
M. N. Favorskaya and V. V. Buryachenko
classification of bounded images with a captured person followed by a final assessment. During training, we faced the problem of finding public datasets of walking persons in the wild. This problem has been partially addressed using pre-trained CNNs and our own dataset. The I3D network shows the best results of walking style recognition in the wild.
References 1. Adolph, D., Wolfgang Tschacher, W., Niemeyer, H., Michalak. J.: Gait patterns and mood in everyday life: A comparison between depressed patients and non-depressed controls. Cogn. Therapy Res. 45, 1128–1140 (2021) 2. Shiguematsu, Y.M., Brandao, M., Takanishi, A.: Effects of walking style and symmetry on the performance of localization algorithms for a biped humanoid robot. In: 2019 IEEE/SICE International Symposium on System Integration (SII), pp. 307–312. IEEE, Paris, France (2019) 3. Xie, Y., Zheng, J., Hou, X., Xi, Y., Tian, F.: Dynamic dual-peak network: A real-time human detection network in crowded scenes. J. Vis. Commun. Image R. 79, 103195.1–103195.10 (2021) 4. Fu, K., Zhang, T., Zhang, Y., Sun, X.: OSCD: A one-shot conditional object detection framework. Neurocomputing 425, 243–255 (2021) 5. Yang, Z., Soltanian-Zadeh, S., Farsiu, S.: BiconNet: An edge-preserved connectivity-based approach for salient object detection. Pattern Recognit. 121, 108231.1–108231.11 (2022) 6. Zheng, Q., Li, Y., Zheng, L., Shen, Q.: Progressively real-time video salient object detection via cascaded fully convolutional networks with motion attention. Neurocomputing 467, 465–475 (2022) 7. Danelljan, M, Khan, F, Felsberg, M, Weijer, J.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of IEEE International Conference on Computer Vision Workshop (ICCV), pp. 4310–4318. IEEE, Santiago, Chile (2015) 8. Li, P.X., Wang, D., Wang, L.J., Lu, H.C.: Deep visual tracking: review and experimental comparison. Pattern Recognit. 76, 323–338 (2018) 9. Chi, Z., Li, H., Lu, H., Yang, M.: Dual deep network for visual tracking. IEEE Trans. Image Process. 26(4), 2005–2015 (2017) 10. Gao, J., Zhang, T., Yang, X., Xu, C.: Deep relative tracking. IEEE Trans. Image Process. 26(4), 1845–1858 (2017) 11. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR arXiv preprint, arXiv:2004.10934v1 (2020) 12. Geiger, A., Lenz, P. Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: IEEE 2012 Conference on Computer Vision and Pattern Recognition (CVPR 2012), pp. 3354–3361. IEEE, Providence, Rhode Island (2012) 13. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: A benchmark for multiobject tracking. CoRR arXiv preprint, arXiv: 1603.00831 (2016) 14. Alotaibi, M., Mahmood, A.: Improved gait recognition based on specialized deep convolutional neural network. Comput. Vis. Image Underst. 164, 103–110 (2017) 15. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: The 15th IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE, Santiago, Chile (2015)
19 Vision-Based Walking Style Recognition in the Wild
225
16. Carreira, J., Zisserman A.: Quo Vadis, action recognition? A new model and the Kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE, Honolulu, HI, USA (2017) 17. WalkingStyles dataset v.211812, https://drive.google.com/drive/folders/1KA392PUfNTnDyvL6kiDSeZ-Wm6elKhi?usp=sharing. Last accessed 26 Dec 2021
Chapter 20
Specifics of Matrix Masking in Digital Radar Images Transmitted Through Radar Channel Vadim Nenashev , Anton Sentsov , and Alexander Sergeev
Abstract This paper presents research results, concerning systems of radar image masking in real-time mode when transmitting fast-changing data. A promising masking approach of visual data protection is employed here. This method entails utilization of a certain class of orthogonal two-level matrices, including the Hadamard matrices. The method is intended to convert radar images to a noise-like representation, while ensuring proper smoothing of pulse interference or deliberate distortions, arisen in the transmission channel, on a demasked image. This research is based on the consideration of special structural features of the images while masking and demasking, as well on the properties of the masking matrices. Evaluations of the demasked images are given, with pulse interference or deliberate distortions, introduced in the transmission channel. The evaluations are based on the values of some widely used metrics.
20.1 Introduction Radiofrequency-based location has been utilized in aerospace industry for decades. The actual maturity level of this technology enables to implement not only radiolocation survey of the area of interest, but to employ the radar stations (RS) for transmission of radar images (RI), thereby significantly expanding the applicability of this technology. Particularly, radar units with exceptionally compelling characteristics currently can be mounted on small-sized unmanned aerial vehicles, acting together as a spatially distributed system [1, 2]. The control and comprehensive data processing hub of the spatially distributed system acquires the RI from multiple locations, what improves the accuracy, fidelity and descriptiveness of this data during cooperative processing and further enhances the completion of various locationrelated tasks [3–8]. One of the most important tasks related to digital radar image V. Nenashev (B) · A. Sentsov · A. Sergeev Saint-Petersburg State University of Aerospace Instrumentation, 67 Bolshaya Morskaya street, Saint-Petersburg 190000, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_20
227
228
V. Nenashev et al.
translation is to ensure adequate data protection, as well to search for any instances of data distortion, faking and spoofing, taking into account the hardware features of the radar station. Data transmitted through public radio channels readily available to everyone and prone to unauthorized access and spoofing. Hence, when establishing public communication channels, significant effort is directed to enable data protection. In the context of RIs, this problem has some specific features. The protection of reflected radio pulses on the background of natural and artificial noise requires an analysis of the structural specificity of the RIs, as well of the algorithms employed for data protection and transfer. Evidently, the RI protection via image destruction can be achieved with cryptographic methods. But for encryption methods in real-time systems, only short keys are applicable to complete the transformation within a reasonably short timeframe. Thereby, with regard to the informational redundancy of the radar images, which is common for this kind of data, cryptographic methods require provisional decomposition of low-frequency components in the RIs. For example, radar images containing radar-contrast objects such as bridges, shores, rivers, and lakes are characterized by high redundancy, which incurs complications in the process of visual information destruction. Another more efficient approach to solve the aforementioned problem involves utilization of algorithms based on strip-transformation of images [9, 10] and masking methods [11, 12], where a unique orthogonal matrix acts as the key of the trans-formation. The purpose of this work is to study the real-time protective matrix transformation of RI in a radar channel based on a comprehensive consideration of characteristic features of the image structure and the requirements for matrices. The rest of the paper is follows. Section 20.2 describes the basics of the radar image masking and the respective matrices. The proposed method of masking-transmission-demasking is explained in Sect. 20.3. Section 20.4 concludes the paper.
20.2 Radar Image Masking and Respective Matrices The term «masking» is used in photography, chemistry, healthcare, and other domains of human activity. Though, regarding the radar images transmitted via public communication channels, a more precise definition of masking should be formulated. Definition Image masking is a transformation bringing the images to noise-like form, which is characterized by smoothing of pulse disturbances or deliberate distortions on an unmasked image. Evidently, the demasking transformation is symmetric to the masking transformation. Masking methods, as well as strip transformation, are based on bilateral matrix multiplication, which can be essentially reduced to computations of sums of paired
20 Specifics of Matrix Masking in Digital Radar Images …
229
products, efficiently implemented in the processing units for digital signal processing. However, their actual implementations show some differences. The strip transformation [9, 10] as a way for noise-resistant image encoding in grayscale consists in «slicing» of an image P into strips, which are subsequently shuffled and subjected to bilateral multiplication with orthogonal squared matrices A and AT of the form: Z = AT PA. Any pulse interference in the data transmission channel influencing the local image area (when reconstructing this image) is represented as «smeared» across the image fragments. An example of such effect is given in Fig. 20.1, which is borrowed from the paper [13]. The noise amplitude depends on the value of the A matrix maximum [10]. Consequently, the problem of search for optimal matrices for the strip transformation can be reduced to the search of orthogonal matrices with the least maximum. Their neighbor matrices also can be used here. Particularly, Hadamard matrices Hn meet this property [14] being of orders n = 4t, where t is a natural number, which satisfies a condition H n T Hn = nI, where I = diag(1,1,…1). The elements of Hn matrices can have values 1 and –1. As per the performed analysis, the bilateral multiplication with the orthogonal Hadamard matrices induces not only the «smearing» of the noise introduced into the channel, but also distorts the source image fed into the channel,
Fig. 20.1 The effect of noise «smearing» on the reconstructed image: a source image, b fragmented image, c noise introduction in channel, d reconstructed image
230
V. Nenashev et al.
thereby hiding this image (see Fig. 20.1). To improve the transformation quality and achieve better characteristics of the strip transformation, a Kronecker multiplication is often preferred here to a usual matrix multiplication. The most promising protection method for visual data with short lifetime is based on the utilization of a relatively recently discovered class of matrices, specifically, quasi-orthogonal matrices. Because of insufficient knowledge about the properties of these matrices and of the masking methods based on them, their practical applicability is extremely limited. Although the masking process is implemented through matrix operations, it differs from the strip transformation, because, first, it relies on utilization of normalized quasi-orthogonal matrices, which satisfy the equation Mn T Mn = ω(n)I, where ω(n) is a weight function [1, 16]. These matrices generalize the Hadamard matrices and have a specific element-related constraint in the form |mij | ≤ 1. Second, the source image is not «sliced» into strips with shuffling, but when possible is used for the multiplication with the matrix of the n order, commensurable with the order of the image in question instead. This ensures good «smoothing» of the brightness relief of the image destructing it to the noise-like form. Third, such masking can be applied to color images by means of transformation of their components such as RGB or YCrCb.
20.3 Masking-Transmission-Demasking The masking routine of the radar images transmitted through a public channel (where exist some probability of image distortion or unauthorized data access) is shown in Fig. 20.2. Here, M is a squared quasi-orthogonal matrix, P is a source (reconstructed) radar image, Y is a masked image. A potential noise range in the channel is given in the bottom part of Fig. 20.2 As mentioned previously, the essence of the masking method consists in transformation of the radar images into a noise-like form along with improvement of noise resistance in data transmission systems with simultaneous smoothing of amplitude
Fig. 20.2 Masking routine applied to the radar images transmitted through a public channel
20 Specifics of Matrix Masking in Digital Radar Images …
231
of possible pulse interference. A more detailed representation of radar image transmission process via radio channel involving masking and demasking can be divided into the following stages: • Establishing a masking quasi-orthogonal matrix, whose order is commensurable (fold) with the RI size. • Conditional check if the image size is equal to the matrix order. Shouldn’t this conditional be met, one of the following routines is performed: (a) radar image truncation via pruning of some rows and columns, (b) addition of mean squared deviation values of the image being masked to the RI per columns and per rows. • Bilateral multiplication of the RI with the masking matrix is performed as shown in Fig. 20.2. • Masked image is transmitted through radio channel. • The RI is received and demasked as shown in Fig. 20.2. To demonstrate a matrix masking method in action, some test radar images obtained in the modes of discovery of dangerous moisture targets and the Earth survey are represented in Fig. 20.3a, b, respectively. The research was performed using an experimental model of a «MFRC» radar station representing a three-dimensional robotic (unmanned) station, which enables electronic scanning in horizontal plane, developed within the research initiative «MARS» in the JSC CSPA «LENINETZ» [16]. Technical parameters of the experimental model of a «MFRC» radar station are provided in Table 20.1 [16]. Hadamard matrix H768 was chosen for masking purposes. It is represented as a portrait in Fig. 20.4. This matrix was obtained via generalization [17] of a Kronecker product of two matrices with indirect inheritance of numerical and structural elements of quasi-orthogonal Mersenne matrices [18]. As we can see, the proposed matrix is commensurable with the radar image. Figure 20.5a, b present the results of masking for the radar images given in
Fig. 20.3 Radar images with: a discovered dangerous moisture targets, b the Earth survey
232
V. Nenashev et al.
Table 20.1 Principal technical parameters of the «MFRC» radar station Parameter name
Value
Operational frequency range, GHz
9.3–9.4
Pulsed radiation power, W
200
Polarization (depending on the operational mode)
Linear horizontal Linear vertical
Width of radiation pattern In horizontal plane, °
3.1
In vertical plane, °
3.3
Maximum detection range, at least Object with RCS = 0.02 m2
6.0
Object with RCS = 100 m2
21.5
Rotation range of the antenna curtain in azimuth, °
± 60
Rotation range of the antenna curtain in roll, °
– 30 + 120
Gain by zero ray deflection, dB
– 32
Speed of mechanical rotation of antenna curtain along the azimuth (roll) axis, 60 °/s Antenna curtain dimensions (without width), mm
720 × 670
Weight, kg (minimum)
55
Fig. 20.4 Portrait of the masking matrix H768
Fig. 20.3a, b, respectively. To assess the data reconstruction method and quality measure of the demasked radar images relative to source images after transmission through a public channel, we employed common quality metrics. These metrics allow estimate numerically the correspondence of the demasked radar image to the source one, specifically: PSNR (peak signal-to-noise ratio), MSE (mean squared error), SSIM (structural similarity index measure), and MSSIM (structural similarity index measure with subsequent averaging for each of RGB-channels). Table 20.2 summarizes the values of the metrics used for quality assessment of the reconstructed images together with the noise introduced in the communication channel.
20 Specifics of Matrix Masking in Digital Radar Images …
233
Fig. 20.5 Masked radar images with: a discovered dangerous moisture targets, b the Earth survey
Table 20.2 Quality assessment metrics for the demasked images Source images
Values of the metrics MSSIM
SSIM
Figure 20.3a
PSNR 63.2070
MSE 0.0311
1.0000
1.0000
Figure 20.6a
35.0094
20.5181
0.9464
0.8848
Figure 20.6b
25.9343
165.8244
0.7979
0.6613
Figure 20.3b
54.8841
0.2112
1.0000
1.0000
Figure 20.6c
33.3340
30.1775
0.9507
0.9556
Figure 20.6d
22.5367
362.5847
0.7704
0.7609
The research showed that significant distortion of image is caused by the data frame losses occurring during transmission of masked radar image through UHFchannels, which are integral for radar station functioning. Further we give the results of demasking of the received radar image (Fig. 20.3), being common in the case of partial loss of the data packets in the transmission channel or by deliberate distortion of data. Only some parts of the masked image were subjected to distortion. Figure 20.6 presents some samples of the demasked images with distortions introduced in the public channel during radar signal transmission. As follows from the presented metric values, the quality level of the demasked radar images even with channel-induced disturbances changes insignificantly. The analysis of the visualizations of the demasked images (with the source images taken as reference) shows that no critical distortions or alterations arise in them. It should be noted that when introducing artificial noise from interference source into the masked RIs, the source image (echo-reflections from the distributed targets) is also influenced by background noise during reconstruction some distortions arise, being «smeared» all over the image. This feature is especially telltale for the masked images, in which the brightness range of the components is presented in floating-point numbers.
234
V. Nenashev et al. Source RI
Masked RI + NOISE
Demasked (Reconstructed) RI
Local map of SSIM metric
a)
b)
c)
d) Fig. 20.6 Reconstructed masked radar images with distortions introduced during transmission
The evaluation of results presented in this paper made it possible to improve the functional software of an embedded system for matching of high-resolution images to digital area map (DAM), taking into account the rate of radar image arrival [19]. The developed algorithms intended for matching of high-resolution images with DAM to display actual data can be implemented in onboard systems of aerospace information processing units, which can receive and accept data incoming from various sources. Results of the experiments presented in this paper can be adapted for processing of radar images from a broad class of multi-range radar stations, with subsequent matching of these images to DAM. This is important in informational support of navigational problem solving, aircraft navigation, interaction with radar beacons and
20 Specifics of Matrix Masking in Digital Radar Images …
235
responder beacons, as well to ensure flight safety in close proximity to dangerous moisture targets. The experimental results can be used in formulation of recommendations for software improvement intended for processing of matrix-like big data structures in real-time mode. The results of the research also can be customized for development of algorithms, applicable with matrix-like framework for digital transformations of visual data (image compression, pattern classification) [20, 21], as well for encoding of established radar images recorded during field tests.
20.4 Conclusions We performed a research of radar image masking systems in real-time mode taking into account the specific structural features of images. We employed a masking method with visual data. This method is based on a class of quasi-orthogonal twolayer matrices, being generalized from Hadamard matrices. The most significant input into distortion of the demasked radar images can be caused by data frame losses during transmission of masked images in radar UHF channels, integral for radar station functioning, hence not only data exchange, but also operational mode control are performed in these channels. Field experiments intended to discover dangerous moisture targets on the radar images and to do the Earth survey were performed on an experimental radar station unit developed under technology of cascading active slotted-waveguide phased antenna arrays. The presented results reveal minor artifacts in the demasked radar images by introduction of deliberate distortions in radar channel. This is due to specific features of the masking method and requires additional research to reduce noise influence in the process of masking/demasking of radar images. Acknowledgements The reported study was funded by RFBR project number № 19-29-06029.
References 1. Kapranova, E.A., Nenashev, V.A., Sergeev, A.M., Burylev, D.A., Nenashev, S.A.: Distributed matrix methods of compression, masking and noise-resistant image encoding in a high-speed network of information exchange, information processing and aggregation. In: Proceedings of the SPIE Future Sensing Technologies, pp. 111970T-1–111970T-7. SPIE, Tokyo, Japan (2019) 2. Shepeta A.P., Nenashev V.A.: Accurate characteristics of coordinates determining of objects in a two-position system of small-size on-board radar. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 2, 31–36 (in Russian) (2020) 3. Christodoulou, C., Blaunstein, N., Sergeev, M.: Introduction to Radio Engineering. CRC Press, Taylor & Francis Group, Boca Raton (2016) 4. Klemm, R. Nickel, U., Gierull, C., Lombardo, P., Griffiths, H., Koch, W (eds.): Novel Radar Techniques and Applications: Real Aperture Array Radar, Imaging Radar, and Passive And Multistatic Radar, vol. 1. SciTech Publishing, London (2017)
236
V. Nenashev et al.
5. Nenashev, V.A., Khanykov, I.G.: Formation of fused images of the land surface from radar and optical images in spatially distributed on-board operational monitoring systems. J. Imaging 7(12), 251.1–251.20 (2021) 6. Toro, G.F.; Tsourdos, A. (eds.) UAV Sensors for Environmental Monitoring. MDPI AG: Belgrade, Serbia (2018) 7. Mokhtari, A., Ahmadi, A., Daccache, A., Drechsler, K.: Actual evapotranspiration from UAV images:aA multi-sensor data fusion approach. Remote Sens. 13, 2315.1–2315.22 (2021) 8. Klemm, R. (ed.): Novel Radar Techniques and Applications: Waveform Diversity and Cognitive Radar, and Target Tracking and Data Fusion, vol. 2. Scitech Publishing, London (2017) 9. Mironovsky, L.A., Slaev, V.A.: Strip-Method for Image and Signal Transformation. De Gruyter, Berlin (2011) 10. Mironovsky, L.A., Slaev, V.A.: Strip transformation of images with given invariants. Meas. Tech. 3, 19–25 (2019) 11. Vostrikov, A., Sergeev, M.: Expansion of the quasi-orthogonal basis to mask images. In: Damiani, E., Howlett, R.J., Jain, L.C., Gallo, L., De Pietro, G. (eds.) Intelligent Interactive Multimedia Systems and Services, SIST, vol. 40, pp. 161–168. Springer, Cham (2015) 12. Vostrikov, A., Sergeev, M., Balonin, N., Sergeev, A.: Use of symmrtric Hadamard and Mersenne matrices in digital image processing. Procedia Computer Sci. 126, 1054–1061 (2018) 13. Rassokhina, A.A.: Investigation of strip-method for signal and image processing. Syst. Inform. 1, 97–106 (2012) 14. Seberry, J., Yamada, M.: Hadamard Matrices: Constructions Using Number Theory and Linear Algebra. Wiley (2020) 15. Balonin, N.A., Sergeev, M.B.: Helping Hadamard Conjecture to Become a theorem. Part 2. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 1, 2–10 (2019) 16. Sentsov A.A., Ivanov S.A., Nenashev S.A., Turnetskaya E.L.: Classification and recognition of objects on radar portraits formed by the equipment of mobile small-size radar systems. In: 2020 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), pp. 1–4. IEEE, St. Petersburg, Russia (2020) 17. Balonin, N., Vostrikov, A., Sergeev, M.: Mersenne-Walsh matrices for image processing. In: Damiani, E., Howlett, R., Jain, L., Gallo, L., De Pietro, G. (eds.) Intelligent Interactive Multimedia Systems and Services, SIST, vol. 40, pp. 141–147. Springer, Cham (2015) 18. Balonin, N.A., Sergeev, M.B., Petoukhov, S.V.: Development of matrix methods for genetic analysis and noise-immune coding. In: Hu Z., Petoukhov S., He M. (eds) Advances in Artificial Systems for Medicine and Education III. AIMEE 2019. Advances in Intelligent Systems and Computing, vol. 1126, pp. 33–42. Springer, Cham (2020) 19. Nenashev V.A., Sentsov A.A., Shepeta A.P: The problem of determination of coordinates of unmanned aerial vehicles using a two-position system ground radar. In: 2018 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), pp. 1–4. IEEE, St. Petersburg, Russia (2018) 20. Polyakov, V.B., Ignatova, N.A., Sentsov, A.A.: Multi-criteria selection of the radar data compression method. In: 2021 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), pp. 1–4. IEEE, St. Petersburg, Russia (2021) 21. Guterman, A., Herrero, A., Thome, N.: New matrix partial order based on spectrally orthogonal matrix decomposition. Linear Multilinear Algebra 64(3), 362–374 (2015)
Chapter 21
Matrix Mining for Digital Transformation Nikolaj Balonin , Yury Balonin , Anton Vostrikov , Alexander Sergeev , and Mikhail Sergeev
Abstract The process of search for Hadamard matrices is interpreted as “mining” which includes not only specifying the initial conditions and choosing the implementation method but also filtering in order to “enrich” the set of the generated sequences. We discuss the main difficulties in matrix mining and ways to overcome them. We propose to facilitate the search for matrices by means of preliminarily freezing their structure, using the example of building Hadamard matrices based on Balonin–Seberry construction. An efficient way to speed up the search can be preliminary filtering of the generated sequences using Fourier spectrum. This allows you to reject sequences with apparent spectrum peaks. The idea is to use symmetry properties of the target matrices. We introduce the definitions of Odin and shadow matrices with symmetries related to Hadamard matrices accompanying the orders n = 4t − 1, 4t − 3 and n = 4t − 2, 4t − 4 respectively, whose values are prime numbers or prime number powers.
21.1 Introduction Long before the dawn of the digital era, the word “mining” was associated with the extraction of mineral resources and huge efforts involved in this process. Today this word is closely related to creating new digital structures (blockchain blocks) in order to provide the functioning of cryptocurrency platforms. This process is also arduous, requiring an enormous amount of computation and vast energy expenditure. When today we consider the search for Hadamard matrices H [1] with elements 1 and −1 of order n, satisfying the condition HT H = nI, this problem can involve similar difficulties and computation cost. Here, I is an identity matrix, and n = 4t, where t is a natural number. Each new matrix, taking into account its structure and order, is a N. Balonin · Y. Balonin · A. Vostrikov (B) · A. Sergeev · M. Sergeev Saint-Petersburg State University of Aerospace Instrumentation, 67 Bolshaya Morskaya street, Saint-Petersburg 190000, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_21
237
238
N. Balonin et al.
result of developing a special algorithm, long computation and orthogonality check, requiring as much time and effort as a bitcoin or coal from a mine. Computation of another matrix often takes as long as several months. It is critical to reduce this time. Of course, we can focus on using supercomputers. However, today they are not easily available yet, and the computer performance is just one instrumental factor which is important but not determinant. What is really crucial is the used computation method, its modifications and ways to speed up the routine calculations. The goal of this work is to show the ways of improving the efficiency of mining for Hadamard matrices of high orders and symmetric structures, as well as for other similar matrices with a small number of element values [2, 3].
21.2 Hadamard Matrices Hadamard matrices, being orthogonal (quasi-orthogonal), are widely used in data transmission and storage systems where symmetric orthogonal transformations are common. The problems in which Hadamard or similar matrices can be useful include signal and image processing [4–6], noise-immune image coding [7], getting data protection codes from matrix rows [8], and many others. The constantly growing image sizes (4K, 8K, and more) and higher demands for the length of coding sequences force us to search for matrices of higher and higher orders, while the application fields listed above require that the matrices are structurally limited. Symmetric structures are a very common case. The problems of Hadamard matrix mining, with the requirements mentioned above, are the following. First, there is no universal method capable of searching, with the same efficiency, for Hadamard matrices of all possible structures in all orders 4t where they exist. Methods which have already become classical are those by Silvester, Paley, Williamson, Scarpi, as well as numerous modifications of these methods. However, their capabilities are not enough to cover all the orders of Hadamard matrices, as there are many gaps in the orders peculiar to these methods. Second, the wide circles of searchers still lack a well-established practice of using powerful computing resources for matrix mining. Supercomputers are still not freely available for mass usage. This situation makes the efficiency of new algorithms, as results of using new approaches and programming techniques, especially important. Third, as there is an infinite number of Hadamard matrices, their mining becomes more and more difficult in higher orders. Each newly found Hadamard matrix of order higher than 428 in a known problematic set of orders becomes a matter of discussion in the scientific community involved in this research [9]. Often a new Hadamard matrix is a result of many months spent adjusting a special algorithm and implementing it in high-performance systems. A newly found Hadamard matrix appears much rarer than a new cryptocurrency unit.
21 Matrix Mining for Digital Transformation
239
21.3 The Main Approaches to Improve the Efficiency of Matrix Mining With our long-term experience in Hadamard matrix search and usage, we can talk about the ways to improve the efficiency of their mining. First, we should always try to find alternatives to our current approaches and techniques. In this sense, pioneering methods are based on optimization procedures not commonly used in the computation field under consideration. Hadamard matrices have always been considered matrices which are supposed to be found via generating sequences of elements 1 and –1, and subsequent permutations. Thereby, the entire field of search for them was ranked as an advanced branch of combinatorics. However, Hadamard himself was the first to notice that these matrices are maximum determinant ones. The optimization of a matrix determinant cannot be reduced to permutation algorithms, as it changes the absolute values of all matrix elements from 0 to 1. There are, though very little known, algorithms for increasing the determinant [10–12], including iterative ones [13]. Of course, they need a starting first approximation, so that the optimizer does not stop in a local maximum point, which is a notorious weak point of iterative procedures. Nevertheless, this is a serious proposal for rare matrix mining. Combinatorial methods are unable to correct even a small defect of the first approximation. Optimization-based mining is a totally different affair. We note it here as an apparent but still not well studied tool for the development of our subject. As a matter of fact, the matrix structure itself can be the first approximation necessary for the optimization. As a prompt, we can force the optimizer to choose the structure that it will ruin during the optimization, but we can bring the optimizer back to it. This is the reason why optimization can be a very powerful search tool. Second, the development of efficient ways of search for Hadamard matrices often assumes that we rigidly fix certain restrictions about their structures (cyclic, bicyclic, tricyclic, symmetric, etc.) or orders. One would think that such a fixation can only complicate the problem, but in fact they can make the search much more efficient due to its simplification or lower computational cost. Essentially, the results of our quest have demonstrated that novel approaches in algorithm development can provide a vast improvement in matrix mining.
21.4 Preliminary Fixation of Matrix Structure Let us consider how the above-mentioned structure fixation can make Hadamard matrix mining more efficient. In high orders, Hadamard matrices H can be a variety of Williamson’s four-blocked array [14, 15] of the form
240
N. Balonin et al.
⎛
A B ⎜C D H=⎜ ⎝ B −A D −C
C −A −D B
⎞ D −B ⎟ ⎟ C ⎠ −A
in which the blocks A, B, C and D are called Williamson matrices. These blocks are usually cyclic and always symmetric. This fact used to be seen as a key to search simplification. Let us explain briefly why it is incorrect. What symmetry can affect is rather the memory size of the computer used for the search. In other words, it is important only in technical aspects. The first successful programs used for Hadamard matrix search (and the matrices themselves) were stored in the computer memory piece by piece. The coercive symmetry of Williamson matrices can have a negative effect on the search time if there are many more constructions which are unsymmetric by blocks. Improving the ability to make an algorithm estimated in target matrix sizes, the search programs worsened another important factor of the mining which was computer time. As a result of the proposal formulated in [16], the matrix H can be built as a symmetric Balonin-Seberry construction based on only three blocks A, B (C = B) and D, where only the block A is symmetric, while the other ones are not. This does not decrease the number of possible solutions, as there are proves that all Hadamard matrices are either symmetric or skew-symmetric as a whole, not by blocks. This proposal was very powerful, and contributed to obtaining many new matrices. Balonin-Seberry construction [17, 18] and similar to it skew-symmetric arrays guarantee that you obtain an Hadamard matrix, whatever the order is. Williamson array cannot guarantee that. When the block size is 35, there are no solutions at all. If you continue the search, you will meet even more negative consequences of being enforced to choose a structure uncharacteristic of Hadamard matrices. The research originally aimed at putting things in order and providing a summary table for Williamson matrices had an unexpected result: instead of a table, we have a sieve with order gaps of growing intensity. Extra large catalogues of search sequences discussed in [19] contain potential first rows of cyclic blocks (circulants) A, B and D of an Hadamard matrix of BaloninSeberry construction. These sequences are formed and accumulated as long as various algorithms for their generation keep functioning. The general algorithm for operations with catalogues can be split into three stages: • generation of sequences necessary to obtain circulants A, B and D; • filtering of the sequences; • search for compatibility of the sequences among possible variants of block implementation in order to form a Balonin-Seberry matrix of the order we need. As the matrix order grows, the search speed plummets due to the increasing volume of the catalogue. It cannot hold together all the three pairs of sequences necessary for A, B and D. It makes sense to thin out the catalogue, similarly to what is called “ore dressing” in real mining.
21 Matrix Mining for Digital Transformation
241
In the case of the simplest implementation of the generator [19], the number of randomly generated combinations of 1 and –1 grows so quickly that the computer needs several weeks to get a good combination. Often all these extra large data are lost because the probability to meet all the three suitable sequences is tiny. It is enough to lack just one, and then the huge comparison table will be cross-checked in vain.
21.5 Ways to Enrich Initial Sequences Useful filtering includes signs of compatibility between sequences for building Hadamard matrix blocks according to Williamson or similar blocks. When in a matrix of Propus construction one sequence out of three is compared with the other two, it cannot be entirely random. The compatibility filter shortens the search field, just like an ore-dressing plant separates commercially valuable minerals from their ores. Thus, not all sequences can be sources of circulant blocks suitable for building an Hadamard matrix of Balonin-Seberry construction. It turns out that the filtering can be performed by a simple and well-known procedure, namely discrete Fourier transformation (DFT) or its fast version DFFT [1, 20]. An unsuitable sequence has a spectrum peak and, hence, a harmonic component which is in conflict with the potential solution. We can specify a threshold beyond which a harmonic makes the sequence unsuitable. A very simple threshold filter can remove up to 99% bad sequences by their spectrum peaks. If you, for example, synthesize a matrix out of 100 paired sequences, instead of their 100 * 100 = 10 000 cross-comparisons you need to perform just one. A nice feature of the filtering is that the number of cross-comparisons grows quadratically, while the number of filterings (or compatibility filters) grows linearly. The “ore-dressing” of huge search catalogues is fully warranted. The search problem solvability for high matrix orders directly depends on whether the generated sequences have been favourably accumulated. You can speed up the search for such sequences without changing the algorithm in general but only replacing a simple generator by a more sophisticated one with a filter. Of course, the Fourier matrix is not the only orthogonal matrix used to build up filters. Paradoxically enough, in order to find new Hadamard matrices, you can use Hadamard matrices already found by filtering procedures arranged similarly to DFT. The matrix order can be either the same or truncated down to the value you need. Our main efforts should be focused on solving the problem of pointless sequences flooding the generator outputs and making the catalogues too large. Low quality of their filling leads to wasting the mining resources to processing “poor ore”: spectrally unstable and incompatible sequences. The new transformations of the algorithm are of prime importance. They comprise the vast experience accumulated by our domestic school of rare matrix mining. In a certain way, well-arranged mining characterizes the search teams. The naive stage
242
N. Balonin et al.
Fig. 21.1 Two Hadamard matrices with bases which are Mersenne matrices
of “washing baskets” was peculiar to the search teams in the beginning of their development. As an example, we can consider the search for Mersenne matrices M [1, 2]. This is a variety of Hadamard matrices obtained by removing their edge (Fig. 21.1). In this case, the mining can be focused on the search for a suitable sequence of the length 11 and form [−1, 1, −1, 1, 1, 1, −1, −1, −1, 1, −1], which gives you two orthogonal matrices at once: an Hadamard matrix of order 12 and a Mersenne matrix of order 11. The matrix M11 , being the core of the matrix H12 , is orthogonal when the value of its negative element reduces until the orthogonality is restored. If there is a Galois field GF(11), the search is elementary, being reduced to finding quadratic residues or to calculating the exponential function which gives addresses (places in the matrix) to the negative elements. But there can be no Galois field, then this example easily turns into a test in which the generated sequences of the length 11 are put into the catalogue. The given model example is good for the first mining experiments, and we describe it just because is has a simple implementation. A single sequence does not involve cross-comparisons. However, this sequence is used in the bicyclic construction of a Mersenne matrix with a binary edge (see Fig. 21.1). In this case, the catalogue is composed of two halves of the matrix, and its thinning will have a more profound effect on the mining time. However, there is no reason why we cannot conventionally split the first starting sequence into halves, choosing for the catalogue its even and odd elements, except the starting one. As you can see, both statements of the problem lead to the use of a filter.
21.6 Examples of Sequence Enrichment In [20], the authors discuss a fairly simple but efficient procedure of enriching sequences by cutting off the peaks in Fourier spectrum. In Fig. 21.2, you can see a
21 Matrix Mining for Digital Transformation
243
Fig. 21.2 Cut-off by Fourier spectrum peaks
situation common for 99% of generated sequences. The numbers of the sequences are given horizontally. The presence of a conspicuous harmonic revealed by amplitude spectrum peaks is in a conflict with the need to build an orthogonal array out of this sequence: a bicyclic Euler matrix or a bicyclic block of an Hadamard matrix with a double edge, such as in [20]. The level of the threshold line depends on the matrix type. As a rule, the threshold should be proportional to the size of Hadamard or Euler matrices [2], as the latter are an adaptable core of matrices with integer elements. For both matrix types, the patterns of their portraits and the threshold are the same. The advantage of this approach is that you can build such diagrams and study the statistics of the frequency with which target sequences for undiscovered matrices appear. For example, the above-mentioned work [20] lacks the information about an Euler matrix of order 154 whose Fourier spectrum is shown in Fig. 21.3. In order to check whether the matrix mining conditions are stably reproduced on computers of different generations, in Table 21.1 we have compared the percentage of Fig. 21.3 Fourier spectrum of sequences for Euler matrices of order 154
244
N. Balonin et al.
Table 21.1 Percentage of sequences found suitable for building orthogonal matrices after filtering Order
Data from [19], 2001 g. (%)
Experiment, 2021 (%)
42
7.22
7.12
50
3.63
3.7
62
1.45
1.52
70
0.8
0.82
82
0.31
0.29
90
0.16
0.16
suitable sequences for several orders, obtained 20 years ago [20] and in the experiment conducted in 2021. The suitable sequence percentage drop graph (Fig. 21.4) built according to Table 21.1 provides strong evidence about the growing role of Fourier filters in the procedures of search for Hadamard matrices and matrices built on their orthogonalized bases. This allows us to find matrices of new even or odd orders. In both these cases, Fourier spectrum is a very efficient mining control tool. For real filters constantly functioning for many days, it would be reasonable to develop a user-friendly search software interface in order to make the search control as convenient as possible. The use of a Fourier filter is not the only way of rejecting bad sequences. Fourier matrix is an orthogonal matrix determining the frequencies. Its counterpart in discrete mathematics is Walsh matrix, and its counterpart in continual mathematics is Mersenne–Walsh matrix [2, 6]. In other words, to control the search for Hadamard matrices, you can use previously found Hadamard matrices or their derivative matrices. There are other ways as well, including the analysis of correlation or autocorrelation functions. These interesting opportunities are not mentioned in [20]. The search for rare sequences is really difficult, so we should never neglect any additional information. This is included in the search culture, too [21]. Fig. 21.4 Suitable sequence percentage drop graph
21 Matrix Mining for Digital Transformation
245
21.7 Matrix Symmetry as a Source of Higher Search Speed Hadamard matrices accompany orders n = 2 k and even numbers 4t in which powers of two are nested (Hadamard’s conjecture). Belevitch matrices (weighed matrices or conference matrices obtained by the assignment of zeros to their diagonal elements) cover additional orders equal to numbers of the form 4t − 2 resolved into a sum of two squares. The last statement if the conjecture [1], as the form of extremal matrices, as the order grows, becomes so much more complex that the first theoretically unsolved cases include the orders 66 and 86 (unlike 668 in the case of Hadamard matrices). Mersenne matrices M [2, 6] (which are Cretan) accompany orders equal to Mersenne sequence numbers n = 2 k – 1 and odd numbers of the form 4t − 1 into which this sequence is nested. They differ from Hadamard matrices by just one √ irrational value −b = t/(t + t) instead of an element −1. The next interesting problem is to find orders in Cretan matrices which accompany prime numbers and powers of prime numbers of the form 4t − 1 and 4t − 3. The case of 4t − 1 is the simplest one, as the existence of a finite field GF(n) guarantees the existence of skew-symmetric (by the signs of the elements, not by their values) Mersenne matrices [2]. It is essential here that, just like in the case of Hadamard matrices, the values of the diagonal elements of these matrices vary from 0 to 1 without loosing the orthogonality while b changes. Actually, Belevitch matrices C can be transformed into Hadamard matrices as C = H + I. In other words, we should first make sure that C is skew-symmetric. However, mutual transitions in this particular case do not change the value of the negative element. Generally, when the values of the diagonal elements are frozen, the number of possible orders decreases from 4t − 1 or 4t − 3 down to the desired set of prime number powers, and the matrices will differ only by their symmetries. Such Cretan matrices have been considered in article [22], so we will describe them in the most general form, without mentioning their bicyclic construction. Let us call them Odin matrices. They can be obtained from chaotic matrices (determinant optimization does not impose any demands about maintaining the structure) or from Hadamard/Belevitch matrices, being closely related to their cores via the proves of the existence theorems [2]. Definition 1 Odin matrix of order 4t – 1 which is a prime number or its power is a Cretan matrix whose √ elements take values 1, −b and d = 0 (on the diagonal), where b = (v − 1)/(v + (2v − 1)), v = (n − 1)/2 is the half size of the matrix, excluding its edge d. Definition 2 Odin matrix of order 4t − 3, which is a prime number or √ its power is a Cretan matrix whose elements take values 1, −b i d = 1/ (1 + n) (on the diagonal), where b = 1 − 2d. Odin matrix has an invariant which is a matrix with the equal number of the values of its non-diagonal elements. Such a structure allows you to easily specify the first
246
N. Balonin et al.
row and column, which will be the edge made of the elements of vectors e and −be, where e is a vector of ones of the length v. The structures of both the matrices are described by a skew-symmetric (by signs of the elements) or symmetric form, respectively: ⎞ ⎛ ⎞ d e −b e d −b e e = ⎝ −b e A B ⎠, O4t−3 = ⎝ −b e A B ⎠ T T T e [−B ] D e −B [−DT ] ⎛
O4t−1
The operation marked here as [−BT ] and [−DT ] means a replacement common in three-valued matrices: all the positive elements of the transposed matrix are replaced by 1 and all the negative elements by −b. Adding an edge of 1 and −1 in the row and column (taking into account the symmetries) produces a Belevitch matrix C with element values −b = −1 and d = 0. Removing the edge from matrices Odin A B A B and . produces Cretan shadow matrices of the form BT −DT −BT DT Definition 3 Shadow matrix T of order n = 4t − 2, where n + 1 is a prime number or its power is a Cretan matrix whose √ elements take values 1, −b and d = 0 (on the diagonal), where b = (v- 1)/(v + (4v − 3)). The block size is v = n/2 = 2t − 1. Definition 4 Shadow matrix T of order n = 4t − 4, where n + 1 is a prime number or its power is a Cretan matrix whose elements take values 1, −b and d = 2/(3 + √ (2n + 1)) (on the diagonal), where b = 1 − 2d. The values of the elements in all the four definitions above follow directly from the orthogonality condition and the pattern invariant in the matrix portrait. The matrices H and C of orders pm + 1, where p is a prime number are related via one-to-one transformations to matrices O and T. When calculating them in the field GF(pm ), the cyclic blocks A = D and B appear most economical. In the skew-symmetric (by A) version of the matrix portrait pattern, the second (paired) matrix B is symmetric, but in the symmetric version its first row is skewsymmetric (by signs). Since the order is even, it consists of halves inverted by sign. The second half is reversed. It is exactly this important invariant of the structure which is imposed by the arithmetic of Galois fields, being responsible for accompanying the matrix orders by prime numbers and their powers. The difference of Odin and shadow matrices from previously described Mersenne and Euler matrices [2] is fundamental. The replacement of diagonal zeros by ones, so inconsequential at prime number powers, gives you the opportunity to use a cyclic shift of blocks A and B in bicyclic Euler matrices to restore orthogonality when there is no field unequivocally related to the mentioned symmetries. Thus, apart from the matrices accompanying prime numbers and their powers, there are Euler matrices E of orders 4t − 2 and Mersenne matrices of composite orders 4t − 1, which can be found by either exhaustive search or determinant optimization [2, 13].
21 Matrix Mining for Digital Transformation
247
21.8 Conclusions We have discussed the development of a modern technique for high-order Hadamard matrix mining, which is crucial for the methods of digital orthogonal transformation of data. The scientific novelty of this research lies in the fact that there had been hardly any study of sample enrichment as a base for building symmetric Hadamard matrices. All the search procedures that we have mentioned were used without supercomputers, on a “naked” and unenriched sample. The development of matrix mining in this direction shows a high-quality solution of the same problems at higher orders and with a better search culture. Nowadays, after rare findings and “record-breaking”, matrix mining is entering the stage of getting regular and guaranteed results in acceptable time. What contributes to it is the newly proposed control of Hadamard matrix search using already found Hadamard matrices, along with the analysis of correlation and autocorrelation functions. Acknowledgements We express our deep gratitude to Ms. Jennifer Seberry (Pearcey Award winner, Professor Emeritus at University of Wollongong, Australia) for her creative recommendations and participation in seminars which promoted Hadamard matrix mining technology. The paper was prepared with the financial support of the Ministry of Science and Higher Education of the Russian Federation, grant agreement No. FSRF-2020-0004.
References 1. Seberry, J., Yamada, M.: Hadamard matrices: constructions using number theory and linear algebra. Wiley (2020) 2. Balonin, N.A., Sergeev, M.B., Seberry, J., Sinitsyna, O.I.: Circles on lattices and Hadamard matrices. Informatsionno-upravliaiushchie sistemy [Inf. Control Syst.] (3), 2–9 (2019) (In Russian). https://doi.org/10.31799/1684-8853-2019-3-2-9 3. Mohan, M.T.: p-almost Hadamard matrices and λ-planes. J. Algebraic Combin. (2020). https:// doi.org/10.1007/s10801-020-00991-y 4. Wang, R.: Introduction to orthogonal transforms with applications in data processing and analysis. Cambridge University Press (2010) 5. Seberry, J., Wysocki, B., Wysockiet, T.: On some applications of Hadamard matrices. Metrika 62(2–3), 221–239 (2005) 6. Vostrikov, A., Sergeev, M., Balonin, N., Sergeev, A.: Use of symmetric hadamard and mersenne matrices in digital image processing. Procedia Comput. Sci. 1054–1061 (2018). https://doi.org/ 10.1016/j.procS.2018.08.042 7. Mironovsky, L.A., Slaev, V.A.: Strip-Method for image and signal transformation. Berlin, Boston: De Gruyter (2021). https://doi.org/10.1515/9783110252569 8. Evangelaras, H., Koukouvinos, C., Seberry, J.: Applications of Hadamard matrices. J Telecommun Inf Technol 2, 3–10 (2003) 9. Kharaghani, H., Tayfeh-Rezaie, B.A.: Hadamard matrix of order 428. J. Comb. Des. 13, 435– 440 (2005) 10. Orrick, W.P., Solomon, B.: Large determinant sign matrices of order 4k+1. Discrete Math 307, 226–236 (2007)
248
N. Balonin et al.
11. Seberry, J., Xia, T., Koukouvinos, C., Mitrouli, M.: The maximal determinant and subdeterminants of ±1 matrices. Linear. Algebra Appl. (373), 297–310 (2003). Combinatorial Matrix Theory Conference (Pohang, 2002). https://doi.org/10.1016/S0024-3795(03)00584-6 12. Orrick, W.P.: The maximal {−1,1}-determinant of order 15. Metrika 62, 195–219 (2005) 13. Balonin, N.A., Sergeev, M.B., Suzdal, V.S.: Dynamic generators of the quasi-orthogonal hadamard matrices family. SPIIRAS Proc. 5(54), 224–243 (2017). https://doi.org/10.15622/ sp.54.10 14. Acevedo, S., Dietrich, H.: New infinite families of Williamson Hadamard matrices. Aust. J. Comb. 73(1), 207–219 (2019) 15. Holzmann, W.H., Kharaghani, H., Tayfeh-Rezaie, B.: Williamson matrices up to order 59. Des. Codes Crypt. 46(3), 343–352 (2008) 16. Seberry, J., Balonin, N.A.: Two infinite families of symmetric Hadamard matrices. Aust J Comb 69(3), 349–357 (2017) 17. Balonin, N.A., Sergeev, M.B.: Helping hadamard conjecture to become a theorem. Part 1. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] (6), 2–13 (2018) (In Russian). https://doi.org/10.31799/1684-8853-2018-6-2-13 18. Balonin, N.A., Sergeev, A.M., Sinitsyna, O.I.: Finite field and group algorithms for orthogonal sequence search. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] (4), 2–17 (2021) (In Russian). https://doi.org/10.31799/1684-8853-2021-4-2-17 19. Balonin, Y., Abuzin, L., Sergeev, A., Nenashev, V.: The study of generators of orthogonal pseudo-random sequences. Smart Innovation Syst Technol. 143, 125–133 (2019). https://doi. org/10.1007/978-981-13-8303-8 20. Fletcher, R.J., Gysin, M., Seberry, J.: Application of the discrete Fourier transform to the search for generalised Legendre pairs and Hadamard matrices. Australas. J Comb. 23, 75–86 (2001) 21. Turner, J.S., Kotsireas, I.S., Bulutoglu, D.A., Geyer, A.J.: A Legendre pair of length 77 using complementary binary matrices with fixed marginal. Des. Codes Cryptogr. 89(6), 1321–1333 (2021) 22. Balonin, N.A., Sergeev, M.B.: Cretan matrices of Odin and shadows accompanying primes and their powers. Informatsionno-upravliaiushchie sistemy [Inf. Control Syst.] (1), 2–7 (2022) (In Russian). https://doi.org/10.31799/1684-8853-2022-1-2-7
Chapter 22
Using Chimera Grids to Describe Boundaries of Complex Shape Alena V. Favorskaya
and Nikolay Khokhlov
Abstract Stress-strain analysis of railway using computer simulation and fullwave modeling of ultrasonic non-destructive testing require significant computing resources. In addition to this, the complex shape of the rail can be noted. The use of computational Chimera grids might be a solution to this problem, because it reduces the amount of computing resources. The background computational grids are structured grids with a constant coordinate step, and the Chimera curvilinear structured grid is a thin layer surrounding the outer boundary of the rail and accurately describing its shape. Interpolation is performed between different types of computational grids. The process of interpolation by points of a quadrangle of an arbitrary shape is considered in detail in this work. We used the grid-characteristic numerical method on structured curvilinear and regular grids, respectively, to carry out the calculations. The results of simulation the propagation of ultrasonic waves in a rail are presented in the work.
22.1 Introduction The development of railway transport [1] leads to the need to improve the safety of exploitation, which in turn requires the development of methods for diagnosing damage and determining the terms of exploitation. Therefore, scientists are faced with the issue of developing novel methods for computer simulation of elastic wave phenomena in the rails and railway. In this paper, we propose to use Chimera computational grids for a more accurate description of the rail shape. Chimera or overlapping grids began to be used to solve hyperbolic systems of equations and problems of hydrodynamics in [2, 3]. In A. V. Favorskaya (B) · N. Khokhlov Moscow Institute of Physics and Technology (MIPT), Institutsky lane 9, Dolgoprudny, Moscow 141700, Russian Federation e-mail: [email protected] N. Khokhlov e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_22
249
250
A. V. Favorskaya and N. Khokhlov
the works [4, 5], we applied them to solve geophysical problems using the gridcharacteristic computational method. To calculate elastic wave phenomena in solids, one can use finite-difference methods [6, 7], finite element methods [8], discontinuous Galerkin method [9], boundary integral equation method [10]. In this work, we use the grid-characteristic method, which is one of the finite-difference methods. Previously, it has proven itself for solving problems in geophysics [11–13], including inverse ones using machine learning [14, 15] and using parallel algorithms [16], for solving problems of ultrasonic non-destructive testing of a rail [17, 18], for calculation of deformations of composite materials [19] and ice islands [20, 21], for modeling elastic waves in the anisotropic case [22, 23], for modeling ultrasound in medicine [24]. We organized this paper as follows. Section 22.2 is devoted to the problem statement. The system of equation to be solved, the used computational grids are presented. The computational algorithm is presented in Sect. 22.3. Section 22.4 deals with interpolation in the quadrangle of an arbitrary shape. Section 22.5 presents the results of computer simulations. Section 22.6 concludes the paper.
22.2 Mathematical Model To simulate ultrasonic waves inside a rail, we solve the following system of Eqs. 22.1 and 22.2. ρ
∂ v(r, t) = (∇ · σ(r, t))T ∂t
(22.1)
∂ σ(r, t) = ρcP2 − 2ρcS2 (∇ · v(r, t))I + ρcS2 ∇ ⊗ v(r, t) + (∇ ⊗ v(r, t))T ∂t (22.2) Here, v is the derivative of the displacement with respect to time, σ is the symmetric Cauchy stress tensor, ρ = 7800 kg/m3 is the density of steel, cP = 6250.1282 m/s is the speed of pressure waves, cS = 3188.5210 m/s is the speed of shear waves. The pressure on the upper side of the rail at the intersection of a circle with a center (0 mm, 152 mm) 15 mm in diameter with the outer boundary of the rail was set as a source. A sinusoidal wavelet with 10 periods with a frequency of 12.5 MHz was considered. On the rest of the outer boundary of the Chimera grid, the condition of a free boundary was set. Zero initial conditions were used. Figure 22.1 shows the computational grids used, i.e. three grids with a constant coordinate step equal to 0.2 mm and one Chimera curved structured grid. In accordance with the stability conditions [4], the time step was equal to 0.36 · 0.02/cS ≈ 32 ns, 12,600 time steps were performed. The above problem statement can be used both for solving the problems of nondestructive testing [17] and for calculating the train impact on the railway [25, 26].
22 Using Chimera Grids to Describe Boundaries of Complex Shape
251
Fig. 22.1 Computational meshes in different scales: a shape, b nodes in detail
22.3 Computational Algorithm At each time step, the following calculation algorithm is used: 1. 2. 3. 4.
5.
Interpolation from background regular grids to the nodes [1, 2] × [1, NL ] of the Chimera grid (to the nodes marked with purple squares in Fig. 22.2a). Calculation in the directions OX in regular background grids and OS in the Chimera grid. Correction in accordance with the boundary condition at the outer boundary of the Chimera grid in points (NS , j), j ∈ [1, NL ]. regular grids into ghost nodes Copying in pairs in contacting background m X,I , n X,I ×[−1, 0] of an adjacent grid, m X,I , n X,I runs through the numbers of contacting nodes for the contact with number I = 1, 2. Copying to ghost nodes at the beginning and end of the Chimera grid from real nodes of the Chimera grid: [1, NS ] × [NL − 1, NL ] → [1, NS ] × [−1, 0]
(22.3)
Fig. 22.2 The nodes to which the interpolation is carried out: a the nodes of the Chimera grid, b the nodes of the regular background grids
252
A. V. Favorskaya and N. Khokhlov
[1, NS ] × [0, 1] → [1, NS ] × [NL + 1, NL + 2] 6. 7.
8.
(22.4)
Calculation in the directions OY in regular background grids and OL in the Chimera grid. Interpolation from the Chimera grid to the nodes of regular background grids, which are covered by a subdomain of the Chimera grid with the numbers of the nodes [3, NS ] × [1, NL ] (to the nodes marked with blue squares in Fig. 22.2b). Save and copy the time layer n + 1 of all computational grids in time layer n. (In order to save RAM, only two time layers of each computational grid are stored).
Here, NS = 16 is the number of nodes in the Chimera grid across it, NL = 3051 is the number of nodes along it. The corresponding directions are named OS (short) and OL (long). Ghost nodes differ from real ones in that the unknowns in them are used to calculate unknowns in real nodes, but the calculation of unknowns on a time layer is not performed in these ghost nodes. Note that nodes of Chimera grid [1, 2] × [1, NL ] might be ghost ones. However, since interpolation is used, the shape of the Chimera grid is important and the usual ghost node creating algorithm for curvilinear structured grid is not applicable here, so these nodes are considered as real. Bilinear interpolation from a regular background grid to the nodes of the Chimera grid does not raise any questions. The next section is devoted to interpolation from the Chimera grid to the nodes of regular background grids.
22.4 Features of Interpolation The disadvantages of the usual bilinear interpolation with basis functions 1, ξ, η, ξ · η in the case of a quadrangle of an arbitrary shape, is the ambiguity of the choice of the coordinate system ξ(x, y), η(x, y) and the need to solve the system of linear equations for each cell of the Chimera grid. Below we propose the following type of interpolation, which allows to avoid these disadvantages. Consider a quadrangle ABCD of arbitrary shape (Fig. 22.3), oriented to satisfy the inequality AB + AD ≥ BC + CD. The point at which the interpolation is needed is indicated by K. We also introduced two auxiliary lines EF and GH in such a way that: α=
DF GK AG BH EK AE = = β= = = AB DC GH AD BC EF
(22.5)
Knowing the coefficients α and β by analogy with bilinear interpolation in a rectangle, the following formula can be used: uK = (1 − α)(1 − β)uA + α(1 − β)uB + αβ · uC + (1 − α)β · uD
(22.6)
22 Using Chimera Grids to Describe Boundaries of Complex Shape
253
Fig. 22.3 Interpolation at point K by points of a quadrangle ABCD of arbitrary shape: a statement of the interpolation problem, b explanation of the definition of the coefficients α and β
Here, u is a vector of unknowns to be interpolate. Using the scheme in Fig. 22.3 and Eq. 22.5, it is easy to obtain the following system of quadratic equations for finding the coefficients α and β: xK = xA + α(xB − xA ) + β(xD − xA ) + αβ(xA + xC − xB − xD )
(22.7)
yK = yA + α(yB − yA ) + β(yD − yA ) + αβ(yA + yC − yB − yD )
(22.8)
It should be noted that the closer the quadrangle is to the parallelogram, the closer system of Eqs. 22.7 and 22.8 is to the system of linear equations. Therefore, we have proposed the following algorithm for finding the coefficients α and β. Let the error ε be given, with which we seek the coordinates of the point K by Eqs. 22.7 and 22.8. Moreover, if it is found that the distance from point K to one of the nodes of Chimera grid is less than ε, or the distance from point K to one of the edges of the quadrilateral ABCD is less than ε, then either copying or linear interpolation along the corresponding segment is used, respectively. Otherwise, the following steps are performed. 1.
Firstly, we try to find α using Eq. 22.9, and β by Eq. 22.11. If Eqs. 22.7, 22.8 are satisfied with an error less than ε, the interpolation problem is solved. This is the case when the quadrangle ABCD is close enough to the trapezoid AD // BC or parallelogram. α= α=
β=
r (K, AB) r (K, AD) β= r (K, AD) + r (K, BC) r (K, AB) + r (K, CD)
(22.9)
(xK − xA ) − (xD − xA )β (yK − yA ) − (yD − yA )β = (xB − xA ) + (xA + xC − xB − xD )β (yB − yA ) + (yA + yC − yB − yD )β (22.10) (xK − xA ) − (xB − xA )α (yK − yA ) − (yB − yA )α = (xD − xA ) + (xA + xC − xB − xD )α (yD − yA ) + (yA + yC − yB − yD )α (22.11)
254
A. V. Favorskaya and N. Khokhlov
In Eq. 22.9, r (K, MN) denotes the distance from the point K to the segment MN, M, N ∈ {A, B, C, D}. 2.
3.
Otherwise, we try to find β by Eq. 22.9, and α by Eq. 22.10. If Eqs. 22.7 and 22.8 are satisfied with an error less than ε, the interpolation problem is solved. This is the case when the quadrangle ABCD is close enough to the trapezoid AB // CD. We are in a situation where it is necessary to solve a system of nonlinear Eqs. 22.7 and 22.8. It remains to determine whether to solve the quadratic equation on α, and to search for β by Eq. 22.11, or to solve the quadratic equation on β and use Eq. 22.10 to find α. It is also necessary to determine the method for solving the quadratic equation, since the coefficient before the quadratic term in most cases will be small and the use of the formula with discriminant may give a significant calculation error. Therefore, we introduce one more parameter, εN , the recommended value of εN is 0.3. If for one of the quadratic equations: aθ2 − bθ + c = 0, θ ∈ {α, β}
(22.12)
the following inequality is true: a b < εN to find this θ we use the Newton’s iterative method: c b − θn 1 − θn · a b θn+1 = θn + 1 − 2 · θn · a b
(22.13)
(22.14)
In the corresponding iterative process, we find the remaining coefficient by Eq. 22.10 or Eq. 22.11, respectively, and terminate the iterative process when Eqs. 22.7, 22.8 are satisfied with an error less than ε. The initial approximation of θ is given by Eq. 22.9. If Eq. 22.13 is not true for none of the quadratic equations for the coefficients α and β, we solve the quadratic equation with the largest coefficient a using the formula with discriminant. Let also consider the geometric meaning of the coefficients of the quadratic Eq. 22.12 in order to justify the choice of the orientation of the quadrangle and the given interpolation algorithm as a whole. For definiteness, we put θ = α. Then the coefficients of Eq. 22.12 can be found by the following formulae: 2a = (xB − xA )(yD − yA ) − (xD − xA )(yB − yA ) + (xC − xA )(yB − yA ) − (xB − xA )(yC − yA )
(22.15)
22 Using Chimera Grids to Describe Boundaries of Complex Shape
255
Fig. 22.4 The geometric meaning of the coefficients of the quadratic Eq. 22.12
2b = (xB − xA )(yD − yA ) − (xD − xA )(yB − yA ) + (xC − xA )(yK − yA ) − (xK − xA )(yC − yA ) + (xK − xA )(yB − yA ) − (xB − xA )(yK − yA ) + (xK − xA )(yD − yA ) − (xD − xA )(yK − yA ) 2c = (xK − xA )(yD − yA ) − (xD − xA )(yK − yA )
(22.16)
(22.17)
The geometric meaning of the coefficient a is the sum of the positive area of the triangle ABD and the negative area of the triangle ABC (Fig. 22.4a). The geometric meaning of the coefficient b is the sum of the positive area of the triangle ABD, the positive area of the triangle AKD, the negative area of the triangle ABK and the area of the triangle ACK with a sign that varies depending on the position of the point K. The last three areas are colored in Fig. 22.4b. The geometric meaning of the coefficient c is the positive area of the triangle AKD.
22.5 Simulation Results Figures 22.5 and 22.6 show the propagation of ultrasonic waves in the rail at different times. Snapshots of the velocity modulus are presented, perfectly demonstrating the propagation of both pressure and shear waves [18]. Comparing Figs. 22.5b, c and 22.6a, b, one can verify that the wave fields in the regions of background regular grids covered by the Chimera grid correspond to the wave fields in the Chimera grid.
22.6 Conclusions The obtained results showed the possibility of using the grid-characteristic method and Chimera grids to solve systems of hyperbolic equations in objects of complex shape such as a rail by the example of a boundary value problem of an elastic wave equation. In this case, it becomes possible to accurately describe the shape of the rail
256
A. V. Favorskaya and N. Khokhlov
Fig. 22.5 Wave patterns: a time moment 4.3545 μs, with Chimera grid, b time moment 14.031 μs, with Chimera grid, c time moment 14.031 μs, background regular grids only
Fig. 22.6 Wave patterns: a time moment 24.1915 μs, with Chimera grid, b time moment 24.1915 μs, background regular grids only, c time moment 26.61 μs, with Chimera grid
and significantly reduce the cost of computing resources due to the use of grids with a constant coordinate step in most of the integration domain. Acknowledgements This work has been performed with the financial support of the Russian Science Foundation (project No. 20-71-10028).
References 1. Vaiciunas, G.: Assessment of railway development in Baltic Sea region. In: Transport Means— Proceedings of the International Conference, pp. 790–796. Kaunas University of Technology, Kaunas (2020) 2. Berger, M.J., Joseph, E.O.: Adaptive mesh refinement for hyperbolic partial differential equations. J. Comput. Phys. 53(3), 484–512 (1984)
22 Using Chimera Grids to Describe Boundaries of Complex Shape
257
3. Steger, J.L.: A chimera grid scheme: advances in grid generation. Am. Soc. Mech. Eng. Fluids Eng. Div. 5, 55–70 (1983) 4. Khokhlov, N., Favorskaya, A., Stetsyuk, V., Mitskovets, I.: Grid-characteristic method using Chimera meshes for simulation of elastic waves scattering on geological fractured zones. J. Comput. Phys. 446, 110637 (2021) 5. Favorskaya, A., Khokhlov, N.: Accounting for curved boundaries in rocks by using curvilinear and Chimera grids. Procedia Comput. Sci. 192, 3787–3794 (2021) 6. Landinez, G., Rueda, S., Lora-Clavijo, F.D.: First steps on modelling wave propagation in isotropic-heterogeneous media: Numerical simulation of P–SV waves. Eur. J. Phys. 42(6), 065001 (2021) 7. Cuenca, E., Ducousso, M., Rondepierre, A., Videau, L., Cuvillier, N., Berthe, L., Coulouvrat, F.: Propagation of laser-generated shock waves in metals: 3D axisymmetric simulations compared to experiments. J. Appl. Phys. 128(24), 244903 (2020) 8. Benatia, N., El Kacimi, A., Laghrouche, O., El Alaoui Talibi, M., Trevelyan, J.: Frequency domain Bernstein-Bézier finite element solver for modelling short waves in elastodynamics. Appl. Math. Model. 102, 115–136 (2022) 9. Favorskaya, A.V., Petrov, I.B.: Combination of grid-characteristic method on regular computational meshes with discontinuous Galerkin method for simulation of elastic wave propagation. Lobachevskii J Math 42(7), 1652–1660 (2021) 10. Fomenko, S.I., Golub, M.V., Doroshenko, O.V., Wang, Y., Zhang, C.: An advanced boundary integral equation method for wave propagation analysis in a layered piezoelectric phononic crystal with a crack or an electrode. J. Comput. Phys. 447, 110669 (2021) 11. Golubev, V.I., Shevchenko, A.V., Petrov, I.B.: Application of the Dorovsky model for taking into account the fluid saturation of geological media. J. Phys. Conf. Ser. 1715(1), 012056 (2021) 12. Favorskaya, A.V., Khokhlov, N.I.: Types of elastic and acoustic wave phenomena scattered on gas- and fluid-filled fractures. Procedia Comput. Sci. 176, 2556–2565 (2020) 13. Favorskaya, A., Golubev, V.: Study of anisotropy of seismic response from fractured media. Smart Innovation Syst. Technol. 238, 231–240 (2021) 14. Muratov, M.V., Ryazanov, V.V., Biryukov, V.A., Petrov, D.I., Petrov, I.B.: Inverse problems of heterogeneous geological layers exploration seismology solution by methods of machine learning. Lobachevskii J. Math. 42(7), 1728–1737 (2021) 15. Muratov, M.V., Petrov, D.I., Biryukov, V.A.: The solution of fractures detection problems by methods of machine learning. Smart Innovation Syst. Technol. 215, 211–221 (2021) 16. Nikitin, I.S., Golubev, V.I., Ekimenko, A.V., Anosova, M.B.: Simulation of seismic responses from the 3D non-linear model of the Bazhenov formation. IOP Conf. Ser. Mater. Sci. Eng. 927(1), 012020 (2020) 17. Favorskaya, A.V., Kabisov, S.V., Petrov, I.B.: Modeling of ultrasonic waves in fractured rails with an explicit approach. Dokl. Math. 98(1), 401–404 (2018) 18. Favorskaya, A., Petrov, I.: A novel method for investigation of acoustic and elastic wave phenomena using numerical experiments. Theor. Appl. Mech. Lett. 10(5), 307–314 (2020) 19. Beklemysheva, K., Golubev, V., Petrov, I., Vasyukov, A.: Determining effects of impact loading on residual strength of fiber-metal laminates with grid-characteristic numerical method. Chin. J. Aeronaut. 34(7), 1–12 (2021) 20. Petrov, I.B., Muratov, M.V., Sergeev, F.I.: Elastic wave propagation modeling during exploratory drilling on artificial ice island. Smart Innovation Syst. Technol. 217, 171–183 (2021) 21. Favorskaya, A., Petrov, I.: Calculation of the destruction of ice structures by the gridcharacteristic method on structured grids. Procedia Comput. Sci. 192, 3768–3776 (2021) 22. Petrov, I.B., Golubev, V.I., Petrukhin, V.Y., Nikitin, I.S.: Simulation of seismic waves in anisotropic media. Dokl. Math. 103(3), 146–150 (2021) 23. Golubev, V.: The grid-characteristic method for applied dynamic problems of fractured and anisotropic media. CEUR Workshop Proc. 3041, 32–37 (2021) 24. Beklemysheva, K., Vasyukov, A., Ermakov, A.: Grid-characteristic numerical method for medical ultrasound. J. Phys. Conf. Ser. 2090(1), 012164 (2021)
258
A. V. Favorskaya and N. Khokhlov
25. Kozhemyachenko, A.A., Petrov, I.B., Favorskaya, A.V., Khokhlov, N.I.: Boundary conditions for modeling the impact of wheels on railway track. Comput. Math. Math. Phys. 60(9), 1539– 1554 (2020) 26. Nozhenko, O., Gorbunov, M., Vaiˇciunas, G., Porkuian, O.: Preconditions for creating a methodology for diagnosing of increase dynamic impact a rolling stock on the rail. In: Transport Means—Proceedings of the International Conference, pp. 1591–1595. Kaunas University of Technology, Kaunas (2019)
Chapter 23
Ultrasonic Study of Sea Ice Ridges Alena V. Favorskaya
and Maksim V. Muratov
Abstract Active hydrocarbon production on the Northern shelf leads to necessity to calculate ice loads on oil platforms and other offshore objects. Therefore, there is a demand to study the internal structure of sea ice ridges. In this paper, we are exploring the possibility to investigate the structure of complex heterogeneous ice formations such as sea ice ridges using ultrasonic (strictly speaking, the waves in the sound range were found to be the most effective) waves and data recording on nearby ice floes. Since carrying out physical laboratory experiments is expensive, it is advisable to preliminary study the issue with the help of computer simulation. In this work, the grid-characteristic method was used on structured regular computational grids to calculate the propagation of elastic and acoustic waves in sea ice ridges, water and bottom geological rock. Comparisons of synthetic seismograms obtained using full-wave modeling are carried out in order to determine the possibility of using the proposed approach for studying the internal structure of sea ice ridges.
23.1 Introduction Depletion of hydrocarbon reserves leads to seismic prospecting and production in hard-to-reach fields, such as the shelf of the Northern seas. This circumstance poses the issue of ensuring the safety of oil production facilities exploitation in difficult ice conditions [1, 2]. Collisions with sea ice ridges are one of the key factors for the design of offshore structures in the Northern seas [3] and ensuring the safety of navigation [4]. Usually, drilling is used to study the internal structure of sea ice ridges [5–7]. In the work [3], the geometric characteristics of sea ice ridges were investigated using a sonar. In the work [4], modeling of electromagnetic waves was carried out, it was A. V. Favorskaya (B) · M. V. Muratov Moscow Institute of Physics and Technology (MIPT), Institutsky lane 9, Dolgoprudny, Moscow 141700, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_23
259
260
A. V. Favorskaya and M. V. Muratov
shown that the characteristics of the surface and structure of the sea ice ridge affect the nature of the scattered signal. In this work, we propose to use elastic and acoustic waves to study the internal structure of sea ice ridges. An interest in the influence of sea ice ridges on the propagation of acoustic waves has arisen for a long time [8]. The work [9] presents the results of experimental studies of seismoacoustic phenomena associated with the growth, melting, re-freezing, cracking, crest formation and rafting of sea ice. In the work [10], the discontinuation of the propagation of S-waves (shear) in ice due to the sea ice ridge is noted. We used the grid-characteristic method on structured regular computational grids for computer simulation of wave phenomena in a sea ice ridge. Previously, this computational method was successfully applied to simulate elastic wave phenomena [11, 12] in geological media [13] with fractures [12, 14, 15], porous media [16], anisotropic media [17] and media with gas pockets [18], to take into account the effect of ice formations on seismograms [19], to simulate impacts on composite materials [20]. The development of bicompact grid-characteristic schemes [21], combination with the discontinuous Galerkin method [22] and parallel algorithms [23] is underway. This paper has the following structure. In Sect. 23.2, we present the 12 problem statements under consideration. The results of numerical modeling and comparison of seismograms are discussed in Sect. 23.3. Section 23.4 concludes the paper.
23.2 Computational Model We use the mathematical model of sea ice ridges from the work [24]. The survey systems are shown in Figs. 23.1, 23.2 and 23.3. Red asterisks mark the positions of the sources that coincide with the positions of the first receivers from the source side. The rest of the receivers are conventionally marked with black triangles. Receivers were located every 0.1 m, 201 receivers on each side of the sea ice ridge. Note that,
Fig. 23.1 Left position of the source, sea ice ridge model: a «Middle», b «Big»
23 Ultrasonic Study of Sea Ice Ridges
261
Fig. 23.2 Right and close position of the source, sea ice ridge model: a «Middle», b «Big»
Fig. 23.3 Right and distant position of the source, sea ice ridge model: a «Middle», b «Big»
Figs. 23.1, 23.2 and 23.3 are shown horizontally not to scale in the areas where the receivers are located. The sources of sinusoidal shape with one period with frequencies of 10 kHz and 5 kHz were used. It turns out to the 12 problem statements, for each of which we consider seismograms on the side of the source and on the opposite to the source side. These 12 problem statements are summarized in Tables 23.1 and 23.2. The time step was taken equal to 5.076 µs, the coordinate steps were 2 cm, and 4500 time steps were calculated. The size of the ice block in the sea ice ridges models was taken equal to 1 m. The joint boundary value problem of the elastic and acoustic Table 23.1 Problem statements 1–6, frequency of 10 kHz No
1
2
3
4
5
6
Model
«Middle»
«Middle»
«Middle»
«Big»
«Big»
«Big»
Source position
Left, Fig. 23.1a
Right, close, Fig. 23.2a
Right, Fig. 23.3a
Left, Fig. 23.1b
Right, close, Fig. 23.2b
Right, Fig. 23.3b
Frequency (kHz)
10
10
10
10
10
10
262
A. V. Favorskaya and M. V. Muratov
Table 23.2 Problem statements 7–12, frequency of 5 kHz No
7
8
9
10
11
12
Model
«Middle»
«Middle»
«Middle»
«Big»
«Big»
«Big»
Source position
Left, Fig. 23.1a
Right, close, Fig. 23.2a
Right, Fig. 23.3a
Left, Fig. 23.1b
Right, close, Fig. 23.2b
Right, Fig. 23.3b
Frequency (kHz)
5
5
5
5
5
5
wave equations and the elastic characteristics of materials are considered in detail in the work [24] and so we do not present them here.
23.3 Simulation Results This section is devoted to a discussion of the results of computer simulation. Examples of wavefield snapshots, synthetic seismograms and results of comparison of synthetic seismograms are given.
23.3.1 Wave Patterns Some examples of snapshots of velocity modulus wave field are given in Figs. 23.4 and 23.5. Comparing Figs. 23.4a, b and 23.5a, b, respectively, one can see the difference between cases of various source frequencies.
Fig. 23.4 Wave patterns, sea ice ridge model «middle», source position «left», time moment 5.3298 ms: a frequency 10 kHz, problem No. 1, b frequency 5 kHz, problem No. 7
23 Ultrasonic Study of Sea Ice Ridges
263
Fig. 23.5 Wave patterns, sea ice ridge model «big», source position «right, close», time moment 6.31962 ms: a frequency 10 kHz, problem No. 5, b frequency 5 kHz, problem No. 11
23.3.2 Seismograms Several synthetic seismograms are given in Figs. 23.6, 23.7, 23.8 and 23.9. All seismograms are normalized on maximum value on time and receiver number. Comparing Figs. 23.6, 23.7, 23.8 and 23.9, respectively, one can see the difference between cases of various source frequencies. A more detailed analysis of all synthetic seismograms for all the 12 problem statements is given in the next section.
Fig. 23.6 Seismograms, sea ice ridge model «middle», source position «left», frequency 10 kHz, problem No. 1, opposite to the source side: a velocity modulus, b horizontal component of velocity, c vertical component of velocity, d scales
Fig. 23.7 Seismograms, sea ice ridge model «middle», source position «left», frequency 5 kHz, problem No. 7, opposite to the source side: a velocity modulus, b horizontal component of velocity, c vertical component of velocity
264
A. V. Favorskaya and M. V. Muratov
Fig. 23.8 Seismograms, sea ice ridge model «big», source position «right», frequency 10 kHz, problem No. 6, opposite to the source side: a velocity modulus, b horizontal component of velocity, c vertical component of velocity
Fig. 23.9 Seismograms, sea ice ridge model «big», source position «right», frequency 5 kHz, problem No. 12, opposite to the source side: a velocity modulus, b horizontal component of velocity, c vertical component of velocity
23.3.3 Analysis and Discussion To compare synthetic seismograms, we used the norms L∞ and L1 for the normalized components and modulus of velocity in accordance with the following expressions: 1,i, j vX L∞ {vX } = max i∈[1,NT ], j∈[1,NR ] max v 1,k,l X
−
(23.1)
k∈[1,NT ],l∈[1,NR ]
2,i, j vX 2,k,l max vX k∈[1,NT ],l∈[1,NR ]
−
(23.2)
k∈[1,NT ],l∈[1,NR ]
2,i, j vY max vY2,k,l k∈[1,NT ],l∈[1,NR ]
1,i, j vY L∞ {vY } = max i∈[1,NT ], j∈[1,NR ] max v 1,k,l Y
L∞ {|v|}
⎛ ⎜ ⎜ = max ⎜ i∈ 1,NT , j∈ 1,NR ⎝
⎞2 ⎛ ⎞2 1,i, j 2,i, j 1,i, j 2,i, j ⎜ ⎟ ⎟ vY vX vX vY ⎜ ⎟ ⎟ − − ⎟ +⎜ ⎟ ⎝ ⎠ ⎠ maxv1,k,l maxv2,k,l maxv1,k,l maxv2,k,l k∈ 1,NT ,l∈ 1,NR k∈ 1,NT ,l∈ 1,NR k∈ 1,NT ,l∈ 1,NR k∈ 1,NT ,l∈ 1,NR
(23.3)
23 Ultrasonic Study of Sea Ice Ridges
265
1,i, j vX 1 L1 {vX } = · sum NT NR i∈[1,NT ], j∈[1,NR ] max vX1,k,l
−
(23.4)
k∈[1,NT ],l∈[1,NR ]
2,i, j vX 2,k,l max vX k∈[1,NT ],l∈[1,NR ]
−
(23.5)
k∈[1,NT ],l∈[1,NR ]
2,i, j vY 2,k,l max vY k∈[1,NT ],l∈[1,NR ]
1,i, j vY 1 L1 {vY } = · sum NT NR i∈[1,NT ], j∈[1,NR ] max vY1,k,l L1 {|v|} =
1 NT NR
⎛ ⎜ ⎜ · sum ⎜ i∈ 1,NT , j∈ 1,NR ⎝
⎞2 1,i, j
vX maxv1,k,l
k∈ 1,NT ,l∈ 1,NR
2,i, j
−
vX maxv2,k,l
k∈ 1,NT ,l∈ 1,NR
⎛
⎜ ⎟ ⎜ ⎟ ⎟ +⎜ ⎝ ⎠
⎞2 1,i, j
vY maxv1,k,l
k∈ 1,NT ,l∈ 1,NR
2,i, j
−
vY maxv2,k,l
k∈ 1,NT ,l∈ 1,NR
⎟ ⎟ ⎟ ⎠
(23.6) In Eqs. 23.1 – 23.6, NT is the number of time steps being equal to 301, NR is the number of receivers being equal to 201 (in each calculation there were 201 receivers s,i, j on the source side and 201 receivers on the opposite to the source side), vX is the s,i, j horizontal component of velocity, vY is the vertical one, vs,i, j denotes the modulus of velocity, index i runs from 1 to NT , index j runs from 1 to NR , and index s equals 1 or 2 and corresponds to two different compared seismograms. Figs. 23.10, 23.11 and 23.12 show the results of seismograms comparison. Note that in Figs. 23.10, 23.11 and 23.12 scales are normalized to the maximum. One can see that the results for different norms L∞ and L1 do not generally coincide. The greatest differences for all components and both norms are present for comparisons 7 (model “middle”, frequency of 5 kHz, source on the left) and 10 (model “big”, frequency of 5 kHz, source on the left) on the side of the source and 7 (model “middle”, frequency of 5 kHz, source on the left) and 12 (model “big”,
Fig. 23.10 Comparison of seismograms of the velocity modulus for 12 problem statements: a the source side, b the opposite to the source side
266
A. V. Favorskaya and M. V. Muratov
Fig. 23.11 Comparison of seismograms of the velocity horizontal component for 12 problem statements: a the source side, b the opposite to the source side
Fig. 23.12 Comparison of seismograms of the velocity vertical component for 12 problem statements: a the source side, b the opposite to the source side
frequency of 5 kHz, source on the right at a distance) on the opposite to the source side. Also, from Figs. 23.10, 23.11 and 23.12 it can be concluded that it is more efficient to use the frequency of 5 kHz than the frequency of 10 kHz. It is more efficient to compare the velocity modulus. The differences in the norm L1 on the opposite to the source side exceed the differences on the side of the source by more than 3 times, while the same differences in the norm L∞ do not exceed 50%, therefore, the use of the norm L1 is more appropriate.
23 Ultrasonic Study of Sea Ice Ridges
267
23.4 Conclusions The results obtained using performed numerical experiments showed the promising of development of the investigation of sea ice ridges internal structure using ultrasound diagnostics and seismograms recording on the nearby ice floes. The perspectives of this approach are the absence of the need to transport sea ice ridges to the laboratory and, at the same time, the possibility of a detailed study of the internal structure. Separately, these two indicators are achievable using another observation techniques (see [3, 5–7], respectively), however, the approach proposed in this work allows them to be achieved using one observation technique. It can be noted that, strictly speaking, the optimal frequencies used are in the sound range. The value of optimal frequencies is related to the speed of elastic waves in ice formations and the specific size of inhomogeneities inside them. The accurate clarification of the optimal frequency ranges requires further research. The performed studies also demonstrate the possibility of using computer modeling with help of the computational grid-characteristic method belonging to finite-difference methods for solving direct problems of ultrasonic non-destructive testing of sea ice ridges. Acknowledgements This work was carried out with the financial support of the Russian Science Foundation, project no. 21-71-10015, https://rscf.ru/en/project/21-71-10015/.
References 1. Dalane, O., Aksnes, V., Loset, S., Aarsnes, J.V.: A moored arctic floater in first-year sea ice ridges. In: Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering—OMAE, pp. 159–167. ASME, New York City (2009) 2. Ogorodov, S.A., Magaeva, A.A., Maznev, S.V., Yaitskaya, N.A., Vernyayev, S., Sigitov, A., Kadranov, Y.: Ice features of the Northern Caspian under sea level fluctuations and ice coverage variations. Geogr. Envir. Sustain. 13(3), 129–138 (2020) 3. Obert, K.M., Brown, T.G.: Ice ridge keel characteristics and distribution in the Northumberland Strait. Cold Reg. Sci. Technol. 66(2–3), 53–64 (2011) 4. Bobby, P., Gill, E.W.: Modeling scattering differences between sea ice ridges. In: OCEANS 2019-Marseille, pp. 1–4. IEEE, New York City (2019) 5. Guzenko, R.B., Mironov, Y.U., Kharitonov, V.V., May, R.I., Porubaev, V.S., Khotchenkov, S.V., Kornishin, K.A., Efimov, Y.O., Tarasov, P.A.: Morphometry and internal structure of ice ridges in the Kara and Laptev seas. Int. J. Offshore Polar Eng. 30(2), 194–201 (2020) 6. Bonath, V., Petrich, C., Sand, B., Fransson, L., Cwirzen, A.: Morphology, internal structure and formation of ice ridges in the sea around Svalbard. Cold Reg. Sci. Technol. 155, 263–279 (2018) 7. Kharitonov, V.V., Borodkin, V.A.: On the results of studying ice ridges in the Shokal’skogo Strait, part I: Morphology and physical parameters in-situ. Cold Reg. Sci. Technol. 174, (2020). Article No. 103041 8. Diachok, O.I.: Effects of sea-ice ridges on sound propagation in the Arctic Ocean. J. Acoust. Soc. Am. 59(5), 1110–1120 (1976)
268
A. V. Favorskaya and M. V. Muratov
9. Chamuel, J.R.: Seismoacoustic ultrasonic modeling characterization of sea ice processes. J. Acoust. Soc. Am. 97(5), 3335–3336 (1995) 10. Farmer, D.M., Xie, Y.: Recent approaches to the acoustic-seismic study of ice mechanics. J. Acoust. Soc. Am. 94(3), 1759–1760 (1993) 11. Favorskaya, A., Petrov, I.: A novel method for investigation of acoustic and elastic wave phenomena using numerical experiments. Theor. Appl. Mech. Lett. 10(5), 307–314 (2020) 12. Khokhlov, N., Favorskaya, A., Stetsyuk, V., Mitskovets, I.: Grid-characteristic method using Chimera meshes for simulation of elastic waves scattering on geological fractured zones. J. Comput. Phys. 446, (2021). Article No. 110637 13. Favorskaya, A., Khokhlov, N.: Accounting for curved boundaries in rocks by using curvilinear and Chimera grids. Procedia Computer Science 192, 3787–3794 (2021) 14. Stognii, P.V., Khokhlov, N.I., Petrov, I.B.: Schoenberg’s model-based simulation of wave propagation in fractured geological media. Mech. Solids 55(8), 1363–1371 (2020) 15. Favorskaya, A.V., Khokhlov, N.I.: Types of elastic and acoustic wave phenomena scattered on gas- And fluid-filled fractures. Procedia Computer Science 176, 2556–2565 (2020) 16. Golubev, V.I., Shevchenko, A.V., Petrov, I.B.: Application of the Dorovsky model for taking into account the fluid saturation of geological media. J. Phys. Conf. Ser. 1715(1), (2021). Article No. 012056 17. Favorskaya, A., Golubev, V.: Study of anisotropy of seismic response from fractured media. Smart Innovation Syst. Technol. 238, 231–240 (2021) 18. Stognii, P.V., Khokhlov, N.I., Petrov, I.B.: The numerical solution of the problem of the contact interaction in models with gas pockets. J. Phys. Conf. Ser. 1715(1), (2021). Article No. 012058 19. Stognii, P., Petrov, I., Favorskaya, A.: The influence of the ice field on the seismic exploration in the Arctic region. Procedia Comput. Sci. 159, 870–877 (2019) 20. Beklemysheva, K., Golubev, V., Petrov, I., Vasyukov, A.: Determining effects of impact loading on residual strength of fiber-metal laminates with grid-characteristic numerical method. Chin. J. Aeronaut. 34(7), 1–12 (2021) 21. Golubev, V.I., Shevchenko, A.V., Khokhlov, N.I., Nikitin, I.S.: Numerical investigation of compact grid-characteristic schemes for acoustic problems. J. Phys. Conf. Ser. 1902(1), (2021). Article No. 012110 22. Favorskaya, A.V., Petrov, I.B.: Combination of grid-characteristic method on regular computational meshes with discontinuous Galerkin method for simulation of elastic wave propagation. Lobachevskii J. Math. 42(7), 1652–1660 (2021) 23. Fofanov, V., Khokhlov, N.: Optimization of load balancing algorithms in parallel modeling of objects using a large number of grids. Commun. Comput. Inf. Sci. 1331, 63–73 (2020) 24. Favorskaya, A., Petrov, I.: Calculation of the destruction of ice structures by the gridcharacteristic method on structured grids. Procedia Comput. Sci. 192, 3768–3776 (2021)
Chapter 24
Technique of Central Nervous System’s Cells Visualization Based on Microscopic Images Processing Alexey Medievsky , Aleksandr Zotin , Konstantin Simonov , and Alexey Kruglyakov Abstract The study of the principles of formation and development of the structure of the brain is necessary to replenish fundamental knowledge both in the field of neurophysiology and in medicine. A detailed description of all the features of the brain will allows to choose the most effective method of therapy or to check the effectiveness of the drugs being developed. The mathematical model of a biological neural network is based on microscopic data describing the relative positions of the cells of interest, their synaptic connections, directions, and branches of axons with dendrites, as well as the sizes of the parts themselves. Studying microscopic images is a difficult task. A technique for processing images based on the shearlet transform algorithm with contrasting using color coding is proposed. It is aimed to improve the process of creating a model of biological neural network. Experimental studies on model images have shown a significant improvement in the detailing of objects under study. The obtained results will make it possible to implement a complex in which to assess the functional characteristics of each cell, it is assumed to use a modified multi-electrode array that allows positioning to the desired coordinates, in accordance with the data from the analyzed microscopic images.
A. Medievsky Krasnoyarsk State Medical University named after Prof. V.F. Voino-Yasenetsky, 1 Partisan Zheleznyak st., Krasnoyarsk 660022, Russian Federation e-mail: [email protected] A. Zotin (B) Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy pr., Krasnoyarsk 660037, Russian Federation e-mail: [email protected] K. Simonov · A. Kruglyakov Institute of Computational Modelling of the SB RAS, 50/44 Akademgorodok, Krasnoyarsk 660036, Russian Federation e-mail: [email protected] A. Kruglyakov e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_24
269
270
A. Medievsky et al.
24.1 Introduction To understand the principles of work and organization of higher nervous activity, it is important to be able to study in detail the histoarchitectonics of the white and gray matter of the brain [1]. As a result of the analysis of all incoming information (visual, auditory, etc.), we create an idea about the world around us and about ourselves, which is expressed in the features of our consciousness [2]. The results of the work of higher nervous activity are due to the connectomes formed during ontogenesis, as well as the ability of the biological neural network (BNN) to transform the signal at the moment of impulse transmission from the presynaptic to the postsynaptic membrane [3]. A large number of scientific experiments with laboratory animals, as a rule, are finished by a microscopic examination of histological preparations of the brain. The obtained information may indicate certain features of behavior. However, in order to explain every decision made by an animal during an experiment, a complete mathematical model of the connectome is needed. The bases of a mathematical model creation are graphs built from the bodies of neurons with their processes, diffusely spreading in the structures of all parts of the central nervous system (CNS). Analysis of the relative position of cells is carried out using data obtained by processing microscopic images. Taking into account the need for a functional study of the BNN, there are a number of requirements for the preparation of the histological sample, which are reduced to a general principle of maximum preservation of the viability of the selected organ. Therefore, when working with histological material, special attention is paid to visualization methods. To study the development and functioning of the BNN, the best option is to use native microscopy. Neurons in this case are not exposed to the external environment and their physiology remains the same. However, the main disadvantage of this method is the weak contrast of the obtained images. The main objectives of the study are as follows: development of a computational technique for the selection of objects in native microscopic images; visualization of cells of interest with enhanced contrast by color-coding, as well as the creation of principles for the formation of a model of a biological neural network. The images obtained as a result of calculations are necessary: to study the development of a neural network; identifying and counting the number of contacts between cells; monitoring of pathological conditions and research of methods of treatment; qualitative analysis of signal transmission from one cell to another in combination with the operation of electrophysiological equipment. As a result, prerequisites are created for the development of a mathematical model of the culture of nerve cells. Based on the results of image processing information about nerve cells, their coordinates and relationships (using processes data) will be obtained. In this case, it is possible to supplement the neural network model with data obtained from a multi-electrode array or a patch-clamp. The paper is organized as follows. Section 24.2 describes different approaches to the analysis of biological neural network cells. Section 24.3 presents description of
24 Technique of Central Nervous System’s Cells …
271
proposed technique. Section 24.4 provides information about experimental research. Concluding remarks are outlined in Sect. 24.5.
24.2 Approaches of Biological Neural Network Cells Analysis The morphological description of neural ensembles formed in the course of life activity is carried out using a variety of labor-intensive techniques [4]. Most of the approaches are applicable only for dead cells or the technology conditions are too unfavorable, which leads, after a certain period of time, to the death of neurons. An important characteristic of the brain is due to the fluctuation of action potentials in the BNN. Based on this, it is desirable to be able to simultaneously register pulses and combine the obtained data with the structure on microscopic images. The object of study also makes its adjustments to the design of the experiment. Morphofunctional study of BNN is possible on a culture of nerve cells, as well as on a section of the brain. Each technology has its advantages and disadvantages. The acute slice preparation method of the brain makes it possible to preserve the histoarchitectonics of the extracted area and, at the same time, maintain neuronal activity for several days [5, 6]. This method is of great importance in neuroscience and allows to study of brain cells such as microglia, astrocytes, neurons, and their interactions. The acute slice technique is a very time-consuming sample preparation for research that begins with the removal of the brain. This is followed by cutting off the section of interest from the entire brain and gluing a tissue to the mounting cylinder. Slicer parameters for slicing on each machine must be determined empirically. At this stage, the sections can be stored for 1–5 h before being transferred to the camera for recording. The presence of HEPES (4-(2-hydroxyethyl)-1piperazineethanesulfonic acid) and the ascorbate/thiourea combination reduces cut edema and delays deterioration [7, 8]. Another popular object for studying the nervous system is cultured cells in a Petri dish, which are a two-dimensional brain organoid with simplified histoarchitectonics [9]. All types of cells involved in the formation of the brain are used and transferred to a Petri dish. The resulting cell culture after a certain period of time begins to simulate the work of the brain. Another method used for neurons brain study is calcium imaging. The essence of this method consists in fluorescence microscopy of only those neurons that are excited. After activation, the neuron restores its membrane potential and simultaneously reduces the calcium concentration in the cytoplasm which leads to the cessation of fluorescence. This effect is possible due to the fluorescent protein, which begins to emit photons at the moment calcium enters the cell. Dyes with the addition of this protein can be introduced both into cell culture and into an animal. Calcium imaging
272
A. Medievsky et al.
makes it possible to study the brain at the cellular level, detecting and tracking nerve impulses without the need to dissect CNS tissues [10, 11]. Calcium imaging allows recording and tracing the propagation of a nerve impulse through a neural network, which demonstrates possible synaptic connections and also the outlines of neurons. However, the obtained data characterize only that part of the neural network, in which the cells changed the membrane potential, and the stimulus was sufficient to trigger the action potential. Therefore, unexcited cells will not be noticed. Among the shortcomings of the method, one should note the impossibility of obtaining data on the qualitative characteristics of the synapse, through which the stimulus was not transmitted. In recent years a large amount of work has been published on processing images and highlighting the paths of neurites or neural processes with the bodies themselves [12]. Many attempts have been made to reduce the amount of manual labor, but most of the algorithms have been adapted only for stained preparations [13–15]. This is due to the fact that images of native preparations have a poor contrast-to-noise ratio. Also, various chemical staining methods make it easier to visualize the boundaries of the cell walls. A small part of all algorithms has been developed to trace undyed neurons [16]. They are able to visualize the cells of interest, but their capabilities are limited to working with two-dimensional preparations. To work with cells located in three-dimensional space, it is necessary to use microscopy methods that can visualize biological tissue in layers. Deep layers with fluorescence microscopy are visualized worse than with differential interference contrast (DIC). Among the disadvantages of fluorescence microscopy, in addition, to the lower imaging depth, one can also emphasize the time-consuming preparation of the preparation [17] and high phototoxicity [18]. Therefore, for the structural and functional study of volumetric sections it is better to use the DIC method. Due to the combination of infrared illumination and differential interference contrast, unstained living neurons can be visualized at a depth of 50–100 µm in 300 µm brain sections [19].
24.3 Proposed Technique According to studies [20], better cultures are obtained from individuals (mice) in the embryonic stage of development, aged 12–14 days. In this case, the preparation of a neuron culture consists of three stages. The first stage ends with the extraction of the brain, and the manipulation algorithm is similar to the preparatory stage for acute slice, which was described above. The second stage is obtaining dissociated cells. The essence of the third stage is to transfer brain cells to a Petri dish with a nutrient. For tracing neurons, in addition to the DIC microscope, the best option is a phasecontrast microscope, which works on a similar principle. It has a simpler structure, which leads to lower costs and, as a result, greater popularity. Within the framework of the proposed technique, three main stages of processing and analysis of visual data can be distinguished [21]:
24 Technique of Central Nervous System’s Cells …
273
• Pre-processing stage that includes noise reduction, as well as brightness and contrast correction. • Main stage, during which the formation of contour representation and colorcoding take place. • Final stage with the extraction of features (markers) and interpretation of the results. Pre-processing is necessary to eliminate the possible noise component, as well as to increase the contrast and correct the brightness characteristics of images. The correction of brightness parameters is especially important for bright-field microscopic images, since native objects slightly change the amplitude of transmitted light rays, due to which the cells under study can merge with a bright field and cause difficulties in determining their morphology. Noise suppression is implemented based on the modification of the weighted median filter. It was decided to set the distribution of weight coefficients taking into account the distance from the processed pixel and the local data of the neighborhood. In order to improve the brightness characteristics, variation of the method based on the Retinex technology [22] is used, within which the balance contrast enhancement technique (BCET) is used for the generated response. The example of microscopic images enhancement is shown in Fig. 24.1. After noise suppression and enhancement of brightness characteristics of microscopic image a contour representation is formed using the shearlet transform and color-coding [23]. Regions are mapped in color according to color-coded data and form segments for further analysis. Then the texture characteristics of the objects of interest are calculated, which are necessary to obtain estimates for each section of the neuron. Figure 24.2 shows an example of microscopic image processing by the modified shearlet transform algorithm. An example of color-coding results is depicted in Fig. 24.3. On the image after color-coding of the phase-contrast image (Fig. 24.3a), it is possible to see all cells and study their morphostructure without difficulty. Great
Fig. 24.1 Example of microscopic image processing: a original, b enhancement by Retinex with BCET stretching
274
A. Medievsky et al.
Fig. 24.2 Example of microscopic image after processing by modified shearlet transform
Fig. 24.3 Example of color-coding results: a phase contrast image, b bright-field image
contrast and color diversity help to differentiate nerve cells from cells of another histogenetic series. With this method of color-coding the bodies of neurons are painted in turquoise the contrast of processes improves which makes it possible to simplify the process of determining the relationships between cells in the future. The variant of image processing from a light microscope (Fig. 24.3b) allows to set the staining of the glial cell in dark colors (since they are of no interest), and the neurons and their processes in purple. At the same time, appendages appear that are problematically detected on the image without processing. In comparison with other image processing algorithms developed in the same direction [24, 25], our approach has the following advantages. This made it possible
24 Technique of Central Nervous System’s Cells …
275
to work with unstained cell cultures. As a result, a bright-field microscope is suitable for research. In case of difficulties of cells population identification due to the characteristics of the microscope or other factors that lead to a distorted operation of the color-coding algorithm, a quick and easy correction of the settings is possible, which will provide a different coloration of the image. The biological neural network modeling process is carried out based on the structure of the cell culture obtained from the processed images. However, a full-fledged model also requires a functional characteristic of neurons, which includes the ability of nerve cells to transmit excitation impulse to the next part of the neural ensemble [26]. To assess the functional characteristics of each cell, it is proposed to use a modified version of the multi-electrode arrays, which has movable microelectrodes and is capable of homing to the desired coordinates. The proposed approach allows us to explore the physiology of the neural network in a new way. To create a model can be used data obtained from microscopic images (Fig. 24.4a, b) and a multi-electrode array (Fig. 24.4c). This takes into account two important bases on which the operation of the neural network is based and creates a more realistic digital model. As part of the methodology, after processing microscopic images each neuron will be determined by the coordinates that characterize their location within the cell culture. This allows us to set the position of electrodes on neurons of interest (neuronal areas) and display bursts of electrical activity on a microscopic image (Fig. 24.5) in real time directly. Figure 24.5 shows the maximum change in the membrane potential in red, the process of its restoration is in yellow, and the restored membrane potential remains without color changes. The visual representation on the improved image makes it easier to follow the activity of the culture and reduces the chance that the researcher will not notice the spike, as it is displayed in bright colors.
Fig. 24.4 Sources data for the formation of a neural network model: a original bright-field image, b color-coded representation, c schematic representation of MEA contact
276
A. Medievsky et al.
Fig. 24.5 Example of a map visualizing the electrical activity of neurons
24.4 Results of Experimental Studies The essence of the experimental study was the experienced processing of images of the proposed method. The study assessed the accuracy of determining the contours of the cells of interest. It is used in the study of microscopic images of a culture of nerve cells and a slice of the brain, which were obtained from a light-field, phase-contrast and infrared differential interference-contrast microscope. The study was performed on a set of images, including 12 samples. As part of the research stage aimed at determining objects of interest, a series of images were used, for which the expert formed a coordinate description of objects of interest (neuronal cells of the brain) with contouring. This stage of the study showed that the proposed method of image pre-processing allows increasing the accuracy of determining the coordinates of the cell center by 2.7–3.8% and the detection of nerve cell boundaries by 2.1–4.7%. In this case, the average accuracy of determining contours in model images was 0.980 ± 0.012 according to the Figure Of Merit metric and 0.971 ± 0.013 Dice similarity coefficient. During the study on the automated determination of bodies (neurons, axons and dendrites), numerical indicators were obtained that showed comparable results obtained with the help of an expert. Moreover, the time for describing the image was significantly reduced and acceptable accuracy was obtained. The accuracy of the description with the use of the proposed technique was about 98.1–98.7% depending on the class of the body. However, the proposed technique also has false positives, which amounted to 1.3–2.8%. The verification of the technique for analyzing and interpreting images based on the modified shearlet transform algorithm with color-coding showed that it makes it possible to isolate complex texture sections with the formation of quantitative features (markers). The obtained features can later be used to determine dynamic
24 Technique of Central Nervous System’s Cells …
277
changes in cell morphology under the influence of various factors, trace neuronal ensembles of large sizes, and form the connectome of a single organism. As a result, when analyzing the data of model images (phase-contrast and lightfield) good results were obtained. We also note that when compared with the results of applying the technique with fluorescent images a similar accuracy is obtained.
24.5 Conclusions To study the nerve cells of the mouse brain, a new visualization technique was developed and tested, which makes it possible to highlight the boundaries of bodies, axons, and dendrites in native microscopic images. The proposed approach creates conditions for maintaining the viability of objects of study for a longer time. As part of experimental studies and testing of the proposed technique, a series of images (images of mouse brain cells) were prepared. At the first stage of the experiment the expert made manual marking for the studied set of images. Further, at the second stage based on the proposed method, automated processing, and analysis of nerve cells were performed. Preliminary calculations on model images made it possible to obtain acceptable results in comparison with the method of analysis of fluorescent images.
References 1. Jiang, X., Johnson, R.R., Burkhalter, A.: Visualization of dendritic morphology of cortical projection neurons by retrograde axonal tracing. J. Neurosci Methods 50(1), 45–60 (1993) 2. Bogen, J.E.: Some neurophysiologic aspects of consciousness. Semin. Neurol. 17(2), 95–103 (1997) 3. Poo, M.M., Pignatelli, M., Ryan, T.J., Tonegawa, S., Bonhoeffer, T., Martin, K.C., Rudenko, A., Tsai, L.H., Tsien, R.W., Fishell, G., Mullins, C., Gonçalves, J.T., Shtrahman, M., Johnston, S.T., Gage, F.H., Dan, Y., Long, J., Buzsáki, G., Stevens, C.: What is memory? The present state of the engram. BMC Biol. 14, 40 (2016) 4. Inal, M.A., Banzai, K., Kamiyama, D.: Retrograde tracing of drosophila embryonic motor neurons using lipophilic fluorescent dyes. J. Visualized Exp. JoVE 155 (2020) 5. Oka, H., Shimono, K., Ogawa, R., Sugihara, H., Taketani, M.: A new planar multielectrode array for extracellular recording: application to hippocampal acute slice. J. Neurosci. Methods 93(1), 61–67 (1999) 6. Qi, G., Mi, Y., Yin, F.: Characterizing brain metabolic function ex vivo with acute mouse slice punches. STAR Protoc. 2(2), 100559.1–23 (2021) 7. Eguchi, K., Velicky, P., Hollergschwandtner, E., Itakura, M., Fukazawa, Y., Danzl, J.G., Shigemoto, R.: Advantages of acute brain slices prepared at physiological temperature in the characterization of synaptic functions. Front. Cell. Neurosci. 14(63), 1–17 (2020) 8. Ting, J.T., Daigle T.L., Chen Q., Feng G.: Acute brain slice methods for adult and aging animals: application of targeted patch clamp analysis and optogenetics. Methods Mol. Biol. (Clifton, N.J.) 1183, 221–242 (2014) 9. Kaech, S., Banker, G.: Culturing hippocampal neurons. Nat. Protoc. 1(5), 2406–2415 (2006)
278
A. Medievsky et al.
10. Stosiek, C., Garaschuk, O., Holthoff, K., Konnerth, A.: In vivo two-photon calcium imaging of neuronal networks. Proc. Natl. Acad. Sci. U.S.A. 100(12), 7319–7324 (2003) 11. Grienberger, C., Konnerth, A.: Imaging calcium in neurons. Neuron 73(5), 862–885 (2012) 12. Weaver, C.M., Pinezich, J.D., Lindquist, W.B., Vazquez, M.E.: An algorithm for neurite outgrowth reconstruction. J. Neurosci. Methods 124(2), 197–205 (2003) 13. Ikeno, H., Kumaraswamy, A., Kai, K., Wachtler, T., Ai, H.: A segmentation scheme for complex neuronal arbors and application to vibration sensitive neurons in the honeybee brain. Front. Neuroinform. 12(61), 1–11 (2018) 14. Bao, Y., Soltanian-Zadeh, S., Farsiu, S., Gong, Y.: Segmentation of neurons from fluorescence calcium recordings beyond real-time. Nature Mach Intell. 3(7), 590–600 (2021) 15. Lee, P.C., Chuang, C.C., Chiang, A.S., Ching, Y.T.: High-throughput computer method for 3D neuronal structure reconstruction from the image stack of the Drosophila brain and its applications. PLoS Comput. Biol. 8(9), e1002658.1–12 (2012) 16. Pang, J., Özkucur, N., Ren, M., Kaplan, D.L., Levin, M., Miller, E.L.: Automatic neuron segmentation and neural network analysis method for phase contrast microscopy images. Biomed. Opt. Express 6(11), 4395–4416 (2015) 17. Legant, W.R., Shao, L., Grimm, J.B., Brown, T.A., Milkie, D.E., Avants, B.B., Lavis, L.D., Betzig, E.: High-density three-dimensional localization microscopy across large volumes. Nat. Methods 13(4), 359–365 (2016) 18. Wäldchen, S., Lehmann, J., Klein, T., van de Linde, S., Sauer, M.: Light-induced cell damage in live-cell super-resolution microscopy. Sci. Rep. 5(15348), 1–12 (2015) 19. Dodt, H.U., Zieglgänsberger, W.: Visualizing unstained neurons in living brain slices by infrared DIC-videomicroscopy. Brain Res. 537(1–2), 333–336 (1990) 20. Ransom, B.R., Neale, E., Henkart, M., Bullock, P.N., Nelson, P.G.: Mouse spinal cord in cell culture. I. Morphology and intrinsic neuronal electrophysiologic properties. J. Neurophysiology 40(5), 1132–1150 (1977) 21. Zotin, A., Hamad, Y., Simonov, K., Kurako, M.: Lung boundary detection for chest X-ray images classification based on GLCM and probabilistic neural networks. In: 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems. Procedia Comput. Sci. 159, 1439–1448 (2019) 22. Zotin, A.G.: Fast algorithm of image enhancement based on multi-scale Retinex. Int. J. Reasoning-Based Intell. Syst. 12(2), 106–116 (2020) 23. Zotin, A., Simonov, K., Kapsargin, F., Cherepanova, T., Kruglyakov, A.: Tissue germination evaluation on implants based on shearlet transform and color coding. In: Favorskaya, M., Jain, L. (eds.) Computer Vision in Advanced Control Systems-5. Intelligent Systems Reference Library, vol. 175, pp. 265–294 (2020) 24. Ross, J.D., Cullen, D.K., Harris, J.P., LaPlaca, M.C., DeWeerth, S.P.: A three-dimensional image processing program for accurate, rapid, and semi-automated segmentation of neuronal somata with dense neurite outgrowth. Front. Neuroanat. 9(87), 1–15 (2015) 25. Ho, S.Y., Chao, C.Y., Huang, H.L., Chiu, T.W., Charoenkwan, P., Hwang, E.: NeurphologyJ: an automatic neuronal morphology quantification method and its application in pharmacological discovery. BMC Bioinf. 12(230), 1–18 (2011) 26. Anokhin, K.V., Burtsev, M.S., Ilyin, V.A., Kiselev, I.I., Kukin, K.A., Lakhman, K.V., Paraskevov, A.V., Rybka, R.B., Sboev, A.G., Tverdokhlebov, N.V.: A review of computational models of neuronal cultures in vitro. Math. Biol. Bioinf. 7(2), 372–397 (2012)
Chapter 25
Multi-kernel Analysis Method for Intelligent Data Processing with Application to Prediction Making Miltiadis Alamaniotis
Abstract One of the major issues in engineering is the development of systems that make accurate predictions. The advances in machine learning and data science have given rise to intelligent data processing that is used for developing smart engineering systems. In this paper, a new method is developed that makes use of multiple learning kernels to analyze a dataset to a set of patterns, and then select a subset of them to put them together and make predictions. The proposed framework utilizes a set of kernel modeled Gaussian processes where each one is equipped with a different kernel function. The proposed method is applied for prediction making on a set of electric load patterns and provides high accuracy as compared to single Gaussian process models.
25.1 Introduction The recent advances in data science and machine learning have found wide use in several application domains. Engineering is one among the areas that have significantly benefited from the new data technologies [1]. From a broader point of view, all the engineering problems are consolidated to the simple tasks of prediction, and classification. Notably, engineering problem are associated with inherent uncertainty, given that there is the need to determine values of unknown parameter, especially of future events [2]. Prediction is inherently a challenging problem that has been studied for long time. In engineering, one of the main research challenges is the development and implementation of prediction methods that are highly accurate for long ahead-of-time horizons. It is known though, that the longer the horizon, the higher the uncertainty is and thus the accuracy significantly drops [3]. To that end, several approaches have been proposed for prediction making utilizing physical models [4], data models [5] M. Alamaniotis (B) University of Texas at San Antonio, San Antontio, TX 78249, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_25
279
280
M. Alamaniotis
or combination of them; the prediction approach is strongly based on the type of application [6, 7]. It should be noted that data processing and analytics have been fully identified as an area of high interest and potential in developing data-driven prediction methods. Such methods are based on the identification of patterns in historical data with the hope that these patterns will be repeated in the future [8, 9]. The term implies the automated processing of data, the identification of data patterns and the feeding of those patterns to a learning model that provides new predictions based on the learned patterns. Intelligent data processing—also known as intelligent data analysis—has been utilized in developing systems that find application in several domains. In [10], a system has been developed for real time supervision in the industry 4.0 domain, while in [11] fuzzy clustering has been used for developing an intelligent processing system for maritime target identification. A neuro-fuzzy model for detection of harmful information has been proposed in [12], and an intelligent data processing tool is introduced in load data disaggregation in [13]. Other processing systems were developed based on fuzzy measures for estimating partial information [14], fog computing [15], edge computing for IoT [16], hardware-based algorithms to counter cyberattacks in electric vehicles [17], and fuzzy logic decision making [18]. Intelligent data processing techniques have also found use in big data analytics and more specifically for load forecasting [19], mobile heart monitoring systems [20], and cross media data using deep learning [21]. Delving into the aforementioned works, there is a wide array of approaches in developing intelligent data processing systems, but the use of AI as the core of those systems is still limited [22]. Therefore, there is a need for advancing intelligent data processing by developing AI based methods that may be independent of hardware and device parameters. Notably, artificial intelligence (AI) offers a set of tools that may be utilized for identifying patterns in the data [23]; the merge of AI with data processing may provide a more accurate definition of the term intelligent data processing. In this work, the focus will be on the use of AI—and more specifically on kernel machines—for developing an intelligent method that processes past datasets and makes projections (predictions) pertaining future values. To that end, a set of multiple kernel-modeled Gaussian processes (GP) [23] are used to expand the historic datasets into a set of patterns, and then a subset of those patterns is selected and utilized for prediction making. This paper is organized as follows: in the next section the proposed method is described and its individual steps are discussed. In Sect. 25.3, the proposed method is applied on a set of electric load data, while Sect. 25.4 concludes the paper.
25.2 Multi-kernel Processing Method Kernel machines is a class of tools in artificial intelligence that are a function of a kernel function (or simply called kernel) and are used in regression and classification
25 Multi-kernel Analysis Method for Intelligent Data Processing …
281
problems [23]. As a kernel is identified any valid function that may be recast in the dual form as given below: k(x1 , x2 ) = f (x1 )T f (x2 )
(25.1)
where T denotes the transpose of the valid function f (), and x 1 , x 2 are the input vectors. There are several types of kernels and their form determines the relation between the data. Therefore, the selection of the kernel is significant for the output of the kernel machine as it determines the relation between the input and output. Thus, its form depends on the specifics of the application as well as the system modeler [24]. In this work, Gaussian processes [23] in the form of kernel machines are utilized as the cornerstone upon which the multi-kernel expansion is built. In the context of kernel machines, a Gaussian process is given by: G P ∼ N m(x), K x , x
(25.2)
where K(x’,x) is the covariance function given by a kernel function k(). Thus, a GP is modeled as a kernel machine by setting its covariance function equal to a kernel. More specifically a set of four kernels is utilized for analyzing the dataset into a set of four datasets [25]. The multi-kernel expansion method is depicted in Fig. 25.1, where the individual steps are presented. In the first step the dataset under consideration is sampled at a rate N, which is determined by the modeler. Next, the sampled data will be forwarded to the five Gaussian processes models with each model equipped with a different kernel function as given below: Linear Kernel k(x1 , x2 ) = θ x1T x2
(25.3)
with θ being a scale parameter. Gaussian kernel k(x1 , x2 ) = exp −x1 − x2 2 /2σ 2
(25.4)
where σ2 is a parameter that expresses the data variance. Matern kernel k(x1 , x2 ) θ1 = 21−θ1 / (θ1 ) 2θ1 |x1 − x2 |/θ2 Kθ1 2θ1 |x1 − x2 |/θ2
(25.5)
282
M. Alamaniotis
Fig. 25.1 Block diagram of the multi-kernel expansion method
where θ 1 and θ 2 are two scale parameters, with θ 1 in this work taken to be equal to 3/2 [23]. Neural Net kernel ⎛ 2 x˜1T
⎞
x˜2 ⎠ k(x1 , x2 ) = θ0 sin−1 ⎝ T 1 + x˜1 x˜1 1 + x˜1T x˜2
(25.6)
where x =(1, x1 , . . . , x D )T , is the data covariance matrix and θ 0 is a scale parameter [23]. In this work, the sampled data are set as the training data that the learning models require during the training phase. Thus, the GP models use the sampled data in order
25 Multi-kernel Analysis Method for Intelligent Data Processing …
283
Fig. 25.2 Process of creating data expansions from the initial data
to evaluate their parameters. In the next step, the trained GP models are utilized for data interpolation by estimating the values (i.e., interpolation) of the initial datapoints that are not part of the sampled data. Then, the estimated values are superimposed to the sampled data and a new dataset is created driven by the form of the kernel (that is marked as the respective “kernel” expansion in Fig. 25.1). It should be noted that each GP model provides a single dataset, and hence we get an overall population of five kernel expansions. The process of getting the expansions is given in Fig. 25.2. In the next step, the kernel expansions are put together and all of their combinations are taken in the form of data averages. For instance, for “expansion pairs” the combination takes the form: Pair Combination =
E x pansion1 + E x pansion2 2
(25.7)
and for triads respectively: T riad Combination =
E x pansion1 + E x pansion2 + E x pansion3 3
(25.8)
and similar for the rest of the combinations including the single expansion cases. Overall, the combinations that are examined are given in Table 25.1. The goal of the combinations is to identify the expansion that best resembles the original data. The idea behind the kernel expansion is that every kernel models a different set of data properties—e.g., smoothness, stationarity, etc.—and thus by putting together various assemblies of kernel it is likely that the underlying data properties will be identified and will be used for prediction making.
284
M. Alamaniotis
Table 25.1 List of combinations of the kernel expansions 1-expansion
2-expansion
3-expansion
4 expansions
Linear
Linear, Gaussian
Linear, Gaussian, Matern
Linear, Gaussian, Matern, NNet
Gaussian
Linear, Matern
Linear, Matern, Nnet
Matern
Linear NNet
Linear Gaussian, NNet
NNet
Gaussian, Matern
Gaussian, Matern, NNet
Gaussian, NNet Matern, NNet Overall: 15 combinations
Once the kernel expansions are formed then a metric is utilized to indicate the combination that is more closely to the original data. To that end, the Theil-II coefficient is utilized as the metric of resemblance [24]:
N n=1 (d1n
− d2n )2 N 2 2 n=1 (d1n ) + n=1 (d2n )
THEIL-II = N
(25.9)
where d1 and d2 are the initial and combined datasets respectively, while N denoted the length of the dataset. The above coefficient denotes high resemblance when the value is closer to zero and dissimilarity when its value approaches one. Lastly, the combination that provides the lowest THEIL-II value is selected as the one that best expands the initial dataset and is forwarded as the method’s final dataset expansion. The respective GP models are utilized for prediction making by averaging their predictions.
25.3 Application on Electricity Load Data In this section the presented multi-kernel expansion method is applied on a set of electricity load data taken from the New England ISO [26]. The data is in the form of time series that express the load demand every five minutes for the period January 19–January 25, 2022. The goal is to predict the 5-min demand for a horizon of 30 min ahead of time by using the 36 more recent measured demand values (in other words we use the observed 3-h data to predict the 5-min demand in the next half an hour). Demand prediction for such short intervals is challenging due to dynamically varying factors in the power system operation (such as weather, faults, consumer behavior). For visualization purposes, an example of a daily load pattern for the day of January 19, 2022 is given in Fig. 25.3.
25 Multi-kernel Analysis Method for Intelligent Data Processing …
285
Fig. 25.3 Electricity demand load pattern for day January 19, 2022 taken from [16]
It should be noted that the sampling rate is—i.e., the value of N—is set equal to 1/6 that means we keep one sample every six datapoints (the choice is based on author’s prior experience. Given that our initial dataset is comprised of 36 datapoints, then the sampled dataset is comprised of 6 datapoints. These 6 datapoints are utilized by the GP models to provide values for the rest of the 30 datapoints in the initial dataset. The 30 provided values (estimations) together with the 6 sampled values are superimposed and subsequently the new dataset is formed. The results obtained for the testing period are given in Table 25.2. The method is tested for prediction making of load demand for the designated period of 7 days. The results are given with respect to mean average percentage error (MAPE) that expresses the difference between the predicted values and the real values (see [27] for details on load prediction). Furthermore, for visualization purposes, Fig. 25.4 depicts the actual demand against the predicted values for the day of January 25, 2022. In addition, the proposed method is compared to the individual GP models. Table 25.2 Prediction results obtained with respect to MAPE Day
GP linear
Jan 19
10.34
GP gaussian 7.89
GP matern 7.23
GP NNet 8.01
Expansion method 7.01 (Gaussian + Matern + NNe)
Jan 20
11.56
8.01
7.98
7.65
7.24 (Matern + NNet)
Jan 21
10.32
7.04
7.02
8.10
6.98 (Gaussian + Matern)
Jan 22
13.53
7.53
6.78
8.09
6.43 (Gaussian + Matern)
Jan 23
17.22
10.04
9.02
12.53
8.34 (Gaussian + Matern)
Jan 24
11.02
7.65
7.67
8.43
7.11 (Gaussian + Matern)
Jan 25
9.01
6.43
6.88
7.99
6.21 (Gaussian + Matern)
286
M. Alamaniotis
Fig. 25.4 Predicted versus actual electricity demand load for day January 25, 2022
It is apparent in Table 25.2 that the proposed expansion method provides the most accurate predictions as compared to the rest tested methods for all days. Moreover, Table 25.2 provides the best combination that the expansion method identified as the best one for making predictions. Therefore, it is observed that the proposed intelligent method attains highest accuracy in prediction in comparison to the single GP models. This is explained by the mechanism of the method: the assessment of all combinations by Theil-II allows the method to identify the most accurate kernels. Notably, it is also observed that the presented method is able to determine as the best expansion those kernels that provide MAPE values that are very close. This observation is also supported by the fact that the linear kernel which provides high MAPE whose values are far away from those of the other kernels is not contained in any of the tested days.
25.4 Conclusion In this paper a new method is proposed for implementing intelligent data processing taking the form of a multi-kernel expansion. In particular, a set of 4 kernel modeled Gaussian Processes are utilized to identify an expansion of kernels that best represent the data and then use this expansion to make predictions. The presented method is applied to a set of 5-min electricity demand load data spanning a period of 7 days. Results demonstrate the superiority of the method as compared to single GP models.
25 Multi-kernel Analysis Method for Intelligent Data Processing …
287
Future work will include the use of a higher number of kernels, and the extensive testing of method in a higher number of datasets (beyond the load demand).
References 1. Dunn, P.F., Davis, M.: Measurement and data analysis for engineering and science. CRC Press, Boca Raton, FL (2017) 2. Xinqing, L., Tsoukalas, L.H., Uhrig, R.E.: A neurofuzzy approach for the anticipatory control of complex systems. In: Proceedings of IEEE 5th International Fuzzy Systems, vol. 1, pp. 587– 593, IEEE (1996) 3. Kim, T.H., Sugie, T.: Adaptive receding horizon predictive control for constrained discrete-time linear systems with parameter uncertainties. Int. J. Control 81(1), 62–73 (2008) 4. Dolara, A., Leva, S., Manzolini, G.: Comparison of different physical models for PV power output prediction. Sol. Energy 119, 83–99 (2015) 5. Dinov, I.D.: Data science and predictive analytics. Springer Science and Business Media LLC: Berlin/Heidelberg, Germany (2018) 6. Alamaniotis, M., Mathew, J., Chroneos, A., Fitzpatrick, M., Tsoukalas, L.H.: Probabilistic kernel machines for predictive monitoring of weld residual stress in energy systems. Eng. Appl. Artif. Intell. 71, 138–154 (2018) 7. Sooby, E., Alamaniotis, M., Heifetz, A.: Gaussian process ensemble for corrosion modeling and prediction in molten salt reactors. In: Proceedings of the 12th Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies (NPIC&HMIT 2021), pp. 239–250. Providence, RI (2021) 8. Fukunaga, K.: Introduction to statistical pattern recognition. Elsevier, Netherlands (2013) 9. Liu, H., Cocea, M., Ding, W.: Multi-task learning for intelligent data processing in granular computing context. Granular Computing 3(3), 257–273 (2018) 10. Peres, R.S., Rocha, A.D., Leitao, P., Barata, J.: IDARTS–Towards intelligent data analysis and real-time supervision for industry 4.0. Comput Ind 101:138–146 11. Zhang, Z., Du, Y.: Intelligent data processing of marine target tracking process based on fuzzy clustering. In: Proceedings of the 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), pp. 1007–1010. IEEE (2021) 12. Kotenko, I.V., Parashchuk, I.B., Omar, T.K.: Neuro-fuzzy models in tasks of intelligent data processing for detection and counteraction of inappropriate, dubious and harmful information. In: Proceedings of the 2nd International Scientific-Practical Conference Fuzzy Technologies in the Industry, pp. 116–125. (2018) 13. Deák, A., Jakab, F.: Energy disaggregation based intelligent data processing tool. In: Proceedings of the 17th International Conference on Emerging eLearning Technologies and Applications (ICETA), pp. 145–148. IEEE (2019) 14. Sahaida, P.: Model and method of processing partial estimates during intelligent data processing based on fuzzy measure. In: Proceedings of the IEEE KhPI Week on Advanced Technology (KhPIWeek), pp. 114–118. IEEE (2020) 15. Mihai, V., Hanganu, C.E., Stamatescu, G., Popescu, D.: Wsn and fog computing integration for intelligent data processing. In: Proceedings of the 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–4. IEEE (2018) 16. Young, R., Fallon, S., Jacob, P.: An architecture for intelligent data processing on iot edge devices. In: Proceedings of the UKSim-AMSS 19th International Conference on Computer Modelling & Simulation (UKSim), pp. 227–232. IEEE (2016) 17. Dey, S., Chandwani, A., Mallik, A.: Real time intelligent data processing algorithm for cyber resilient electric vehicle onboard chargers. In: Proceedings of the IEEE Transportation Electrification Conference & Expo (ITEC), pp. 1–6. IEEE (2021)
288
M. Alamaniotis
18. Matrosova, E., Tikhomirova, A.: Intelligent data processing received from radio frequency identification system. Procedia Comput. Sci. 145, 332–336 (2018) 19. Xu, M., Huang, G., Zhang, M., Cui, P., Wang, C.: Load forecasting research based on high performance intelligent data processing of power big data. In: Proceedings of the 2nd International Conference on Algorithms, Computing and Systems, pp. 55–60. (2018) 20. Kuzmin, A., Mitrokhin, M., Mitrokhina, N., Rovnyagin, M., Alimuradov, A.: Intelligent data processing scheme for mobile heart monitoring system. In: Proceedings of the IEEE International Conference on Soft Computing and Measurements, pp. 571–573. IEEE (2017) 21. Fang, M.: Intelligent processing technology of cross media intelligence based on deep cognitive neural network and big data. In: Proceedings of the 2nd International Conference on Machine Learning, Big Data and Business Intelligence, pp. 505–508. IEEE (2020) 22. Górriz, J.M., Ramírez, J., Ortíz, A., Martinez-Murcia, F.J., Segovia, F., Suckling, J., Leming, M., Zhang, Y.D., Álvarez-Sánchez, J.R., Bologna, G., Bonomini, P.: Artificial intelligence within the interplay between natural and artificial computation: advances in data science, trends and applications. Neurocomputing 410, 237–270 (2020) 23. Bishop, C.M.: Machine learning and pattern recognition. Information science and statistics. Springer, Heidelberg (2006) 24. Alamaniotis, M., Ikonomopoulos, A., Tsoukalas, L.H.: Probabilistic kernel approach to online monitoring of nuclear power plants. Nucl. Technol. 177(1), 132–144 (2012) 25. Alamaniotis, M.: Multi-Kernel decomposition paradigm implementing the learning from loads approach in smart power systems. In: Tsihrintzis, G., Virvou, M., Sakkopoulos, E., Jain L., (eds.) Machine Learning Paradigms—Applications of Learning and Analytics in Intelligent Systems, vol. 1, pp. 131–148. Springer, Berlin (2019) 26. New England ISO Homepage. http://iso-ne.com. Last accessed 25 Jan 2022 27. Alamaniotis, M., Ikonomopoulos, A., Tsoukalas, L.H.: Evolutionary multiobjective optimization of kernel-based very-short-term load forecasting. IEEE Trans. Power Syst. 27(3), 1477–1484 (2012)
Part IV
Blockchains and Intelligent Decision Making
Chapter 26
Potentials of Blockchain Technologies in Supply Chain Management—Empirical Results Ralf-Christian Härting, Nathalie Hoppe, and Sandra Trieu
Abstract This paper focuses on Blockchain Technologies in Supply Chain Management (SCM). It is based on the qualitative study “Potentials of Blockchain Technologies in Supply Chain Management—A Conceptual Model” by Härting et al. (Science Direct. 24th International conference on knowledge-based and intelligent information & engineering systems. Elsevier B.V., pp 1–9, 2020 [1]). The paper aims to investigate the potentials of Blockchain Technology in Supply Chain Management based on the conceptual model with six influencing factors and four moderators. A quantitative study is used to gain further insights and to examine relevant factors influencing the potentials of Blockchain Technologies in SCM. The results of the quantitative research approach show that the factors efficiency and scalability positively and significantly influence the potentials of Blockchain Technologies in SCM.
26.1 Introduction As digitalization is advancing rapidly, different digital information technologies are being used in the areas of society and business. One of these technologies is the distributed ledger technology that has promising potential in business [2]. The focal point of this paper is Blockchain Technology which is known as a distributed ledger technology. It is commonly used in the financial sector to track and trace transactions. However, more industries are starting to recognize the potentials of blockchain particularly in the supply chain sector where the technology can be applied to expand
R.-C. Härting (B) · N. Hoppe · S. Trieu Aalen University of Applied Sciences, Aalen, Germany e-mail: [email protected] N. Hoppe e-mail: [email protected] S. Trieu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_26
291
292
R.-C Härting et al.
communication networks and improve internal and external processes [3]. Therefore, the qualitative study “Potentials of Blockchain Technologies in Supply Chain Management—A Conceptual Model” from 2020 by Härting et al. is being used as a base [1]. Within the previous qualitative study, the authors examined main potentials and risks of Blockchain Technologies applied to SCM. The qualitative research design is based on Grounded Theory by Glaser [4] in order to develop a new theory including main influencing factors on the potentials of Blockchain Technologies in SCM. The authors created a conceptual model based on the results of eleven semi-structured interviews and identified six direct influencing factors as well as four moderating effects. To validate the model and to gain additional insights on the potential benefits of Blockchain Technologies in SCM, a quantitative research design was conducted which is subject of this paper. It includes modifying and supplementing the existing model with additional indicators identified through a systematic literature review (SLR) within the present study. The chapter is structured as follows: First, a brief introduction and an explanation of Blockchain Technology and its relation to SCM is being provided. In the second chapter, the research design including the prior SLR and a revised conceptual model will be illustrated. The third chapter presents the results by initially providing information on the data collection process. It is followed by the examination of different quality criteria (Sect. 26.3.2) and the evaluation of the revised model based on structural equation modeling (SEM). The last section summarizes the results, discusses limitations of the work and potential further actions needed.
26.1.1 Blockchain Technology Blockchain Technology is a technology that has emerged recently. It is a decentralized ledger, which stores transaction data in blocks. These blocks are added together in chronological order to form an incorruptible chain, which is shared and distributed to all the participating entities [2]. Blockchain makes it possible in nearly all businesses, where a mediator or trusted third party is needed, to exclude third parties for transfers. Due to the visible public transactions and the asymmetric cryptography, all participants can validate and confirm real transactions while still remaining unknown [5, 6]. Blockchain allows the opportunity to work in a decentralized network of many equally privileged nodes, also called peer-to-peer networks, and on one distributed ledger [7, 8]. Blockchain in itself can be termed as metatechnology because it is the result of the integration of several other technologies such as cryptographic technology or database technology [9]. Blockchain can make transactions more instantaneous and cheaper [10].
26 Potentials of Blockchain Technologies in Supply Chain …
293
26.1.2 Blockchain Technology in Supply Chain Management In literature, supply chain is derived from several different fields where they are examining the integration of Blockchain Technology. Plenty of papers mention it, for instance, Tian [11] deals with the application of blockchain in supply chains in the agriculture and food sector. It has also been examined in the luxury goods market. Due to high margins and low trading volumes, the luxury market offers, in comparison to the food market, better environmental conditions for the implementation of a blockchain solution [12, 13]. It can be concluded that Blockchain Technologies can be integrated into supply chains in different ways.
26.1.3 Potentials of Blockchain Technologies in Supply Chain Management In a typical supply chain, material, cash and information flow through facilities, such as vendors, plants, or distribution centers. In every supply chain, some transactions take place at these interfaces. For example, between plant and distribution center or between vendor and plant. Due to the existence of finance-related redundancies in record keeping, trust-related issues are inevitable [14]. Blockchain can work as a sole source of information and will help to integrate all functions of the supply chain [15]. Blockchain Technology is expected to make the whole process faster and more reliable [16]. Because of its ability to create records of activities, Blockchain Technology can help organizations with accurate demand forecasting, managing resources effectively, and reducing inventory carrying costs. Compared to the traditional supply chains where high stocks of inventory, excess capacity, and third-party backup sources are developed in anticipation of disruptions, this can help the supply chains in risk mitigations at lower costs [17]. Applications of Blockchain Technology can help in enhancing the scale and scope of the tracing and tracking systems [18]. Moreover, the architecture of blockchain brings benefits of better traceability and tamper-proof nature and has the potential to resolve trust issues in a typical supply chain [19]. Blockchain technology can improve the visibility and transparency of transactions in supply chains [20].
26.2 Research Design 26.2.1 Prior Research Design The present work is based on the prior qualitative study “Potentials of Blockchain Technologies in Supply Chain Management—A Conceptual Model” by Härting et al. [1]. The focus of the former paper were the factors that “[…] influence the willingness
294
R.-C Härting et al.
Fig. 26.1 Prior conceptual model
of companies, to use a blockchain-based technology in supply chain […]” [1]. The paper aimed to identify relationships between six different core factors with four moderating effects on the potentials of blockchain in SCM (Fig. 26.1). These factors and affects that influence the potentials of Blockchain Technologies in SCM were being identified by semi-structured expert interviews. The experts consisted of nine male and two female experts from Germany, Switzerland, and the USA. Some of the limitations of the prior research were the limited duration of four months. As the number of experts with adequate experience and knowledge is small in this field, the search, selection and time for interviews was difficult. Furthermore, the many fields of application and the different company structures had an impact on the research. The aim of the former qualitative study was to question experts for blockchain and SCM, whereas many experts did not have experience in SCM. Additionally, the experts were mostly from German speaking countries.
26.2.2 Systematic Literature Review (SLR) The objective of this quantitative study is to further investigate the potentials of Blockchain Technology for SCM and either confirm or reject the presented influences, moderators, and the already existing conceptual model of Härting et al. [1]. Since no substantive literature review was made in Härting et al. [1] a SLR (Table 26.1) was conducted to supplement the existing conceptual model. Based on the modified model, hypotheses were generated and integrated into the survey. The resources were highly ranked (>C rank) international journals, papers, and empirical studies. The objective of the literature analysis was to identify further factors relating
26 Potentials of Blockchain Technologies in Supply Chain …
295
Table 26.1 Summary of the SLR Determinant
Indicator
References
Trust
Transparency, traceability, anti-counterfeiting, permanent record
[21, 22]
Efficiency
Process automatization, real-time processing of transactions, cost-effective processes
[22–24]
Control
Standardized regulations, monitoring legal systems, regular checks
[21, 25]
Privacy
Data access, encryption techniques, competitive information
[21, 26]
Privacy → Trust
Privacy affects trust
[27, 28]
Scalability
Size of blockchain, technical maturity, validating real applications
[22, 25, 29]
to the potentials of Blockchain Technologies in SCM, to confirm the existing items, and to update the conceptual model by these factors.
26.2.3 Revised Conceptual Model and Hypotheses In accordance with the findings of the SLR, indicators were added to measure each determinant (Fig. 26.2). Due to redundancies between the variables “Efficiency” and “Costs”, “Costs” has been integrated as an item in “Efficiency” within the revised conceptual model. Because of the relationship between “Privacy” and “Trust” found in the literature analysis, “Privacy” directly influences “Trust” and indirectly influences the potentials of Blockchain Technologies in SCM. The result is a modified conceptual model with following hypotheses (see Fig. 26.2): • Privacy positively influences trust. • Trust, Efficiency, Control and Scalability positively influence the potentials of Blockchain Technologies in SCM. • Use case, Knowledge, Collaboration and Regulations affect the influences on the potentials of Blockchain Technologies in SCM (moderating effects).
26.3 Results 26.3.1 Data Collection To enable comparability of the results in the quantitative survey, closed-ended questions were used to examine the mentioned influences on the potentials of blockchain technologies in SCM. The questionnaire contained statements regarding the items found in the literature analysis. The participants were asked to evaluate them from 1
296
R.-C Härting et al.
Fig. 26.2 Revised conceptual model
(completely agree) to 5 (completely disagree). The quantitative data were collected utilizing an online questionnaire, which was created with “LimeSurvey”. After pretesting the, the survey was primarily shared through the online business networks “LinkedIn” and “Xing” between June 2020 and May 2021. The target group of the survey were experts in Blockchain Technologies for SCM. A total of 179 respondents have completed the online survey on LimeSurvey. 76 participants did not have experience with Blockchain Technologies in SCM and thus did not meet the criteria for the evaluation. The adjusted sample is n = 103. The sample of the 103 respondents that have blockchain experience concerning SCM consists of 79 (77%) male, 16 (16%) female and 1 diverse (1%) respondent. 7 respondents (7%) did not disclose their gender. Most of the respondents work in the Information and Communication/IT sector (30%). Other fields are Manufacturing Industry/Production (13%), Financial and Insurance services (13%), Trade/Traffic (10%), Health/Social Insurance (3%), Pharmacy (3%), Farming (2%) and Building Trade/Construction (1%). The remaining respondents (26%) work in other sectors such as Software Development, Automotive, Logistics, Education, and Consulting. With 43%, most respondents belong to the Management/Board of director. This is followed by the function of Team leader (12%), Department manager (9%), Person responsible (6%), and Division manager (6%). 16% have other job functions including: Founder/CEO, software developer, project manager, data analyst and business analyst. 8% did not provide any information about their job function.
26 Potentials of Blockchain Technologies in Supply Chain …
297
14% of the participants have been using Blockchain Technologies for less than a year in their company. Most of the participants have been using them for 1– 4 years (63%) and 24% of the participants have used them for more than 5 years. The following blockchain approaches are used by the experts: Smart Contracts (82%), Proof of identity (60%), Proof of ownership (60%), Anti-counterfeit solutions (47%), Trading and Financial platforms (41%), Services for consumer privacy (15%), Tax automation (5%) and other approaches (24%) like Tokenization, Automated Warehouse solutions and Delegated Proof of Stake (DPoS). Furthermore, 82 of the respondents provided information on their company’s location. 17 of them are located in Germany, 16 in the United States, respectively 5 in India and Canada. Between one and four companies are located in the following countries: Australia, Austria, Brazil, France, Hong Kong, Iran, Italy, Ireland, Netherlands, Saudi Arabia, Slovenia, Spain, Sweden, Switzerland, UK and Ukraine. Most of the respondents work in large enterprises with more than 250 employees (45%) and a turnover of more than 40 Mio. e (37%). Another large share is attributable to small companies with less than 50 employees (39%) and less than 12 Mio. e turnover (32%). In contrast, few respondents work for medium-sized companies with employees between 50 and 250 (10%) and a turnover of between 12 and 40 Mio. e (4%). In order to analyze the relationship between the impact factors and the potentials of Blockchain Technologies in SCM, the structural equation modeling (SEM) approach was used. SEM is a second-generation multivariate data analysis method which is often used in research because it can test theoretically supported linear and additive causal models [30]. The authors chose SmartPLS due to its robustness and data requirements. SmartPLS is a statistics software for Partial Least Squares Structural Equation Modeling (PLS-SEM) [31].
26.3.2 Examination of Quality Criteria Before running the Structural Equation Model, the quality criteria were examined taking into consideration Cronbach’s Alpha (CA), Composite Reliability (CR), Average Variance Extracted (AVE) (Table 26.2) and R2 . Table 26.2 Measures of reliability and validity Variable
CA
CR
AVE
Privacy
0.648
0.812
0.597
Trust
0.721
0.824
0.541
Efficiency
0.612
0.784
0.554
Control
0.773
0.866
0.684
Scalability
0.756
0.859
0.670
298
R.-C Härting et al.
CA and CR are measures of internal consistency reliability. CA is considered the lower bound and CR the upper bound for internal consistency reliability. Typically, the true value is viewed as within these two extremes. The recommended value is 0.7–0.9 [32]. The CA value of Privacy and Efficiency is below 0.7, the other variables meet the recommended value. The value of CR is within this range for all variables. To measure the convergent validity of each construct, the AVE is used. An AVE with 0.5 or higher is acceptable [32]. All variables show an acceptable AVE value. The coefficient of determination R2 is a measure of the quality of linear regression [33]. To reduce bias, the adapted R2 has been used. The adapted R2 of the dependent variable “Potential Benefits of Blockchain Technologies in SCM” is 0.236. This means that 23.6% of the observed data can be explained by the model. Consequently, there is improvement potential in regard to new indicators or additional constructs. As it is >0.19, it is considered “weak” to “moderate” [34].
26.3.3 Evaluation of the SEM After the examination of the quality criteria, the model based on SEM is being evaluated by means of SmartPLS [35]. Figure 26.3 depicts the structural model with the coefficients of the respective paths. It shows the relation between the five determinants and the potentials of Blockchain Technologies in SCM. The statistical analysis is based on a one-sided significance test with a significance level of 10%. In this case, the critical t-value is 1.28 [30]. T-statistics are used to determine the significance of a parameter estimator. If the measured t-value exceeds the critical t-value, the null hypothesis has to be rejected [30]. In order to confirm a significant influence at a significance level of 10%, the p-value has to be below 0.1 [30]. Table 26.3 shows the path coefficients, t-statistics and p-values of the different constructs. H1: Considering hypothesis 1, the p-value is 0.000. The t-value of 4.559 exceeds the critical value of 1.28 [30]. The results of the path coefficient show that privacy positively affects (+0.387) the potentials of Blockchain Technologies in SCM.
Privacy Efficiency
+0.387**
Trust
-0.042
Control
+0.018
+0.175*
Potentials of Blockchain Technologies in SCM
+0.431**
Scalability
Fig. 26.3 SEM with coefficients
* p < 0.1 ** p < 0.05
26 Potentials of Blockchain Technologies in Supply Chain …
299
Table 26.3 SEM-coefficients Hypothesis SEM-path
Path coefficient T-statistics Significance (p-values)
H1
Privacy → Trust
H2
Trust → BC in SCM
+0.018
0.156
0.438
H3
Efficiency → BC in SCM
+0.175
1.426
0.077
H4
Control → BC in SCM
−0.042
0.285
0.388
H5
Scalability → BC in SCM +0.431
2.708
0.003
+0.387
4.559
0.000
Consequently, a statistical significance can be assumed, and the hypothesis can be accepted. H2: For hypothesis 2, the coefficient shows positive results (+0.018). However, the p-value is 0.438 which indicates it is not significant, and the t-value is 0.156. Therefore, trust does not positively influence the dependent variable and neither does privacy. H3: Hypothesis 3 refers to efficiency as a possible influence on the potentials of Blockchain Technologies in SCM. As there is a p-value of 0.077 and a path coefficient of +0.175, a significant influence can be assumed. Moreover, the t-value is 1.426 which leads to the fact that the hypothesis can be supported. H4: Regarding hypothesis 4, the p-value of 0.388 shows no significant influence. Combined with the path coefficient of −0.042 and the t-value of 0.285, the values signalize that the hypothesis cannot be confirmed. H5: Hypothesis 5 shows the most significant p-value of 0.003 and a high path coefficient of +0.431. The significance is confirmed because the t-value is 2.708. Consequently, the hypothesis can be accepted.
26.4 Conclusion The research project aimed to review and further develop the conceptual model within the qualitative study “Potentials of Blockchain Technologies in Supply Chain Management—A Conceptual Model” by Härting et al. [1]. Therefore, a SLR was conducted to elaborate the model as a basis for a quantitative study. As a result, new indicators were added to the existing conceptual model and redundancies were excluded. A main goal was to examine the influence of the determinants and moderating effects on the potentials of Blockchain Technologies in SCM. The results show that efficiency and scalability have a positive and significant influence. According to the indicators of the conceptual model, efficiency can be considered as an automation of processes, real-time processing and cost-effective processes Scalability is generated by an increasing blockchain network, technical maturity, and a large number of real applications. In contrast, the influencing factors trust and control do not have a significant impact on the potentials of Blockchain Technologies in SCM. This is probably due to the fact that trust and control act neutrally.
300
R.-C Härting et al.
Besides, the question regarding the company size shows interesting results. Both in terms of turnover and number of employees, either very large companies use Blockchain Technologies or very small ones. The number of medium-sized companies using Blockchain Technologies is very small. Large companies benefit strongly from Blockchain Technologies in SCM because supply chains in international corporations are multitiered and complex, which is not the case for medium-sized companies. The high percentage of small companies can be explained that there are an increasing number of start-ups specialized in Blockchain Technologies offering their services. Limitations of this research have to be also considered. The quantitative research includes a sample consisting of 103 blockchain and supply chain experts. The data were collected during June 2020–May 2021, and thus are based on the opinions of the experts during this period. Future work should increase the sample size in order to derive more generally applicable statements. This can be achieved by extending the investigation period. Additionally, the sample consists of experts from 20 different countries with different research statuses and expertise. Therefore, future research in this field could be conducted specifically for individual countries as well as sector or company specific. Since blockchain in the supply chain is a still less explored topic, further hypotheses on potentials and risks can be obtained through additional literature research. The present empirical study illustrates that enterprises can benefit from the implementation of Blockchain Technologies into their supply chain environment. Efficiency and scalability are both factors that could be confirmed by the study and that play a significant role for enterprises having to consider economic aspects. The scalability might be improved by an increasing and interacting blockchain network and the technical maturity of blockchain systems. In order to exploit this potential, it is important to identify and use the appropriate blockchain technology as well as to align it to the individual corporate processes.
References 1. Härting, R., Sprengel, A., Wottle, K., Rettenmaier, J.: Potentials of blockchain technologies in supply chain management—a conceptual model. In: Science Direct. 24th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, pp. 1–9. Elsevier B.V. (2020) 2. Underwood, S.: Blockchain beyond bitcoin. Commun. ACM 59(11), 15–17 (2016). https://doi. org/10.1145/2994581 3. Lacity, M.: Addressing key challenges to making enterprise blockchain applications a reality. MIS Q. Exec. 17, 3 (2018) 4. Glaser, B.: Doing Grounded Theory: Issues and Discussions. Sociology Press (1998) 5. Hinckeldeyn, J.: Blockchain-Technologie in der Supply Chain. Springer Fachmedien Wiesbaden, Wiesbaden (2019). https://doi.org/10.1007/978-3-658-26440-6 6. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System (2008). Retrieved from https:// bitcoin.org/bitcoin.pdf
26 Potentials of Blockchain Technologies in Supply Chain …
301
7. Abeyratne, S.A., Monafared, R.P.: Blockchain ready manufacturing supply chain using distributed ledger. Int. J. Res. Eng. Technol. 05(09), 1–10 (2016). https://doi.org/10.15623/ ijret.2016.0509001 8. Fill, H.-G., Meier, A.: Blockchain kompakt. Springer Fachmedien Wiesbaden, Wiesbaden (2020). https://doi.org/10.1007/978-3-658-27461-0 9. Mougayar, W.: The Business Blockchain: Promise, Practice, and Application of the Next Internet Technology. Wiley, Hoboken, NJ (2016) 10. Peters, G.W., Panayi, E.: Understanding modern banking ledgers through blockchain technologies: future of transaction processing and smart contracts on the internet of money. In: Banking Beyond Banks and Money, pp. 239–278. Springer, Cham (2016) 11. Tian, F.: An agri-food supply chain traceability system for China based on RFID & blockchain technology. In: Yang, B. (ed.) 2016 13th International Conference on Service Systems and Service Management (ICSSSM): June 24–26, 2016, KUST, Kunming, China, pp. 1–6. IEEE, Piscataway, NJ (2016). https://doi.org/10.1109/ICSSSM.2016.7538424 12. Loebbecke, C., Lueneborg, L., Niederle, D.: Blockchain Technology Impacting the Role of Trust in Transactions: Reflections in the Case of Trading Diamonds (2018). ECIS Retrieved from https://aisel.aisnet.org/ecis2018_rip/68 13. McEntire, J., Kennedy, A.W.: Food Traceability. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-10902-8 14. Ammous, S.: Blockchain Technology: What is it Good for? Center on Capitalism and Society at Columbia University Working Paper #91 (2016). https://doi.org/10.2139/ssrn.2832751 15. Korpela, K., Hallikas, J., Dahlberg, T.: Digital supply chain transformation toward blockchain integration. In: Proceedings of the 50th Hawaii International Conference on System Sciences (2017). https://doi.org/10.24251/HICSS.2017.506 16. Kim, H., Laskowski, M.: Towards an Ontology-Driven Blockchain Design for Supply Chain Provenance (2016). https://doi.org/10.2139/ssrn.2828369 17. Ivanov, D., Dolgui, A., Sokolov, B.: The impact of digital technology and Industry 4.0 on the ripple effect and supply chain risk analytics. Int. J. Prod. Res. 25, 1–18 (2018) 18. Hofmann, E., Strewe, U.M., Bosia, N.: Supply Chain Finance and Blockchain Technology. Springer International, Heidelberg (2018) 19. Kshetri, N.: Blockchain’s roles in meeting key supply chain management objectives. Int. J. Inf. Manage. 3980–3989 (2018) 20. Pilkington, M.: Blockchain technology: principles and applications. In: Xavier Olleros, F., Zhegu, M. (ed.) Research Handbook on Digital Transformations, pp. 1–38. Edward Elgar, Glos (2016) 21. Behnke, K., Janssen, M.F.W.H.A.: Boundary conditions for traceability in food supply chains using blockchain technology. Int. J. Inf. Manage. 52, 1–10 (2020) 22. Schmidt, C.G., Wagner, S.M.: Blockchain and supply chain relations: a transaction cost theory perspective. J. Purchasing Supply Manage. 25, 1–13 (2019) 23. Kurpjuweit, S., Schmidt, C.G., Klöckner, M., Wagner, S.M.: Blockchain in additive manufacturing and its impact on supply chains. J. Bus. Logistics 2019, 1–25 (2019) 24. Kamble, S., Gunasekaran, A., Arha, H.: Understanding the Blockchain technology adoption in supply chains-Indian context. Int. J. Prod. Res. 57(7), 2009–2033 (2019) 25. Saberi, S., Kouhizadeh, M., Sarkis, J., Shen, L.: Blockchain technology and its relationships to sustainable supply chain management. Int. J. Prod. Res. 57(7), 2117–2135 (2019) 26. Ciriani, V., De Capitani di Vimercati, S., Foresti, S., Jajodia, S., Paraboschi, S., Samarati, P.: Combining fragmentation and encryption to protect privacy in data storage. ACM Trans. Inf. Syst. Secur. (TISSEC), 22–33 (2010) 27. Zhang, R., Xue, R., Liu, L.: Security and privacy on blockchain. ACM Comput. Surv. 52(3), article 51, 1–34 (2019) 28. Karame, G., Capkun, S.: Blockchain security and privacy. IEEE Secur. Priv. 16(4), 11–12 (2018) 29. Chauhan, A., Malviya, O., Verma, M., Singh Mor, T.: Blockchain and scalability. In: IEEE International Conference on Software Quality, Reliability and Security Companion, pp. 122– 128 (2018)
302
R.-C Härting et al.
30. Hair, J.F., Hult, G.T.M., Ringle, C.M., Sarstedt, M., Richter, N.F., Hauff, S.: Partial Least Squares Strukturgleichungsmodellierung. Eine anwendungsorientierte Einführung. Verlag Franz Vahlen, München (2017) 31. Wong, K.K.K.: Partial least squares structural equation modeling (PLS-SEM) techniques using SmartPLS. Mark. Bull. 24(1), 1–32 (2013) 32. Hair, J.F., Risher, J.J., Sarstedt, M., Ringle, C.M.: When to use and how to report the results of PLS-SEM. Eur. Bus. Rev. 31(1), 2–24 (2019). https://doi.org/10.1108/EBR-11-2018-0203 33. Nakagawa, S., Johnson, P.C.D., Schielzeth, H.: The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J. R. Soc. Interface 14(20170213), 1–11 (2017) 34. Chin, W.W.: The partial least squares approach for structural equation modeling. In: Marcoulides, G.A. (ed.) Modern Methods for Business Research (Quantitative Methodology Series), pp. 295–336. Psychology Press, New York, NY (1998) 35. Ringle, C.M., Wende, S., Becker, J.-M.: SmartPLS 3 (2015). Available online at http://www. smartpls.com
Chapter 27
Synchronization of Mobile Grading and Pre-cooling Services Xuping Wang, Yue Wang, Na lin, and Ya Li
Abstract At present, the “first-mile” of fruit and vegetable products in China has problems such as low efficiency of manual grading and untimely pre-cooling services, and the scattered small and medium scale characteristics of fruit and vegetable producing areas make it difficult to implement post-harvest processes. The emerging mobile grading and pre-cooling equipment used in the field can solve these problems, but there is a break in the chain between grading and pre-cooling, and mobile grading and pre-cooling resources need to be coordinated to reduce the service interval and thus reduce the rate of fruit and vegetable spoilage. In this paper, taking into account the service sequencing constraints of graded pre-cooling, the maximum service interval constraint of graded pre-cooling and the delayed pre-cooling effect, and other factors, a two-demand collaborative optimization model for mobile graded pre-cooling is developed to minimize the total cost. A hybrid genetic algorithm with variable neighborhood search considering customer clustering is developed to solve the problem and compared with genetic algorithms and variable neighborhood search algorithms to verify its effectiveness. The rationality of the model is proved by solving a case of grading and pre-cooling apples in Luochuan County, Shaanxi Province, and designing a sensitivity analysis to find the most suitable service level and percentage of users with pre-cooling needs for the county, which provides a basis for decision making for the future use of mobile grading and pre-cooling resources in the promotion of small and medium-sized farmers.
27.1 Introduction In recent years, as more and more small and medium-sized farmers adopt the B2C model to sell fruit and vegetable produce, the problems of post-harvest commercialization and the “first-mile” link are increasingly exposed. China’s fruit and vegetable X. Wang (B) · Y. Wang · N. lin · Y. Li School of Economics and Management, Dalian University of Technology, Linggong Road, Dalian 116024, Liaoning, People’s Republic of China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_27
303
304
X. Wang et al.
products emphasize pre-harvest over post-harvest, post-harvest handling operations are missing or untimely, grading links are more manual, and pre-cooling stations are too far away from small and medium-sized farmers, all of which lead to the loss of quality and quantity of fruits and vegetables after harvest, and affect the income [1–3]. Given the above, for China’s fruit and vegetable agricultural production areas are scattered, the characteristics of small and medium-scale production, through the development of mobile grading [4, 5], pre-cooling equipment [6], and supporting transport vehicles suitable for China’s national conditions can effectively solve the low efficiency of the manual grading stage, the pre-cooling link is not timely and other problems. Obviously, in the context of China’s smallholder economy, mobile grading and pre-cooling resources will better serve individual smallholder farmers. However, at present, few scholars pay attention to the operation optimization of the pre-cooling and grading in the “first mile” of fruits and vegetables, and more focus on the research on the cargo collection of agricultural products [7, 8] and the layout of the cold chain network [9, 10]. Therefore, this paper will focus on the collaborative scheduling of grading and precooling vehicles to reduce process waiting and ensure the timeliness of precooling. The essence of this paper is the dual-demand vehicle routing problem (VRP), which is a variant of VRP. Subramanian et al. [11] used a parallel algorithm to solve the vehicle routing problem with simultaneous delivery and pickup. In the above literature, the vehicles can deliver and pick up goods at the customer node at the same time, the dual needs of the customers studied in this paper fall into different categories that cannot be met by the same vehicle, which is similar to delivery and installation in the appliance industry. Sun et al. [12] set the delivery vehicle service level as the duration between delivery and installation services, using an endosymbiotic evolutionary algorithm based on a random search mechanism for collaborative optimization delivery and installation vehicle routing. Sim et al. [13] studied the vehicle scheduling for the dual of appliances delivery and installation services, solved it by ant colony algorithm, and applied it to a logistics company for testing. Kim et al. [14] studied the vehicle routing problem considering both delivery and installation service demands and formulated a mixed-integer nonlinear programming model that minimizes the travel time of transportation and installation vehicles. Bae et al. [15] designed a phased genetic algorithm to solve the multi-depot vehicle routing problem with time windows considering delivery and installation vehicles. The first stage was to formulate the routing of the delivery vehicle, and the second stage was to optimize the routing of the installation vehicle. Hojabri [16] studied the vehicle routing problem with synchronization constraints and designed an adaptive large neighborhood search vehicle routing problem with synchronization constraints to solve it. Sarasola et al. [17] consider that customers have multiple needs and are served by different vehicles to formulate a vehicle path optimization model with synchronization constraints, and develop an adaptive large neighborhood algorithm to solve it. Reviewing the related research on the integration of delivery and installation in the home appliance industry, it can be seen that when the dual needs of customers must be served by different types of vehicles, the service sequence constraints and
27 Synchronization of Mobile Grading and Pre-cooling Services
305
the maximum service time interval constraints should be considered. Based on the existing literature, this paper combines the special background of fruits and vegetables needing timely pre-cooling to develop a mobile grading and pre-cooling vehicle collaborative dispatching model, and develops a hybrid genetic algorithm with variable neighborhood search considering customer clustering to solve the problem, to effectively solve the current problem of broken chains in the “first-mile” of fruit and vegetable agricultural products and provide a basis for enterprise managers to make decisions. The subject of this paper is fruit and vegetable produce, which has high requirements for freshness, so the problem differs from that of delivery and installation in the appliance industry as follows: • Changes in freshness and increased costs of fruits and vegetables due to delayed pre-cooling should be taken into account when developing the model. • Compared to traditional VRP there is no load constraint but the mode of serving grading and pre-cooling trucks in the field in situ gives it a maximum service capacity limit and makes its service time for grading and pre-cooling a variable of concern.
27.2 Problem Description The “first-mile” mobile grading and pre-cooling resource dispatching problem for single product categories of fruit and vegetable produce studied in this paper can be described as follows. A cold chain logistics center (single depot) has both mobile grading and pre-cooling trucks to provide in-field grading or pre-cooling services to farmer sites with mixed needs. For farmers with grading needs only, only one visit by a grading vehicle is allowed and the vehicle needs to arrive within the required time window, which is a soft time window. For farmers who have both grading and pre-cooling needs, the grading vehicle and the pre-cooling vehicle are allowed to visit once respectively. They are served in the order of grading first and then precooling, with a double soft time window. Their service order is grading first and then pre-cooling, and the service time window for them is a double soft time window. In addition to meeting the grading time window constraint, the dual demand node also expects that the interval between the start of service of the pre-cooling truck and the end of service of the grading truck must not exceed the given maximum service interval to ensure the timeliness and effectiveness of fruit and vegetable pre-cooling. Pre-cooling trucks are allowed to arrive earlier (see Fig. 27.1). This setting can also effectively solve the problem of broken chains in the grading and pre-cooling chain. Therefore, optimizing the service path of the grading and pre-cooling trucks while satisfying the double-time window constraint is the key problem addressed in this paper (see Fig. 27.2).
306
X. Wang et al.
Fig. 27.1 Grading-pre-cooling timeline
Fig. 27.2 Single-category multi-demand vehicle route (VRPSTW) in the single depot
27.3 Optimization Model 27.3.1 Model Assumptions To simplify the model, the following assumptions are made about the problem. 1.
The cold chain logistics center doubles as a depot and all grading/pre-cooling vehicles depart from the same depot and return to the depot after completing the pre-cooling/grading service. The vehicles make only one round trip from the depot per day.
27 Synchronization of Mobile Grading and Pre-cooling Services
2. 3.
307
Farmers’ demands are all small to medium batches and should not exceed the maximum service workload of the mobile grading/pre-cooling trucks. The time window for providing grading services to each farmer is known.
27.3.2 Symbol Description The decision variables are as follows and the other parameters are given in Table 27.1. STiks the starting service time of the vehicle k corresponds to the demand of the type s in the farmer i; etkis the waiting time for vehicle k to arrive at farmer i earlier than the grading/precooling time window; ltkis the penalty time for vehicle k s if the vehicle to arrive at farmer i later than the grading/pre-cooling time window; xik k corresponding to the s-type demand is selected to serve the farmer i, = 1; yisjk if the vehicle k corresponding to the s-type demand chooses to serve directly from the farmer i to farmer j, = 1; gks if the s-type vehicle k is scheduled for service, = 1.
27.3.3 Mathematical Model The Mixed Integer Programming Model of this paper is formulated to minimize the total costs, as follows: min z =
FC s · gks +
s∈S k∈V s
+
i∈N j∈N k∈V 1
1 + EC 1 · ti1 · yki j
i∈N j∈N k∈V 1
+
1 + V C 1 · di j · yki j
2 V C 2 · di j · yki j
i∈N D P j∈N D P k∈V 2 2 EC 2 · ti2 · yki j
i∈N D P j∈N D P k∈V 2 s + PC s · lt s PC1s · etki 2 ki
s∈S k∈V s i∈N s
(27.1) s.t. s∈S k∈K
s yi0k =
i∈N 1
i∈N 1
s∈S k∈K
s y0ik =
i∈N 1
k∈V s
s xik = 1, ∀i ∈ N s , s ∈ S
k∈V 1
gk1 ≤ K 1
(27.2)
s∈S k∈K
s s xik · Dis ≤ Cmax , ∀k ∈ V s , s ∈ S
gks
(27.3) (27.4) (27.5)
308
X. Wang et al.
Table 27.1 Nomenclature of sets and parameters Notation
Meaning
D
The cold chain logistics center, D = {0}
S
Types of services the depot can provide, S = { s|1, 2},where 1 represents grading service, 2 represents pre-cooling service
N1
The set of all farmers with graded needs is also the set of all farmers
Ns
The set of farmers with s-type demand, N 2 ⊆ N 1
N
The set of the depot and all farmers with graded needs, N = D ∪ N 1
N DP
The set of the depot and all farmers with pre-cooling needs, N D P = D ∪ N 2
Vs
The set of available vehicles that provide s-type services to farmers, V 1 = { k|1, 2, ..., K 1 }, V 2 = { k|K 1 + 1, K 2 + 2, ..., K }
FC s
The fixed cost of the s-type vehicle
V Cs
The travel cost per unit distance of the s-type vehicle
EC s
The energy consumption cost per unit time of equipment on the s-type vehicle
PC1s
The waiting cost per unit time for the s-type vehicle arriving earlier than the required time
PC2s
The penalty cost per unit time for the vehicle of type s arriving later than the required time
vs
The speed of the s-type vehicle
ws
Working speed of the s-type vehicle when providing corresponding services
s Tmax
The maximum duration of the s-type vehicle
s Tmax
The demand for s-type service of farmer i
E Ti1 L Ti1
The lower bound of the grading time window of farmer i
STi
The maximum time interval of grading and pre-cooling service acceptable to farmer i
tisj
The travel time of the s-type vehicle from node i to node j,tisj = di j /vs
tis
The time required for farmer i to perform s-type service
M
Infinite positive number
di j
The distance from node i to node j
TD
The maximum working hours of the logistics center
s Cmax
The maximum handling capacity of the s-type vehicle
The upper limit of the grading time window of farmer i
gk2 ≤ K − K 1
(27.6)
k∈V 2
STik1 = max RTik1 , E Ti1
(27.7)
STik2 = max RTik2 , STik1 + ti1
(27.8)
27 Synchronization of Mobile Grading and Pre-cooling Services
309
etki1 = max E Ti1 − RTik1 , 0 , ∀i ∈ N 1 , k ∈ V 1
(27.9)
ltki1 = max RTik1 − L Ti1 , 0 , ∀i ∈ N 1 , k ∈ V 1
(27.10)
etui2 = max STik1 + ti1 − RTiu2 , 0 , ∀i ∈ N 2 , k ∈ V 1 , u ∈ V 2
(27.11)
ltui2 = max RTiu2 − STik1 − ti1 − S L i , 0 , ∀i ∈ N 2 , k ∈ V 1 , u ∈ V 2
(27.12)
STiks + tis + tisj ≤ RT jks + M(1 − yisjk ), ∀i ∈ N , k ∈ V s , s ∈ S
(27.13)
E Ti1 ≤ RTik + etki1 − ltkii ≤ L Ti1 , ∀i ∈ N 1 , k ∈ V 1 1 2 STik1 − M 1 − xik ≤ RTiu2 + M 1 − xiu , ∀i ∈ N 2 , k ∈ V 1 , u ∈ V 2
(27.14) (27.15)
STik1 + ti1 + S L i ≥ RTiu2 + etui2 − ltui2 ≥ STik1 + ti1 , ∀i ∈ N 2 , k ∈ V 1 , u ∈ V 2 (27.16) s yisjk · tisj ≤ Tmax , ∀s ∈ S, k ∈ V s (27.17) i∈N j∈N
STiks + tis + tisj ≤ TD + M 1 − yisjk , ∀i ∈ N , j ∈ N , k ∈ V s n
(27.18)
s yisjk = xik , ∀ j ∈ Ns, k ∈ V s, s ∈ S
(27.19)
y sjik = x sjk , ∀ j ∈ N s , k ∈ V s , s ∈ S
(27.20)
i=0 n i=0 s xik , yisjk , gks ∈ {0, 1}, ∀i, j ∈ N , k ∈ V s , s ∈ S
(27.21)
Equation (27.1) is the objective function formula, which includes fixed cost, variable cost, the total cost of grading and pre-cooling service equipment, and time window penalty cost. Equation (27.2) means that both the pre-cooled vehicles and the graded vehicles depart from the logistics center (closed bicycle yard), and return to the depot after completing the service. Equation (27.3) indicates that the number of farmers served by a certain type of vehicle on the same route shall not exceed the maximum service workload of the vehicle. Equation (27.4) indicates that each demand of each farmer is only served by vehicles of the corresponding type and the demand cannot be split. Equations (27.5) and (27.6) represent the limit on the number of graded cars and pre-cooled cars, respectively. Equations (27.7) and (27.8)
310
X. Wang et al.
represent the relationship between the arrival time and the starting service time of the graded and pre-cooled vehicles, respectively. Equations (27.9)–(27.12) represent the waiting time for early arrival and the penalty time for late arrival of graded and precooled vehicles, respectively. Equation (27.13) represents the time interval relationship between the arrival of any type of vehicle to adjacent farmers. Equation (27.14) is the grading time window constraint for farmers. Equation (27.15) represents the service sequence relationship in which farmers with dual needs begin to perform grading and pre-cooling (grading first and then pre-cooling). Equation (27.16) represents the time interval constraint when grading vehicles and pre-cooling vehicles provide the same farmer with both grading and pre-cooling services. Equation (27.17) indicates that the total travel time of the vehicle shall not exceed its maximum duration. Equation (27.18) indicates that the vehicle shall return no later than the vehicle return deadline of the depot. Equations (27.19) and (27.20) are the equilibrium relationship between the arrival and departure of vehicles. Equation (27.21) represents the decision variable.
27.4 Solution Method The proposed problem is a VRP expansion problem that optimizes the service path of graded vehicles and pre-cooled vehicles considering the customer’s dual time window and service sequencing, which is an NP-hard problem and is difficult to solve using exact algorithms. Therefore, this paper develops a hybrid genetic algorithm with variable neighborhood search to solve the problem. The steps are as follows: Step 1 Set algorithm parameters, such as MAXGEN, population size N. Step 2 Select individuals with better fitness values for subsequent search. Step 3 Cross individuals to increase the diversity of the population. Step 4 Set the neighborhood search methods, that is, 2-opt, swap, and insert, and perform the search for each individual in sequence. Step 5 Check whether the termination conditions are met. If yes, output the current optimal solution. If no, go to Step 2.
27.4.1 Generation of the Initial Population Firstly, the route of the graded vehicles is obtained by dividing the customers into graded vehicles according to the order of each chromosome customer based on constraints such as service capacity and maximum duration of the graded vehicles. Then, the customers with pre-cooling service demand in each chromosome are selected and arranged according to the sequence of customers given by the chromosome. The customers are divided into pre-cooling vehicles based on constraints such as pre-cooling vehicle service capacity, maximum duration, and the maximum
27 Synchronization of Mobile Grading and Pre-cooling Services 7
3
2
5
4
6
1
10
8
7
9
7
6
2
1
8
(c) customers with pre-cooling needs
(a) initial population
0
311
3
2
0
5
4
6
1
0
10
8
9
0
0
7
2
6
1
0
8
0
(d) the pre-cooling vehicle routes
(b) the graded vehicle routes
Fig. 27.3 Schematic diagram of chromosome encoding and decoding
time interval between the arrival of the pre-cooling vehicle and the graded vehicle to obtain the pre-cooling vehicle route. The decoding process of each chromosome is shown in Fig. 27.3. Taking 10 customers as an example, the initial population is obtained by randomly arranging the customers, as shown in Fig. 27.3a. Starting from the first customer of the chromosome, the service capacity and duration of the graded vehicles are judged by accumulative calculation. If the vehicle constraint cannot be met at customer 5, record the position and insert the depot 0 before the position, rearrange the vehicle at the recorded position, and repeat the above process to obtain the route of the graded vehicle. Insert the depot 0 at the beginning and end of the chromosome to complete the decoding. Finally, four routes are formed as shown in Fig. 27.3b. According to the above-mentioned screening and arrangement of customers with pre-cooling needs (Fig. 27.3c), the pre-cooling vehicle routes are divided into consideration of relevant constraints, and two pre-cooling vehicle routes are obtained as shown in Fig. 27.3d.
27.4.2 Neighbourhood Search Set up 2-opt, exchange, and insert three kinds of neighborhood searches. The three search methods are as follows: 1.
2-opt: randomly select points i and j in the individual, and reverse the order of points between point i and point j. As shown in Fig. 27.4a, point 3 and point 5 are randomly selected, and points 4, 1, and 2 between points 3 and 5 are operated in reverse order. 6
3
4
1
2
5
6
3
2
1
4
5
6
5
4
1
2
3
6
4
1
2
5
3
(a) 2-opt
6
3
4
1
2
5 (b) Exchange
6
3
4
1
2
5 (c) Insert
Fig. 27.4 Schematic diagram of neighborhood search
312
2.
3.
X. Wang et al.
Exchange: randomly select points i and j in the individual, and exchange the positions of point i and point j. As shown in Fig. 27.4b, point 3 and point 5 are randomly selected, and the positions of point 3 and point 5 are exchanged. Insert: randomly select points i and j in the individual, and move point i to the back of point j. As shown in Fig. 27.4c, point 3 and point 5 are randomly selected, and point 3 is moved after point 5.
27.5 Computational Experiment and Analyses 27.5.1 Test the Performance of the Algorithm To test the effectiveness of this hybrid variable neighborhood algorithm, we adapt data from the Solomon standard instance library [18] to construct instances for the proposed problem. The specific adaptation method is as follows: Select C101, C201, R101, R201, RC101, RC201 in the Solomon instance library, and three different groups of 25, 50 and 100 customer sizes are selected from each instance to form 18 test instances. The geographical location, demand, and time window of the customer are kept constant, and the service time at the customer is obtained by calculating the ratio of customer demand to service speed. 10, 25 and 50 dual demand customers were randomly generated for the adapted instances. The fixed costs of the graded and pre-cooled vehicles are 200 and 300 respectively. The early arrival and late arrival penalty factors for the graded vehicles are 0.04 and 0.06. The maximum duration of the vehicles are both 200 min. After several experiments, the relevant parameters of the algorithm were set as follows: the population size was all 100. The maximum number of iterations for the instances of size 25, 50 and 100 was 1000, 1000 and 2000 respectively. We solved each case 20 times using the traditional genetic algorithm (GA), the variable neighborhood search (VNS), and the hybrid genetic algorithm with variable neighborhood search (HGA_VNS) in this paper, and the computational results are given in Table 27.2.
27.5.2 Numerical Experiments Basic information of an example Apple picking in 25 typical villages in Luochuan County, Shaanxi Province, was selected as an instance. Suppose a cold chain logistics center intends to provide postharvest commercialization services for apples in Luochuan County. The center is equipped with mobile grading vehicles and pre-cooling vehicles, which can provide mobile grading and pre-cooling services to farmers. The instance data is as follows:
27 Synchronization of Mobile Grading and Pre-cooling Services
313
Table 27.2 Results of the three algorithms Instance
GA Best
VNS
HGA-VNS
Average
Gap (%)
Best
Average
Gap (%)
Best
C101-25
1773.55 1893.47
8.41
1809.18
1904.52
10.58
1636.01 1683.81
C201-25
1609.66 1757.56
9.57
1667.13
1780.63
13.48
1469.04 1607.25
R101-25
2007.43 2061.44
5.31
2034.65
2086.38
6.74
1906.21 1927.41
R201-25
1869.22 1974.44
5.41
1899.21
1978.01
7.10
1773.28 1786.22
RC101-25
1824.45 1893.89
6.99
1874.17
1929.56
9.90
1705.32 1713.56
RC201-25
1436.61 1587.61
9.18
1487.72
1591.22
13.06
1315.86 1329.84
C101-50
3759.91 4032.22
11.65 3851.24
4146.34
14.36
3367.65 3447.85
C201-50
2920.54 3383.48
13.58 3066.57
3364.03
19.26
2571.43 2777.58
R101-50
3708.21 3959.42
11.16 3819.89
4108.74
14.51
3335.87 3411.58
R201-50
2925.47 3281.22
13.08 3019.02
3413.12
16.70
2587.31 2677.63
RC101-50
4099.87 4454.57
10.22 4202.86
4444.35
12.99
3719.55 3740.14
RC201-50
2707.35 3437.15
13.60 2847.55
3116.34
19.48
2383.29 2456.07
C101-100
9657.52 10,307.09 14.68 10,137.99 10,884.79 20.39
8421.08 8566.74
C201-100
6113.46 6801.15
18.82 6787.57
7423.85
31.93
5145.01 5556.18
R101-100
7377.51 7827.21
15.79 7819.19
8348.34
22.72
6371.44 6453.47
R201-100
5954.13 6525.91
18.66 6304.28
6808.85
25.64
5017.69 4384.36
RC101-100 8200.49 8876.11
16.11 8426.66
8764.46
19.31% 7062.93 7191.51
RC201-100 6274.82 6835.15
19.62 6829.43
7448.83
30.20
5245.48 4323.94
average
12.32 4326.90
4641.24
17.13
3613.01 3613.06
4123.34 4493.83
Average
Note Gap refers to the improved range of HGA_VNS compared with the optimal value obtained by other algorithms, the same below
Due to the difficulty in collecting specific apple production data, a typical household in each typical village was selected as a customer point. The total number of apples picked in a day was obtained from field research of typical households in 2020 based on mu yield, the number of acres, and total apple picking hours. The grading time windows were generated randomly based on actual conditions and they obeyed a U(14,18) uniform distribution (Unit: hours). The fixed costs of graded vehicles and pre-cooled vehicles are 500 ¥/vehicle and 300 ¥/vehicle, respectively. The variable costs of graded vehicles and pre-cooled vehicles are 2 ¥/km and 1.5 ¥/km, respectively. The energy consumption costs of the upper equipment of the graded vehicle and the pre-cooled vehicle are 1.5 ¥/min and 1.2 ¥/min, respectively. The traveling speeds of the graded vehicle and the pre-cooled vehicle were 30 km/h and 40 km/h, respectively. The working efficiency of graded vehicles and pre-cooled vehicles is 30 and 15 kg/min. The maximum duration for graded vehicles and pre-cooled vehicles are 500 and 600 min. The maximum time interval of grading and pre-cooling services that farmers can accept (calculated from
314
X. Wang et al.
the end of the service time of the grading vehicle) is 100 min. The maximum service handling capacity for graded vehicles is 7000 kg, and the maximum service handling capacity for pre-cooled vehicles is 4000 kg. The penalty coefficients for early-arriving waiting and late-arriving vehicles are 0.5 ¥/min and 1.5 ¥/min, respectively. The penalty coefficients for early arrival and late arrival of pre-cooled vehicles are 0.8 ¥/min and 20 ¥/min, respectively. The pre-cooling method of pre-cooling vehicles is differential pressure pre-cooling. Experimental process and results analysis The case is solved by HGA_VNS, and the optimal value of 10 running results is shown in Table 27.3. The cold chain logistics center dispatched 5 grading vehicles and 4 pre-cooling vehicles to serve 25 customers, and the total cost is 9448.91 ¥. From Table 5, it can be concluded that the unit cost of mobile grading service is about 0.2 ¥/kg, and the unit cost of mobile pre-cooling service is about 0.3 ¥/kg, which is lower than the actual manual grading cost (0.4 ¥/kg) and the pre-cooling cost of the pre-cooling warehouse (0.8 ¥/kg). This suggests that the use of mobile equipment for postharvest processing can be cost-effective, but this does not preclude the idealization of parameter settings. In addition, the grading vehicle 5 only served one farmer, resulting in a grading service cost of 0.55 higher than labor 0.4, which Table 27.3 Vehicle scheduling plan Vehicle type
Vehicle route
Service volume
Service rate (%)
Cost
Unit cost
Grading vehicle 1
0-22-21-12-24-5-7-0
5980
85.43
1179.90
0.20
Grading vehicle 2
0-18-15-23-10-8-19-0
6070
86.71
1235.62
0.20
Grading vehicle 3
0-14-11-13-16-9-20-0
6770
96.71
1292.76
0.19
Grading vehicle 4
0-3-1-6-4-25-17-0
6010
85.86
1206.92
0.20
Grading vehicle 5
0-2-0
1210
17.29
661.25
0.55
Pre-cooling vehicle 1
0-22-21-24-0
3370
84.25
994.79
0.30
Pre-cooling vehicle 2
0-18-23-8-0
3070
76.75
937.88
0.31
Pre-cooling vehicle 3
0-14-13-9-0
3180
79.5
955.79
0.30
Pre-cooling vehicle 4
0-3-1-4-0
3370
84.25
983.98
0.29
Total cost
–
–
Note Unit cost = cost/service volume (unit: ¥/kg)
–
9448.91
–
27 Synchronization of Mobile Grading and Pre-cooling Services
315
indicates that when the number of farmers is small, it is not cost-effective to use mobile service equipment for post-harvest processing.
27.6 Conclusions Aiming at the problems of low grading efficiency, untimely pre-cooling, and broken chains in the first mile of fruit and vegetable produce, this paper proposed a service model of using mobile grading and pre-cooling resources on-site processing based on a single category of fruit and vegetable produce. To minimize the total operating cost, a collaborative optimization model of mobile grading and pre-cooling resources considering dual time windows is developed. The HGA_VNS was developed to solve the designed example, and compared with GA and VNS, which proves the effectiveness of the HGA_VNS. Finally, solving the case of apple picking processing in Luochuan County, Shaanxi Province verifies the rationality of the model and provides a reference for the collaborative operation of mobile grading and pre-cooling in the future. It can be seen from the above analysis that the emerging mobile grading and precooling technology has significant advantages over the traditional manual sorting and pre-cooling station pre-cooling mode. In the future, cold chain logistics enterprises should gradually apply this technology to fruit and vegetable harvesting. Post-processing practice to enhance the competitiveness of enterprises. Acknowledgements This work is supported in part by the National Key Research and Development Program of China under Grant 2019YFD1101103, the National Natural Science Foundation of China under Grants 72071028, 71973106, and 71933005, and Annual Project of Social Science Foundation of Shaanxi Province under Grant 2020R002.
References 1. Wu, S.S.: Analysis of the key factors of the “first kilometer” of fresh agricultural products under the background of supply-side reform. Rural. Agric. Farmers (B version) (10), 27–30 (2019) 2. Cao, J.P., Chen, Y.Z., Sun, C., Wang, Y., Chen, K.S., Zhang, C.F., Sun, C.D.: Development status of the commercialization technology support system of fruit and vegetable production areas in my country. J. Zhejiang Univ. (Agriculture and Life Sciences) 46(01), 1–7+16+2 (2020) 3. Chen, L.X., Huang, L., Ma, L.J.: Research on revenue sharing contract of agricultural product supply chain participated by TPL under freight cost sharing. Chin. J. Manage. Eng. 35(06), 218–225 (2021) 4. Wang, Q., Sun, L., Li, X.M., Zhang, M., Lv, Q., Cai, J.R.: Design of field grading system for navel oranges based on machine vision. J. Jiangsu Univ. (Natural Science Edition) 38(06), 672–676 (2017) 5. Xu, L.B., Zhu, Q.B., Huang, M.: Design of an ARM-based Apple Postharvest Field Grading Inspection System. Comput. Eng. Appl. 51(16), 234–238 (2015)
316
X. Wang et al.
6. Wang, Q., Dai, S.B., Deng, Z.K.: The application of mobile pre-cooling equipment in the preservation of fruits and vegetables. Refrigeration 30(03), 47–52 (2011) 7. Ge, X.L., Zhang, Y.T.: Optimization of fresh food logistics collection path based on proactive scheduling. Syst. Eng. 38(06), 70–80 (2020) 8. Wang, M.Y., Wu, S.C.: Optimization of the first kilometer path of agricultural products based on batch transportation. Logistics Eng. Manage. 42(09), 116–118 (2020) 9. Li, Y.M., Wang, X.L., Guo, X.Y.: Agricultural products cold chain network layout based on set covering model. J. China Agric. Univ. 22(09), 212–220 (2017) 10. Wang, X.L.: Research on the “First Kilometer” Cold Chain Network Layout of Fresh Agricultural Products. Zhengzhou University (2017) 11. Subramanian, A., Drummond, L.M.A., Bentes, C., et al.: A parallel heuristic for the vehicle routing problem with simultaneous pickup and delivery. Comput. Oper. Res. 37(11), 1899–1911 (2010) 12. Sun, J.: Synchronizing the schedule of delivery and service vehicle using endosymbiotic evolutionary algorithm. J. Korean Soc. Supply Chain Manage. 7(1), 79–90 (2007) 13. Sim, I., Kim, T., Cha, J., Lee, H.: A design and analysis of scheduling for the dual of appliances delivery and installation services. Korean Soc. Supply Chain Manage. 11, 41–53 (2011) 14. Kim, K.C., Sun, J.U., Lee, S.W.: A hierarchical approach to vehicle routing and scheduling with sequential services using the genetic algorithm. Int. J. Ind. Eng. Theor. Appl. Pract. 20(1), 99–113 (2013) 15. Bae, H., Moon, I., et al.: Multi-depot vehicle routing problem with time windows considering delivery and installation vehicles. Appl. Math. Model. 40(13), 6536–6549 (2016) 16. Hojabri, H., Gendreau, M., et al.: Large neighborhood search with constraint programming for a vehicle routing problem with synchronization constraints. Computers 92(01), 87–97 (2018) 17. Sarasola, B., Doerner, K.F.: Adaptive large neighborhood search for the vehicle routing problem with synchronization constraints at the delivery location. Networks 75(1), 64–85 (2020) 18. Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. Oper. Res. 35(2), 254–265 (1987)
Chapter 28
The Challenge of Willingness to Blockchain Traceability Adoption: An Empirical Investigation of the Main Drivers in China Xueying Zhai and Xiangpei Hu Abstract With the rapid development of blockchain and of other new technologies, more and more traditional industries and new technologies have been integrated. This paper considers the practical application of blockchain technology in the field of modern agriculture and develops on the basis of technology-organizationenvironment (TOE) framework. Six variables were considered, including technical complexity (TC), expected revenue (PE), network diversity (ND), organizational innovation (OI), Degree of Government supervision (DG), and government support (GS), as the driving and inhibiting effects on the willingness to adopt blockchain traceability. The proposed research model was tested using structural equation model with data collected from 155 manufacturers in China. The results show that PE and GS have a positive impact on the willingness to use blockchain traceability and ND has a negative impact. The results will help government departments and relevant marketers to expand their understanding of the determinants affecting the adoption of blockchain traceability in agriculture, and provide valuable insights for the promotion of blockchain traceability in agriculture.
28.1 Introduction The rapid rise of new technologies has upended the production and trading patterns of many traditional industries, and also brought new opportunities to solve some problems. After the Ethereum concept and blockchain data structure were first proposed in Satoshi Nakamoto in 2008, blockchain based on which gradually came into view. With the development of smart agriculture and agricultural e-commerce, the application of blockchain technology in agricultural supply chain is recognized by the field of operations management and consumers [1]. It is one of the development directions of future research to explore the agricultural supply chain empowered by block chain X. Zhai (B) · X. Hu School of Economics and Management, Dalian University of Technology, 2 Linggong Road, Dalian, Liaoning 116024, People’s Republic of China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_28
317
318
X. Zhai and X. Hu
technology. Based on the existing literature, there are still few studies on the willingness of producers to adopt blockchain in agriculture and the driving factors for its adoption. On this basis, this paper studies the application of blockchain technology in agricultural product traceability, and illustrates the willingness to adopt blockchain traceability and what are the driving factors that producers adopt in agricultural supply chain.
28.2 Theoretical Background 28.2.1 Blockchain Traceability Review Blockchain technology is a revolutionary technology emerging in recent years. Blockchain is defined as a way for businesses, industries and public organizations to conduct public transactions and verify transactions in near real time through distributed ledgers without relying on central authority [1]. The study found that blockchain technology offers the potential to address food safety issues by providing a reliable and accessible record of the entire product trajectory from agriculture to consumption. Blockchain technology has also been proved to have potential in solving information transfer speed, data authenticity and security, providing a theoretical basis for the implementation of agricultural products blockchain traceability. In the study of willingness to adopt blockchain technology, Queiroz and Fosso Wamba studied the impact of performance expectation, social influence, convenience, blockchain transparency and supply chain trust on the adoption of blockchain technology [2]. Clohessy and Acton confirmed that senior management support, organizational readiness, and organizational support are considered to be the main considerations for blockchain technology adoption in the supply chain [3]. Wong verified that facilitation conditions, technological readiness and technological affinity positively affect the application of blockchain technology, while policy support can affect facilitation conditions [4].
28.2.2 TOE Model Technology-organization-environment (TOE framework) is a theoretical framework for the study of Technology adoption proposed by Tornatzky and Fleischer in 1990. This paper aims to analyze the driving forces of enterprise adoption of new technology from three aspects: technology, organization and environment. Although “technology adoption” includes many theoretical models, scholars tend to choose technology-organization-environment framework or innovation diffusion theory and expand variables based on it [5, 6]. Recently, Clohessy drew lessons from this practice
28 The Challenge of Willingness to Blockchain Traceability …
319
and enriched it on the basis of TOE framework, summarizing the factors influencing the adoption of blockchain technology [3].
28.2.3 Hypotheses Development TOE model constitutes the potential theoretical basis of the research model. Based on the research and analysis framework shown in Fig. 28.1, the technical factors, organizational factors and environmental factors are analyzed as follows.
28.2.3.1
Technical Complexity
In Diffusion of Innovations, Rogers explains that there are multiple factors affecting the Diffusion of a technological innovation, including the complexity factor. That is, when an innovative technology is adopted by users, the effort cost and perceived ease that users pay in order to learn the innovative technology. In agriculture, one of the major obstacles to the adoption of new technologies is the ability of agricultural producers and operators to understand the technology [7]. If the application conditions and operation of blockchain traceability technology are too complicated, their adoption will be affected. Based on this, the following hypotheses are proposed: H1. Technology complexity has a negative impact on the willingness to adopt blockchain traceability technology of new agricultural operating entities.
Fig. 28.1 Research framework
320
28.2.3.2
X. Zhai and X. Hu
Prospective Earnings
Golan summarized the private benefits of the application of traceability system as increasing transparency, reducing liability risk, improving logistics efficiency, ensuring market access and obtaining product premium, etc. [8]. Blockchain traceability technology is to disclose product information, so as to enhance consumer trust in product quality. Such investment in innovation can protect brand assets and visibility and have a positive spillover effect on brand value [9]. The improvement of brand value can enhance the bargaining power of products, so as to improve the profits of enterprises on the premise of the same number of sales. From the perspective of technology adoption, the acquisition of private benefits will be transformed into the perception of “usefulness” of traceable technologies, and the stronger the perception, the better the incentive effect [10]. Based on this, the following hypothesis is proposed: H2. Prospective earnings have a positive impact on the willingness of new agricultural business entities to adopt blockchain traceability technology.
28.2.3.3
Network Diversity
Network diversity represents the amount of variation of network partner types [11], and a high level of network diversity means that members of the network are different from each other [12]. In the agricultural supply chain, the high cost of coordinating with diverse partners may slow down the absorption of external resources and may also reduce the efficiency of enterprises in converting resources from central locations into new products [11]. At the same time, in the interconnected ego network, the shared norms with multiple partners may also limit the enterprise’s innovation autonomy and willingness to apply industry information [13]. Based on this hypothesis is proposed: H3. Network diversity has a negative impact on the willingness of new agricultural operating subjects to adopt blockchain traceability technology.
28.2.3.4
Organizational Innovation
Innovation of an organization refers to the degree to which an organization is willing to actively adopt new management techniques and technologies to improve the organization. Existing studies consider that innovation is the key influencing factor of NEW technology adoption [14, 15], has been shown to influence organizational change, such as the adoption of information systems. In the face of new technologies, the more innovative the organization is, the more willing it is to accept and understand new technologies, and therefore, the more likely it is to adopt new technologies. Based on this, the following hypothesis is proposed: H4. Organizational innovation has a positive impact on the willingness of new agricultural operating subjects to adopt blockchain traceability technology.
28 The Challenge of Willingness to Blockchain Traceability …
28.2.3.5
321
Degree of Government Supervision
For agricultural products, the government will supervise and standardize the unqualified agricultural products, and even take the products off the shelves and impose fines [16]. Resende-filho pointed out that no matter whether the traceability system is mandatory or not, decision-makers will embed external beliefs into their own cognitive structure of the importance of the system and affect their management decisions [17]. Therefore, when the government strictly supervises the quality of agricultural products, agricultural subjects have a high perception of risk loss, which will stimulate the willingness to adopt blockchain traceability system. Based on this hypothesis is proposed: H5. The degree of government supervision has a positive impact on the willingness to adopt blockchain traceability technology of new agricultural operation subjects.
28.2.3.6
Government Support
Agriculture has always been a key area of concern to all countries and has been supported by preferential policies in recent years. Existing studies have proved that government subsidies can stimulate production and investment activities with external effects [18]. At the same time, several papers have considered the impact of government support on blockchain adoption [19, 20]. Based on this hypothesis is proposed: H6. The intensity of government support has a positive impact on the willingness of new agricultural operation subjects to adopt blockchain traceability technology.
28.3 Research Methodology 28.3.1 Sampling Design and Data Collection We collected relevant information about new agricultural subjects through project practice, and stratified sampling was conducted according to regional information. A total of 196 new-type agricultural subjects participated in the survey. A total of 167 questionnaires were screened by setting questions, and 12 questionnaires with missing data were removed. Finally, 155 valid questionnaires were recovered. All items were measured using a 7-point Likert scale (i.e., strongly disagree, strongly agree). Then, we operate these potential variables by taking the average value of the index corresponding to each structure, and refer to Liu [21] for the specific calculation method except network diversity.
322
X. Zhai and X. Hu
28.3.2 Structural Equation Model (SEM) Analysis 28.3.2.1
Measurement Model
Our model is developed from previous literature [2, 22, 25]. Thus, these structures and their indicators were chosen because they were considered appropriate to better explain the adoption of technological behavior [21, 22]. The selected indicators and references will be presented in the appendix. Based on the microscopic data of 155 new agricultural operation subjects, the structural equation model is used to analyze and test the theoretical model of influencing factors on the adoption of blockchain traceability. SPSS 26.0 and AMOS Graphics were used to analyze the reliability and validity of technical factors, internal factors, external factors and other modules. Gender, educational background and product type were selected as control variables by referring to existing literature, and the theoretical model was modified according to the test results. Finally, the test results were explained. Third column in data Table 28.1 shows the external loads. It is obvious that all values are above the threshold 0.7 except for PE4, which is 0.654 [23]. Cronbach’s alpha, compound reliability and AVE values of all structures are given in the last three columns in Table 28.1. All of the values exceed the threshold because they are at 0.70, 0.70, and 0.50, respectively. Therefore, it is reasonable to use all structures in the proposed research model. Furthermore, an AVE of more than 0.50 for each structure indicates that the observed item variance can account for the hypothetical construct. Therefore, it can be said that the model has passed the test of reliability and validity.
28.3.2.2
Structural Model
The results of hypothesis testing with β value, p value and t value are shown in Fig. 28.2. Excluding the interference of gender, educational background and product type, PE had a significant positive effect on BI of new agricultural subjects (β = 0.425, P < 0.001). Therefore, assume that H2 is supported. It was found that ND had a negative effect on BI (β = −0.255, P < 0.001). Therefore, the sample data supports hypothesis H3. GS had a significant positive effect on BI (β = 0.521, P < 0.001), so the data validated H6. TC (β = −0.058, P = 0.387), OI (β = −0.046, P = 0.563), DG (β = 0.067, P = 0.369), p values were all greater than 0.05, so there was no correlation between TC, OI, DG and BI. That is, H1, H4 and H5 are not supported by data. Hypothesis 1, 4, 5, not formed reasons may include the following aspects: one is surveyed object for managers of new agricultural management main body, is not responsible for relevant technical operation, and technology in the promotion, the relevant developers on the market will be simplified as much as possible to the operation of the equipment, so as to solve technical operation complexity; Secondly, the respondents all have a strong sense of innovation, but at the same time, many of
28 The Challenge of Willingness to Blockchain Traceability …
323
Table 28.1 Outer loadings, Cronbach’s Alpha, CR and AVE values Construct
Item
Loadings
Cronbach’s Alpha
CR
AVE
TC
TC1
0.729
0.852
0.855
0.665
TC2
0.887 0.819
0.824
0.541
0.894
0.896
0.634
0.903
0.903
0.756
0.916
0.917
0.690
0.947
0.947
0.899
PE
OI
DG
GS
BI
TC3
0.822
PE1
0.743
PE2
0.789
PE3
0.749
PE4
0.654
OI1
0.793
OI2
0.831
OI3
0.753
OI4
0.792
OI5
0.810
DG1
0.851
DG2
0.898
DG3
0.858
GS1
0.796
GS2
0.871
GS3
0.830
GS4
0.853
GS5
0.800
BI1
0.958
BI2
0.938
Fig. 28.2 Outcome of the structural model examination using AMOS graphics
324
X. Zhai and X. Hu
them are willing to choose innovation from other perspectives in order to achieve higher profits. Therefore, the sense of innovation alone does not affect the willingness to adopt a single technology. Third, although the government’s supervision improves the quality awareness of agricultural producers, managers prefer simple and low-cost methods such as inspection rather than constructing digital equipment for traceability. Therefore, the government’s supervision intensity has no significant impact on the willingness to adopt blockchain traceability.
28.4 Discussion Although many achievements have been made in blockchain research in recent years, there is not enough practical research on this topic, especially research on blockchain adoption behavior of agricultural producers. Therefore, this study aims to bridge this gap and help understand the willingness to adopt blockchain traceability technology in China’s agricultural sector through a survey of agricultural organization managers.
28.4.1 Implications for Research In this study, we developed and tested a model based on TOE to better understand the organizational adoption behavior of blockchain traceability, a disruptive and emerging technology. The research model refers to the framework of Queiroz 2019, and is extended based on the social network theory to enrich relevant research. Through field research, data were collected for hypothesis verification, further enriching empirical studies on the adoption of producer blockchain in the agricultural field. In this study, we apply SEM to the Chinese environment, and the proposed model provides important insights into blockchain adoption and agricultural supply chain literature.
28.4.2 Implications for Practice Our statistical results show interesting findings about blockchain behavior adoption in China. Compared with TC and DG, Agricultural producers in China pay more attention to benefits after technology adoption and government subsidy policies when considering the introduction of blockchain technology. This finding helps scholars and practitioners better understand and promote how organizations act when promoting blockchain. In the direction of technology publicity, the direct and indirect benefits brought by the technology should be highlighted. Establishing demonstration bases and achieving good results will promote the speed of technology promotion. In addition, since blockchain traceability is based on digital agriculture, cost is a
28 The Challenge of Willingness to Blockchain Traceability …
325
concern of the research subjects. Government subsidy policies can ease investment pressure and promote adoption willingness, which also reinforces recent studies on blockchain adoption [24]. Enterprises with diversified partners in the supply chain are less willing to adopt the technology, so the target of policy subsidies should be such to encourage these enterprises.
28.5 Conclusions Based on the TOE framework of technology adoption theory and social network theory, our study identifies the main technical, organizational and environmental factors that influence an organization’s willingness to adopt blockchain traceability. We also introduced the research background, challenges and opportunities faced by China’s agricultural sector, as well as the application advantages and potential requirements of blockchain technology in the agricultural sector. This study has certain practical significance and theoretical guidance for the popularization of technology. But there are several limitations to the study. First of all, our model only applies to The Chinese environment and has limited practical guidance for other countries. Therefore, future studies in different countries can be discussed. Secondly, in this study, TC, OI and DG have no impact on blockchain traceability, which requires more studies and discussions in other countries. Acknowledgements This work is supported by the National Key R&D Program of China under Grant #2019YFD1101103.
Appendix A: Indicators of the Research Model
Construct
Cod
Indicators
Adapted from
Behavior intention
BI1
Willing to try to introduce blockchain traceability technology
Maruping et al. (2017)
BI2
Resources will be invested in the blockchain traceability system
TC1
Blockchain traceability Martins (2016) involves software systems that are difficult to operate
TC2
Data related work in blockchain traceability systems is time consuming
Technical complexity
(continued)
326
X. Zhai and X. Hu
(continued) Construct
Prospective earnings
Organizational innovation
Network diversity
Cod
Indicators
TC3
The daily operation of blockchain traceability system takes a lot of energy
Adapted from
PE1
The application of blockchain Queiroz and Fosso Wamba traceability technology can (2019), Venkatesh et al. improve the ability of product (2003) quality and safety monitoring
PE2
The application of blockchain traceability technology can realize data transparency and reduce fraud
PE3
The application of blockchain traceability technology can save time and reduce costs
PE4
The application of blockchain traceability technology can increase company revenue
OI1
Tend to be creative in my work
OI2
Tend to accept the new agricultural information technology
OI3
Want to make some changes to the existing model
OI4
Be willing to accept the challenge
OI5
Like to try new ways of doing things
ND1
Number of suppliers (purchase of machinery, seed and fertilizer, etc.) cooperating with your company (organization)
ND2
The number of customers cooperating with your company (organization) is (for example, how many channel customers are the products sold by your hand)
ND3
The number of universities or research institutes with which the company (organization) cooperates
Jeon et al. (2006)
Liu (2019)
(continued)
28 The Challenge of Willingness to Blockchain Traceability …
327
(continued) Construct
Government support
Degree of government supervision
Cod
Indicators
ND4
The number of public institutions such as banks and governments that your company (organization) cooperates with
ND5
What other types of organizations do you work with and how many do you have
DG1
The government often inspects the production process
DG2
The degree of government supervision is strict
DG3
The government has imposed heavy penalties on non-compliant products
GS1
The government encourages organizations to develop the application of blockchain traceability technology
GS2
The government has issued corresponding policy documents to help organizations carry out the application of blockchain traceability system
GS3
If blockchain traceability technology is adopted, the government will give corresponding financial subsidies
GS4
The government has provided various forms of support for organizations to introduce blockchain traceability systems
GS5
The government encourages the adoption of blockchain traceability systems by promoting successful case studies and technical training
Adapted from
He (2016)
lv (2020)
328
X. Zhai and X. Hu
References 1. Ronaghi, M.H.: A blockchain maturity model in agricultural supply chain. Inf. Process. Agric. (2020) 2. Queiroz, M.M., Wamba, S.F.: Blockchain adoption challenges in supply chain: an empirical investigation of the main drivers in India and the USA. Int. J. Inf. Manage. 46, 70–82 (2019) 3. Clohessy, T., Acton, T., Rogers, N.: Blockchain adoption: technological, organisational and environmental considerations. Business Transformation Through Blockchain, pp. 47–76. Palgrave Macmillan, Cham (2019) 4. Wong, L., Tan, G., Lee, V., Ooi, K., Sohal, A.: Unearthing the determinants of Blockchain adoption in supply chain management. Int. J. Prod. Res. 58(7), 2100–2123 (2020) 5. Alkhater, N., Walters, R., Wills, G.: An empirical study of factors influencing cloud adoption among private sector organizations. Telematics Inform. 35(1), 38–54 (2018) 6. Sun, S., Cegielski, C.G., Jia, L., Hall, D.J.: Understanding the factors affecting the organizational adoption of big data. J. Comput. Inform. Syst. 58(3), 193–203 (2018) 7. Ningbo, C.: Analysis of farmers’ technology adoption behavior based on modern agricultural development. Acad. Exch. 01, 81–84 (2010) 8. Golan, E.H., et al.: Traceability in the US Food Supply: Economic Theory and Industry Studies. No. 1473-2016-120760 (2004) 9. Zhang, Y.: Threshold effect analysis of corporate social responsibility, innovation investment and brand value promotion. Bus. Econ. Res. 04, 86–89 (2021) 10. Venkatesh, V., Davis, F.D.: A theoretical extension of the technology acceptance model: four longitudinal field studies. Manage. Sci. 46(2), 186–204 (2000) 11. Fang, E., Lee, J., Palmatier, R., Han, S.: If it takes a village to foster innovation, success depends on the neighbors: the effects of global and ego networks on new product launches. J. Mark. Res. 53(3), 319–337 (2016) 12. Jackson, S.E., Joshi, A.: Work team diversity. In: APA Handbook of Industrial and Organizational Psychology, Vol 1: Building and Developing the Organization, pp. 651–686. American Psychological Association (2011) 13. Burt, R.S.: Autonomy in a social topology. Am. J. Sociol. 85(4), 892–925 (1980) 14. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: Blockchain challenges and opportunities: a survey. Int. J. Web Grid Serv. 14(4), 352–375 (2018) 15. Angelis, J., da Silva, E.R.: Blockchain adoption: a value driver perspective. Bus. Horiz. 62(3), 307–314 (2019) 16. He, P., Liyuan, Z.: Study on the incentive factors for pig farmers to adopt and implement ear tag traceability system. Agric. Mod. Res. 37(04), 716–724 (2016) 17. Resende-Filho, M.A., Buhr, B.L.: A principal-agent model for evaluating the economics value of a traceability system: a case study with injection-site Lesion control in fed cattle. Am. J. Agr. Econ. 4, 1091–1102 (2008) 18. Guoqiang, C., Mande, Z.: Agricultural subsidy system and policy choice in the middle stage of China’s industrialization. Manage. World 01, 9–20 (2012) 19. Schuetz, S., Venkatesh, V.: Blockchain, adoption, and financial inclusion in India: research opportunities. Int. J. Inf. Manage. 52, 101936 (2020) 20. Crosby, M., Nachiappan Pattanayak, P., Verma, S., Kalyanaraman, V.: Blockchain technology: beyond bitcoin. Appl. Innovation Rev. 1(2), 6–10 (2016) 21. Liu, Y., Jia, X., Jia, X., Koufteros, X.: CSR orientation incongruence and supply chain relationship performance—a network perspective. J. Oper. Manage. 67(2), 237–260 (2021) 22. Maruping, L.M., et al.: Going beyond intention: integrating behavioral expectation into the unified theory of acceptance and use of technology. J. Assoc. Inf. Sci. Technol. 68(3), 623–637 (2017) 23. Hair, J.F., Hult, G.T.M., Ringle, C.M., Sarstedt, M.: A primer on partial least squares structural equation modeling. Long Range Planning, vol. 46, 2nd edn. Sage Publications (2017)
28 The Challenge of Willingness to Blockchain Traceability …
329
24. Lindman, J., Tuunainen, V.K., Rossi, M.: Opportunities and risks of blockchain technologies— a research agenda. In: Proceedings of the 50th Hawaii International Conference on System Sciences, Hawaii (2017) 25. Martins, R., Oliveira, T., Thomas, M.A.: An empirical analysis to assess the determinants of SAAS diffusion in firms. Comput. Hum. Behav. 62, 19–33 (2016)
Part V
Intelligent Processing and Analysis of Multidimensional Signals and Images
Chapter 29
Decorrelation of a Sequence of Color Images Through Hierarchical Adaptive Color KLT Roumen Kountchev and Roumiana Kountcheva
Abstract In this work is presented new algorithm for Hierarchical Adaptive Color Karhunen–Loève transform (HACKLT) aimed at the decorrelation of sequences of color RGB images. To increase the decorrelation, on each triad of components of same color, obtained from three consecutive images, is applied Color KLT (CKLT). To achieve maximum decorrelation, for each three sequential color images is detected the optimum orientation of the 3-component vectors used for the calculation of their covariance matrix of size 3 × 3, in one of the directions x, y or z. The so chosen orientation enhances the diagonalization of the covariance matrix. The hierarchical organization of the algorithm permits the calculations to stop in the level, in which the covariance matrix is diagonalized. Besides, by using HACKLT is achieved very high power concentration in the first eigen images of the decomposition. Compared to the “classic” KLT, used for the vectors which represent a sequence of images and are oriented along the direction “z” (for example, time), the computational complexity of the new algorithm is twice lower. The high efficiency of HACKLT in respect of the decorrelation defines its most suitable application areas: video compression, computer vision, extraction of features for objects recognition, objects tracking in video sequences, etc.
29.1 Introduction Sources of correlated sequences of color images are video and photo cameras, surveillance TV cameras, endoscopy devices, medical scanners for pseudo-color image sequences, etc. In most cases, the color video frames in the non-compressed RGB sequence with frequency 25 fps, are highly correlated. The related investigations show that the inter-frame correlation of such sequences usually involves 8–12 R. Kountchev (B) Technical University of Sofia, Sofia 1000, Bulgaria e-mail: [email protected] R. Kountcheva TK Engineering, Sofia 1582, Bulgaria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_29
333
334
R. Kountchev and R. Kountcheva
frames [1]. The highest correlation exists between components of the same color, for example, sequences of R, G or B components. The decorrelation of sequences of RGB images permits to achieve: (1) efficient compression through reduction of the color information redundancy; (2) reduction of the number of needed features for objects recognition based on their color; (3) objects segmentation in video sequences, based on their color, etc. For the implementation of the decorrelation of sequences of images are already developed various methods and algorithms, presented in many monographs [1–7]. These methods could be divided into two basic groups. To the first group belong methods aimed at the correlation reduction between the RGB components of a single color image. To achieve this is used the RGB space transform, based on the Karhunen-Loève Transform (KLT) [8–12], which is optimum in means of the mean square error values. It is also known as the Hotelling Transform (HT) or the Principal Component Analysis (PCA). Its most important advantage in respect of the famous deterministic color transforms of the kind YCrCb, YUV, YIQ, YCoCg, RCT, HSV, HSL, CMY, PhotoYCC, Lab, etc. [1, 4, 5], is the maximum decorrelation of the transformed eigen color components. The main disadvantage of the color KLT is the higher computational complexity compared to that of the deterministic transforms. Together with the fast development of the contemporary computer technologies, this disadvantage is easily overcome. The second group of methods is aimed at the correlation reduction in a sequence of images. To this group belong the well-known 3D orthogonal transforms: the Discrete Fourier Transform, the Walsh-Hadamard Transform, the Discrete Wavelet Transform, etc., [6, 10], and also the KL and SVD transforms [11, 12]. The “classic” SVD is usually used for single image decomposition, while KLT is aimed at vectors representing pixels of same spatial position in each image from the sequence. As a result of the KLT implementation, the covariance matrix of the vectors is diagonalized. In case that the sequence comprises F color images, the covariance matrix of the vectors is of size 3F × 3F. The number of processed frames in one video sequence is defined by the correlation range length, and for the case when it involves 12 frames, the size of the corresponding covariance matrix is 36 × 36. In this case its characteristic polynomial is of order 36, and the calculation of its roots (eigen values), needs the execution of operations of high computational complexity [13]. To overcome the problem are already developed various iterative methods for calculation of the covariance matrix eigen vectors, which build the KLT matrix. For this are used the QR decomposition [14] and the Householder’s transforms [15], the methods of Jacobi and Givens [16, 17], and also—calculation by using neural networks [18] of the kind Generalized Hebbian, or Multilayer Perceptron Networks, etc., which need high number of iterations. A fast KLT algorithm (FKLT) is known for the particular case, when the images are represented through first-order Markov model [19]. In correspondence with the algorithm for PCA randomization [20], on the basis of an accidental choice are selected certain number of rows (or columns) of the covariance matrix, and on the basis of this approximation, the computational complexity of the KLT is reduced. As an alternative, here is proposed one new algorithm called Hierarchical Adaptive KLT (HAKLT), which has lower computational complexity and does not need
29 Decorrelation of a Sequence of Color Images Through Hierarchical …
335
iterations [21, 22]. In this case, the processed sequence of F images is divided into groups of three images, and on the 3-component vectors for each group is applied KLT, whose transform matrix is defined by exact mathematical relations [21]. In this work is developed the algorithm for hierarchical adaptive color KLT (HACKLT), which combines the algorithms HAKLT and Adaptive Color KLT (AC-KLT) and permits to achieve higher decorrelation for the processed sequence of color RGB images. The paper is arranged as follows. In Sect. 29.2 is given a brief presentation of the algorithm CKLT for a single color image; in Sect. 29.3 is described the algorithm Hierarchical Adaptive Color KLT for a group of 3 RGB images; in Sect. 29.4 is defined the choice of the orientation direction of the input color vectors; in Sect. 29.5 is given the principle used to set the number of hierarchical levels and the decorrelation properties of HACKLT; Sect. 29.6 presents the results of the HACKLT algorithm modeling, and Sect. 29.7 contains the Conclusions.
29.2 Hierarchical Adaptive Color KLT for Decorrelation of a Sequence of Color Images The block diagram on Fig. 29.1 illustrates the HACKLT algorithm for the case when the processed sequence contains 3 color images only (F = 3). In this case, the total number of images (R, G, B color components) in the sequence is D = 3F = 9. To increase the decorrelation efficiency, CKLT is applied on each three color components which are rearranged in accordance with the figure. This rearrangement takes into account the higher correlation between components of same color (R, G or B) in the sequence of three color images, compared to the correlation between the R, G, B components of one single image. As a result of the rearrangement, after applying the CKLT on the rearranged triad of components, the power concentration in the first component (the eigen image) is increased at the expense of the remaining two, and the power of the third component is the lowest. The CKLT is executed three times for each triad of color components, in two hierarchical levels. In the first level of HACKLT (Fig. 29.1), for each triad of components of same color, are executed the transforms CKLT-i1 for i = 1, 2, 3. In result are obtained the corresponding eigen images, which have different power concentration. On the figure, the eigen images of highest power are colored in yellow, in blue—these of lower power, and the eigen images of lowest power—in green. Prior to applying CKLT-i2 in the next level, the triads of eigen images are rearranged. The rearrangement is adaptive, and is executed in accordance with the requirement for continuous power reduction: P1 ≥ P2 ≥ . . . , P9 , where Pd =
M N i=1 j=1
L i,d j , for d = 1, 2, . . . , 9.
(29.1)
336
R. Kountchev and R. Kountcheva
1=111
2=122
3=211
4=222
5=311
6=322
7=133
8=233
9=333
111
122
211
222
311
322
133
233
333
233
333
Output Eigen Images
Adaptive rearrangement for level 2 in GOI
111
211
311
122
CKLT-12
AKLT Level 2
11
12
222
322
133
CKLT-22
13
21
22
CKLT-32
23
31
32
33
23
33
Adaptive rearrangement for level 1 in GOI
11
21
31
12
CKLT-11
AKLT Level 1
11
21
22
32
13
CKLT-21
31
12
22
CKLT-31
32
13
23
33
32
33
Initial rearrangement in GOI
11
12
13
21
22
23
31
Input Images RGB image 1
GOI
RGB image 2
RGB image 3
Fig. 29.1 Two-level HACKLT algorithm for a sequence (group) of 3 color images (GOI)
29 Decorrelation of a Sequence of Color Images Through Hierarchical …
337
Here L i,d j is the (i,j)th pixel of dth eigen image, and D = 9 is the number of the eigen images Ld for d = 1,2,…,9, which corresponds to the number of the rearranged input components. On Fig. 29.1 is shown one of the possible rearrangements of the eigen images in the first level, in correspondence with Eq. (29.1). In this case, the first three eigen images (colored in yellow) represent these of highest power for each transformed triad of eigen images, before the rearrangement. The next three rearranged eigen images (blue) are these of lower power, and the last three (green) are of lowest power. The transform of the input 9 components through CKLT and after that—the adaptive rearrangement of the so obtained eigen images in correspondence with their power, is called adaptive CKLT (ACKLT). In the second level of HACKLT, all ACKLT operations for the eigen images from the first level are executed again in a similar way. The total number of algorithm levels n, which in the presented case is n = 2, is defined in accordance with the considerations given in detail in Sect. 29.5. In result, the final output sequence consists of eigen images, arranged following their decreasing power. The main part of the power of the initial input sequence of 9 RGB images, after the processing is concentrated in the first three eigen images shown in the upper part of Fig. 29.1, which have the darkest grey color. In the next sections are given the details about the operations in the sequential steps and hierarchical levels of the HACKLT algorithm.
29.3 Color KLT for Single Image RGB Components Here, in correspondence with [22] is presented the CKLT for a group of color vectors Cs = [C1s , Cs2 , C3s ]T where s = 1,2,…,S (S = M × N), as shown on Fig. 29.2. The components of these vectors correspond to the pixels of same spatial position in the matrix images [C1 ], [C2 ] and [C3 ], each of size M × N. In result of the CKLT are obtained the corresponding output vectors L s = [L 1s , L s2 , L 3s ]T .
C11 C12 C13
r L11 C1 r C31 C2 r L12 C32 C3 r C33 L13 C4 C34 L14
L21
r L1 r L31 L2 r L32 L3 r L4 L33 L34
C14
C21 C22 C23 C24
[C1]
[C2]
[C3]
[L1]
[L2]
[L3]
M×N
M×N
M×N
M×N
M×N
M×N
L22
L23 L24
CKLT
Fig. 29.2 The RGB components of the original color image and the corresponding components L1 L2 L3 obtained after CKLT
338
R. Kountchev and R. Kountcheva
The direct and inverse CKLT for vectors transform are represented by the relations: s + μ s − μ C and Cs = []T L C L s = [] C
(29.2)
or: ⎡
⎤ ⎡ L 1s 11 ⎣ L 2s ⎦ = ⎣ 21 L 3s 31 ⎤ ⎡ ⎡ 11 C1s ⎣ C2s ⎦ = ⎣ 12 C3s 13
⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ 12 13 C1 C1s 22 23 ⎦ ⎝⎣ C2s ⎦ − ⎣ C 2 ⎦⎠; 32 33 C3s C3 ⎤⎡ ⎤ ⎡ ⎤ 21 31 C1 L 1s 22 32 ⎦ ⎣ L 2s ⎦ + ⎣ C 2 ⎦ 23 33 L 3s C3
where T
μ C = C 1 , C 2 , C 3 ; C 1 = E(C1s ); C 2 = E(C2s ); C 3 = E(C3s ) for x = E(xs ) =
S 1 xs ; S s=1
m1 = Am /Pm ;m2 = Bm /Pm ;m3 = Dm /Pm , for m = 1, 2, 3; Am = (k3 − λm )[k5 (k2 − λm ) − k4 k6 ]; Bm = (k3 − λm )[k6 (k1 − λm ) − k4 k5 ]; Dm = k6 [2k4 k5 − k6 (k1 − λm )] − k52 (k2 − λm ); Pm = A2m + Bm2 + Dm2 ; 2 2 2 2 k1 = E C1s − C 1 ; k2 = E C2s − C2 ; 2 2 k3 = E C3s − C 3 ; k4 = E(C1s C2s ) − C 1 C 2 ; k5 = E(C1s C3s ) C 1 C 3 ; k6 = E(C2s C3s ) − C 2 C 3 ; | p| | p| φ a φ+π a λ1 = 2 cos − ; λ2 = −2 cos − ; 3 3 3 3 3 3 | p| φ−π a λ3 = −2 cos − ; 3 3 3
(29.3)
(29.4)
(29.5)
(29.6)
(29.7)
(29.8)
(29.9)
29 Decorrelation of a Sequence of Color Images Through Hierarchical …
q = 2(a/3)3 − (ab)/3 + c; p = − a 2 /3 + b; φ = arccos −q/2/ (| p|/3)3 ; a = −(k1 + k2 + k3 ); b = k1 k2 + k1 k3 + k2 k3 − k42 + k52 + k62 ; c = k1 k62 + k2 k52 + k3 k42 − (k1 k2 k3 + 2k4 k5 k6 ).
339
(29.10)
(29.11) (29.12)
The algorithm CKLT presented through Eqs. (29.2)–(29.12), is executed for each triad of images in the first and second level of HACKLT, in correspondence with Fig. 29.1.
29.4 Choice of the Input Color Vectors Orientation Each triad of images, which consists of the color components R, G or B, is represented by the color vectors Cs (u) = [C1s (u), C2s (u), C3s (u)]T for s = 1,2,…,S and u = 1,2,3. Here u represents one of the possible vector orientations (horizontal, vertical and lateral). The choice of u is based on the distribution of the coefficients of their covariance matrices.
29.4.1 Covariance Matrices for Vectors Orientation The covariance matrices of size 3 × 3 for the vectors Cs (u) are calculated in accordance with the relation: C (u)μ CT (u) [K C (u)] = E C s (u)C sT (u) − μ ⎡ ⎤ k1 (u) k4 (u) k5 (u) (29.13) = ⎣ k4 (u) k2 (u) k6 (u) ⎦ for u = 1, 2, 3. k5 (u) k6 (u) k3 (u)
29.4.2 Analysis of the Covariance Matrices To estimate the correlation of vectors Cs (u), is used the ratio (u) of the sums of the squares of the coefficients placed outside the main diagonal of the matrix [K C (u)], and those on the diagonal:
340
R. Kountchev and R. Kountcheva
o
o
o
o 4o o
o o
o
o r Cs (1) for s=1,2,..,9
o o
a o o
o
o o 7o o
o
o
o o
o
o
o
o
o
o 4o o
o
o
o o 7o o
o
o
o o
o
o
o r Cs(2) for s=1,2,..,9
o
b o
o
o
o 4o o
o
o
o 4o o 7 o o
o
o
o 7o o
o
o
o
c o
o
o r Cs (3) for s=1,2,..,9
Fig. 29.3 a, b, c Orientation of the color vectors Cs (u) for u = 1,2,3 in one of the directions x, y, z 2 2 2 2 2 2 (u) = 2 k4 (u) + k5 (u) + k6 (u) / k1 (u) + k2 (u) + k3 (u) for u = 1, 2, 3.
(29.14)
This ratio gets the maximum value when the vectors Cs (u) of same orientation u, have the highest correlation.
29.4.3 Choice of Vectors’ Orientation The optimal orientation u0 of the vectors Cs (u 0 ) is determined by the condition for maximizing the ratio below: (u 0 ) = max{(1), (2), (3)},
(29.15)
where • if u0 = 1: horizontal orientation of the vectors Cs (1) is chosen (Fig. 29.3a); • if u0 = 2: vertical orientation of the vectors Cs (2) is chosen (Fig. 29.3b); • if u0 = 3: lateral orientation of the vectors Cs (3) is chosen (Fig. 29.3c).
29.5 Setting the HACKLT Structure 29.5.1 Number of Hierarchical Levels The minimum number of levels nmin needed for the execution of HACKLT for a sequence of F color images, is defined through analysis of the decorrelation degree for the corresponding transformed D-dimensional vectors, obtained after each hierarchical level. For this, after the calculation of the first HACKLT level, from the transformed vectors L s for each 3-component group are obtained the corresponding D-dimensional vectors L 1s = [L 11s , L 12s , . . . , L 1Ds ]T . After rearrangement of the components of each vector L 1s , it is transformed into the vector
29 Decorrelation of a Sequence of Color Images Through Hierarchical …
341
L 1s (r ) = [L 11s (r ), L 12s (r ), . . . , L 1Ds (r )]T . At this moment must be taken the decision to continue with the second level of HACKLT, or to stop the transform. For this is analyzed the covariance matrix [K L1 (r )] of the rearranged vectors L 1s (r ), for s = 1,2,…,S, which defines the achieved decorrelation in the first level. In case that the decorrelation of the rearranged vectors is full, their matrix [K L1 (r )] is diagonal and the algorithm is stopped. The proposed HAKLT permits the processing to stop earlier, despite full decorrelation is not achieved, if the result is satisfactory. For this is used the inequality below: ⎧ D D ⎨ ⎩
i=1 j=1
⎫ D D ⎬ [ki, j (r )]2|(i= j) / [ki, j (r )]2|(i= j) ≤ δ. ⎭
(29.16)
i=1 j=1
Here ki, j (r ) is the (i,j)th element of the matrix [K L1 (r )], and δ is a threshold of small value, set in advance. In case that the condition is satisfied, the processing stops. Else, it continues with the next (second) HACKLT level. When the calculations for the second level are finished, the result is checked again, but in this case ki, j (r ) are the elements of the matrix [K L2 (r )] of the rearranged vectors L 2s (r ), etc.
29.5.2 Evaluation of the Decorrelation Properties of HACKLT These qualities of the algorithm correspond to the decorrelation degree of the output eigen images, which defines the number of needed execution levels, n. As an example, here is given the evaluation of the decorrelation properties of the algorithm for the case D = 9. In the first hierarchical level of HACKLT, the covariance matrix [K L1 ] of size 9 × 9 which represents the transformed 9-component vectors L 1s = [L 11s , L 12s , .., L 19s ]T prior to rearrangement, is:
[K L1 ] = E L 1s . L 1s T
⎡ ⎤ [K L1,1 ] [K L1 1 ,L 2 ] [K L1 1 ,L 3 ] T − E L 1s .E L 1s = ⎣ [K L1 1 ,L 2 ] [K L1,2 ] [K L1 2 ,L 3 ] ⎦. [K L1 1 ,L 3 ] [K L1 2 ,L 3 ] [K L1,3 ] (29.17)
1, p
Here the sub-matrices [K L ] for p = 1, 2, 3, defined as:
1, p
KL
p 1, p T p p = E L 1, − E L 1, .E L 1, s .L s s s
T
⎤ 1, p 0 0 λ1 ⎥ ⎢ p = ⎣ 0 λ1, 0 ⎦ (29.18) 2 1, p 0 0 λ3 ⎡
342
R. Kountchev and R. Kountcheva
are the covariance matrices of the transformed vectors L s = [L s1 , L s2 , L s3 ]T in 1, p 1, p 1, p the group p for the first level, and λ1 , λ2 , λ3 are the eigen values of the covariance 1, p p p p p matrices [K C ] of the 3-component vectors Cs = [ Cs1 , Cs2 , Cs3 ]T in each group. Respectively, 1, p
1, p
1, p
1, p
T T 1, p 1,k − E L .E L K L1 p ,L k = E L s1, p . L 1,k for p, k = 1, 2, 3, (p = k) s s s (29.19)
are the mutual covariance matrices of size 3 × 3 of the 3-component vectors L s and L 1,k s for the couples of groups p and k from the first level. In the second hierarchical level of HACKLT the covariance matrix [K L2 ] of size 9 × 9 of the transformed vectors L 2s = [L 21s , L 22s , .., L 29s ]T , obtained from the rearranged vectors L 1s (r ) = [L 11s (r ), L 12s (r ), . . . , L 19s (r )]t in the first level, is represented as: 1, p
2
K L = E L 2s . L 2s T
⎡ ⎤ [K L2,1 ] [K L2 1 ,L 2 ] [K L2 1 ,L 3 ] T − E L 2s .E L 2s = ⎣ [K L2 1 ,L 2 ] [K L2,2 ] [K L2 2 ,L 3 ] ⎦. [K L2 1 ,L 3 ] [K L2 2 ,L 3 ] [K L2,3 ] (29.20)
Here
2, p
KL
T = E L s2, p . L s2, p T − E L s2, p .E L s2, p ⎤ ⎡ 2, p 0 0 λ1 ⎥ ⎢ = ⎣ 0 λ22, p 0 ⎦ for p = 1, 2, 3 2, p 0 0 λ3
(29.21)
2, p 2, p 2, p 2, p are the covariance matrices of the transformed vectors L s = [L s1 , L s2 , L s3 ]T 2, p 2, p 2, p 1, p in the group p; λ1 , λ2 , λ3 —eigen values of the covariance matrices [K L (r )] 1, p 1, p 1, p 1, p of the 3-component vectors L s (r ) = [L s1 (r ), L s2 (r ), L s3 (r )]T in the group p, obtained after the rearrangement in the first level.
T T − E L s2, p .E L 2,k K L2 p ,L k = E L s2, p . L 2,k for p, k = 1, 2, 3(p = k) s s (29.22)
are the mutual covariance matrices of size 3 × 3 of the 3-component vectors 2, p L s and L 2,k in the couples of groups p and k in the second level. After rears rangement, the vectors L 2s = [L 21s , L 22s , . . . , L 29s ]T are transformed into vectors L 2s (r ) = [L 2s1 (r ), L 2s2 (r ), . . . , L 2s9 (r )]T . To evaluate the result of the HACKLT, is
29 Decorrelation of a Sequence of Color Images Through Hierarchical …
343
analyzed the covariance matrix [K L2 (r )] of the rearranged vectors L 2s (r ), from which in accordance with Eq. (29.16) is defined the achieved decorrelation degree in the second level.
29.6 Results for the HACKLT Modeling For the algorithm modeling were used color video sequences containing frames of 25 fps. On the basis of the 2-levels HACKLT algorithm, shown on Fig. 29.1, for the experiments below were used sequences of cropped color frames of size 720 × 576 pixels, 24 bpp. On Fig. 29.4 are shown three consecutive frames, numbered 69, 70 and 71, and their RGB components. For this example, the maximum value of the coefficient (r0 ) is got for r0 = 1. In columns 2 and 3 of Table 29.1 are given the values of the power (P) and the relative power (RP) for the 9 output eigen images, obtained after applying the HACKLT algorithm on the sequence of frames 69, 70 and 71. In columns 4 and 5 of the Table are given the mean values of P and PR for the same 9 eigen images, after transforming the sequence of 3 × 7 = 21 RGB images and 9 × 7 = 63 components. On the basis of the data from Table 29.1, on Fig. 29.5 is shown the distribution of the powers and the relative powers of the 9 eigen output images for the sequence of 3 color images from Fig. 29.4. From Table 29.1 it follows that the mean power of the first eigen image for the sequence is more than 250 times larger than that of each of the next 8 eigen images. The analysis of the algorithm modeling confirms the high power concentration of the sequence of color images in the first 3 output eigen images. This property determines the high decorrelation degree of the sequence of output images and opens wide abilities for their efficient compression. However, the HACKLT algorithm is reversible and ensures the ability for restoration of the sequence of color images, with minimum mean square error, which is typical for the KLT. As it was already proved in [22], the total number of operations TOHACKLT (S) is at least 1.7 times smaller than that of TOCKLT (S) for each value of S (in average, about 2 times). For higher values of F (the number of images in one sequence) in the range from 9 up to 16 and for larger values of S, the reduction coefficient of the total number of operations η(S) = [TOCKLT (S)/TOHACKLT (S)] increases from 1.7 up to 2.1.
29.7 Conclusions and Future Work In this work is presented new approach for efficient decorrelation of sequences of RGB color images, based on two algorithms previously developed by the authors: the adaptive color space transform and the hierarchical adaptive PCA [21, 24]. The so developed new HACKLT algorithm takes into account the existing high similarity
344
R. Kountchev and R. Kountcheva
Color frame 69
Color frame 70
Color frame 71
R frame 69
R frame 70
R frame 71
G frame 69
G frame 70
G frame 71
B frame 69
B frame 70
B frame 71
Fig. 29.4 A sequence of 3 video frames (No. 69, 70, 71) and their RGB components
between the homonymous color components. Its main advantage is that the input sequence of color images is transformed into a non-correlated, and its power is concentrated into small number of components only. Besides, the new algorithm has lower computational complexity than the “classic” KLT. The structure of the algorithm permits parallel implementation at minimum memory requirements. These qualities of HACKLT open good possibilities for its application in the compression of moving images, computer vision systems, machine learning, etc. The new algorithm will be further developed in various directions, as: adaptive decomposition of tensor color images through the 3D Frequency-Ordered Hierarchical KLT [23], acceleration of the transform through recursion in correspondence
29 Decorrelation of a Sequence of Color Images Through Hierarchical …
345
Table 29.1 Power distribution of the eigen images 1
3
4
Eigen image D = 9 Eigen image power D = 9
2
Relative power (RP) D = 9
Mean RP for D = 9 Mean RP % for ×7 D=9×7
5
1
53,041
220
259.6
91.4
2
1100
5
6.8
93.8
3
686
3
5.6
95.7
4
710
3
3.8
97.1
5
316
1
2.2
97.8
6
305
1
2.0
98.6
7
523
2
1.9
99.2
8
326
1
1.3
99.6
9
242
1
1.0
100.0
Fig. 29.5 Power distribution a and relative power distribution b for the output eigen images
with the approach presented in [24], extraction of color features for recognition of moving 3D objects, etc. Acknowledgements This work was supported by the National Science Fund of Bulgaria: Project No. KP-06-H27/16 “Development of efficient methods and algorithms for tensor-based processing and analysis of multidimensional images with application in interdisciplinary areas”.
References 1. Mukhopadhyay, J.: Image and Video Processing in the Compressed Domain. CRC Press (2011) 2. Fieguth, P.: Statistical Image Processing and Multidimensional Modeling. Springer, Science+Business Media (2011) 3. Goodman, R.: Discrete Fourier and wavelet transforms: an introduction through linear algebra with applications to signal processing. World Scientific Publishing (2016)
346
R. Kountchev and R. Kountcheva
4. Celebi, M., Lecca, M., Smolka, B.: Color Image and Video Enhancement. Springer, Intern. Publishing Switzerland 5. Tekalp, A.: Digital Video Processing, 2nd edn. Pearson College (2015) 6. Wang, R.: Introduction to orthogonal transforms with applications in data processing and analysis. Cambridge University Press (2012) 7. Dumas, S.: Karhunen-Loeve Transform and Digital Signal Processing—Part 1, Technical Report (2016) 8. Abadpour, A.: Color Image Processing Using Principal Component Analysis, Thesis for Degree of Master of Science (2005) 9. Abadpour, A., Kasaei, S.: Color PCA Eigen Images and Their Application to Compression and Watermarking, Image and Video Computing, vol. 26, pp. 878–890. Butterworth-Heinemann Newton, MA USA (2008) 10. Walter, G., Shen, X.: Wavelets and Other Orthogonal Systems, 2nd edn. CRC Press (2019) 11. Orfanidis, S.: SVD, PCA, KLT, CCA, and all that, Rutgers University Electrical & Computer Engineering Department. Optimum Signal Processing, pp. 1–77 (2007) 12. Liwicki, S., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: Euler principal component analysis. Int. J. Comput. Vision 101(3), 498–518 (2013) 13. Strang, G.: Linear Algebra and Learning from Data. Cambridge Press, Wellesley MA, 02482 (2019) 14. Watkins, D.: Fundamentals of Matrix Computations, 2nd edn. Wiley (2004) 15. Zhang, F.: Matrix Theory: Basic Results and Techniques. Springer, NY (2011) 16. Press, W., Teukolsky, S., Vetterling, W.: Numerical recipes in C. In: The Art of Scientific Computing, 2nd edn. Cambridge University Press (2001) 17. Carlen, E.: Calculus++, Ch. 3: The Symmetric Eigen Value Problem. Georgia Tech (2003) 18. Du, K., Swamy, M.: Principal component analysis. In: Neural Networks and Statistical Learning, pp. 373–425. Springer, London (2019) 19. Yilmaz, O., Torun, M., Akansu, A.: A fast derivation of Karhunen-Loève transform kernel for first-order autoregressive discrete process. ACM SIGMETRICS Perf. Eval. Rev. 41(4), 61–64 (2014) 20. Rokhlin, V., Szlam, A., Tygert, M.: A randomized algorithm for principal component analysis. SIAM J. Matrix Anal. Appl. 31, 1100–1124 (2009) 21. Kountchev, R., Kountcheva, R.: PCA-based adaptive hierarchical transform for correlated image groups. In: Proceedings of the International Conference on Telecommunications in Modern Satellite, Cable and Roadcasting Services (TELSIKS’13), pp. 323–332. IEEE, Serbia (2013) 22. Kountchev, R., Kountcheva, R.: Adaptive hierarchical KL-based transform: algorithms and applications. In: Favorskaya, M., Jain, L. (eds) Computer Vision in Advanced Control Systems: Mathematical Theory, vol. 1, pp. 91–136. Springer (2015) 23. Kountchev, R., Mironov, R., Kountcheva, R.: Complexity estimation of cubical tensor represented through 3D frequency-ordered hierarchical KLT. MDPI Symmetry 12(10), 1605. Special Issue: Advances in Symmetric Tensor Decomposition Methods. Open Access (2020) 24. Kountchev, R., Kountcheva, R.: Color space transform of correlated images group based on recursive adaptive color KLT. IARAS Int. J. Sig. Process. 2, 72–80 (2017)
Chapter 30
Moving Objects Detection in Video by Various Background Modelling Algorithms and Score Fusion Ivo Draganov and Rumen Mironov
Abstract The paper presents results from testing ten of the fastest background modelling algorithms applied for detecting moving objects in video. The algorithms are Fast Principal Component Pursuit (Fast PCP), Grassmann Average (GA), Grassmann Median (GM), Go Decomposition (GoDec), Greedy Semi-Soft Go Decomposition (GreGoDec), Low-Rank Matrix Completion by Riemannian Optimization (LRGeomCG), Robust Orthonormal Subspace Learning (ROSL), Non-Negative Matrix Factorization via Nesterovs Optimal Gradient Method (NeNMF), Deep Semi Non-negative Matrix Factorization (Deep-Semi-NMF) and Tucker Decomposition by Alternating Least Squares (Tucker-ALS). Two new algorithms employing score fusion from Fast PCP and ROSL, which yielded alone the highest Detection Rate, Precision and F-measure, are proposed. The first algorithm has higher Detection Rate from all the others and the second—the highest Precision. Both are considered applicable in various practical scenarios when seeking either higher reliability of object detection or higher precision of the covered area by each object.
30.1 Introduction Detecting movement of objects and the associated with it tracking in video has been studied for a long time [1–5]. Large amount of fields exploit the benefits of one or another algorithm of the kind—surveillance systems, traffic control, manufacturing, gaming, archival mining, defense, and many others [2]. Background modelling and subtraction in videos is one of the major direction towards efficient detection of moving objects. It includes traditional approaches, applicable mainly for static cameras, and more recent ones, created for the case of moving cameras, I. Draganov (B) · R. Mironov Technical University of Sofia, Sofia, Bulgaria e-mail: [email protected] R. Mironov e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_30
347
348
I. Draganov and R. Mironov
where two distinctive stages occur—background model initialization and background subtraction [6]. One of the traditional methods relies on statistical models useful in subtracting the video background [7]. It copes well with shadows but the authors suppose that it may be less efficient in dynamic changes within the scene when new objects appear and may merge with the background. Reflections could also decrease the algorithm performance. Hofmann et al. [8] propose segmentation of the background using feedback in adaptive fashion on a pixel level. A non-parametric model is generated using interpixel states and dynamic controllers. The authors point out as a drawback, the large number of parameters that control the process which possibly should be reduced. Additionally, shadow modeling is not included in the particular implementation. Another example of traditional type of an approach is the stochastic approximation during the modelling of the background [9]. Mixture of uniform distributions and Gaussians, instead of just Gaussians, are applied towards handling the dynamics of backgrounds, and the resulting implementation works in real time. Some of the tuning parameters during initialization of this algorithm may influence its overall performance. Particle filters, as more traditional technique, are another means to track objects in videos with the possibility of building a fuzzy model from the spatial displacement over time and applying rough set dependencies [10]. Fuzzy texture and color cues are used although the process of their selection in various challenging situations may need additional resources for getting enough representativeness. As more recent could be determined the background modelling and subtraction algorithms employing tensor decompositions. In [11] propose 3D total variation in order to improve the continuity both in space and time for the foreground content and use Tucker decomposition to model the background in an unified framework. Complex foreground tends to be challenging, so the authors plan to further model distinctive layers from it by introducing Gaussian mixture models as well. Presence of fog or snow along with illumination change is suggested to be handled by dynamic background modelling further. Sobral, Bouhmans and Zahzah describe an extensive library, named LRS, in which the most popular decomposition algorithm, employing low-rank and sparse representations, including tensor ones. They are grouped to Robust Principal Component Analysis (RPCA), Subspace Tracking (ST), Matrix Completion (MC), Low Rank Recovery (LRR), Three-Term Decomposition (TTD), Non-Negative matrix Factorization (NMF), Non-Negative Tensor Factorization (NTF), and Tensor Decompositions (TD) algorithms. Some of them, requiring the least computational time are the primary focus of the research, presented in this paper.
30.2 Related Work In a recent study, Draganov et al. [12] apply High-order Robust Principle Component Analysis solved by Inexact Augmented Lagrange Multipliers (HoRPCA IALM),
30 Moving Objects Detection in Video by Various Background …
349
Tucker decomposition solved by Alternating Least Squares (Tucker-ALS) [13], Canonical Polyadic Decomposition (CP), and Tensor Singular Value Decomposition (t-SVD) as decomposing techniques in background modelling and subtraction in videos containing fires. The aim is to estimate the fire dispersal area over time. Accuracy, measured on a pixel level, varies between 73.01 and 99.88% for different videos with false positives changing between 0.52 and 13.37%. Tucker-ALS and CP-ALS are considerably faster than HoRPCA IALM and t-SVD by a few orders of magnitude. Further, Draganov et al. [14] use RPCA, Go Decomposition (GoDec), Lowrank Matrix completion by Riemannian Optimization (LRGeomCG), Robust Orthonormal Subspace Learning (ROSL), and Non-negative Matrix Factorization via Nestorov’s Optimal Gradient Method (NeNMF) [15] to assess wild animals populations from thermographic videos, again incorporating background modelling and subtraction. Achieved accuracy of spotting individual animals and closely spaced ones per a frame basis varies between 96.51 and 99.30% with very close processing times for all tested decompositions. Some additional functionalities within the implementation which would make parts of bodies interconnected back from partial occlusions are suggested as further development to enhance accuracy even more. Another study [16] reveal the applicability of Tucker-ALS, CP-ALS, TuckerADAL and HoRPCA-S decompositions when modelling and subtracting the background in videos containing domestic animals with the aim of monitoring local farms environment. Detection rate on pixel level reaches 0.9999 for the Tucker-ALS and Tucker-ADAL algorithms in particular videos captured with thermographic camera and falls down to 0.7 on average for CP-ALS and around 0.3 for the HoRPCA-S. Variability in intensity within patterns of different breeds of animals, recorded at closer distances, is considered higher which leads to these uneven results among different decompositions. Tucker-ALS and CP-ALS are an order faster than Tucker-ADAL and 2 orders than HoRPCA-S. In [17] Andriyanov et al. accomplish trajectory tracking of multiple objects on a real-tiem basis using YOLO v3 convolutional neural network (CNN) along with stochastic filters and gradient-like functions. Thus they manage to align image fragments. Mao et al. [18] use the average delay as a metric to estimate the detection accuracy video objects. They cope to trace complex trajectories this way. Various detection algorithms show significant increase in accuracy when fed with this metric and preserve the average precision. Autonomous vehicle perception is one of the potential application of this approach. Lempitsy and Zissermen [19] developed a framework, employing supervised learning, in order to counts objects in videos, including people from surveillance videos. They estimate the image density by introducing regularized risk function, which could be efficiently calculated through the maximum subarray algorithm. The whole procedure then comes to convex quadratic problem with well established solution. The aim of the study, presented in this paper, is to compare the efficiency in terms of detection rate and precision of decomposing algorithms for background modelling for videos containing moving objects of more general type (office workers, pedestrians, vehicles, vegetation moving in the wind, flying by birds, etc.) for general
350
I. Draganov and R. Mironov
purpose applications (at changing illumination, persistence of shadows, etc.). Tested algorithms should have lowest execution time from all existing ones, possibly suitable for real-time implementation. These are Fast Principal Component Pursuit (Fast PCP) [20], Grassmann Average (GA) [21], Grassmann Median (GM), GoDec [21], Greedy Semi-Soft Go Decomposition (GreGoDec) [23], LRGeomCG [24], ROSL [25], NeNMF, Deep Semi Non-negative Matrix Factorization (Deep-Semi-NMF) [26] and Tucker-ALS. Based on the obtained results, two new score fusion algorithms are proposed incorporating union and intersection of the results from multiple decomposed frames which yield higher detection rate and precision, respectively. The rest of the paper is organized as the following: in the next section the newly proposed algorithms descriptions are presented, then the experimental results are given, followed by discussion and at the end a conclusion is made.
30.3 Proposed Algorithms 30.3.1 Object Detection in Video Based on Score Fusion from Decompositions Union (Fusion OR) The first algorithm (Algorithm 1) proposed in this study is to have a union among several output results from decompositions over one and the same input, particularly color video which brightness component is processed with the aim of background modelling and foreground objects extraction. Then, score fusion is applied from adjacent frames, being binarized—0 corresponds to background and 1—to foreground, from resulting videos with fair voting on a pixel-by-pixel basis. Only if at least half plus one of the tried decompositions yield positive result for a moving object, then the corresponding pixel is classified as part of such. In the case of only 2 algorithms, both need to agree on that decision. Algorithm 1 procedure TScoreFusionUnion (OutputVideo) {Applying D number of decompositions}; var x, y, t, D, R, C, T, InputVideo, OutputVideo: Integer; begin x := 1; y := 1; t := 1; ReadLn(D); ReadLn(InputVideo); OutputVideo = 0; OutputVideo1 = Decomp1(InputVideo); OutputVideo2 = Decomp2(InputVideo); ... OutputVideoD = DecompD(InputVideo); OutputVideoUnion = OutputVideo1 || OutputVideo2 OutputVideoD; repeat
||...||
30 Moving Objects Detection in Video by Various Background …
351
if OutputVideoUnion(x,y,t) >= (C/2 + 1) then begin OutputVideo(x,y,t) = 1; end; until (x 50 per 4 mm2
out the presence of the syndrome. Here a negative diagnostic will give rise to regular monitoring. Additionally, Chisholm gave a criteria for the grading of the syndrome based on the number of Lymphocyte per glandular tissue area [4] (Table 39.1). Besides those methods, Ultrasound imaging is a promising tool for Sjögren’s syndrome diagnosis. We will present this method in detail in the next sections [5].
39.3 Related Work 39.3.1 Machine Learning Applied to Medical Images 39.3.2 Sjogren’s Syndrome Detection Several works have been proposed for segmentation and classification of Sjögren’s syndrome on Ultrasound images. Two main approaches have been used for automatic classification, either training a Deep neural network [6], or using feature extraction methods combined with a machine learning classifier [7, 8]. Berthomier et al. [7] proposed an approach using a scattering operator as a feature extractor to characterise SjS whereas other works focus on using features based on gray-level textures and statistics, following the radiomics approach [9, 10]. Deep learning has become a popular in image processing since 2012 with the first human-competitive results on a digit classification dataset [11] and has been largely applied to medical image analysis [12–14]. Other works focus on segmenting the glands on Ultrasound imaging with deep neural networks. Vukicevic et al. compare several architectures of Deep convolutional neural network (CNN) such as FCN8,Unet, FCDenseNet, for the segmentation of salivary glands on 1184 Ultrasound images from 287 patients all diagnosed with Primary SjS and obtain a Dice score of 0.91 with FCN8 [15]. To detect diseased glands on Ultrasound images, Kise et al. propose to use the VGG16 Network [6] pre-trained on ImageNet to classify SjS in the 4 following class, were experts evaluate their confidence in their classification: (Definitely SjS, Probable SjS, Probable non Sjs, definitely not SjS). They used a 200 images database, and the trained model produced on the test set, an AUC of 0.810 and Parotid glands and an AUC of 0.894 on Submandibular glands [6].
452
G. Fodop et al.
39.3.3 Siamese Networks for Image Classification Bromley et al. [16] proposed as Siamese network to compare fingerprints, whereas Baldi et al. proposed another Siamese architecture, using time delay neural networks, for signature verification [17]. The proposed network for the comparison of two signatures consisted in encoding each images into feature vectors and to use the cosine of the angle between the two vectors as a distance metric. Chopra et al. [18], introduced a Contrastive loss as a training metric for Siamese architecture with two CNN on a dataset with a large number of class. Yamada et al. [19] applied this method for texture segmentation with a model pre-trained on a texture dataset. We choose to use the same architecture to train a Siamese network to learn how to differentiate gray-level textures and apply it for salivary glands segmentation.
39.4 Method 39.4.1 Sjogren’s Syndrome Imaging Ultrasound imaging technique is commonly used in human and veterinary medicine. The variation in wave speed through the different material layers is be detected and processed to reconstruct the image of the tissue. Based on the work of Berthomier et al. [7], which highlights the possibility of representing the texture present in an image, we propose here a to train a Deep Neural Network, to detect the texture difference of gray-scale image. Then, the model is used to segment salivary glands without re training. We will name this model SJSCNN in this manuscript.
39.4.2 Data Preparation and Training Settings To train our Siamese model, we use images from the texture dataset in order to obtain a network with discriminative features for texture classification. Training DataBase We use a gray scale images texture data-set as a training database for the unsupervised phase. We chosed Brodatz data-set [20], similarly to [19], and divided it into small patches, of size 32 × 32. https://multibandtexture.recherche. usherbrooke.ca/original_brodatz.html (Fig. 39.1). Test SjS Ultrasound imaging DataBase Our salivary glands database was acquired with an Aplio 800, with 140 images of glands (parotid and sub-mandibular) from 35 patients. There are 60% healthy glands and 40% diseased glands. Table 39.2 presents the different type of images in the database (Fig. 39.2).
39 Siamese Network for Salivary Glands Segmentation
453
Fig. 39.1 Brodatz texture data-set of gray-scale images Table 39.2 Number of image per gland type Parotid Nb image Healthy SjS
73 44 29
Sub-mandibular 70 44 25
454
G. Fodop et al.
Fig. 39.2 Siamese model
Training phase We train a Siamese network based on Convolutional network architecture, to learn how to recognize textures of the same nature. A Siamese Network [16] contains two identical branches with the same model sharing the same weights, to differentiate between two images. We used the SJSCNN described in Fig. 39.3, in both branch of the Siamese network. The network consist in two convolutional layers (conv2d) followed by an Hyperbolic Tangent Function (tanh) and Max-pooling operator (Maxpool2d) for down-sampling the image, in order to produce features at various resolutions [21]. Finally two fully connected layer are applied to produce the output feature vector. Siamese network require for training a loss function for training known as Contrastive Loss. (39.1) L = Y · D 2 + (1 − Y ) · max(m − D, 0)2 where Y is the true expected label, m the margin who is tolerance value about similarity of different D values. And D is a norm of the difference of the 2 outputs from the branches of our Siamese network. During the training of the SJCNN model for texture differentiation, at each step of training we provide 2 images as inputs for the SJCNN model, and compute the contrastive loss. We optimize the loss with Adam algorithm [22] and a learning rate of 5 × 10−4 during 100 epochs with a batch size of 16. After the training of SJCNN model on texture differentiation, we infer the prediction on the salivary glands US dataset with the SJSCNN on patches centered on each pixel of the input image, with a size of 32 × 32. We obtain a vector of size 2 for each pixel of the image. Finally, we train a k-means clustering model to produce 4 class of cluster for the segmentation, applied pixel wise, to the vector of size 2 obtained for each pixel during the inference of SJCNN (Fig. 39.3).
39 Siamese Network for Salivary Glands Segmentation
455
Fig. 39.3 SJSCNN architecture
39.5 Result We evaluate our texture differentiation model on the gland dataset by selecting the cluster corresponding to the gland and obtain a precision of 93% and a mean intersection over union (mIoU) of 0.85, for the segmentation of the salivary glands.
39.6 Discussion The transfer of our texture differentiation model to the segmentation of salivary glands produce a precision of 93%, without using any labeled images from Ultrasound imaging of salivary glands. First, this results show that texture information is a valuable feature for salivary gland segmentation. Secondly the success of the application of the texture differentiation model confirms that gray-level texture information can successfully be encoded in Deep Neural Networks. Thirdly, this highlight the potential of transfer learning methods in case of limited training data. Finally, this segmentation method could provide a strong basis to extract the texture information in the salivary glands tissue for further classification of Primary SjS.
456
G. Fodop et al.
39.7 Conclusion In this manuscript, we proposed a new approach, applying a texture differentiation model based on a Siamese network, trained on a gray-level texture database, to segment salivary glands on ultrasound imaging without using a labelled training database from our target task. We obtained results for the classification of the disease with a good precision for the applications. Our future works will consider other networks as well as augmenting the texture dataset, for the training of the texture differentiation model. In addition, other feature extraction approach can improve the segmentation results and to perform classification of sick glands to enhance the global diagnosis of this disease.
References 1. Ramos-Casals, M., Brito-Zerón, P., Sisó-Almirall, A., Bosch, X.: Primary Sjögren syndrome. BMJ 344 (2012). https://doi.org/10.1136/bmj.e3821 2. Devoize, L., Salivation, R.D.: EMC—Médecine Buccale (2010) 3. Harris, V.M., Scofield, R.H., Sivils, K.L.: Genetics in Sjögren’s syndrome: where we are and where we go. Clin. Exp. Rheumatol. 37(Suppl. 118), 234–239 (2019) 4. Chisholm, D., Mason, D.: Labial salivary gland biopsy in Sjögren’s disease. J. Clin. Pathol. 21(5), 656–660 (1968) 5. Doare, E., Jousse-Joulin, S., Pers, J.-O., Devauchelle-Pensec, V., Saraux, A.: Syndrome de gougerot-sjogren primitif. Appareil locomoteur 1(1), 1 (2020). https://doi.org/10.1016/S02460521(20)41579-7 6. Kise, Y., Shimizu, M., Ikeda, H., Fujii, T., Kuwada, C., Nishiyama, M., Funakoshi, T., Ariji, Y., Fujita, H., Katsumata, A., Yoshiura, K., Ariji, E.: Usefulness of a deep learning system for diagnosing Sjögren’s syndrome using ultrasonography images. Dentomaxillofac. Radiol. 49(3), 20190348 (2020). https://doi.org/10.1259/dmfr.20190348. PMID: 31804146 7. Berthomier, T., Mansour, A., Bressollette, L., Le Roy, F., Mottier, D.: Venous blood clot structure characterization using scattering operator. In: International Conference on Frontiers of Signal Processing (ICFSP), pp. 73–80 (2016). https://doi.org/10.1109/ICFSP.2016.7802960 8. Berthomier, T., Mansour, A., Bressollette, L., Le Roy, F., Mottier, D., Fréchier, L., Hermenault, B.: Scattering operator and spectral clustering for ultrasound images: application on deep venous thrombi. In: World Academy of Science, Engineering and Technology, vol. 11, pp. 630–637 (2017) 9. Kumar, V., Gu, Y., Basu, S., Berglund, A., Eschrich, S.A., Schabath, M.B., Forster, K., Aerts, H.J., Dekker, A., Fenstermacher, D.: Radiomics: the process and the challenges. Magn. Reson. Imaging 30(9), 1234–1248 (2012) 10. Vukicevic, A.M., Milic, V., Zabotti, A., Hocevar, A., De Lucia, O., Filippou, G., Frangi, A.F., Tzioufas, A., De Vita, S., Filipovic, N.: Radiomics-based assessment of primary Sjögren’s syndrome from salivary gland ultrasonography images. IEEE J. Biomed. Health Inform. 24(3), 835–843 (2019) 11. Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649 (2012). https://doi.org/10.1109/CVPR.2012.6248110 12. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015). Springer
39 Siamese Network for Salivary Glands Segmentation
457
13. Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021) 14. Olivier, A., Hoffmann, C., Mansour, A., Bressollette, L., Clement, B.: Survey on machine learning applied to medical image analysis. In: 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–6 (2021). IEEE 15. Vukicevic, A.M., Radovic, M., Zabotti, A., Milic, V., Hocevar, A., Callegher, S.Z., De Lucia, O., De Vita, S., Filipovic, N.: Deep learning segmentation of primary Sjögren’s syndrome affected salivary glands from ultrasonography images. Comput. Biol. Med. 129, 104154 (2021). https:// doi.org/10.1016/j.compbiomed.2020.104154 16. Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., Lecun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “Siamese” time delay neural network. Int. J. Pattern Recognit. Artif. Intell. 07(04), 669–688 (1993). https://doi.org/10.1142/S0218001493000339 17. Baldi, P., Chauvin, Y.: Neural networks for fingerprint recognition. Neural Comput. 5(3), 402– 418 (1993). https://doi.org/10.1162/neco.1993.5.3.402 18. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 539–5461 (2005). https://doi.org/10.1109/CVPR. 2005.202 19. Yamada, R., Ide, H., Yudistira, N., Kurita, T.: Texture segmentation using Siamese network and hierarchical region merging. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2735–2740 (2018). https://doi.org/10.1109/ICPR.2018.8545348 20. Al-Sahaf, H., Al-Sahaf, A., Xue, B., Johnston, M., Zhang, M.: Automatically evolving rotationinvariant texture image descriptors by genetic programming. IEEE Trans. Evol. Comput. 21(1), 83–101 (2017). https://doi.org/10.1109/TEVC.2016.2577548 21. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 22. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
Chapter 40
A Lightweight and Accurate RNN in Wearable Embedded Systems for Human Activity Recognition Laura Falaschetti, Giorgio Biagetti, Paolo Crippa, Michele Alessandrini, Di Filippo Giacomo, and Claudio Turchetti
Abstract Human activity recognition (HAR) is an important technology for a wide range of applications including elderly people monitoring, ambient assisted living, sport and fitness activities. The aim of this paper is to address the HAR task directly on a wearable device, implementing a recurrent neural network (RNN) on a low cost, low power microcontroller, ensuring the required performance in terms of accuracy and low complexity. To reach this goal we first develop a lightweight RNN on the Human Activity Recognition Using Smartphones dataset in order to accurately detect human activity and then we port the RNN to the embedded device Cloud-JAM L4, based on an STM32 microcontroller. Experimental results show that this HAR RNN-based detector can be effectively implemented on the chosen constrained-resource system, achieving an accuracy of about 90.50% with a very low memory cost (40.883 KB) and inference time (67.131 ms), allowing the design of a wearable embedded system for human activity recognition.
L. Falaschetti (B) · G. Biagetti · P. Crippa · M. Alessandrini · D. F. Giacomo · C. Turchetti DII—Dipartimento di Ingegneria dell’Informazione, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131 Ancona, Italy e-mail: [email protected] G. Biagetti e-mail: [email protected] P. Crippa e-mail: [email protected] M. Alessandrini e-mail: [email protected] D. F. Giacomo e-mail: [email protected] C. Turchetti e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_40
459
460
L. Falaschetti et al.
40.1 Introduction Human activity recognition (HAR) is important for a wide range of applications including elderly people monitoring for assistive living, robotics, human computer interaction, surveillance, and keeping track of athletic activities. Existing HAR systems can be classified into two main categories: (i) fixed sensors, where information is gathered from static sensors mounted at fixed locations and (ii) mobile sensors, where the sensors are wearable. Fixed sensors require limited environments where the sensors are installed (e.g. healthcare facilities) and a specialized sensing equipment. Mobile or wearable sensors have become very popular over the years as they are flexible and easy to operate, but also durable and less costly. Wearables integrate low-power sensors that allow them to sense movement and other physiologic signals such as heart rate, temperature, blood pressure, gesture recognition, acceleration and inertial monitoring during human movement, etc. This paper focuses on HAR for wearable systems. HAR can be treated as a pattern recognition problem, and in this context machine learning (ML) techniques have proven particularly successful [1–7]. These methods can be categorized in two main approaches: (i) conventional machine learning techniques, such as support vector machine (SVM) [8], k-nearest neighbors (kNN) [9, 10], Decision Tree (DT) [11], (ii) deep learning-based techniques, such as deep neural networks (DNNs) [12, 13], convolutional neural networks (CNNs) [14], autoencoders [15], recurrent neural networks (RNNs) [16, 17]. A typical approach for ML applications involves streaming data acquired from IoT sensors and actuators to external computing systems (such as cloud servers) for further processing. However, such an approach worsens latency, leads to high communication costs and suffers for privacy concerns. In recent years, the trend is to address this problem by processing the ML algorithms directly on the device that generates data. Typically this is an embedded device based on a microcontroller (MCU) with limited computational power and very low energy consumption, without the need of transferring data to a more powerful computer to be elaborated [18, 19]. This goal leads to several challenges: the reduction of computational complexity and memory occupation so that the algorithm can be integrated directly into the microcontroller, together with the reduction of inference time in order to create realtime applications, by preserving the performance in terms of accuracy. Following the above motivations, the aim of this paper is to propose a lightweight and accurate recurrent neural network (RNN) that can be implemented on a low cost, low power core, while preserving good performance in terms of accuracy. To reach this goal we proceed as follows. First we design an RNN using inertial data in order to detect human activity, using the publicly available Human Activity Recognition Using Smartphones (HAR) dataset [20] for its design and testing. Then we investigate the porting and performance of the implemented network on an embedded device, the low-power, low-cost Cloud-JAM L4 board (STM32L476RG microcontroller). To this end the STM32Cube.AI development environment [21, 22], that allows the porting of a pre-built DNN on the constrained hardware platform, is used.
40 A Lightweight and Accurate RNN in Wearable Embedded …
461
40.2 Dataset The Human Activity Recognition Using Smartphones (HAR) dataset [20] is a public dataset available on the well-known UCI repository [23], that consists of a collection of inertial data acquired from smartphone accelerometers and gyroscopes, targeting the recognition of six different human activities: standing, sitting, laying, walking, walking downstairs and upstairs. The data was collected from a group of 30 volunteers within an age bracket of 19–48 years, who performed the six aforementioned activities wearing a smartphone (Samsung Galaxy S II) on the waist. Specifically, the recorded data consists of the 3-axial linear acceleration and the 3-axial angular velocity all sampled at a constant rate 50 Hz, obtained using the accelerometer and gyroscope embedded in the smartphone. The dataset contains the pre-processed signals (accelerometer and gyroscope), obtained by applying noise filters and then sampled in 128 fixed-width sliding readings/windows of 2.56 s duration and 50% overlap. Besides, a Butterworth low-pass filter has been used to separate the gravitational and body motion components of the sensor acceleration signal. In this way, nine signals are available with the same number of windows and readings: the body acceleration along the three axes, the body 3-axial angular velocity and the total 3-axial acceleration. The dataset has been randomly partitioned into two sets: 70% of the volunteers was selected for generating the training data and 30% for the test data, corresponding to 21 subjects for training and 9 for testing.
40.3 Design of the TensorFlow RNN Architecture for Embedded System 40.3.1 RNN Architecture The architecture chosen for this dataset is the Long Short-Term Memory (LSTM) network, an RNN that thanks to the feedback is able to learn long-term dependencies. While traditional neural networks are characterized by the complete connection between adjacent layers, RNNs can map target vectors from the entire history of the previous map. A similar structure has been used in [24], and proved to be effective in performing HAR using photoplethysmography (PPG) signal. This architecture is particularly suitable for the HAR dataset since the windows to be learned include 128 samples. Therefore, it was applied to the classification of six different activities represented by a third order tensor consisting of windows × timesteps × features, where the windows, as mentioned above, are 2.56 s each with an overlap of 50% and each of them contains 128 timesteps, all repeated 9 times since the training set derives from the nine sensor signals acquired. Particularly, the input to the neural network consists of a training set of shape 7352 × 128 × 9
462
L. Falaschetti et al.
Fig. 40.1 Network architecture
(7352 data windows, each containing 128 timesteps, all repeated 9 times), while the testing set has a consistency of 2947 × 128 × 9, with a number of classes equal to 6. The proposed network is of a sequential type and it is composed as follows. The first layer consists in a LSTM layer, an RNN with memory, that takes in input the third order tensor of the type already described above and returns a tensor with first dimension None, representing the variable batch size, and second dimension 16, representing the number of features extracted from the LSTM network, equal to the number of chosen nodes. The second layer is used for the dropout: it has the function to reduce the overfitting, which occurs when the network tends to specialize too much on the training set, thus resulting in worse classification performance. In the proposed network the dropout rate is set at 0.5, so half of the inputs are randomly set to zero. The third and last layer is a dense layer, therefore a classic densely connected neural network layer that takes in the tensor output from the dropout and returns one that has only 6 features, that is the number of classes. The activation function chosen is the softmax, which was preferred to the sigmoid after verifying that it performs better in this network. The network was then trained for 200 epochs on the HAR dataset partitioned as reported in the previous section, by using a RMSprop optimizer with categorical cross entropy as loss function and a batch size of 32. Figure 40.1 shows a graphical representation of the proposed network.
40.3.2 Hardware The low-cost, low-power HAR detector has been implemented on the STM32L4 Cloud-JAM L4 board1 that, for its small form factor and a set of integrated inertial sensors, can represent a valid prototyping base for a wearable system. The board features an STM32L476RG microcontroller,2 with an ARM® 32-bit Cortex® -M4 CPU + FPU, frequency up to 80 MHz, 1 MiB flash memory, 128 KiB RAM (but limited to 96 KiB for practical reasons) and about 3 mA of CPU current consumption at full speed.
1
https://github.com/rushup/Cloud-JAM-L4. https://www.st.com/content/st_com/en/products/microcontrollers-microprocessors/stm32-32bit-arm-cortex-mcus/stm32-ultra-low-power-mcus/stm32l4-series/stm32l4x6/stm32l476rg.html.
2
40 A Lightweight and Accurate RNN in Wearable Embedded …
463
40.3.3 Software The model was implemented on Google Colaboratory using TensorFlow and TensorFlow Keras v. 2.4.0 as a backend in Python v. 3.7.10. The network was trained for 200 epochs on the HAR dataset partitioned as reported in the previous section, by using a RMSprop optimizer with categorical cross entropy as loss function and a batch size of 32. The model was saved in Hierarchical Data Format version 5 (HDF5) binary data format (.h5). HDF5 is a grid file format to store structured data, that is ideal for storing multi-dimensional arrays of numbers. Keras saves models in this format, so that the weights and model configuration can be easily stored in a single file. Once the model has been trained to a satisfactory accuracy, it must be converted to an executable code that runs on the embedded device. For this purpose, we used the STM32Cube.AI [21, 22] v. 6.0.0, a framework integrated in the STM32Cube IDE that allows the porting of a pre-built DNN, converting it to optimized code to be run on the constrained hardware platform. The results for all tensors were obtained with 32-bit floating point precision (h5 model).
40.3.4 Porting RNN to the Embedded System STM32L4 Cloud-JAM L4 The porting of the neural network to the STM32 architecture is made possible by a software framework from ST, named STM32Cube.AI [21, 22], integrated in the STM32Cube IDE. The software is a complete solution to import a Keras, TFLite, Caffe model, analyze it for compatibility and memory requirements, and convert it to an optimized C implementation for the target architecture. This allows users to develop applications in high-level programming language rather than directly using C language. The generated network can then be evaluated with test input data, both on the computer and the embedded device, to get various metrics related to computational complexity and classification accuracy. The process for using STM32Cube.AI with STM32L4 Cloud-JAM L4 is described in Fig. 40.2.
Fig. 40.2 Process of integrating the RNN into the STM32L4 Cloud-JAM L4 board with STM32Cube.AI
464
L. Falaschetti et al.
40.4 Experimental Results In order to validate the intelligent sensor for HAR detection previously discussed, two different experiments, (i) the experiment on desktop and (ii) the experiment on the STM32L4 Cloud-JAM L4 board, were conducted. The first aims to compare the performance of implemented RNN with the state-of-the-art ML methods. The second aims to determine the final performance, obtained by the network implemented on the embedded processor. The experiments were conducted using the TensorFlow Keras v. 2.4.0 to train the models on Google Colaboratory with the HAR dataset partitioned as reported Sect. 40.2 and the STM32Cube.AI v. 6.0.0 to analyze, generate C code, program the board and run the final validation on board of the generated model.
40.4.1 Testing on Desktop In this experiment a variety of ML methods have been applied to HAR dataset in order to compare their performance with those achieved with the proposed lightweight RNN. In particular the following ML classification methods have been used: SVM with polynomial kernel (quadratic order), kNN and DT. Table 40.1 reports the results achieved for storage cost, compression, accuracy, loss and inference time. The inference time is computed both on the entire testing set and on the single observation. As can be seen the RNN proposed in this paper outperforms all the other models both in terms of memory cost and inference time. As far as the testing accuracy is concerned, the value achieved with proposed RNN is just a few percentage below the best performance obtained with SVM but greater than that obtained by the other classification methods (kNN, DT). This does not constitute a real issue for the final HAR detection, since this lack of accuracy is compensated by a low inference time the proposed RNN is able to achieve. Indeed, to design and develop a low-cost, low-power and real-time detector based on a RNN architecture for HAR classification, some several constraints have to be met and inference time as well as memory occupancy are of primary concern, obviously guaranteeing a good classification accuracy. To obtain the smallest possible model size, quantization is a recommended approach. Quantization can be applied during the training stage (quantization aware training) or directly on the model file (post training quantization); this process takes the float graph of a trained net and converts it down to 8 bit. The post-training INT8 quantization of the proposed RNN leads to a storage cost of 14.781 kB, further reducing the model occupancy, but, unfortunately the STM32Cube.AI does not yet support some specific operations generated by the TFLiteConverter for RNN models and this model cannot be implemented on the STM32L4 Cloud-JAM L4 board, so it could not have been included in the comparisons. To summarize the results, Table 40.2 reports the confusion matrix of the proposed RNN, h5 model format.
40 A Lightweight and Accurate RNN in Wearable Embedded …
465
Table 40.1 Performance of the proposed network on the HAR dataset in terms of storage cost, train/test accuracy, loss and inference time
Model
Storage cost (KB)
Accuracy (%)
Loss (%)
Inference time (ms)
Train
Test
Train
Test
All data
Single obs
Proposed 40.883 RNN
94.87
90.50
0.1709
0.4258
693.765
0.2354
SVM poly
7738.149
98.38
95.58
0.0418
0.1188
1070.2
0.3631
kNN
32,568.066
99.19
89.00
0.0326
1.5738
1197.7
0.4064
DT
49.774
99.00
85.98
5.2747
4.8286
961.92
0.3264
Best results are displayed in bold Table 40.2 Confusion matrix Laying Sitting Laying Sitting Standing Walking Walking downstairs Walking upstairs
Standing
Walking
Walking downstairs
Walking upstairs
537 0 0 0 0
0 361 58 2 0
0 108 472 0 0
0 1 2 467 13
0 0 0 11 394
0 21 0 16 13
0
0
0
20
15
436
40.4.2 Testing on the Embedded Platform STM32L4 Cloud-JAM L4 In order to validate the suitability of the RNN architecture previously discussed for the real-time HAR classification, the evaluation of the portability of this model on the low-cost, low-power embedded platform STM32L4 Cloud-JAM L4 and the validation on target, have been conducted. Particularly, we follow the steps described in Fig. 40.2: (1) load the model on STM32Cube.AI; (2) analyze the model; (3) generate C code of the pre-built RNN network, and program the embedded system with the generated binary file; (4) run the final validation on board to estimate testing accuracy by providing input data to the board from the test set via a serial interface. Figures 40.3 and 40.4 shows in details the output of the step 2 and the step 4 respectively, to validate the proposed approach. As shown in Fig. 40.3, step 2 provides a complete analysis of the model, in terms of model complexity and memory storage, and, in order to check if the model meets the memory requirements required by the microcontroller, the following metrics
466
L. Falaschetti et al.
Fig. 40.3 Integrating the RNN into the STM32L4 Cloud-JAM L4 board with STM32Cube.AI: analyze network
are computed: number of model parameters, MACC, ROM Bytes (weights), RAM Bytes. As you can see, the model occupancy consists of 7.09 KiB Flash memory and 5.02 KiB RAM, thus much lower than the memory constraints of the chosen STM32L476RG microcontroller (1 MiB Flash memory, 96 KiB RAM). Once the board is programmed with the generated C code embedding the RNN, the final validation on target can be carried out. When the ground-truth output values are provided with the associated input samples, STM32Cube.AI uses the predicted values to calculate several metrics related to classification accuracy: classification accuracy (ACC), root mean square error (RMSE), mean absolute error (MAE), L2 relative error (L2r), confusion matrix. These values are reported in Fig. 40.4 and show how the results obtained on board are equivalent to those obtained on Desktop (Tables 40.1 and 40.2), without any loss in accuracy.
40 A Lightweight and Accurate RNN in Wearable Embedded …
467
Fig. 40.4 Integrating the RNN into the STM32L4 Cloud-JAM L4 board with STM32Cube.AI: validation on target, with average inference time of 67.131 ms
40.5 Conclusions This work demonstrated that an activity detection system to recognize human activities can be realized using an RNN and inertial signals. In particular, it was shown that the proposed RNN, trained on the publicly available Human Activity Recognition Using Smartphones dataset, can be implemented on an embedded system based on a low-cost, low-power, limited computational power STM32 microcontroller, ensuring a low complexity and achieving a good accuracy. Experimental results shows that the proposed RNN, compared with other machine learning techniques, reaches a good accuracy (90.50%) with a very small memory storage (40.883 KB) and inference time (67.131 ms), making it suitable for implementation on low-cost wearable sensors.
References 1. Zhang, S., Li, Y., Zhang, S., Shahabi, F., Xia, S., Deng, Y., Alshurafa, N.: Deep learning in human activity recognition with wearable sensors: a review on advances. CoRR abs/2111.00418 (2021). https://arxiv.org/abs/2111.00418 2. Gupta, N., Gupta, S.K., Pathak, R.K., Jain, V., Rashidi, P., Suri, J.S.: Human activity recognition in artificial intelligence framework: a narrative review. Artif. Intell. Rev. 1–54 (2022) 3. Qiu, S., Zhao, H., Jiang, N., Wang, Z., Liu, L., An, Y., Zhao, H., Miao, X., Liu, R., Fortino, G.: Multi-sensor information fusion based on machine learning for real applications in human activity recognition: state-of-the-art and research challenges. Inf. Fusion 80, 241–265 (2022) 4. Cornacchia, M., Ozcan, K., Zheng, Y., Velipasalar, S.: A survey on activity detection and classification using wearable sensors. IEEE Sens. J. 17(2), 386–403 (2017) 5. Uddin, M.Z., Soylu, A.: Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning. Sci. Rep. 11(1), 1–15 (2021)
468
L. Falaschetti et al.
6. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: A portable wireless sEMG and inertial acquisition system for human activity monitoring. In: International Conference on Bioinformatics and Biomedical Engineering, pp. 608–620. Springer (2017) 7. Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: An efficient technique for real-time human activity classification using accelerometer data. In: International Conference on Intelligent Decision Technologies, pp. 425–434. Springer (2016) 8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967) 10. Dasarathy, B.V.: Handbook of data mining and knowledge discovery. Data Mining Tasks and Methods: Classification: Nearest-Neighbor Approaches, pp. 288–298. Oxford University Press, Inc., New York, NY, USA (2002) 11. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth & BrooksCole Advanced Books & Software, Monterey, CA (1984) 12. Hammerla, N.Y., Halloran, S., Plötz, T.: Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880 (2016) 13. Chen, Y., Xue, Y.: A deep learning approach to human activity recognition based on single accelerometer. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1488–1492. IEEE (2015) 14. Jiang, W., Yin, Z.: Human activity recognition using wearable sensors by deep convolutional neural networks. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 1307–1310 (2015) 15. Almaslukh, B., AlMuhtadi, J., Artoli, A.: An effective deep autoencoder approach for online smartphone-based human activity recognition. Int. J. Comput. Sci. Netw. Secur. 17(4), 160–165 (2017) 16. Singh, D., Merdivan, E., Psychoula, I., Kropf, J., Hanke, S., Geist, M., Holzinger, A.: Human activity recognition using recurrent neural networks. In: International Cross-Domain Conference for Machine Learning and Knowledge Extraction, pp. 267–274. Springer (2017) 17. Ordóñez, F.J., Roggen, D.: Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1), 115 (2016) 18. Novac, P.E., Boukli Hacene, G., Pegatoquet, A., Miramond, B., Gripon, V.: Quantization and deployment of deep neural networks on microcontrollers. Sensors 21(9) (2021) 19. Novac, P.E., Castagnetti, A., Russo, A., Miramond, B., Pegatoquet, A., Verdier, F., Castagnetti, A.: Toward unsupervised human activity recognition on microcontroller units. In: 2020 23rd Euromicro Conference on Digital System Design (DSD), pp. 542–550 (2020) 20. Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) (2013) 21. STMicroelectronics: Artificial Intelligence Ecosystem for STM32. https://www.st.com/ content/st_com/en/ecosystems/stm32-ann.html (2022). Accessed 08 Jan 2022 22. STMicroelectronics: AI Expansion Pack for STM32CubeMX. https://www.st.com/en/ embedded-software/x-cube-ai.html (2022). Accessed 08 Jan 2022 23. Reyes-Ortiz, J.L., Anguita, D., Ghio, A., Oneto, L., Parra, X.: Human Activity Recognition Using Smartphones Data Set. https://archive.ics.uci.edu/ml/datasets/ human+activity+recognition+using+smartphones. Accessed 21 Jan 2022 24. Alessandrini, M., Biagetti, G., Crippa, P., Falaschetti, L., Turchetti, C.: Recurrent neural network for human activity recognition in embedded systems using PPG and accelerometer data. Electronics 10(14) (2021)
Chapter 41
Direction-of-Arrival Based Technique for Estimation of Primary User Beam Width Zeinab Kteish, Jad Abou Chaaya, Abbass Nasser, Koffi-Clément Yao, and Ali Mansour Abstract We consider a single transmitter and multiple receivers equipped with multiple antennas to exploit the spatial characteristics for transmission. To allow the receiver to transmit simultaneously with the transmitter but in a different direction, the receivers use two methods to estimate the half-power beamwidth of the transmitting signal. The first method uses the received power of the transmitter using the Energy Detector (ED). The second method relies on the Direction-of-Arrivals (DoA) at the receivers that are collected using conventional beamforming (BF) method. The performance analysis of the beamwidth estimation is investigated by calculating two metrics. The first is an angular interval that is missed and not detected as a part of the beam. The second is a false detection interval. The simulation results show that the DoA-based technique provides lower false detections compared to the ED-based method, with a tax of a slight increase in the angular missed detection. In addition, our DoA-based approach handles a Rayleigh channel whereas the ED-based technique fails to recover the BW in such a scenario.
41.1 Introduction Spectrum sensing (SS) approaches have been investigated extensively in the previous decades to offer safety for primary users (PU) in cognitive radio (CR) networks. When a primary user (PU) is silent, a secondary user (SU) can transmit data on the same band as the latter [1]. SU does this by continuously sensing the spectrum to Z. Kteish (B) · K.-C. Yao LABSTICC, UMR CNRS 6285, Université de Bretagne Occidentale, Brest, France e-mail: [email protected] Z. Kteish · J. A. Chaaya Applied Physics Lab, Faculty of Science, Lebanese University, Jdeideh, Lebanon J. A. Chaaya · A. Mansour LABSTICC, UMR CNRS 6285, ENSTA-Bretagne, Brest, France A. Nasser ICCS-Lab, American University of Culture and Education, Beirut, Lebanon © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_41
469
470
Z. Kteish et al.
find and fill spectrum voids, when feasible. A comprehensive assessment of current achievements in SS in a CR system is offered in [2]. Several forms of SS in CR, such as underlay and opportunistic, have been explored. Because both, PU and SU, use the same frequency band, the underlying CR system relies on channel status information between them to ensure that the SU does not cause significant interference to the PU [3, 4]. On the other hand, the SU doesn’t cooperate with the PU but senses the channel to track the PU’s activity in the opportunistic CR. Recently, the authors of [5] used reconfigurable antennas (RA) to boost transmission rates in opportunistic CR while accounting for two types of errors: inaccurate sensing of existing PU and channel estimate error between the SU’s transmitter and receiver. However, it is assumed in [5, 6] that the PU is an omnidirectional source. Furthermore, the authors in [7] included a trade-off between the sensing time required to monitor the PU’s activity and the transmission capacity that can be achieved at the SU. The authors considered a basic scenario in which the SU transmitter is aware of the PU’s direction. Researchers concentrated on a new spectrum sensing approach based on an angle exploration due to beamforming methodologies. Furthermore, rather than relying on the frequency or time spectrum dimensions, SUs with multi-antennas may detect the PU’s signal’s Direction-of-Arrival (DoA) and transmit in a new direction while maintaining the same frequency. The position of the PU should be known in order to administer the CR network, as well as to facilitate the power management mechanisms and discover spectrum gaps. Cabric et al. suggested a DoA-based localization technique for PU in a CR network [8]. The performance of localization has been tested in the presence of specific system factors. By adding the received-signal strength (RSS) metric to DoA, the author has raised the quality of her investigations in [9]. Similarly, the authors of [10] developed a RSS-based PU localization approach using a sectorized antenna at the SU end. Considering the near field aspect, the authors of [11] account the coherently distributed behaviour of the transmitters, as it becomes more relevant, to better localize and estimate the shape distribution of the transmitters. None of the aforementioned publications discussed the idea of estimating the interval of angles occupied by the PU’s beam. To allow a simultaneous transmission of the SUs and the PU, we proposed in a previous publication [12] an approach to estimate the Half Power Beamwidth (HPBW) of the PU’s beam. This approach is based on using the Energy Detector (ED) to collect the received power at the SUs. We define two main impacts for misestimating the HPBW, namely, the false and the missed angular detections. One of the drawbacks of the ED-based estimating technique was the high false detection. In this paper, we propose a method to estimate the HPBW, which decreases the false detection. This method is based on conventional beamformers [13] which provides the Direction-of-Arrival (DoA) at each SU position. Given that the SUs within the main lobe of the PU can accurately localize the latter, we check PU’s localization error at the SUs side. Therefore, the SUs with the lowest errors are considered to be within the PU’s beam. We compare the performance of the proposed method to that in [12]. Also, we evaluate the estimation parameters of the PU’s BW.
41 Direction-of-Arrival Based Technique for Estimation …
471
41.2 System Model We consider a single PU equipped with a ULA of N elements and K SUs, each one using a ULA with M elements. SUs make their observations on the channel and send them to a fusion center to make the final decision on the channel status. The fusion center is chosen to be one of the SUs which well know the positions of all SUs. The signal transmitted from the PU, s(t), is multiplied by a weight vector w(θPU ), where 1 ≤ t ≤ Ns and Ns is the length of the transmitted samples by the PU, and θPU is the beamforming angle, i.e., the transmission angle of PU in the azimuth plane. The transmitted signal at PU is given by: x(t) = w(θPU )s(t)
with w(θPU ) =
η Pt x ∗ a N PU
(41.1)
(41.2)
where η is the power control coefficient, Pt x is the transmitted power, and the steering vector aPU ∈ C N ×1 is given by: T sin θ sin θ aPU = 1 e− j2πdPU λPU . . . e− j2πdPU (N −1) λPU
(41.3)
d PU is the distance between the elements at the PU array and λ is the wavelength of the signal. The SUs are uniformly distributed in the area of a circle circumscribed about the PU with a maximum distance of R. The channel between the PU and the kth SU is divided into a Line of Sight (LOS) channel and Non-Line of Sight (NLOS) ones formed by reflecting objects in the environment. We adopt in this paper the Rician fading channel model. The LOS channel HLOS between the PU and the kth T SU forms a matrix of dimension M × N and it is given as [14]: HLOS = aSU,k aPU,SU . M×1 is the array response vector at each SU: Where aSU,k ∈ C sin θSU,k sin θSU,k T aSU,k = 1 e j2πdSU,k λ . . . e j2πdSU,k (M−1) λ
(41.4)
λ and dSU,k are defined in similar manner to (41.3), whereas θSU,k is defined as the angle of arrival (AoA) of the signal at the kth SU. Similarly, aPU,SU ∈ C N ×1 is defined as the array response vector that connects the PU to the kth SU: sin θSU,k sin θSU,k T aPU,SU = 1 e− j2πdPU λ . . . e− j2πdPU (N −1) λ
(41.5)
The NLOS channel HNLOS is a M × N matrix describing a Rician fading channel. Thus, the total channel model is the weighted sum of the two components HLOS and HNLOS [14].
472
Z. Kteish et al.
Hk =
−γ
C0 dk k αk HLOS + 1 + αk
−γ
C0 dk k HNLOS 1 + αk
(41.6)
Such that C0 is the path loss power at distance d = 1m, dk is the distance between the PU and the kth SU, with γk and αk are the path loss exponent and Rician K-factor, respectively. The Rician K-factor is the ratio of signal power in the direct path over the scattered power. The signal received at each SU is given by: yk (t) = Hk x(t) + gk (t)
(41.7)
where gk (t) is a M × 1 additive Gaussian noise component with zero mean and variance σg2 .
41.3 PU’s Half Power Beamwidth (HPBW) Estimation In this section, we present our methods to find the Half Power Beamwidth (HPBW) of the PU’s signal. Our methods are based on the received power and the DoA error estimate. In [12], the power-based technique has been discussed thoroughly. It has been shown that the SUs facing the PU’s main beam detect higher powers from those outside the beam or far away from it. Hence, it is applicable to distribute the SUs randomly in the area so as to collect variable powers. The power received by all SUs forms the PU’s beam.
41.3.1 Energy Detector Based Technique The conventional energy detector is adopted for the power-based technique since there is no prior information about the PU’s signal. The power collection indicates the direction of the signal and the angular region that is reserved by its beam. The power received at each SU is given by: pk =
M Ns 1 |yk (t)|2 M Ns m=1 t=1
(41.8)
To cope with the unpredicted channel variations, we use a moving average (MA) filter along the power data stored at the fusion center, represented by the vector pED of length K × 1, such that: T pED = p1 p2 . . . p K
(41.9)
41 Direction-of-Arrival Based Technique for Estimation …
473
1 μ−1 i=0 μ pED (k − i). Where μ is the filter’s order. We indicated in [12] the best way to choose μ. The method is based on calculating the sum of the absolute value of the difference (SAD) between the estimated power vector and its smoothed version using multiple K order values μ j . SAD is defined as: SAD = |p f,μ j − p E D |. k=1 SAD converges into a constant value after a specific order threshold μt , where filtering does not affect any more. For the rest of the paper, we fix μ to 85%μt . Applying MA to (41.9) results in the filtered form, p f (k), of p: pf (k) =
41.3.2 Beamforming (BF) Based Technique Given that the SUs are equipped with multiple antennas, we can use the BF technique [13]. This method estimates the PU’s DoA and the received power. The BF, also known as the sum and delay method is based on estimating the DoA of the sources that maximizes the power received from the corresponding desired direction of the user. The DoA estimation of the PU is computed by:
θˆ = arg max u H Ryy u θ
(41.10)
where Ryy = E{y(t)y H (t)} is the covariance matrix of the received signal and E(.) denotes the expectation operator. The optimal weight coefficient vector for the cona(θ ) ventional BF is obtained by the Lagrangian optimization [13] as u = H a (θ )a(θ ) where a(θ ) is M × 1 steering vector of the SU array. Assuming that the received snapshots are stochastic and ergodic signals, we estimate the expectation of a random variable by its average, therefore, the estimate of the 1 Ns yk (t)ykH (t). covariance matrix is computed using Ns snapshots by Rˆ yy = n=1 Ns The spatial spectrum of the BF can be re-written as: θˆ =
1 H ˆ yy a(θ ) a arg max (θ ) R M2 θ
(41.11)
Thus, the DoA estimates at each SU is sent to a fusion center, assuming that the fusion center knows the position and the orientation of the array of the SUs. All the values are collected in the vector θˆ B F T θˆ BF = θˆ1 θˆ2 . . . θˆK
(41.12)
The SUs facing the PU’s main lobe are more likely to localize the PU with lower error. Relying on this fact, our second proposed method to calculate the HPBW
474
Z. Kteish et al.
depends on the variation of the localization error with respect to the SU’s distribution. Considering the case of no orientation of the SUs’ arrays, where they simply face the PU, the error is defined by: θ e = |θˆ BF − θ true |
(41.13)
where θ true is the vector of the true DoA at all SUs. As done in the power-based technique, we apply a MA filter to θ e to better observe the data.
41.4 Performance Analysis To manage the transmission of the SU alongside with the PU, SUs should precisely indicate the nominal direction and the range of angles occupied by the signals of the licensed user. The nominal direction of the PU is the position of the SU with the max( pf ) for the ED-based technique, and min(θ e ) for the BF-based technique. Any error in estimating the previous parameters affects severely the PU signal. To elaborate, any phase shift in the beam position, and any modification in estimating the HPBW result in two problematic cases, illustrated in Fig. 41.1, and defined and their impacts as follows: 1. Angular False Detection f d : It is an angular region that is falsely estimated to belong to the PU’s beam. Thus, SUs that fall within this range are prohibited from the transmission, which decreases the throughput of the SUs’ system. 2. Angular Missed Detection m d : It is the angular range that is missed and not detected as a part of the beam. SUs in this range are freely allowed to transmit, which causes high interference to the PU. To calculate these two parameters, taking into account the shift that may occur and the inaccuracy in estimating the beamwidth, we define θmin and θmax as the minimum and maximum angles in the true HPBW, W . Similarly, θˆmin and θˆmax are the minimum and maximum angles in the estimated HPBW, (Wˆ ), for any of the proposed techniques. Thus, for W ∩ Wˆ = Ø, we define f d and m d as follows: ⎧ θˆmax − θmax , ⎪ ⎪ ⎪ ⎨
for θˆmax > θmax and θˆmin > θmin fd = ⎪ (θˆmax − θmax ) − (θˆmin − θmin ), for θˆmax > θmax ⎪ ⎪ ⎩ and θˆmin < θmin
(41.14)
⎧ θˆmin − θmin , ⎪ ⎪ ⎪ ⎨
for θˆmin > θmin and θˆmax > θmax md = ⎪ (θˆmin − θmin ) − (θˆmax − θmax ), for θˆmin > θmin ⎪ ⎪ ⎩ and θˆmax < θmax
(41.15)
41 Direction-of-Arrival Based Technique for Estimation …
475
Fig. 41.1 f d error (in red) and m d error (in blue) regions caused due to the shift in the estimated HPBW
Nevertheless, if W ∩ Wˆ = Ø, f d = Wˆ and m d = W . Both metrics are studied with respect to the variation of affecting parameters in the following section.
41.5 Simulation Results In this section, we validate our proposed approaches by computing the variation in the false and missed ranges with respect to different parameters. For example, we study the variation of the errors with different number of SUs. Knowing that the number of antenna elements at the PU define the width of its signal, we check the effect of varying the PU’s beam on the estimation accuracy. It is worthy to mention that since both PU and SUs use Uniform Linear Arrays (ULA), and due to symmetric characteristics, it is enough to look at the SUs that are in the half-plane facing the PU. So, we only care about the power values obtained at the SUs in the range of [−π/2, π/2]. Unless otherwise specified, the parameters used in our simulations are given by: Ns = 1000 samples, N = 10 and M = 3 antennas, the power coefficient η = 1, Pt x = 30 dBm, the noise power Pn = −90 dBm, C0 = −30 dB, γ = 2.5, and αk = α = 5.
476
Z. Kteish et al. a) Missed detections vs number of SUs
md ( )
6
BF error ED BF power
4
2
0 20
40
60
80
100
120
140
160
50
200
BF error ED BF power
40
fd ( )
180
number of SUs b) False detections vs number of SUs
30 20 10 0 20
40
60
80
100
120
140
160
180
200
number of SUs
Fig. 41.2 a m d and b f d versus number of SUs with μ = 85%μt
By considering η = 1, this means that Pt x is distributed equally between the N antenna elements. For the following simulations, we define 100 position distributions for the SUs and for each configuration we run 100 Monte Carlo simulation. Assume that the PU’s receiver is at 0◦ away from the PU’s transmitter, which requires that θ PU = 0◦ . The true BW is defined [15] in dB as: β Dλ , where D is the antenna array’s diameter and β is a proportionality constant, which is defined for most of the cases as β = 1. In the following simulations, the SUs are distributed randomly in the half area facing the PU, with a maximum distance R = 100 m. It is assumed that each SU knows its corresponding dk . In a real life scenario, a localization technique can be applied at the SU to compute this distance. Our following simulations will consider the f d and m d errors’ variations with respect to the variation of N antennas and K SUs. In our first simulation, we check the variation of m d and f d with respect to the number of SUs K . The results are provided in Fig. 41.2. We define BF error as the DoA’s error-based technique, BF power is a power based technique where the power is collected from the BF method, and ED corresponds to the energy detector technique. The results show that as the number of SUs increases, m d and f d decreases for the power-based techniques. ED succeeds in reaching a zero m d for high number of SUs. However, the ED has the highest f d . This is due to the fact that the power techniques result in a wide estimated beamwidth, so, even if a shift in the peak power occurs, all true angles will be detected in the estimated beam. On the other hand, although the DoA’s error based technique has a slightly higher m d (about 2◦ for low number of SUs and less than 1◦ for high number of users), it has a significantly lower f d . It
41 Direction-of-Arrival Based Technique for Estimation …
477
a) Missed detections vs N
15
BF error ED BF power
m d( )
10
5
0 5
10
15
20
25
30
35
40
N b) False detections vs N
40
BF error ED BF power
fd ( )
30 20 10 0 5
10
15
20
25
30
35
40
N
Fig. 41.3 a m d and b f d versus number of antennas at the PU N
is noticed that f d is constant with the variation of K for the error-based technique. This means that the estimated HPBW doesn’t vary with respect to the variation of the number of SUs, but as the m d decreases, this indicates that the shift in the peak position is coped with the increase of K . The second simulation studies the effect of varying the PU’s beamwidth; we vary N from 5 antennas up to 40 antennas. As N increases, the PU’s beamwidth gets narrower and gains a higher chance in being estimated. Figure 41.3 presents the variation of m d and f d as function of N . In this simulation, the number of SUs is K = 60. For the DoA-based method, it is noticed that as N increases m d sharply decreases to reach a zero m d when N = 20 antennas, where the PU’s width is around 5.7◦ . Yet, the power-based techniques face a higher f d even for a small N , where the PU’s beam is already wide enough. This ensures that the estimated beam is mostly wider than the true one. One more parameter that affects the simulation is the Rician K-factor α. For α = 0, the channel becomes totally Rayleigh fading. We studied the estimation accuracy of both techniques when the channel is Rayleigh. The results are presented in Fig. 41.4. It is interesting to notice that the DoA-based technique outperforms the power-based technique. The latter suffers from a distinctly high and constant f d and m d that do not improve even with high K . Though, the former shows a zero m d beyond K = 60 users.
478
Z. Kteish et al. a) Missed detections vs number of SUs
6
ED BF error
m d( )
4
2
0 20
40
60
80
100
120
140
160
100
200
ED BF error
80
fd ( )
180
number of SUs b) False detections vs number of SUs
60 40 20 0 20
40
60
80
100
120
140
160
180
200
number of SUs
Fig. 41.4 a m d and b f d versus number of SUs with α = 0
41.6 Conclusion This paper proposes a novel method based on the DoAs to estimate the HPBW of the PU’s beam. Two errors, the angular missed and angular false detections, result from misidentifying the correct beamwidth. We compute the variation of these errors with various system parameters. The results are compared to those resulting from the ED-based technique which was proposed in a previous publication. The DoAbased technique achieves lower false detections with a slight increase in the missed detections. In addition, this approach presents good results for a narrow PU’s beam. It also shows high performance in a Rayleigh fading channel. After identifying the range of angles occupied by the PU’s beam, SUs can perform an interference-free transmission. This will be examined and managed in our future work.
References 1. Yucek, T., Arslan, H.: A survey of spectrum sensing algorithms for cognitive radio applications. IEEE Commun. Surv. Tutor. 11(1), 116–130 (2009) 2. Nasser, A., Al Haj Hassan, H., Abou Chaaya, J., Mansour, A., Yao, K.-C.: Spectrum sensing for cognitive radio: recent advances and future challenge. Sensors 21(7) (2021). https://doi. org/10.3390/s21072408 3. Goldsmith, A., Jafar, S.A., Maric, I., Srinivasa, S.: Breaking spectrum gridlock with cognitive radios: an information theoretic perspective. Proc. IEEE 97(5), 894–914 (2009)
41 Direction-of-Arrival Based Technique for Estimation …
479
4. Joneidi, M., Yazdani, H., Vosoughi, A., Rahnavard, N.: Source localization and tracking for dynamic radio cartography using directional antennas. In: 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), USA, pp. 1–9 (2019) 5. Yazdani, H., Vosoughi, A., Gong, X.: Achievable rates of opportunistic cognitive radio systems using reconfigurable antennas with imperfect sensing and channel estimation. IEEE Trans. Cognit. Commun. Netw. (2021). https://doi.org/10.1109/TCCN.2021.3056691 6. Yazdani, H., Vosoughi, A.: On cognitive radio systems with directional antennas and imperfect spectrum sensing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 3589–3593 (2017). https://doi.org/10.1109/ ICASSP.2017.7952825 7. Yazdani, H., Vosoughi, A.: On optimal sensing and capacity trade-off in cognitive radio systems with directional antennas. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP), November 2018, pp. 1015–1019. https://doi.org/10.1109/GlobalSIP.2018. 8646661 8. Penna, F., Cabric, D.: Cooperative DoA-only localization of primary users in cognitive radio networks. EURASIP J. Wirel. Commun. Netw. 2013(107) (2013). https://doi.org/10.1186/ 1687-1499-2013-107 9. Wang, J., Chen, J., Cabric, D.: Cramer-Rao bounds for joint RSS/DoA-based primary-user localization in cognitive radio networks. IEEE Trans. Wirel. Commun. 12(3), 1363–1375 (2013). https://doi.org/10.1109/TWC.2013.012513.120966 10. Saeed, N., Nam, H., Al-Naffouri, T.Y., Alouini, M.S.: Primary user localization and its error analysis in 5G cognitive radio networks. Sensors 19(9) (2019). https://doi.org/10.3390/ s19092035 11. Abou Chaaya, J., Picheral, J., Marcos, S.: Localization of spatially distributed near-field sources with unknown angular spread shape. Signal Process. 106, 259–265 (2015) 12. Kteish, Z., Chaaya, J.A., Nasser, A., Yao, K.-C., Mansour, A.: Estimation of the primary user’s beam width using cooperative secondary users. In: IEEE 94th Vehicular Technology Conference (VTC2021-Fall), pp. 1–5 (2021). https://doi.org/10.1109/VTC2021-Fall52928. 2021.9625154 13. Krim, H., Viberg, M.: Two decades of array signal processing research: the parametric approach. IEEE Signal Process. Mag. 13(4), 67–94 (1996) 14. Jeon, K., Su, X., Hui, B., Chang, K.: Practical and simple wireless channel models for use in multipolarized antenna systems. Int. J. Antennas Propag. 2014, 10 pp. (2014). https://doi.org/ 10.1155/2014/619304 15. Skolnik, M.: Introduction to RADAR Systems, 3rd edn. McGraw Hill, New York (2001)
Part VIII
Artificial Intelligence Innovation in Daily Life
Chapter 42
Fuzzy Decision-Making in Crisis Situation: Crowd Flow Control in Closed Public Spaces in COVID’19 Wejden Abdallah , Oumayma Abdallah, Dalel Kanzari, and Kurosh Madani
Abstract Human behavior in a crisis situation can be very different from what is expected. Although intrinsically related to the personality of individuals and several other educational and innate parameters, in response to crisis situations, several emotional characters, and spontaneous behaviors can be triggered in search of outcomes. These psychological expressions are multiple and can be followed by variable decision-making inadequacy with the situation. This paper presents the impact of fuzzy behavior in the decision-making process in a crisis. The objective is to control the flow in public closed spaces while avoiding crowd formation to prevent contamination of the COVID’19 virus. We evaluate our model by simulating passenger behaviors in a closed public area during the post-pandemic COVID’19 context. In the experiments, we show the impact of the combination of rationality and emotional characters on the traffic flow and the risk of the pandemic spread.
42.1 Introduction When an emergency occurs in certain large and crowded areas, individuals are susceptible to chaos and congestion, resulting in massive economic losses and even deaths [4]. As a result, in recent years there has been a great deal of interest in W. Abdallah (B) LARIA Laboratory, National School of Computer Sciences (ENSI), University of Manouba, Manouba, Tunisia e-mail: [email protected] O. Abdallah Higher Institute of Management, University of Sousse, Sousse, Tunisia D. Kanzari Higher Institute of Applied Sciences and Technology, University of Sousse, Sousse, Tunisia K. Madani LISSI/EA 3956 Laboratory, Senart Institute of Technology, Paris-East University (UPEC), Lieusaint, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_42
483
484
W. Abdallah et al.
modeling and simulating evacuation crowds. In the literature, crowd behavior models can be divided into two categories: macroscopic models and microscopic models. Macroscopic models describe the overall movement of the population but do not offer individual details. On the other hand, microscopic models focus on the details of individuals. They have been used in several crowd simulation studies to better understand crowd behavior during emergencies. Microscopic models include cellular automated models, force-based models, and agent-based models [6]. Agentbased models are widely used because it adds more human-like features to crowd simulations, allowing each agent in the crowd to be entirely autonomous [6]. The change in the environment and the surrounding agents may affect the variation of emotions during such an emergency scenario. Thus, this variation of emotions may influence agent decision-making as well as the spread of the virus. In this paper, we propose an agent-based model that combines rational behavior, evolving emotional behavior, and a fuzzy decision to simulate human behavior in a crisis situation. We aim at analyzing the impact of these behaviors in making the right decision and overcoming the crisis. The rest of this paper is organized as follows. Section 42.2 gives an overview of the related works. Section 42.3 describes the formulation of the model and the proposed solution. In Sect. 42.4 we detail and discuss some experiments and results. Finally, Sect. 42.5 concludes the paper and gives future perspectives.
42.2 Related Works Simulating uncertainty and ambiguity in human behavior is crucial in many different situations, such as emergency evacuation, to increase the credibility of crowd simulations. A fuzzy logic approach offers some benefits over other approaches, such as the ability to exploit perceptual information and human knowledge and to mimic human thought. Therefore, the fuzzy logic approach provides a natural and appropriate tool to simulate the dynamic behavior of individuals [10]. This section details some work based on fuzzy logic for human crowd evacuation. Zhu et al. [12] developed a fuzzy logic-based mathematical model for pedestrian evacuation. Each pedestrian’s behavior is linguistically described by some simple qualitative physical and psychological principles. In addition, Dell’Orco et al. [3] have presented a microscopic, time-discrete behavioral model of crowd dynamics that incorporates the fuzzy perception and anxiety embedded in human reasoning. The model was able to reproduce typical crowd evacuation phenomena such as rainbow-like arching structures. It is also capable of handling choice behavior while taking the ‘herding behavior’ factor into account. In [7] the authors used genetic algorithms, neural networks, and fuzzy logic to create learning agents. During an evacuation, these agents can change their behavior. Emotions, memory, behavior rules, and decision-making abilities are all characteristics of agents. They claim that the prototype implementation produces simulations that are similar to real-world scenarios. Zakaria et al. in [11] proposed their model that aims to create a human-like
42 Fuzzy Decision-Making in Crisis Situation …
485
evacuation model by incorporating key emotions observed in the real-world video into their approach. Without ignoring the most important aspect of an emergency, stress, the Lazarus theory of cognitive and appraisal in stressful situations was chosen to be included in the proposed model. While the aforementioned research has made great findings for the dynamics of the crowd using a fuzzy logic approach, human interaction (communication) and emotional variations during crowds are neglected. The findings therefore cannot be closest to the scenarios of real life. Therefore, to map the relationships between emotions and decision-making, a fuzzy logic method can be applied. The significant increase of daily cases on COVID’19 pandemic encourages researchers to invest in this field, to reduce the number of infected individuals and, in particular, to establish a safe evacuation while avoiding the crowd, which is a critical task, especially when working in an extremely contagious situation that usually requires significant solutions. Suvar ¸ et al. [9] present in their work the main challenges to which emergency evacuation models must respond, particularly those that consider social distancing and interaction between individuals within a given distance, all to reduce the risk of disease transmission during the building’s evacuation process. Ronchi et al. [5] examined in their study the use of crowd modeling tools during pandemics, focusing on aspects related to physical distancing, and presented a methodology that is intended to guide crowd model users. To promote our choice, we compare previous works using some parameters as illustrated in Table 42.1. A review of previous works allowed us to identify several flaws: Firstly, the majority of these works have one similar theme: they all use the multi-agent system to simulate crowd evacuation. Secondly, to simulate a crowd, FL-based works do not use a communication module. Authors do not integrate different emotions to mimic real human behavior in emergencies. No work combines emotions, fuzzy logic, and the Agent-Based Modeling approach, as well as communication between agents. In the case of COVID’19, they are investigating the use of crowd modeling tools with a focus on aspects related to social distancing.
42.3 Proposed Approach To better see the interest of decision-making in crowd situations, we propose an architecture that combines adaptive rational reasoning with fuzzy logic. In a crowd situation and more specifically in an emergency, the most important thing is to get out quickly with the least damage. In such a case, several emotional characteristics can impact the decision and even external parameters. Hence, the idea is to apply fuzzy logic to see the dynamics of the emotional characters during the evacuation process. Secondly, insert fuzzy logic in the final decision-making based on the individual and external parameters. The proposed approach is known as Fuzzy Susceptible Pedestrian Agent (FSPA). It combines Rational Reasoning with Emotional Reasoning. The architectural process is made up of five distinct modules as shown in Fig. 42.1:
486
W. Abdallah et al.
Table 42.1 Comparative table of previous works References Agent based Fuzzy logic X [3] X
X
[7]
X [11] X [2]
X
X
[8]
X [12]
Application of fuzzy logic
Emotions
The deviation from the original path Explore how X agents can learn and adapt their behavior during an evacuation Imitating human X behaviour during real evacuation Translate the forces modeled by SFM equations into desire and interaction effects Analyze the X impact of stress and panic on aircraft evacuation times Influence of some parameters on time evacuation and dynamical features of pedestrian
• Computing nearest gate based on the positions of the gates, agents compute the respective distances to the gates using the Euclidean distance (d(x, y) = m 2 j=1 (y j − x j ) ), and then select the closest gate. • Knowledge Data Base contains the internal knowledge for an agent: e.g. current state, current position, emotional parameters, and their variation over time. • Rational Adaptive Reasoning is based on computing the nearest distance between the agent and the selected gate. The reasoning is adaptive because, for each new position, it adapts to the minimum distance to the closest and safest gate. The safest one, because as it advances, there will be an update of the agent status. Depending on the state of its linens and the number of infected agents detected, there will be an update by applying the SIR Model [1]. • Fuzzy Temperament Modelling based on Fuzzy Logic includes two reversed emotional characters, fear and confidence. The module is designed to trace the evolution of the emotional characters during the simulation.
42 Fuzzy Decision-Making in Crisis Situation …
487
Fig. 42.1 Fuzzy susceptible pedestrian agent (FSPA) architecture
• Dominant Character Emergence to select the most dominant character for each simulation time. • Fuzzy Decision-Making: final agent action with state-based fuzzy logic.
42.3.1 Rational Adaptive Reasoning In the following module, the agent will decide its next position based on the gate it selects at the beginning and its current position. Then, each FSPA receives information from the neighboring agents, which are states and positions. Through this information, the agent can know how many of its neighbor agents are infected and if necessary its state will be updated. If the number of infected agents is significant, it will affect the agent’s current state. If the number of the infected agent is more than 2 the agent state will change from Susceptible to infected using SIR Model [1].
42.3.2 Fuzzy Temperament Modelling Human behavior patterns are strongly influenced by psychological processes, especially emotions that impact cognitive processes and decision-making. In an emergency crowd situation, there will be the appearance of certain emotional characters that will strongly influence decision-making. To present a better global understanding of the impact of emotions on decision-making, we choose two divergent emotional traits: “Fear” and “Confidence”. The choice of these two characters is the first step to see the impact, and then we can add others. This module is based on a fuzzy inference system to learn the temperament of each agent, which has the following steps: Fuzzification of the input variables, Rules-based reasoning using IF-THEN rules, and Defuzzification. Starting with the first step, fuzzification, which is about converting each piece of input data into a degree of membership, there are four input
488
W. Abdallah et al.
Table 42.2 Linguistic variables for fuzzy temperament variation module Inputs Linguistic ranges Number of infected Low Medium High neighbors (NBIF) Distance Fear Confidence
Near Totally fear Totally confident
Medium Partially fear Partially confident
Far Not fear Not confident
variables to consider: the number of infected neighbors noted NBIF (Low, Medium and High), distance to a selected gate (Near, Medium and Far), and the intensity of fear (CH1), and intensity of confidence (CH2) (with linguistic ranges Totally, Partially and Not). The fuzzification generates two outputs that denote the variation of the two emotional characters: VFear and VConfidence. The various linguistic variables of the inputs and outputs are shown in Table 42.2. The second step is the rules-based reasoning that contains all possible fuzzy relations between input and output. These rules are expressed in IF-THEN format. As following some of the rules concerning this module which are based on assumptions: • IF (CH1 is Totally fear) AND (CH2 is Not confident) AND (NBIF is Low) THEN (VFear is Partially fear) AND (VConfidence is Partially confident) • IF (CH1 is Partially fear) AND (CH2 is Totally confident) AND (NBIF is High) THEN (VFear is Totally fear) AND (VConfidence is Partially confident). The resulting fuzzy value of the emotional characters from the second step should then be converted into numbers in the last step, which is defuzzification. The result of this step is the variation of each emotional character. The defuzzified output will be evaluated in the next module of our contribution.
42.3.3 Dominant Character Emergence After obtaining the variation of the emotional characters (the defuzzified output) the agent is now tasked with selecting their dominant character (DChar). It emerges the higher value of two characters based on their variations.
42.3.4 Fuzzy Decision-Making The decision-making process is a sophisticated process that is influenced by a plethora of parameters including emotion and behavior. An external agent (Advisor) is added to control the agent while providing advice to move (good) or halt (not good). This
42 Fuzzy Decision-Making in Crisis Situation …
489
advice helps throughout to investigate the agent’s cooperative behavior with the external agent. The Fuzzy Decision-making is based at first on Fuzzy Logic (the same as the Fuzzy Temperament Emergence module) there are three input variables to take into account which are: Advice received from an external agent, Current State and the dominant character (DChar). The result of Fuzzy logic is the agent Cooperative Behavior, which can be one of five linguistic variables: Very High, High, Medium, Low, and Very Low, which demonstrated the degree of cooperation of each agent with the external agent. Since Fuzzy Decision-making is based also on Fuzzy Logic, all Fuzzy Logic steps are applied as the Fuzzy Temperament Variation module. The various linguistic variables of the inputs and outputs are shown in Table 42.3. Having followed the fuzzification of various variables, the next step is to generate fuzzy rules based on assumptions. Some rules for the module of Fuzzy Behavior are illustrated as follow: • IF (DChar is Totally fear) AND (State is Very Infected) AND (Advice is Not Good) THEN (Cooperative Behavior is High) • IF (DChar is Partially confident) AND (State is Very Infected) AND (Advice is Not Good) THEN (Cooperative Behavior is Low). The same rules apply to susceptible agents (Low Infected). The agent decision may be influenced by the cooperative behavior that has been defuzzified using Fuzzy Logic. The agent’s decision can be diversified between two propositions: stop in their current position or move to the next position. The agent decision is based on three major variables: current position, next position, and cooperative behavior that influences the final next position. To accomplish this, we use the IF-THEN rules to construct the rules. Based on assumptions some rules must be followed to reach the final new position, for example: 1. IF Cooperative Behavior is Very High THEN Final new position is next position 2. IF Cooperative Behavior is Medium THEN Final new position is current position.
Table 42.3 Linguistic variables for fuzzy behavior Inputs Linguistic ranges Current state Advice Dominant character Cooperative behavior
Low infected Good Totally
High infected Not good Partially
Not
Very low
Low
Medium
High
Very high
490
W. Abdallah et al.
42.4 Experiments and Results Experiments were performed to evaluate the effectiveness of our fuzzy approach to managing flows in a crowd situation. To validate our approach, we used a closed public space as a simulation context in the case of COVID’19. Each public space has several emergency exits. For our simulation, we chose two gates. Multiple scenarios were created to perform a preliminary evaluation of our proposed model. The scenarios are based on a simulation of a variable number of agents over many steps to reach the gates. The values that occur on fear and confidence variation and the dominant character figures are the result of our contribution’s assumptions that use Fuzzy Logic as described in Table 42.4. By hypothesis, we suppose that there is a quarantine period when the agent is infected. In our simulation, the infected agent will be blocked for a few steps in its position and will be redirected to a gate specified to the infected agents to avoid the contamination of others. In this case, the role of the advisor is important to give good advice for the routing of these infected agents. We began by simulating a simple evacuation scenario in an epidemic situation. We started with 100 agents, during 20 steps, some of whom are infected and the others are not. As shown in Fig. 42.2, where we have 50 infected agents at the beginning of this simulation. According to Fig. 42.3, in step 3 the dominant character (Partially Fear or Partially Confidence) becomes more prominent, and the number of infected agents decreases as shown in Fig. 42.4. This dominance of Partially Fear or Partially Confidence will influence the agent’s cooperative behavior. The decrease in the number of the infected agents after step 3 is explained by the external agent control and authority in this situation. As a result of this in the crowd situation, the external agent was able to reduce contamination. After the first simulation, we launched the simulation of 200 agents in 40 steps. The simulation result is presented in Fig. 42.4. We notice that at the start of the simulation, there are 100 infected agents; this number increases in subsequent steps. Step 3 notices a decrease in the number of infected agents, with only 20 remaining. This number gradually increases until we have no infected agents at step 32. As illustrated in Fig. 42.5, at step 0 we have various values of emotional characters (Totally, Partially and Not feat as well as Totally, Partially and Not confidence). However, in step 2 we notice that more than 150 agents have partially fear and partially confident character, but only 25 infected agents are not fear and not confident as illustrated in Fig. 42.5. The decrease in the number of infected agents
Table 42.4 Significance of interval of various characters and dominant character Interval Signification ]0, 50[ ]20, 75[ ]50, 100[
Not (fear or confidence) Partially (fear or confidence) Totally (fear or confidence)
42 Fuzzy Decision-Making in Crisis Situation …
491
Fig. 42.2 Figure show the number of susceptible and Infected agent (y-axis) during each step (x-axis) with 100 agents where blue color and 0 represents susceptible agent and red color and 1 represent infected agent
in the following steps is due to this change of emotional characters from step 0 to step 2. In a crisis, the psychological factor plays a key role. We can optimize each agent’s ultimate decision by considering emotions. This reinforces our choice to prevent crowds and contain diseases by adding the emotional aspect to the COVID’19 pandemic. In particular, the effectiveness of fuzzy logic by imitating real human emotion, its application during crowd situations to simulate agent temperament, and therefore is crucial for agent right decision-making. Based on the experiments mentioned in the previous section and obviously others, we notice the impact of emotional characters on the final decision. As well as the role of the advisor to control the decisions of the agents that have cooperative behavior. The emotions in our simulations are based on hypotheses. To get closer to reality and make simulations in real-time, we can use EEG (Electroencephalogram) signals to detect the emotion at time t and see its variation in our simulation context.
492
W. Abdallah et al.
Fig. 42.3 Agent dominant character variation in which x-axis represent the steps and y-axis includes the value of the dominant character as described in Table 42.4
Fig. 42.4 Figure show the number of susceptible and infected agent (y-axis) during each step (x-axis) with 200 agents where blue color and 0 represents susceptible agent and red color and 1 represent infected agent
42 Fuzzy Decision-Making in Crisis Situation …
493
Fig. 42.5 Variation of fear and confidence from step 0 to step 2 where the x-axis represents the value of each emotional character as shown on Table 42.4 and the y-axis depicts the number of agents
42.5 Conclusion To understand agent behavior and decision-making during the pandemic of COVID’19, we combine both rational and emotional reasoning in which we use two emotional characters fear and confidence. In this paper, we have integrated fuzzy logic into two parts of our approach. On the one hand, in the temperament module, to follow the dynamics of the emotional characters during the simulation. On the other hand, we have integrated it into the final decision. According to simulations, emotional variation influences the agent’s decision-making as well as its cooperative behavior with the advice of the external agents, these results explain the importance of studying emotion variation during an emergency. Future research will focus on incorporating more emotions into our work to be more representative of real human behavior, integration of negotiation between multiple agents to resolve conflicts and generate value in interactions between them, as well as using machine learning to enhance agent decisions through experience.
494
W. Abdallah et al.
References 1. Blackwood, J.C., Childs, L.M.: An introduction to compartmental modeling for the budding infectious disease modeler (2018) 2. Del Sent, A., Roisenberg, M., de Freitas Filho, P.J.: Simulation of crowd behavior using fuzzy social force model. In: 2015 Winter Simulation Conference (WSC), pp. 3901–3912. IEEE (2015) 3. Dell’Orco, M., Marinelli, M., Ottomanelli, M.: Simulation of crowd dynamics in panic situations using a fuzzy logic-based behavioural model. In: Computer-Based Modelling and Optimization in Transportation, pp. 237–250. Springer (2014) 4. Mao, Y., Fan, X., Fan, Z., He, W.: Modeling group structures with emotion in crowd evacuation. IEEE Access 7, 140010–140021 (2019) 5. Ronchi, E., Scozzari, R., Fronterrè, M.: A risk analysis methodology for the use of crowd models during the covid-19 pandemic (2020) 6. Sakour, I., Hu, H.: Robot-assisted crowd evacuation under emergency situations: a survey. Robotics 6(2), 8 (2017) 7. Sharma, S., Ogunlana, K., Scribner, D., Grynovicki, J.: Modeling human behavior during emergency evacuation using intelligent agents: a multi-agent simulation approach. Inf. Syst. Front. 20(4), 741–757 (2018) 8. Sharma, S., Singh, H., Prakash, A.: Multi-agent modeling and simulation of human behavior in aircraft evacuations. IEEE Trans. Aerosp. Electron. Syst. 44(4), 1477–1488 (2008) 9. Suvar, ¸ M.C., P˘asculescu, V.M., Irimia, A., P˘asculescu, D.: Aspects regarding numerical models for safe evacuation of people, in the current pandemic context. In: MATEC Web of Conferences, vol. 342. EDP Sciences (2021) 10. Xue, Z., Dong, Q., Fan, X., Jin, Q., Jian, H., Liu, J.: Fuzzy logic-based model that incorporates personality traits for heterogeneous pedestrians. Symmetry 9(10), 239 (2017) 11. Zakaria, W., Yusof, U.K., Naim, S.: Modelling and simulation of crowd evacuation with cognitive behaviour using fuzzy logic. Int. J. Adv. Soft Comput. Appl. 11(2) (2019) 12. Zhu, B., Liu, T., Tang, Y.: Research on pedestrian evacuation simulation based on fuzzy logic. In: 2008 9th International Conference on Computer-Aided Industrial Design and Conceptual Design, pp. 1024–1029. IEEE (2008)
Chapter 43
Perception of 3D Scene Based on Depth Estimation and Point-Cloud Generation Shadi Saleh , Shanmugapriyan Manoharan, Julkar Nine, and Wolfram Hardt
Abstract Depth estimation and object instance recognition capabilities in 3D space are critical for autonomous navigation, localization, mapping, and robotic object manipulation. RGB-D images and LiDAR (Light Detection and Ranging) point clouds are the most descriptive formats for depth information. However, these depth sensors have many shortcomings, such as low effective spatial resolution and capturing a scene from a single perspective. Our work is mainly focused on reproducing a denser and more comprehensive 3D scene structure of given monocular RGB images employing depth and 3D object detection. The first contribution is the proposal of two architectures for estimating depth maps based on an unsupervised learning system, which are utilized to analyze the structure from motion and 3D geometric constraints and deliver better results than existing methodologies. The second contribution involved the generation of a 3D point cloud based on the estimated depth map and a monocular RGB frame. The primary goal of this part was to enhance the visualization of the 3D scene structure. Our proposed solutions achieve better outcomes in terms of ARD, RMSE, RMSE (log), accuracy and other evaluation metrics with lower but equivalent results compared to state-of-the-art methods with a maximum depth of 80 m.
S. Saleh (B) · J. Nine · W. Hardt Department of Computer Engineering, Chemnitz University of Technology, Straße der Nationen 62, 09111 Chemnitz, Germany e-mail: [email protected] J. Nine e-mail: [email protected] W. Hardt e-mail: [email protected] S. Manoharan Embedded Systems, Chemnitz University of Technology, Straße der Nationen 62, 09111 Chemnitz, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_43
495
496
S. Saleh et al.
43.1 Introduction Automobiles have become a primary means of transportation in today’s world. The notable advancements in the automotive domain are the Advanced Driver Assistance System (ADAS) and autonomous driving [1]. As the road traffic increases rapidly, these advancements mainly ensure the safety of the driver and passengers. Every year, 20–50 million people get wounded, and around 1.2 million die in car accidents. Several applications developed in ADAS, like Adaptive Cruise Control (ACC), collision & lane departure warning, blind-spot detection, and others, reduce accidents [2]. It is advantageous to be aware of the three-dimensional (3D) view of the surroundings in order to be able to react more effectively in dangerous situations. In computer vision (CV), reconstructing the 3D scene from a two-dimensional (2D) image is one of the most unavoidable challenges [3, 4]. We use depth estimation to perceive the spatial structure of the scene. However, in the real world, each object has three dimensions. The third dimension of an object is lost when it is projected onto the image plane. In 3D computer vision and computer graphics (CG), 3D structure and scene geometry reconstruction are fundamental. The 3D structure restoration from the multiple scenes requires numerous efforts, such as multi-view stereo and motion structure. Similarly, a single-view metrology method was employed to recover the single-view 3D geometry. Similarly, depth map estimation is one of the techniques to reconstruct the scene geometry and 3D structure. Common depth-sensing techniques include Sound Navigation Ranging (SONAR), Radio Detection and Ranging (RADAR), LiDAR, and stereo vision and its variants such as structured-light sensors. Still, these depth sensing techniques offer many shortcomings (such as sparse depth, affected by weather conditions, and expensive). The available depth sensors cannot achieve both better Range, Resolution, and Accuracy (RRA) and lower Size, Weight, and Power (SWP). These limitations make the available sensors inadequate for most hardware platforms, small robots, and embedded systems. The monocular depth estimation technique performs better than stereo vision as the monocular camera works efficiently in low-texture regions, and it is less expensive, lighter, consumes less power, and produces a denser depth map. However, an ill-posed problem is faced while estimating the depth of a monocular RGB image. The existing research work has been done to estimate the depth map using CV and deep learning. The CV approaches mainly focus on indoor scenes with fewer environmental conditions, and they are not suited for outdoor use due to a few limitations. The limitations of these algorithms only address geometric differences (triangulation method), and predicting the camera motion and scene geometry is a challenging task. Thus, the estimated depth by the CV approach is sparse and not dense. With the advancement of deep learning techniques, various CNN architectures have been proposed to understand 3D and scene geometry. 3D object detection is also a crucial task for autonomous driving other than depth estimation. The recognition of 3D objects in CV is getting the 3D information of the object’s pose, volume, or shape. The reconstruction of a 3D object from a single 2D image is an ambiguous task.
43 Perception of 3D Scene Based on Depth Estimation …
497
Fig. 43.1 Comprehension of 3D Scene geometry from 2D RGB image
Humans are capable of perceiving a 3D structure from a 2D image. In Fig. 43.1b, the gray car is closer to the camera than the white car, and the white car is closer to the camera than the building in the background. The road surface is facing up towards the sky (see Fig. 43.1c), and the gray car has a cuboidal shape without looking at its occluded regions (see Fig. 43.1d). The projection of a 2D image into a 3D world results in infinite solutions. Hence, humans rely on learning from previous experience to resolve the ambiguity. The depth estimation helps us understand the third dimension from a single 2D image (3D properties) and regresses the 3D bounding boxes based on the 2D object and its 3D property help us to perceive the 3D structure and scene geometry more precisely. This paper proposes a precise, cost-effective, and low-power strategy for perceiving a 3D scene in real-time. An effective solution based on the CNN network was developed to recognize depth information and generate point clouds from a frontal monocular camera capable of discerning which components of the scene are close and which are further away from the camera posture. We may examine objects and surfaces that are supposed to be in flat photos as 3D scenes with the help of the estimated depth and produced point cloud.
43.2 Background Research There are various approaches that focus on comprehending the 3D scene based on estimating the depth map and detecting the objects in the 3D space. That is performed either from stereo image pairs, images captured from multiple views, static scenes, or with a fixed camera location. Here we focus on the approach to improving the results of the existing approaches [5, 6] by proposing a new architecture to estimate the depth map.
43.2.1 Supervised Monocular Depth Estimation The supervised techniques get a single image for training the depth network using the depth data from supervision, which is measured by RGB-D cameras or multi-channel laser scanners. Saxena et al. [7] presented a method for acquiring the functional
498
S. Saleh et al.
mapping of depth from visual cues using a Markov random field and learning 3D orientation and local plane position using a patch-based method [8]. A multi-scale convolutional architecture was proposed by Eigen et al. [9], which learns the global depth predictions (coarse) using one network and refines the predictions with the help of another network. These two networks learn the depth from the raw pixels without any prior feature extraction. Eigen et al. [9] also incorporates strong scene priors for surface normal estimation [10], improves accuracy by employing conditional random fields or by shifting the learning problem from regression to classification [11]. The state-of-the-art result was achieved by Fu et al. [12] using quantized ordinal regression. Yin et al. [13] use the geometric constraints in the reconstructed 3D space to determine the randomly chosen points.
43.2.2 Semi-supervised Monocular Depth Estimation Some research has been carried out to estimate the depth map using the semisupervised or weakly supervised learning framework. Chen et al. [14] proposed an approach that predicts the depth map from the unconstrained images by utilizing the relative depth and depth ranking loss function. Kuznietsov et al. [15] proposed a technique to overcome the challenges of obtaining high-quality depth data by providing sparse LiDAR data for supervision and using image alignment loss for training the depth network.
43.2.3 Self-supervised Monocular Depth Estimation A self-supervised approach requires rectified stereo image pairs only for the training of the depth network. Garg et al. [16] and Godard et al. [17] presented a method for reconstructing an image from intermediate depth estimation. The network produces the right view from the left view with estimated disparities and specifies the error between the left–right views as the reconstruction loss. This approach provides a clear overview of why we need a synchronized stereo pair to estimate the depth map rather than ground truth depth data, which reduces the effort required to acquire new scenes. There are a few accuracy gaps when compared to supervised learning framework techniques. An encoder- decoder architecture was introduced by Garg et al. [16], which is trained using the photometric reconstruction error. Godard et al. [17] also presented a network to get left–right consistency loss, which improves prediction of the depth. An effective architecture utilizing robust re-projection loss, multi-scale sampling, and an auto-masking loss to improve the estimation of depth was proposed by Godard et al. [18].
43 Perception of 3D Scene Based on Depth Estimation …
499
43.2.4 Unsupervised Monocular Depth Estimation There are several approaches that employ sequential data to perform monocular depth estimation. Yin et al. [19] present an architecture comprised of two generative networks which are trained to estimate the disparity map. Zhou et al. [5] proposed an approach to estimate the depth map and ego-motion from a monocular video using unsupervised learning. Mahjourian et al. [6] presented an approach that additionally considers the scene’s 3D geometry and imposes the estimated 3D point clouds and ego-motion through successive frames. Wang et al. [20] implemented a differentiable pose predictor to improve the performance of the depth network. The supervised and semi-supervised approaches require ground truth data to be gathered using LiDAR technology, which is an expensive and time-consuming procedure. The self-supervised approach performs better than the previous approach, but the network degrades its performance and requires post-processing techniques to enhance accuracy. The unsupervised approach uses an encoder-decoder architecture to achieve better results. The overall performance of the unsupervised learning method is enhanced by monocular visual cues. And point cloud generation algorithm is proposed to better visualize the 3D scene structure from a monocular image.
43.3 Dataset Localizing and recognizing the object along with its estimated distance are not enough for autonomous cars. More accurate information about the 3D environment is necessary for autonomous cars using depth estimation and 3D object detection techniques. While driving an autonomous car, these techniques are used to avoid collisions or accidents. This section describes the dataset used for depth estimation and 3D object detection.
43.3.1 KITTI The various scenarios, such as city, residential, road, campus, and people, included in the KITTI dataset [21]. Complex CV tasks can be solved using this dataset. This dataset is categorized into stereo, depth, optical flow, 2D & 3D object detection, tracking, and visual odometry. The RGB images are used to train the unsupervised depth models. There are nearly 42 K stereo pairs from 61 scenes for training the depth model with an image resolution of 1242 × 375. This dataset is also used to train the pose network, and the pose network enhances the depth network’s performance.
500
S. Saleh et al.
43.3.2 Cityscapes Most of the available datasets for depth estimation do not deal with high variance. For example, the KITTI dataset includes the recorded video of 6 h of the German metropolitan area [22]. The Cityscapes dataset is used to train and evaluate multiple approaches, namely, depth prediction and semantic labeling. This dataset includes many stereo video series recorded in the streets of 50 different cities to reduce cityspecific over-fitting [22]. There are pixel-level annotated high-quality 5 K images and coarse annotated 20 K images to enable techniques that support large volumes of weakly-labeled data. The images were acquired for several months, covering different seasons (summer, autumn, and spring). All the images in this dataset include the logo of the car using which the dataset is captured. This dataset is mainly used to train depth networks. In the preprocessing step, the car logo was removed before training the network.
43.4 Implementation Estimating depth information using classical computer vision approaches is a wellstudied topic. The majority of these studies have focused on binocular vision (stereopsis) and additional algorithms that involve multiple images, such as Structure From Motion (SFM) and depth from defocusing. However, these algorithms only address geometric triangulation differences. The traditional and SFM approaches are limited in predicting the camera’s motion and the 3D scene structure. Standard methods tackle the problem by detecting the scene’s identical spot in the different progressive frames, constant across the images. However, the depth estimation problem is kept intact if the points of correspondence between the images (left and right) are considered. Several assumptions are made in conventional methods (such as local planarity and, fixed camera position) to avoid the manual assumption that the methods based on deep learning are applied. CNN can learn complex features that map an image onto its depth map. The primary task of CNN is to obtain the relationships between the depth of the scene and similar image composition, scene interpretation, and local and global connections in the image in real-time capabilities. Through the implementation of CNNs, stereo cameras and monocular camera sensors are used to estimate depth. The stereo approaches employ standard geometry to compute disparity maps for image pairs. CNN trained uses the predicted disparity maps to estimate the depth map. Usually, the monocular camera approach estimates a depth map by training the CNN network with a single view. This section deals with depth estimation based on unsupervised learning from a single image. This depth estimation approach was extended by the proposal of an algorithm to generate point cloud based on estimated depth map and input image, shown in Fig. 43.2. The depth estimation model was trained using only RGB images based on an unsupervised learning framework. The
43 Perception of 3D Scene Based on Depth Estimation …
501
Fig. 43.2 Systematic design of the proposed system
trained model is capable of predicting the depth map during test time from a single image.
43.4.1 Proposed Architecture The proposed architectures were inspired from DispNet from SFMLearner [5], which estimates the accurate depth of the single view. This architecture is chosen due to the encoder-decoder architecture (which includes skip-connections) and predicts a multi-scale view. Figure 43.3 depicts the Depth Network (D-Net) 1 architecture. The red font indicates the change in architecture compared to the state-of-the-art
Fig. 43.3 D-Net 1 architecture
502
S. Saleh et al.
Fig. 43.4 D-Net 2 architecture
architecture. In this architectural, the first four convolutional layers’ kernel size is 7, 7, 5, 5, and for the remaining layers, it is 3. The total number of parameters for this architecture is 21.3 million, compared to the state-of-the-art model’s 33.2 million. There are 16, 32, 64, and so on are the convolutional layers’ output channels. This network architecture concatenates the convolutional and deconvolution layers, also known as the “skip connections.” The skip connections leap some layers in the network and provide the output of one layer as the input to the next layers, rather than only the next one. The D-Net 2 architecture is shown in Fig. 43.4. The 7th convolutional and deconvolutional layers have the 1024 output channels indicated in white font. The increase in the number of output channels increases the number of abstractions that network architecture can extract from the input RGB image. Similar to the D-Net 1 architecture, the D-Net 2 architecture also has the same kernel size but different output channels (32, 64, 128, and so on) for each of the convolutional layers. The total number of parameters for this architecture is 61.5 million, compared to the D-Net 1 model’s 21.3 million. The output channel count increased at the final convolutional and deconvolutional layers to transfer more features to the successive layers. Similar to the D-Net 1 architecture, this architecture also includes skip connections.
43.4.2 Depth Network Training The three different ways that the training is carried out to estimate the depth map are described below: Method 1 uses the structure from the motion approach and is trained using the Tensorflow deep learning framework and KITTI dataset. The preprocessed data is used to train the network architecture such that the training image’s resolution is 126 × 418. In the network architecture, all the layers are trained using batch
43 Perception of 3D Scene Based on Depth Estimation …
503
normalization, except the output layers, as the batch normalization stabilizes each layer’s input for every mini-batch. The optimizer used for this approach is the Adam Optimizer, which updates the network’s weights iteratively based on the training data. This approach’s learning rate is 0.0002, which controls the weight updates in the optimization algorithm. Then, the momentum accumulates the gradients from the previous steps and determines the direction of the training, 0.9 in this method. The other hyperparameters like smoothness weight are set to 0.5, and a batch size of 2 and the total number of epochs the training is carried out is 20. However, the training data with a fixed resolution during the inference model can predict any arbitrary size image’s depth. Method 2 also uses the structure from the motion approach and it is trained using the Chainer deep learning framework and KITTI dataset. The Chainer framework provides faster performance compared to the Tensorflow framework. Hence, this framework is used to train and evaluate model performance. Similar to Method 1, the hyperparameters used are Adam Optimizer, the learning rate of 0.0002, momentum 0.9, batch size 2, and epochs 20. This method also predicts the depth map for images with random resolutions. Method 3 uses the 3D geometric constraint approach and is trained using the Tensorflow deep learning framework and the KITTI, Cityscapes datasets. Similar hyper parameters are used as in Method 1. Additionally, hyper parameters such as a reconstruction weight of 0.85 and a structure similarity loss weight of 0.15 are used to improve the model performance. Compared to Method 1, this method is also trained with a resized image of a resolution of 128 × 416, and during testing or inference, the model can predict depth for any arbitrary size image.
43.4.3 Point Cloud Generation Algorithm The algorithm used to generate the point cloud information from the RGB image and the corresponding depth map, shown in Fig. 43.5. As the resolution of the depth map differs from that of the RGB picture, the depth map is scaled. The open3D library
Fig. 43.5 Point cloud generation algorithm
504
S. Saleh et al.
capability aids in the generation of an RGB-D picture using an RGB image and a depth map as input. The RGB-D picture pair consists of color and depth images seen from the same resolution and viewpoint. The RGB-D picture and camera inherent characteristics are then utilized to build point cloud data. The example of the camera parameters for the KITTI dataset for the sequence 2011 09 26 drive 0001: the focal length is [828.285 813.462], pixels principal point is [487.963 332.708] pixels, and Radial Distortion [828.285 813.462 0 0 0]. Finally, the generated point cloud data stored in the binary (.bin) format.
43.5 Result and Evaluation The estimated depth generated by the proposed methods to the existing unsupervised depth estimation methods, including the state-of-the-art approaches are depicted in Fig. 43.6. Method 1 & 2 fails for the roads enclosed with trees and more shadow regions and also for the open scene. However, Method 3 only fails in the scene with more trees on both sides of the road—this shadow prevents the model from predicting the better depth map. Finally, two columns in Fig. 43.6, except Garg et al. [16] all other state-of-the-art and proposed method fails. Table 43.1 quantitatively analyzes out depth estimation results against the existing approaches and evaluation metrics calculated over the Eigen test set. This table also reports a separate result for a depth distance of 50 m, as this is the results achieved by Tinghui et al. [5] and Ravi et al. [16]. When trained only on the KITTI dataset, the proposed methods lowers the RMSE (log) from 0.258 to 0.250 and higher the accuracy value from 0.740 to 0.748 (δ < 1.25), 0.915 to 0.922 (δ < 1.252 ) and 0.968 to 0.971 (δ < 1.253 ), which is a vital improvement. When the proposed method trained with both KITTI (K) and Cityscapes (CS) datasets, the proposed methods lowers RMSE value from 4.975 to 4.792, RMSE
Fig. 43.6 Comparison with existing approaches (Orange box: Model fails to predict depth map) ([5, 6, 16])
0.208 0.198 0.163 0.159
K
K
CS + K
K
Liu et al. [23]
Zhou et al. [5]
Zhou et al. [5]
Mahjourian et al. [6]
0.194 0.181 0.201 0.190 0.178 0.175 0.185 0.172
K
CS + K
K
K
CS + K
K
CS + K
K
CS + K
D-Net 1
D-Net 2
D-Net 2
Garg et al. [16]
Zhou et at. [5]
Zhou et al. [5]
D-Net 1
D-Net 1
D-Net 2
D-Net 2
5.205 4.792
1.083
5.035
5.122
4.975
5.181
5.104
6.542
6.955
6.598
7.011
5.912
5.220
6.565
6.856
6.471
6.307
6.563
1.586
1.234
1.324
1.436
1.391
1.080
1.491
2.594
1.905
2.095
1.231
1.240
1.836
1.768
1.584
1.548
1.605
The bold numbers indicate our improved results compared to other studies
0.169
0.185
CS + K
D-Net 1
50
0.190
CS + K
K
Mahjourian et al. [6]
0.201
0.203
0.2141
K
BO
K
Eigen et al. (Fine) [9]
0.241
0.250
0.262
0.253
0.258
0.264
0.273
0.261
0.270
0.262
0.271
0.243
0.250
0.275
0.283
0.273
0.282
0.292
0.754
0.748
0.752
0.750
0.735
0.696
0.740
0.764
0.761
0.741
0.740
0.784
0.762
0.718
0.678
0.680
0.702
0.673
Accuracy RMSE (log)
δ 0 along such that [π ] |= ψ, for every position 0 < π < π, [π ] |= ψ, and duration( ≤π ) ∈ I
The TCTL syntax is defined by the following grammar: φ ::= a|¬φ|φ ∨ φ|EφU I φ|AφU I φ where a ∈ A P (we denote with A P a set of atomic propositions), and I is an interval of R+ with integral bounds. There are two possible semantics for TCTL, one which is said continuous, and the other one which is more discrete and is said pointwise. We consider the second one i.e., the pointwise semantic (Table 47.1). Where [π ] is the state of at position π , and duration ≤π is the prefix of ending at position π , and duration( ≤π ) is the sum of all delays along up to position π . In the pointwise semantics, a position in a run: τ1 ,e1
τ2 ,e2
τn ,en
= s0 −→ s1 −→ s2 . . . sn−1 −→ sn is an integer i and the corresponding state si . In this semantics, formulas are checked only right after a discrete action has been done. Sometimes, the pointwise semantics is given in terms of actions and timed words, but that does not change anything. As usually in CTL, TCTL: tt ≡ a ∨ ¬ a standing for true, ff ≡ ¬ tt standing for false, the implication φ → ψ ≡ (¬ φ ∨ ψ), the eventually operator FI φ ≡ tt U I φ and the globally operator G I φ ≡ ¬ (FI ¬ φ). Formal Verification Environment: once defined the model and the temporal logic properties, we need something enabling us to check whether the timed automata based model satisfies the defined properties. To this aim we consider formal verification, a system process exploiting mathematical reasoning to verify if a system (i.e., the model) satisfies some requirement (i.e., the timed temporal properties). Several verification techniques were proposed in last years, in this paper model checking is considered. In this paper we consider as model checker UPPAAL,2 an integrated tool environment for modeling, validation and verification of real-time systems modeled as timed automata networks. 2
http://www.uppaal.org/.
47 Timed Automata Networks for SCADA Attacks Real-Time Mitigation
553
Fig. 47.1 Formal model creation
47.3 The Method The aim of the proposed method is to detect attacks on SCADA gas distribution networks. In detail we consider two features, the pressure measurement and the pump (i.e., how much fluid it is pumping through the pipe). The proposed approach consists in two main phases: the Formal Model Creation (Fig. 47.1) and the Formal Model Verification (Fig. 47.4). The Formal Model Creation phase output is the formal model representing the SCADA system. From the SCADA system under analysis (i.e., gas distribution system in Fig. 47.1) and the technical report (i.e., technical report in Fig. 47.1), containing the pressure and the pump feature values, a technician manually marks the specific log as attack. These information i.e., the features gathered from the SCADA gas distribution system and the label to mark a specific trace as an attack are the input for the control information stored in textual files, containing the the pressure and the pump measurements at a fixed time interval (for instance, every millisecond). The next step of Formal Model Creation is the Discretisation, aimed to discretize each tank level feature. The registered pressure and pump continuous values are divided into three intervals, i.e., we map the numeric feature values into one of the following classes: Up, Basal, and Low. There are several methods proposed by the research community to discretize continuous values, we resort to the one proposed by authors in [7]: basically this method divides the features in three intervals: Low, Basal and Up. We consider the equal-width partitioning dividing the values of a given attribute into 3 intervals of equal size. The width of the interval is computed using the following formula: W = (max − min)/3, where max and min are respectively the maximum and the minimum values achieved by the feature. The equal-width partitioning has been applied to any feature under analysis. Furthermore, each discretised feature previously obtained, is converted into a timed automaton (i.e., formal model in Fig. 47.1). To better understand the adopted discretisation method, Table 47.2 shows an example of discretised feature fragment. The first column (i.e., Time) indicates the interval time (in the example 1 ≤ t ≤ 6), while the F1 and F2 columns are respectively related to the two considered features. For instance, at the t3 interval time, the F1 exhibits the u value, while the F2 exhibits the l one.
554
F. Mercaldo et al.
Table 47.2 Example of feature fragmentation with three intervals Time F1 F2 t1 t2 t3 t4 t5 t6
u u u b l u
u l l u b b
u Up, b Basal and l Low
Starting from the feature discretisation, the next step of Formal Model Creation is the formal model. In this step a network of timed automata is generated (i.e., the formal model in Fig. 47.1): in detail for each discretised feature an automaton is built. Resuming the feature fragmentation example provided in Table 47.2, whether the same value is repeated between consecutive temporal instances the automaton will contain a loop: the automaton generated from the F1 feature will contain a loop for the t1 , t2 and t3 time intervals (the repeated value is u), while the automaton generated from the F2 features will contain two loops, the first one for the t2 and t3 time intervals (the repeated value is l) and the second one for the t5 and t6 time intervals (the repeated value is b). The exit condition from the loops is guaranteed by a guard, while the entering one is guaranteed from an invariant. In detail, two different clocks are considered for each automaton: the first one (i.e., x) to respect the loop entering condition, while the second one (i.e., y) to respect the exit one. Furthermore, each automaton locally stores the count of u, b and l values. The variables related to the Fx automaton are marked with a subscript x: considering two features, x ∈ {1, 2}. Only the channel (i.e., s), allowing synchronisation between automata, is not stored locally: in fact it must guarantee the continuous and progressive automata advancement. One sender automata (i.e., s!) can synchronise with an arbitrary number of receivers automaton (i.e., s?). In practice, considering that each line of the discretized features in Table 47.2 corresponds to the value of the features in the same time interval, the synchronization allows to switch between a time interval to the next obliging the automata to go ahead with the next transition and to update the values of the features with the values related to the next time interval. This mechanism avoids inconsistencies between the values of the features and the time intervals. Figures 47.2 and 47.3 show the automaton respectively obtained from the F1 and F2 discretisation. For each loop we note the presence of a guard, furthermore the two automata are synchronized by using the s channel. The enabled transitions for the for the F1 and F2 automata are shown in Table 47.3. As shown in Table 47.3, the F1 automaton is iterating in the loop (node 1 in Fig. 47.2) for three time intervals (i.e., y1 < 3), while the F2 automaton after the increment of the u 1 variable (node 1 in Fig. 47.3) is iterating for two time intervals
47 Timed Automata Networks for SCADA Attacks Real-Time Mitigation
555
y1 < 3 u1 := u1 + 1 y1 := u1 s! x2 ≤ 1 s! y1 > 3 u1 := 0 b1 := b1 + 1 y1 := b1
1
s! b1 := 0 l1 := l1 + 1 y1 := l1
2
s! l1 := 0 u1 := u1 + 1 y1 := u1
3
4
Fig. 47.2 The F1 automaton
1
y2 < 2 u2 := 0 l2 := l2 + 1 y2 := l2 s?
y2 < 2 u2 := 0 b2 := b2 + 1 y2 := b2 s?
x2 ≤ 1
x2 ≤ 1
s? u2 := u2 + 1 y2 := u2
2
s? y2 > 2 l2 := 0 x2 := 0 u2 := u2 + 1 y2 := u2
3
4
y2 > 2 b2 := 0
Fig. 47.3 The F2 automaton Table 47.3 Enabled transitions for F1 and F2 automata Trans. F1 automaton Node u 1 b1 l1 y1 Node 1 2 3 4 5 6 7
1 1 1 2 3 4 4
1 2 3 0 0 1 1
0 0 0 1 0 0 0
0 0 0 0 1 0 0
1 CapLet. >2 IN isThereNNSorNNPS Anaphoric Word counter Vague word Adverbs Passive verb
There is at least one noun before a pronoun Number of conjunctions There is more than one pronoun There are at least two singular nouns Number of verb is greater than four Number of punctuation marks is greater than two Nouns count Pronouns count Verb count Punctuation marks count There is one relative clause and one pronoun together There are at least two capital letters There are at least two prepositions There are at least two plural nouns Anaphoric words count Number of words for sentence Vague words count Modal adverbs count Passive verb count
Type 1 (T), 0 (F) Integer 1 (T), 0 (F) 1 (T), 0 (F) 1 (T), 0 (F) 1 (T), 0 (F) Integer Integer Integer Integer 1 (T), 0 (F) 1 (T), 0 (F) 1 (T), 0 (F) 1 (T), 0 (F) Integer Integer Integer Integer Integer
two possible configurations: ETCS Level 1 and ETCS Level 2, still based on fixed blocks. These two levels are well specified in a complex set of specifications; among them, the SUBSET-026 version 3.0.0 [26]. With the ETCS-L3 initiative, railway stakeholders envisage implementing a promising train control system: the Moving Block (MB). Eventually, train position detection will be not based on physical track circuits, enabling a considerable increase in network capacity. Up to this date, ETCS-L3 is still in the research phase and is the subject of different EU-funded research project involving both industry and
48 On the Evaluation of BDD with Text-based Metrics
567
academia. To name a few: X2Rail-3,5 ASTRail6 and PERFORMINGRAIL.7 Only few pilot lines have been built and used for experimenting technologies and solutions; international organizations have not still started the standardization process. To this aim, and according to the methodology defined in Sect. 48.3, the comparison is made on the basis of the following documents: 1. SUBSET-026 v3.0.0 part 5: reporting the main system requirements of the procedures of ETCS Level 2 [26]. The first specification is an assessed document; it has been refined during different formal revisions and feedback from concrete system construction and operation. A requirement example is the 5.12.2.5: When the driver closes a desk and opens the other one of the same engine, the ERTMS/ETCS on-board equipment shall be able to calculate the new train position data (train front position, train orientation), by use of the previous data. 2. X2Rail-3 D4.2 part 3: containing the tentative specification for the ETCS-L3 [27]. The document constitutes a part of a public deliverable in the EU research project X2Rail-3, run by the main European industrial companies involved in railway signalling and systems. A requirement example is the REQ-Reserved-2: Following a request from the TMS, the L3 Trackside shall be able to reduce or remove an existing Reserved Status Area of track only after adjusting any given authorization accordingly. In this context, a part of the SUBSET-026 v3.0.0, about 52 requirements, is considered as the golden dataset; X2Rail-3 D4.2 is the target dataset, containing 142 requirements. All these last 142 requirements are then manually re-written according to the Gherkin notation, constituting the improved dataset. An example of this activity is reported here, with the rewriting of REQ-Reserved-2: Given the L3 trackside on When the TMS sends a request and the L3 Trackside adjusts any given authorisation accordingly Then the L3 Trackside shall be able to reduce or remove an existing Reserved Status Area of track only after adjusting any given authorisation accordingly.
48.5 Results As a preliminary activity, the list of vague words used by the HOMER tool has been extended with other words.8 Then, HOMER has been applied to both the considered documents [26, 27], showing us some generic insights: reading time in minutes, readability scores (Flesh reading ease and Dale-Chall readability scores), total paragraphs, sentences and words, average sentences for paragraph, average words per sentence and number of vague word (see Table 48.2). 5
https://projects.shift2rail.org/s2r_ip2_n.aspx?p=X2RAIL-3. http://www.astrail.eu/. 7 https://www.performingrail.com/. 8 https://www.blinn.edu/writing-centers/handouts-and-worksheets.html. 6
568
L. Campanile et al.
Table 48.2 Results of HOMER execution Subset-026 Flesch reading ease Dale-Chall readability score Paragraphs Avg sentences per paragraph Sentences Avg words per sentence Vague words
Very difficult 7–8th grade 399 2.27 904 20.7 126
X2Rail-3 Difficult 4th grade 652 2.49 1623 16.68 117
Fig. 48.2 Comparison between original and modified requirements
Then the tool, developed according to the methodology reported in Sect. 48.3, has been applied on the three datasets considered in Sect. 48.4. Figure 48.2 reports a graph illustrating the scaled measures for each considered feature for both the L3 requirements reported in [27] and a manual attempt to report such requirements in the Gherkin notation. Furthermore, the overall scores of such set of requirements are respectively 3.73908E−10 and 3.88056E−10.9 As it is possible to note, notwithstanding the rewriting of the requirements of L3 according to the Gherkin approach, the overall score of L3-Gherkin worsens. The single measures that most affect such increase are: >4VB (i.e., more than four verbs), >2IN (i.e., more than two conjunctions), word counter (i.e., the number of the words). Structuring better the requirements is not promoted but punished by quantitative evaluation.
9
Lower score means better results.
48 On the Evaluation of BDD with Text-based Metrics
569
Fig. 48.3 Overall score on considered datasets
With respect to the target dataset, the improved dataset has the following characteristics. First, an increased number of conjunctions, words, verbs and nouns is observed. This increase is justifiable, as BDD templates lead to break complex sentences into simpler ones reducing punctuation and passive verbs; conversely, simple sentences increase the number of objects, active verbs, predicates and the conjunctions “and”/“or”. The increase of the number of nouns and active verbs implies an improvement in the clarity of BDD sentences, as each verb has an explicit subject and action. The number of the total used words increases, accordingly. This is not negative as the ratio between the numbers of all the words and vague words do not imply a loss of clarity; indeed, a decrease in anaphoric words, adverbs, passive verbs and punctuation improves readability. As a final consideration, there is the strong need of a novel approach to quantitatively measure requirements according to BDD. Traditional methods are not able to capture the advantages of such methodology that international expert panels agree to define one of the most promising techniques in RE [28]. In fact, also by applying the extended feature set (see Sect. 48.3) the mean overall score of the Gherkin-shaped requirements is still the same (see Fig. 48.3).
48.6 Conclusions The present paper tackles with the problem of evaluating quantitative scoring of requirements in case of the adoption of the BDD methodology. This still constitutes an open problem since, notwithstanding a general improvement of the requirement comprehensibility, traditional NLP-based metrics and indices worsen: e.g., an
570
L. Campanile et al.
increase of conjunctions, words, verbs and nouns was observed in the analysis of BDD-formatted requirements. This study constitutes a first research step oriented toward the definition of a more comprehensive set of criteria. The next research will be devoted to this direction, collecting different basic measures and generating a general score, also by means of Artificial Intelligence (AI) methods. As an example, a Named Entity Recognition (NER) process could be applied to the requirements to highlight the technical words of the specific domain under consideration. In this way, requirement specification activities could be integrated into Domain-Driven Design (DDD approaches in a closer way). Acknowledgements The work of Maria Stella de Biase is granted by PON Ricerca e Innovazione 2014/2020 MUR—Ministero dell’Universitá e della Ricerca (Italy)—with the Ph.D. program XXXVI cycle. The work of Mariapia Raimondo is granted by INPS—Istituto Nazionale di Previdenza Sociale (Italy)—with the Ph.D. program XXXVI cycle.
References 1. Silva, D., Gonçalves, T.G., da Rocha, A.R.C.: A requirements engineering process for IoT systems. In: Proceedings of the XVIII Brazilian Symposium on Software Quality, pp. 204–209 (2019) 2. North, D.: Introducing BDD (Mar 2006). http://dannorth.net/introducing-bdd/ 3. Binamungu, L.P., Embury, S.M., Konstantinou, N.: Maintaining behaviour driven development specifications: challenges and opportunities. In: IEEE 25th SANER Conference (2018) 4. Solis, C., Wang, X.: A study of the characteristics of behaviour driven development. In: 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications. pp. 383–387. IEEE (2011) 5. Arrabito, M., Fantechi, A., Gnesi, S., Semini, L.: An experience with the application of three NLP tools for the analysis of natural language requirements. In: International Conference on the Quality of Information and Communications Technology (2020) 6. Vinay, S., Aithal, S., Desai, P.: An NLP based requirements analysis tool. In: International Advance Computing Conference. IEEE (2009) 7. Oliveira, G., Marczak, S., Moralles, C.: How to evaluate BDD scenarios’ quality? In: ACM International Conference Proceeding Series, pp. 481–490 (2019) 8. Campanile, L., et al.: Towards the use of generative adversarial neural networks to attack online resources. In: Web, Artificial Intelligence and Network Applications. pp. 890–901. Springer International Publishing, Cham (2020) 9. Marulli, F., et al.: A comparison of character and word embeddings in bidirectional LSTMs for POS tagging in Italian. In: International Conference on Intelligent Interactive Multimedia Systems and Services, pp. 14–23. Springer (2018) 10. Marulli, F., et al.: Exploring a federated learning approach to enhance authorship attribution of misleading information from heterogeneous sources. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021) 11. Marulli, F., Verde, L., Campanile, L.: Exploring data and model poisoning attacks to deep learning-based NLP systems. Procedia Computer Sci. 192, 3570–3579 (2021) 12. Berry, D.M., Kamsties, E., Krieger, M.: From contract drafting to software specification: Linguistic sources of ambiguity (2003) 13. Davis, A., et al.: Identifying and measuring quality in a software requirements specification. In: Proceedings of 1st International Software Metrics Symposium, pp. 141–152 (1993)
48 On the Evaluation of BDD with Text-based Metrics
571
14. Gleich, B., Creighton, O., Kof, L.: Ambiguity detection: Towards a tool explaining ambiguity sources. In: 16th International Conference on Requirements Engineering: Foundation for Software Quality. Lecture Notes in Computer Science, vol. 6182, pp. 218–232 (2010) 15. Lami, G., Gnesi, S., Trentanni, G., Fabbrini, F., Fusani, M.: An automatic tool for the analysis of natural language requirements. Computer Syst. Sci. Eng. 20, 53–62 (2005) 16. Rosadini, B., et al.: Using NLP to detect requirements defects: an industrial experience in the railway domain. In: 23rd International Conference, REFSQ 2017 (2017) 17. Saavedra, R., Ballejos, L.C., Ale, M.A.: Requirements quality evaluation : state of the art and research challenges (2013) 18. Gnesi, S., Trentanni, G.: QuARS: A NLP Tool for Requirements Analysis. In: REFSQ Workshops (2019) 19. Génova, G., Fuentes, J.M., Morillo, J.L., Hurtado, O., Moreno, V.: A framework to measure and improve the quality of textual requirements. Require. Eng. 18, 25–41 (2011) 20. Arora, C., Sabetzadeh, M., Briand, L., Zimmer, F.: Automated checking of conformance to requirements templates using natural language processing. IEEE Trans. Software Eng. 41(10), 944–968 (2015) 21. Jurafsky, D., Martin, J.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson Prentice Hall (2009) 22. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986) 23. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python—Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc. (2009) 24. Zhang, J., Nora, M.E.G.: Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking. J. Comput. Civil Eng. 30 (2016) 25. Agac, S., Akan, Y., B.T.: Detecting and resolving referential ambiguity. In: CEUR Workshop Proceedings (2021) 26. Hougardy, A.: ERTMS/ETCS. System requirements specification. Chapter 5—procedures, version 3.0.0. Technical report (2008) 27. Group, E.E.U.: Deliverable D4.2 Moving Block Specifications Part 3—System Specification. Technical report (2020) 28. Nascimento, N., Santos, A., Sales, A., Chanin, R.: Behavior-driven development: an expert panel to evaluate benefits and challenges. In: ACM International Conference Proceeding Series, pp. 41–46 (2020)
Chapter 49
Break the Fake: A Technical Report on Browsing Behavior During the Pandemic Lelio Campanile, Mario Cesarano, Gianfranco Palmiero, and Carlo Sanghez
Abstract The widespread use of the internet as the main source of information for many users has led to the spread of fake news and misleading information as a side effect. The pandemic that in the last 2 years has forced us to change our lifestyle and to increase the time spent at home, has further increased the time spent surfing the Internet. In this work we analyze the navigation logs of a sample of users, in compliance with the current privacy regulation, comparing and dividing between the different categories of target sites, also identifying some well-known sites that spread fake news. The results of the report show that during the most acute periods of the pandemic there was an increase in surfing on untrusted sites. The report also shows the tendency to use such sites in the evening and night hours and highlights the differences between the different years considered.
49.1 Introduction Nowadays, the diffusion of information through the internet has changed the way content is consumed. At the same time, the ways in which content is created have also been influenced and changed accordingly. Social networks and websites have assumed a primary role in this transformation, becoming the protagonists of this inversion of the information model. L. Campanile Dipartimento di Matematica e Fisica, Università degli Studi della Campania “L. Vanvitelli”, Caserta, Italy e-mail: [email protected] M. Cesarano · G. Palmiero · C. Sanghez (B) GAM Engineering srlu, Caserta, Italy e-mail: [email protected] URL: https://www.gameng.it M. Cesarano e-mail: [email protected] G. Palmiero e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_49
573
574
L. Campanile et al.
The traditional news model involves control by the publisher and journalists over the writing and creation of the news and newspaper article. The news is first verified by the journalist, then reviewed, validated by the editor, and finally published. The result is more reliable, but less immediate, content creation. The availability of affordable and widespread technological means, such as modern smartphones and cameras, and the availability of the internet for immediate dissemination of content captured has triggered the inversion of the traditional model of information. The direct consequence of these changes is the decrease in the worth of information and the loss of editorial control over the quality and truthfulness of information. The economic worth of visualizations linked to the web and social networks have moreover created an uncontrollable race to sensationalism of the news, not favouring quality and instead often going towards the fraudulent creation of fake news. In this work, we propose a quantitative analysis of web browsing and search news, also examining the period of the pandemic. Our goal was to understand how people search for news on the web and if their habits changed during the pandemic. We have collected the browsing data of a few hundred people with the declared purpose of making statistical analyzes relating to their habits on the web. All participants were informed that we would be collecting browsing data completely anonymously. We were interested in how people browse the web for news. We wanted to know if people’s habits had changed during the pandemic: Do people read the news on newspaper sites? Do people read the news on sites that spread fake news? Do people rely solely on social networks? We were particularly interested in understanding with what percentage visitors accessed debunking sites, what percentage of visitors accessed fake news sites and with what percentage access to fake news originated from social networks. The paper is organized as follow: in Sect. 49.2 a state of the art on fake news recognition and browsing data analysis is presented; Sect. 49.3 presents the proposed approach and the methodology used; Sect. 49.4 presents the case study analyzed in this work and the dataset used, the results of this evaluation also presented. Finally, Sect. 49.5 concludes the paper and summarizes the next steps.
49.2 Related Works Fake news and misleading information production and dissemination represent an old problem of the human society. In the digital era, this phenomenon has been exacerbated and its automatic detection has recently become one of the problems considered in the context of computer science, even exploiting the great results obtained in the Natural Language Processing (NLP) and Data Analytics (DA) by exploiting advanced artificial intelligence algorithms [4]. Naturally, when the access to news media has become very convenient, several news channels have tried to make their news more appealing, in order to deliver news to as many users as possible on the web. This availability has resulted in the fast dissemination of news to millions of
49 Break the Fake: A Technical Report on Browsing …
575
users through the most varied news sources [3]. Excluding well-established sources of reliable information and reputable news, there was a fast proliferation of a great number of smaller news sources which mostly deliver news and information that can’t be considered trustworthy. In addition to this, social media and social networking platforms enabled anyone from anywhere to publish and disseminate any kind of unverified and unchecked information, including statements, personal opinions and conjectures, contributing to spread misleading information and fake news, sometimes simply out of pure recklessness, but more often for achieving fraudulent or illegal goals. A great trap consists in the fact that several popular sources for information, that generally are considered to be authentic, are also prone to false information or fake news Fake news can be considered to be one of the greatest threats to the welfare of the whole community, because it can lead to huge collateral damages. If we think to the COVID-19 pandemics, fake medical information freely circulated on the media during the pandemics peaks and lot of confusion was introduced in the common opinion about safety measures, precautions and vaccines. In order to contrast the effects of fake news spreading, and, mainly, to reduce the trust that most users place in them. Several artificial intelligence-based approaches were used not only for early detection of diseases, such as in [12, 13], but also to analyze consequences and trustworthiness of information are explored. In the work of [6], an NLP-based approach is, in fact, proposed to identify some characterizing stylometrical features of Fake News sources, in order to suggest potential and suspicious misleading news and false information. The great improvements obtained in the NLP strategies [7, 11], evenly supported by artificial intelligence facilities has lead to a heavy exploitation of these techniques in designing Fake news classification and detection systems, in addition to information debunking systems [1]. Furthermore, artificially crafted data, provided by fake news and false information, are raising an additional threat, represented by the creation of data set that could be used to train classification and detection models, as machine and deep learning-based systems. In such a perspective, misleading information and fake news are contributing to increase a cybersecurity threat to machine and deep learning systems, known as data poisoning attacks [8, 9]. The raising of these kind of adversarial attacks performed to cheat intelligent detection systems has led to the emergence of a new research strand, known as resilient machine learning [2]. Adversarial attacks are very mostly known to be performed against image processing systems [5], but the most recent research literature has evidenced that NLP systems too exhibit vulnerabilities to adversarial and data poisoning attacks [10, 14]. In [15] a defense strategy against adversarial attacks in NLP, by exploiting a Dirichlet neighborhood ensemble is proposed. The work of [16] provides a study to point out that Fake News Detection, performed via NLP techniques, is vulnerable to adversarial attacks. So, fake news can represent the ideal candidate to prepare a corrupted data set to affect the training of a on machine learning model and need to be contrasted by adopting several strategies, among which supporting the information literacy and the information awareness of users, by supporting both effective detection and classification
576
L. Campanile et al.
systems and building effective information debunking strategies. Finally, the study performed in this work would represent a step toward the process of building this information awareness of web users, after the catastrophic experience of COVID-19 pandemics misinformation campaign.
49.3 Methodology We have collected the browsing data of about 500 people over a long period of time from 2019 until 2022 and we focused the experiments to march 2019, march 2020 and march 2021. We interviewed people and asked them the names of the websites they knew and if we could collect the browsing data. We have divided the websites into several categories: Newspapers and press, Junk websites that spread fake news and Social networks. In the list of official press websites there are larepubblica.it, ilcorriere.it, ansa.it, etc. In our experiments we have decided to include Facebook.com, Twitter.com, tiktok.com among the social networks, while among the debunking websites we have included: NoBufale.it and www.butac.it, which are the ones most known by users test participants. In our experiments we considered a few dozen websites where fake news are from: AgenPress.it, AmbienteBio.it, AutismoVaccini.org, ByoBlu.com, CaffeinaMagazine.it, ComeDonChisciotte.org, ControInformazione.info, Contro.tv, Corvelva.it, Dagospia.com, DataBaseItalia.it, Disinformazione.it, FanMagazine. it, FonteVerificazione.it, GospaNews.net, IlPopulista.it, IlPrimatoNazionale.it, ImolaOggi.it, Infosannio.com, JedaNews.com, It.SputnikNews.com, LAntidiplomatico.it, LaCrunaDellAgo.net, LaNuovaBQ.it, LaVerita.info, Leggilo.org, LuogoCommune.net, MaurizioBlondet.it, MedNat.org, NoGeoingegneria.com, PandoraTv.it, Renovatio21.com, ScenariEconomici.it, SegniDalCielo.it, StefanoMontanari.net, StopCensura.online, ViralMagazine.it, VoxNews.info, etc. We have considered both the sites of newspapers, and sites that manifestly publish fake news and also debunking sites. Our main objective was to test whether or not, among the subjects involved in this study, there were men or women who, after visiting or reading a news item, searched for further news on debunking sites. What we were most interested in was whether people go directly to news sites, or go to social networks first. The algorithm used in our tests is very simple: • the script takes as input the text file containing the IP addresses of websites, the file with the web traffic log of the test participants; • the script divides the 24 h into three time bands: morning, afternoon, evening; • the script counts visits to sites for each time slot; • the script saves the result in a CSV file for later processing. The CSV file includes: the time slot (T1 as morning, T2 as afternoon, T3 as evening), the visited site and the occurrences. To see the XLS converted file from the above CSV file, see (Fig. 49.1).
49 Break the Fake: A Technical Report on Browsing …
577
49.4 Use Case and Results Our experiments involved users’ browsing data in the months of March 2019, March 2020 and March 2021 for a total of about 450 GB. The experiment was conducted considering a raw data set with 185.409.239 rows * 27 columns. We looked for correlations between output values and “Day of week” and “Lock down” characteristic. We repeated the correlation calculation for T1, T2 and T3. Basically, the output values are correlated with two characteristics: the day of the week and the lock down condition. Fake T1: • Day of week 0.124031 • Lock down 0.370385
Fake T2: • Lock down 0.286414
Fake T3: • Day of week 0.129604 • Lock down 0.381774
To verify the quality of the model we analyzed the distances between the actual value and the predicted value of the model, see (Figs. 49.2, 49.3 and 49.4). Now we have a model that can predict the number of single access to junk websites by day of the week and time of day. The structure of the log file containing the navigation data is simple. Every line of LOG file includes: the timestamp, IP source and IP destination and protocol. The day was divided into three time slots: morning (T1, from 8.00 to 13.59), afternoon (T2, from 14.00 to 17:59) and evening (T3, from 18:00 to 7.59 the next day). Every single access to the websites was counted as a single occurrence. The first step was to collect logs from March 2019. The first collected data were those relating to visits to social networks websites, (Fig. 49.7). Then, access data to the press websites were captured, see (Fig. 49.5). Finally, access data to the fake news websites were collected, see (Fig. 49.6). The second step was to save logs from March 2020 (Figs. 49.7, 49.8 and 49.9). As for year 2019, data relating to visits to social
Fig. 49.1 EXCEL file
578
L. Campanile et al.
Fig. 49.2 T1 residual plot
Fig. 49.3 T2 residual plot
networks websites, see (Fig. 49.10), to the press websites, see (Fig. 49.11) and to the fake news websites, see (Fig. 49.12) were collected. The last step was, instead, to collect logs from March 2021. Also in this case, data on visits to social network websites, see (Fig. 49.13), on the press websites, see (Fig. 49.14) and on the fake news websites, see (Fig. 49.15) were considered (Fig. 49.16). The obtained results show that, people read news mainly from social networks or fake sites in both 2019 and 2020 (see Figs. 49.8 and 49.9 respectively). This trend is also confirmed in 2021, the results show in fact that also for this year people read the news mainly from social fake sites (see Fig. 49.8).
49 Break the Fake: A Technical Report on Browsing …
579
Fig. 49.4 T3 residual plot
Fig. 49.5 Visits to press websites during March 2019
49.5 Conclusions Our experiments show a disheartening situation. Few people read the news from official press websites. Most of the news comes from junk websites. In 2019, people were reading the news from junk websites, and the trend became even more dramatic during the first month of the pandemic lockdown. Test results show that when people are at home, they tend to read the news during the last hours of the afternoon and at night. Visits to junk websites peak during the evening and late at night. The
580
Fig. 49.6 Visits to fake news websites during March 2019
Fig. 49.7 Visits to social networks websites during March 2019
L. Campanile et al.
49 Break the Fake: A Technical Report on Browsing …
Fig. 49.8 Visits to websites during March 2019
Fig. 49.9 Visits to websites during March 2020
581
582
Fig. 49.10 Visits to social networks websites during March 2020
Fig. 49.11 Visits to press websites during March 2020
L. Campanile et al.
49 Break the Fake: A Technical Report on Browsing …
Fig. 49.12 Visits to fake news websites during March 2020
Fig. 49.13 Visits to social networks websites during March 2021
583
584
Fig. 49.14 Visits to press websites during March 2021
Fig. 49.15 Visits to fake news websites during March 2021
L. Campanile et al.
49 Break the Fake: A Technical Report on Browsing …
585
Fig. 49.16 Visits to websites during March 2021
proposed approach has considerable potential: it is theoretically possible to predict visits to junk websites. Our question is whether it is possible to reduce access to junk websites. Unfortunately, the proposed approach has a significant limitation due to the respect of privacy according to the GDPR. In fact, we cannot know if people are using smartphones, tablets or laptops to access junk websites. In the future, we intend to legally overcome the limitations of the GDPR in order to identify the main tool for accessing junk websites and, above all, find the correlation between the news broadcast in the mainstream and the accesses to junk websites. Acknowledgements The research described in this work is co-funded and realized within the activities of the Research Program “Vanvitelli V:ALERE 2020—WAIILD TROLS”, financed by the University of Campania “L. Vanvitelli”, Italy.
586
L. Campanile et al.
References 1. Campanile, L., Cantiello, P., Iacono, M., Marulli, F., Mastroianni, M.: Vulnerabilities assessment of deep learning-based fake news checker under poisoning attacks. In: Computational Data and Social Networks, p. 385 (2021) 2. Eigner, O., Eresheim, S., Kieseberg, P., Klausner, L.D., Pirker, M., Priebe, T., Tjoa, S., Marulli, F., Mercaldo, F.: Towards resilient artificial intelligence: survey and research issues. In: 2021 IEEE International Conference on Cyber Security and Resilience (CSR), pp. 536–542. IEEE (2021) 3. Khurana, P., Kumar, D.: Sir model for fake news spreading through whatsapp. In: Proceedings of 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT), pp. 26–27 (2018) 4. Kumar, S., Shah, N.: False information on web and social media: a survey. arXiv preprint arXiv:1804.08559 (2018) 5. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236 (2016) 6. Marulli, F., Balzanella, A., Campanile, L., Iacono, M., Mastroianni, M.: Exploring a federated learning approach to enhance authorship attribution of misleading information from heterogeneous sources. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021) 7. Marulli, F., Pota, M., Esposito, M.: A comparison of character and word embeddings in bidirectional LSTMs for POS tagging in Italian. In: International Conference on Intelligent Interactive Multimedia Systems and Services, pp. 14–23. Springer (2018) 8. Marulli, F., Verde, L., Campanile, L.: Exploring data and model poisoning attacks to deep learning-based NLP systems. Proc. Comput. Sci. 192, 3570–3579 (2021) 9. Marulli, F., Visaggio, C.A.: Adversarial deep learning for energy management in buildings. In: SummerSim, pp. 50–51 (2019) 10. Morris, J.X., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D., Qi, Y.: Textattack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. arXiv preprint arXiv:2005.05909 (2020) 11. Piccialli, F., Marulli, F., Chianese, A.: A novel approach for automatic text analysis and generation for the cultural heritage domain. Multimed. Tools Appl. 76(8), 10389–10406 (2017) 12. Verde, L., De Pietro, G.: A neural network approach to classify carotid disorders from heart rate variability analysis. Comput. Biol. Med. 109, 226–234 (2019) 13. Verde, L., De Pietro, G., Ghoneim, A., Alrashoud, M., Al-Mutib, K.N., Sannino, G.: Exploring the use of artificial intelligence techniques to detect the presence of coronavirus covid-19 through speech and voice analysis. IEEE Access 9, 65750–65757 (2021) 14. Zhang, W.E., Sheng, Q.Z., Alhazmi, A., Li, C.: Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11(3), 1–41 (2020) 15. Zhou, Y., Zheng, X., Hsieh, C.J., Chang, K.W., Huang, X.: Defense against adversarial attacks in NLP via dirichlet neighborhood ensemble. arXiv preprint arXiv:2006.11627 (2020) 16. Zhou, Z., Guan, H., Bhat, M.M., Hsu, J.: Fake news detection via NLP is vulnerable to adversarial attacks. arXiv preprint arXiv:1901.09657 (2019)
Chapter 50
A Federated Consensus-Based Model for Enhancing Fake News and Misleading Information Debunking Fiammetta Marulli, Laura Verde, Stefano Marrore, and Lelio Campanile
Abstract Misinformation and Fake News are hard to dislodge. According to experts on this phenomenon, to fight disinformation a less credulous public is needed; so, current AI techniques can support misleading information debunking, given the human tendency to believe “facts” that confirm biases. Much effort has been recently spent by the research community on this plague: several AI-based approaches for automatic detection and classification of Fake News have been proposed; unfortunately, Fake News producers have refined their ability in eluding automatic ML and DLbased detection systems. So, debunking false news represents an effective weapon to contrast the users’ reliance on false information. In this work, we propose a preliminary study aiming to approach the design of effective fake news debunking systems, harnessing two complementary federated approaches. We propose, firstly, a federation of independent classification systems to accomplish a debunking process, by applying a distributed consensus mechanism. Secondly, a federated learning task, involving several cooperating nodes, is accomplished, to obtain a unique merged model, including features of single participants models, trained on different and independent data fragments. This study is a preliminary work aiming to to point out the feasibility and the comparability of these proposed approaches, thus paving the way to an experimental campaign that will be performed on effective real data, thus providing an evidence for an effective and feasible model for detecting potential heterogeneous fake news. Debunking misleading information is mission critical to increase the awareness of facts on the part of news consumers.
F. Marulli · L. Verde · S. Marrore (B) · L. Campanile Dip. Matematica e Fisica, Università degli Studi della Campania “L. Vanvitelli”, Caserta, Italy e-mail: [email protected] F. Marulli e-mail: [email protected] URL: https://www.matfis.unicampania.it/ L. Verde e-mail: [email protected] L. Campanile e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5_50
587
588
F. Marulli et al.
50.1 Introduction Fake news is not a novel threat to society, since well-prepared false and misleading information have got the power to destroy until individuals carriers or reign supremacies. In recent times, this threat has become more powerful: fast communication services have supported widespread and immediate sharing of untrusted information among users. Large amounts of fake news are posted, read, believed and frequently and uncarefully shared daily by a number of users. On the other hand, an increasing number of users is trying to combat this threat, attempting to raise awareness in the other users, by sharing posts that debunk fake news and publishing information from fact-checking websites. In addition to debunking, a complementary countermeasures to fight the fake news plague is represented by the increasing development of artificial intelligence (AI) systems, aiming to support fake news identification and classification, evenly distinguishing them from effective real news. Several Fake News checking and detection approaches [11, 17] have been proposed in the very last few years. Most of these approaches take advantage of AI and deep learning (DL) methods and algorithms. Mainly, classification techniques, ranging from the analysis of linguistic [6, 10, 15] and psycolinguistic features [4, 7] until the content analysis (number and kinds of hyperlinks or reliable information sources references, e.g.) have been explored in the recent literature [5, 9]. Anyway, even if all these classification approaches are promising as a useful base to raise warnings towards suspicious information, they can only provide a surface analysis for the information: along with the parallel improvements obtained in the automatic content generation [14, 18], distinguishing linguistic features, as by the stylometrybased methods, e.g., has lost its effectiveness in detection, because malicious users producing and targeting fake news have become more professional, supported by automatic text generation systems. Moreover, as more the fake news detectors become more smart and sophisticated, also the automatic generation systems have become smarter too. Finally, malicious users can exploit also evasion attacks [3] towards Fake News Classification Systems, by performing poisoning attacks [15, 16] that are able to drive the classification process in a default state, acting as back-doors systems. In such perspective, an effective mean to debunk fake news is represented by the fact-checking, as evidenced by the recent literature [19]. Anyway, since fake news and misleading information cover several fields, from politics to climate changes, to economics, to the cultural and social trends, until the critical fields of healthcare and pandemics, developing effective automatically fake news checker and debunking system is anything but a trivial task, since information and misinformation change continually and too quickly. So, recalibrating systems to automatically perform an almost real time track-back of information is very far from being a task that has been effectively solved. In this work, we aim to propose two complementary approaches to model potential debunking systems able to manage features heterogeneity of different fake news
50 A Federated Consensus-Based Model for Enhancing Fake …
589
sources and put the basis for investigating their feasibility and effectiveness in supporting to pursue the untrusted and fake information debunking. So, two federated approaches are proposed: a first one, basing on the completion of a federated learning task that would be accomplished by several nodes training the same model, based on the same features but adopting different domain data fragments. The aim of this system is to create a unique merged model system that takes into account the influence of all the different domain news. Such a system, if it would work, could be used as a cross-domain system to support fake news detection and debunking suggestion. The second approach is conceived for harnessing a federation of independently trained clusters of models, where several models have been trained, in the same cluster, on the same domain data fragment, to obtain more opinions and obtaining more islands of decision on different specific domains. A final decision would be reached by adopting a consensus-based mechanism involving all the specific-domain clusters. The work is organized as follows. Section 50.2 provides motivations and background issues inspiring this work. Section 50.3 provides a brief literature survey concerning major research contribution concerning fake news issues and mitigation strategies. Section 50.4 presents the proposed approaches and the overall idea inspiring the federated approaches. Section 50.5 concludes this work by providing some observations and highlighting some issues concerning the feasibility of the proposed approaches, paving the way for a further more exhaustive work, enriched by a testing campaign required, in order to transform the contribution of this work from a theory study of potential models to practical implementation of fake news debunking systems.
50.2 Motivation and Background During the very recent coronavirus pandemics, the dissemination of fake news and misleading medical information about several kinds of diseases have been circulating widely on social media, posing a potential threat to public health as it spreads rapidly and effectively. False information are potentially dangerous when suggesting prevention measures or cures that are dangerous to a person’s health, beyond disseminating anxiety and undesirable and unnecessary fear. These disinformation campaigns can be deliberate and require collective and individual efforts to avoid the dangers they represent. In other cases, the spread of misleading advice is simply due to the fact that people do not know where to find correct data and facts. During the coronavirus pandemic, the dissemination of fake news and the associated risks have reached a wider dimension, at the point that medical organizations managed an extensive communication campaign to fight the fake infodemic, by activating their social media channels to provide up-to-date information and reliable advice on the coronavirus, and providing rolling updates including also alert news, to instant messaging platforms too. Some countries, also
590
F. Marulli et al.
activated anti-fake news agencies who attempt to counter the false information about the disease when the news source cannot be removed from social media, and work together with health bodies to provide accurate medical advice. As a result, official healthcare organizations always encourage, now more than ever before, everyone to follow official advice to check the reliability of any news sources claiming to offer medical advice. Among some possible signs indicating that news may be false, there are: • include a questionably high level of certainty in the advice; • poorly written texts news that is particularly surprising, upsetting or seems “too good to be true”; • fake social media accounts designed to look like legitimate ones. On the other hand, the evolution of the Internet of Things has contributed to an advance in the inter-connectivity and in fastening data sharing and spreading, thus supporting both useful and misleading corrupted news circulation. Basing on the great advances in the Artificial Intelligence-based systems, both Fake News Detectors and Fact Checker systems, for example, have been recently developed, following different approaches, ranging from the automatic news debunking and verification to content features analysis, as the authorship attribution, the links and stylometric text features analysis, for example. Unfortunately, the better these systems become at identifying characteristics of fake news, the more skilled the evasion systems become. More generally, all the AI-based systems are exposed to attacks such as noise, privacy invasion, replay and false data injection, evasion attacks, that affect their reliability and trustworthiness [24]. In this way, they can affect the inclination and confidence in adopting AI-based solutions in critical domains, as the healthcare [23] or the autonomous driving, e.g. So, countermeasures to make AI systems more robust and resilient by design is required now more than ever before.
50.3 Related Works The rapid advancement of artificial intelligence have suggested the conduction of a large number of experiments aiming to solve problems which have been marginally considered in the context of computer science. Among this problems, Fake News detection and debunking has got more and more importance in the very recent times [11], when the access to news media has become very convenient to people on the globe. This opportunity in turn results in the fast dissemination of news to millions of news consumers through an increasing number of news sources such as news channels, articles, websites, blogs and micro-blogs, social networking sites. The term Fake News is often described in related literature as misinformation, disinformation, hoax, or rumor, in order to refer, in actually different variations, to false information. Misinformation is used to refer to the spreading of false information disregarding the true intent. False information can be the result of false labeling and can easily
50 A Federated Consensus-Based Model for Enhancing Fake …
591
spread among users that do not care much about the veracity and reliability of what they are reading or sharing. Disinformation implies an intent to mislead the target of information. It refers to false information that is disseminated in a tactical way in order to bias and manipulate facts. Rumours and hoaxes can be interchangeably used to refer to false information that are deliberately built to seem true. The facts that they report are either inaccurate or false, although they are presented as genuine. Excluding well known reputable news sources, as trusted and recognized agencies and organizations, there has been tremendous growth of smaller news producers, which deliver information that are not trustworthy. In addition to this, in popular social networking and social media platforms anyone from anywhere around the worldwide web is allowed to publish and disseminate any kind of statement, unrecognized and unverified theories, thus contributing to spread fake and misleading information that are shared unreservedly and without any awareness among credulous users. The purpose of producing and spreading fake information can be manifold, such as, for example, manipulating and influencing public opinion on an event or simply to pursue fraudulent or illegal goals. Furthermore, also some of popular sources for informational services, that are considered to be authentic and reliable, such as Wikipedia, can be affected and prone to false information or fake news [12]. Additionally, the problem is further intensified by the cations of some official news aggregators that may deliberately spread false or fake news in order to gain popularity and achieve some political objective, or earn money, e.g. Finally, another factor contributing to the spread of fake news may be organized biasing campaigns attempting to mock or spoil a specific organization, competitors’ services and products or companies, or groups of people, e.g. for political, social, or financial reasons [20]. Currently, fake news cam be considered as one of the greatest threats to democracy, commerce, journalism, and a critical element to crack social and political balances, all over the world, carrying out huge collateral damages. The dissemination of fake news through different medium channels has not been dammed yet to a degree allowing to limit the adverse effects fake news can lead to. One main reason can be found in the unavailability of systems able to control fake news with no human involvement. Experiments indicate that machine and deep learning algorithms may have the ability to detect fake news, given that they have an initial set of cases to be trained on. Several research projects, tools and applications have dealt with fake news detection [27] and fact checking [19, 25] which mostly examine the problem as a veracity classification. Anyway, misinformation and fake news are hard to dislodge. Despite great progress in the classification and detection of false information, supported by advanced AI-based techniques, the only truly effective weapon is the denial of falsely reported facts, through fact checking and debunking of fake news [8]. According to psychologists studying this phenomenon, to fight disinformation a less credulous public is needed; so, current AI techniques can support misleading information debunking, given the human tendency to believe “facts” that confirm biases.
592
F. Marulli et al.
Much effort has been recently spent by the research community on this plague: several AI-based approaches for automatic detection and classification of Fake News have been proposed; unfortunately, Fake News producers have refined their ability in eluding automatic ML and DL-based detection systems. So, debunking false news represents an effective weapon to contrast the users’ reliance on false information. In this work, we propose a preliminary study aiming to approach the design of effective fake news debunking systems, harnessing two complementary federated approaches, as it will be detailed in the next sections. Several studies are presented in the literature to propose and discuss methods to detect fake news. Diverse aspects of fake news such as algorithms used for counterfeit news detection, categories of fake news, the actors involved in spreading false information, quantifying the impact of false information and future plans were described in [11]. A comprehensive overview of the state of false information on the web and social media was provided. Online fake reviews were used in [1] to realise the proposed automated detector of fake content. Authors compared the results obtained by their methods with the performance of six Machine Learning (ML) models, showing an improved accomplishments as compared to existing state-of-the-art benchmarks. Also Wang et al. [26] compared the performance of the proposed model for fake news detection with those achieved by existing models, such as Support Vector Machine (SVM), Logistic Regression and Bidirectional Long Short-term Memory Networks (BILSTM). In detail, they proposed a hybrid Convolutional Neural Network (CNN) on a benchmarked dataset, combining the metadata with text. This model achieved an better accuracy than other algorithms. Deep learning techniques have great prospect in fake news detection task. There are very few studies suggest the importance of neural networks in this area. In the work of [17] is proposed a hybrid neural network model which is a combination of CNN and Recurrent Neural Network (RNNs). As this model is required to classify between fake news and legitimate news, so this problem is cast as a binary classification problem. CNN and BILSTM models were used also in [21]. The proposed ensemble architecture attempts to capture the pattern of information from the short statements and to learn the characteristic behaviour of the source speaker from the different attributes provided in the dataset. Finally, all the learned knowledge is integrated to produce a fine-grained multiclass classification. An attention-based Long-Short memory network (LSTM) model to integrate speaker profiles for detecting fake news was proposed in [13]. Speaker profiles such as party affiliation, speaker title, location and credit history contribute to improve the accuracy to detect fake news of 14.5% compared the state-of-the-art methods, tested on a benchmark fake news detection dataset. Various Deep Learning (DL) algorithms were analysed in [2]. These were adopted on dataset of fake news available in Kaggle1 and authentic news articles extracted from Signal Media News.2 The performance of classifiers based on Gated Recurrent Unit (GRU), LSTM and BILSTM were estimated, showing better accuracy than 1 2
https://www.kaggle.com/mrisdal/fake-news. https://research.signal-ai.com/newsir16/signal-dataset.html.
50 A Federated Consensus-Based Model for Enhancing Fake …
593
those achieved by the classifiers based on CNN. A social media dataset was, instead, analysed in [22]. The authors developed a hybrid DL model. They showed that analysing the temporal behaviour of articles and learning about the characteristics of the source on user behaviour is useful for detecting fake news. The integration of these two elements improves the performance of the model.
50.4 Methodology In this work, we have conducted a preliminary study aiming to approach the design of effective fake news debunking systems, harnessing two complementary federated approaches. The first approach consists in composing a federation of independent classification systems to accomplish a debunking process; the final decision will be taken by applying a distributed consensus mechanism. The second approach proposes a model for accomplishing a federated learning task, that will involve several cooperating nodes, to obtain a unique merged model. Each participant will train the same shared model, including the same features; but, each single participant will train its model on a different and independent data fragments. This study is a preliminary work aiming to to point out the feasibility and the comparability of these proposed approaches, thus paving the way to an experimental campaign that will be performed on effective real data, thus providing an evidence for an effective and feasible model for detecting potential heterogeneous fake news. Debunking misleading information is mission critical to increase the awareness of facts on the part of news consumers. More precisely, in the first approach we propose to exploit a distributed consensusbased mechanism to perform an almost real-time debunking of fake news. Here, we focused our experiment by considering a federation of nodes that are trained to debunk specific fields news, as the politics, the healthcare, the social aspects, the environment, e.g. Each of the nodes participating in the federation will be able to provide a list of candidate reliable sources, proposed as input, to debunk a specific kind of information. Moreover, each of them will be trained on the same model, as can happen in a Federated Learning paradigm, but trained on a different topics or domains (politics, medicine, climate, etc.) and a different set of reliable debunking sites to verify the facts. So, our experimental modeling is designed in two steps: 1. Federated Learning Task: in this model, a federated learning task is developed, exchanging the same model between the participating nodes and sharing intermediate models to create a single model that groups a sort of super-classifier of the debunking site, according to the topic. In this federation, topic redundancy is required, i.e. each topic should be handled by at least 3 nodes (a federation of topic-oriented clusters).
594
F. Marulli et al.
2. Federation of Cooperating Neural Networks: in this model, a federation of independently trained but cooperating neural networks (NN) is realised. Each NN is separately trained on a specific topic and without requiring to recalibrate the overall global model, that represent a super classifiers federation based on: – Topic identification: this is necessary to address the right cluster of nodes to be involved in the debunking sources identification (we suggest the 3 top most authoritative debunking site to that topic). – Single nodes evaluation: each node processes the news provided as input and proposes, as output, its own list of the 3 top authoritative debunking sites. – Consensus step: a consensus mechanism is implemented to evaluate the proposal of each of the nodes participating to the cluster and providing the final decision (the final debunking sites list). The realisation of this network means that when the content of a set of news to debunk precisely matches with a topic, the process of suggesting the debunking list is addressed and managed by the federation of individual independent models (topic-oriented sub-cluster or federation). When a set of news would be crosstopics, the process of proposing the output (debunking sites list) is managed by the overall global model, obtained after the federated learning task. In this step, a maximum of 2 topics categories can be provided and 3 debunking sites will be provided for each topics.
50.5 Conclusions In this preliminary study, we approached the problem of supporting systems and strategies to mitigate the negative effects of fake news by addressing, as the experts of the problem suggest, the debunking of fake news, since the most effective weapon seems to be a more aware public that verify whether an information and its source can be considered as reliable and trustworthy or not. Automatic debunking of news and facts checking can result in a not trivial task, since misleading and false information are produced in great amounts, from several sources, that are more and more refined and heterogeneous in their style and in the features they exhibit in their surface. Furthermore, fake news and false information can cover several kinds of domain or involve cross domain topics. So, designing systems that are able to detect automatically fake news and the corresponding debunking action is an open issue, matter of research investigation. This work, that simply represents an exploratory study for approaching the modeling of effective working debunking strategies and systems, aims to pointing out the feasibility of two federated based models, in order to compare their effectiveness in accomplishing a debunking task and comparing their performances, in term of accuracy of the performed task and in terms of effort to build and scale when further different domains, topics and sources would be added. In future developments of
50 A Federated Consensus-Based Model for Enhancing Fake …
595
this study, there are the development details of the two proposed approaches and the evaluation steps will be performed, considering firstly a single topic news, in order to estimate: • The level of agreement in the output provided by the global model and each single topic-oriented and consensus-based cluster. • The general accuracy obtained by the global model, since it mixed the results from several different data fragments. Acknowledgements The research described in this work is co-funded and realized within the activities of the Research Program “Vanvitelli V:ALERE 2020—WAIILD TROLS”, financed by the University of Campania “L. Vanvitelli”, Italy.
References 1. Ahmed, H., Traore, I., Saad, S.: Detection of online fake news using n-gram analysis and machine learning techniques. In: International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, pp. 127–138. Springer (2017) 2. Bajaj, S.: The pope has a new baby! Fake news detection using deep learning. CS 224N, pp. 1–8 (2017) 3. Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndi´c, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 387–402. Springer (2013) 4. Campanile, L., Cantiello, P., Iacono, M., Marulli, F., Mastroianni, M.: Vulnerabilities assessment of deep learning-based fake news checker under poisoning attacks. Comput. Data Soc. Netw. 385 (2021) 5. Chora´s, M., Demestichas, K., Giełczyk, A., Herrero, Á., Ksieniewicz, P., Remoundou, K., Urda, D., Wo´zniak, M.: Advanced machine learning techniques for fake news (online disinformation) detection: a systematic mapping study. Appl. Soft Comput. 101, 107050 (2021) 6. Choudhary, A., Arora, A.: Linguistic feature based learning model for fake news detection and classification. Expert Syst. Appl. 169, 114171 (2021) 7. Giachanou, A., Ghanem, B., Ríssola, E.A., Rosso, P., Crestani, F., Oberski, D.: The impact of psycholinguistic patterns in discriminating between fake news spreaders and fact checkers. Data Knowl. Eng. 138, 101960 (2022) 8. Humprecht, E.: How do they debunk “fake news”? A cross-national comparison of transparency in fact checks. Digit. J. 8(3), 310–327 (2020) 9. Jiang, T., Li, J.P., Haq, A.U., Saboor, A., Ali, A.: A novel stacking approach for accurate detection of fake news. IEEE Access 9, 22626–22639 (2021) 10. Kaliyar, R.K., Goswami, A., Narang, P.: Fakebert: fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 80(8), 11765–11788 (2021) 11. Kumar, S., Shah, N.: False information on web and social media: a survey. arXiv preprint arXiv:1804.08559 (2018) 12. Kumar, S., West, R., Leskovec, J.: Disinformation on the web: Impact, characteristics, and detection of Wikipedia hoaxes. In: Proceedings of the 25th International Conference on World Wide Web, pp. 591–602 (2016) 13. Long, Y.: Fake News Detection Through Multi-perspective Speaker Profiles. Association for Computational Linguistics (2017) 14. Marulli, F.: IoT to enhance understanding of cultural heritage: Fedro authoring platform, artworks telling their fables. In: Future Access Enablers of Ubiquitous and Intelligent Infrastructures, pp. 270–276. Springer (2015)
596
F. Marulli et al.
15. Marulli, F., Balzanella, A., Campanile, L., Iacono, M., Mastroianni, M.: Exploring a federated learning approach to enhance authorship attribution of misleading information from heterogeneous sources. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021) 16. Marulli, F., Visaggio, C.A.: Adversarial deep learning for energy management in buildings. In: SummerSim, pp. 50–51 (2019) 17. Nasir, J.A., Khan, O.S., Varlamis, I.: Fake news detection: a hybrid CNN-RNN based deep learning approach. Int. J. Inf. Manag. Data Insights 1(1), 100007 (2021) 18. Piccialli, F., Marulli, F., Chianese, A.: A novel approach for automatic text analysis and generation for the cultural heritage domain. Multimed. Tools Appl. 76(8), 10389–10406 (2017) 19. Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y.: Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937 (2017) 20. Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Flammini, A., Menczer, F.: Detecting and tracking political abuse in social media. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, pp. 297–304 (2011) 21. Roy, A., Basak, K., Ekbal, A., Bhattacharyya, P.: A deep ensemble framework for fake news detection and classification. arXiv preprint arXiv:1811.04670 (2018) 22. Ruchansky, N., Seo, S., Liu, Y.: CSI: a hybrid deep model for fake news detection. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 797–806 (2017) 23. Verde, L., De Pietro, G.: A neural network approach to classify carotid disorders from heart rate variability analysis. Comput. Biol. Med. 109, 226–234 (2019) 24. Verde, L., De Pietro, G., Ghoneim, A., Alrashoud, M., Al-Mutib, K.N., Sannino, G.: Exploring the use of artificial intelligence techniques to detect the presence of coronavirus covid-19 through speech and voice analysis. IEEE Access 9, 65750–65757 (2021) 25. Vo, N., Lee, K.: The rise of guardians: Fact-checking URL recommendation to combat fake news. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 275–284 (2018) 26. Wang, W.Y.: “liar, liar pants on fire”: a new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017) 27. Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R.: Detection and resolution of rumours in social media: a survey. ACM Comput. Surv. (CSUR) 51(2), 1–36 (2018)
Author Index
A Abbas, Albo Jwaid Furqan, 509 Abdallah, Oumayma, 483 Abdallah, Wejden, 483 Abe, Jair Minoro, 137 Achaal, Batoul, 417 Akama, Seiki, 137 Akkad, Ghattas, 441 Alamaniotis, Miltiadis, 279 Alessandrini, Michele, 429, 459 Alonso, Santiago, 15 Alsubie, Abdelaziz, 149 Andriyanov, Nikita, 171, 183 Anisetti, Marco, 63
B Baaouni, Jalel, 109 Bakholdin, Petr P., 99 Baklouti, Mouna, 109 Balonin, Nikolaj, 237 Balonin, Yury, 237 Banks, Alec, 87 Bellandi, Valerio, 63 Benaben, Frederick, 523 Biagetti, Giorgio, 429, 459 Biase, Maria Stella de, 561 Blaunstein, Nathan, 123 Bobadilla, Jesús, 15 Bressollette, Luc, 449 Buryachenko, Vladimir V., 215
C Calinescu, Radu, 87
Campanile, Lelio, 561, 573, 587 Cassata, Roberto, 63 Cavaciuti, Alessandro, 63 Cesarano, Mario, 573 Chaabane, Faten, 109 Chaaya, Jad Abou, 469 Choura, Hedi, 109 Clement, Benoit, 449 Cohen, Yaniv, 123 Crippa, Paolo, 429, 459 Czarnowski, Ireneusz, 535
D Damiani, Ernesto, 63 Dan, Hayato, 363 Dekel, Ben Zion, 123 Dementev, Vitalii, 193 Dementiev, Vitaly, 183 Demin, Nikita, 171 Dondi, Riccardo, 77 Draganov, Ivo, 347
F Falaschetti, Laura, 429, 459 Favorskaya, Alena V., 249, 259 Favorskaya, Margarita N., 205, 215 Fodop, Gabin, 449 Frikha, Tarek, 109
G Gaiduk, Maksym, 405 Gaponova, Maria, 193
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 309, https://doi.org/10.1007/978-981-19-3444-5
597
598 Giacomo, Di Filippo, 459 Gianini, Gabriele, 63 González, Álvaro, 15 Gusev, Konstantin A., 205 Gutiérrez, Abraham, 15
H Härting, Ralf-Christian, 291 Hardt, Wolfram, 495 He, Liu, 25 Hoffmann, Clément, 449 Homma, Katsumi, 363 Hoppe, Nathalie, 291 Hu, Xiangpei, 317
I Ilyasova, Nataly, 171
J Jeany, Julien, 523 Jousse-Joulin, Sandrine, 449
K Kanzari, Dalel, 483 Khokhlov, Nikolay, 249 Klyachkin, Vladimir N., 161 Kountchev, Roumen, 333 Kountcheva, Roumiana, 333 Krasheninnikov, Victor R., 161 Kremlev, Artem S., 99 Kruglyakov, Alexey, 269 Kteish, Zeinab, 469 Kudenko, Daniel, 87 Kuvayskova, Yuliya E., 161
L Lauras, Matthieu, 523 Le Duff, Clara, 523 Li, Ya, 303 Lin, Na, 303
M Madani, Kurosh, 483 Mahmood, Mohammed Shakir, 509 Makarenko, Zinaida V., 99 Manoharan, Shanmugapriyan, 495 Mansour, Ali, 417, 441, 449, 469 Margun, Alexey A., 99
Author Index Marrone, Stefano, 561 Marrore, Stefano, 587 Martinelli, Fabio, 549 Marulli, Fiammetta, 587 Medievsky, Alexey, 269 Mercaldo, Francesco, 549 Mironov, Rumen, 347 Mizuno, Takafumi, 395 Mizuyama, Hajime, 363 Montreuil, Benoit, 523 Moradkhani, Nafe, 523 Morita, Natsuki, 363 Mortada, Mohamad Rida, 417 Murai, Tetsuya, 137 Muratov, Maksim V., 259
N Nakayama, Yotaro, 137 Nasser, Abbass, 417, 469 Nenashev, Vadim, 227 Nguyen, Viet-Dung, 441 Nine, Julkar, 495 Norikumo, Shunei, 375
O Ogawa, Masatoshi, 363 Ohya, Takao, 385 Olivier, Aurélien, 449
P Palmiero, Gianfranco, 573 Paterson, Colin, 87 Pavlov, Viktor A., 99 Politsinsky, Alexander S., 99
R Raimondo, Mariapia, 561 Rätzer, Sebastian, 405 Riley, Joshua, 87
S Saga, Ryosuke, 25 Sakunova, Anastasia, 509 Saleh, Hadi, 509 Saleh, Shadi, 495 Sanghez, Carlo, 573 Santone, Antonella, 549 Sato, Mizuho, 363 Seepold, Ralf, 405
Author Index S˛ek, Oskar, 535 Sentsov, Anton, 227 Sergeev, Alexander, 227, 237 Sergeev, Mikhail, 237 Shirokanev, Aleksandr, 171 Simonov, Konstantin, 269 Suetin, Marat, 193 Suginouchi, Shota, 363 Sumikawa, Yasunobu, 3 Suntsova, Darya I., 99
T Takagi, Ryusei, 3 Tashlinskii, Aleksandr, 193 Tashlinskiy, Alexandr, 183 Terbeh, Naim, 49 Teyeb, Rim, 49 Trieu, Sandra, 291 Turchetti, Claudio, 429, 459
599 V Verde, Laura, 561, 587 Vostrikov, Anton, 237
W Wang, Xuping, 303 Wang, Yue, 303
Y Yanami, Hitoshi, 363 Yao, Koffi-Clément, 469 Yuldashev, Zafar, 123
Z Zarri, Gian Piero, 37 Zhai, Xueying, 317 Zotin, Aleksandr, 269 Zrigui, Mounir, 49